bioimage_py
Efficient, parallel, and distributed implementation of image analysis and segmentation functionality for biomedical imaging.
Reimplements functionality from elf and cluster_tools in a more efficient and scalable manner.
Note: this package is in an early state and mainly provides support for data conversion, downsampling, and some initial segmentation functionality (connected components and watershed). The functionality will be extended soon; the implementation of seamlessly switching between local and distributed execution (via slurm) is already in place. Any feedback on issues you find or on how to improve usability is welcome!
Installation
This package can be installed via pip:
pip install bioimage-py
and conda:
conda install -c conda-forge bioimage-py
You can also install it from source by cloning the repository and then running
python -m pip install -e .
This pulls the core dependencies (numpy, pandas, scikit-image, cloudpickle, tqdm, threadpoolctl and
bioimage-cpp), which are enough for in-memory (numpy) workflows and the local execution backend.
Optional dependencies
File-backed and distributed I/O, and the individual file-format backends, are optional extras. Install
the ones you need, e.g. python -m pip install -e ".[io]" or combine several
(python -m pip install -e ".[io,nifti]"):
| Extra | Pulls in | Enables |
|---|---|---|
io |
zarr>=3, z5py |
Chunked zarr / n5 arrays — required for file-backed and distributed (subprocess/slurm) runs. |
hdf5 |
h5py |
HDF5 input (read). HDF5 is rejected as a distributed output. |
mrc |
mrcfile |
MRC / REC volumes (read-only). |
nifti |
nibabel |
NIfTI volumes (read-only). |
imagestack |
imageio, tifffile |
TIFF files and folders of image slices. |
msr |
msr-reader |
MSR / OBF microscopy files (read-only). |
cloudvolume |
cloud-volume |
CloudVolume (precomputed) layers — writable, Linux only. |
webknossos |
webknossos |
WebKnossos layers — read-only, remote or local. |
io-all |
all of the above | Every supported I/O backend in one go. |
test |
pytest, zarr>=3, scikit-image, scipy, openpyxl |
Running the test suite. |
dev |
flake8, pyflakes |
Linting. |
Distributed (subprocess / slurm) execution always requires a file-backed output, so install at
least the io extra for those workflows.
Usage
Operations run block-wise and share a common interface: pass block_shape and num_workers for
parallel local execution, or job_type="slurm" to run distributed (one task per
block). For distributed runs the output must be a file-backed (zarr/n5) array.
copy — block-wise copy of one source into another
Useful for converting between storage formats (e.g. a tiff stack to zarr) or for persisting an on-the-fly wrapper transformation to file.
import zarr
import bioimage_py as bp
# Convert a tiff stack (single multi-page file, or a folder of slices via bp.open_source(folder, "*.tif"))
# to a chunked zarr array.
src = bp.open_source("stack.tif")
out = zarr.open_array("out.zarr", mode="w", shape=src.shape, dtype=src.dtype, chunks=(64, 64, 64))
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8)
# Persist a wrapper (here a threshold) to file instead of recomputing it on every read.
from bioimage_py.wrapper import ThresholdSource
mask = zarr.open_array("mask.zarr", mode="w", shape=src.shape, dtype="bool", chunks=(64, 64, 64))
bp.copy(ThresholdSource(src, 128), mask, block_shape=(64, 64, 64), num_workers=8)
# Distributed: output must be file-backed (zarr/n5).
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8, job_type="slurm")
If output is omitted, a numpy array is allocated and returned (local execution only).
downsample — block-wise downsampling by an integer factor
Defaults are label-safe (order=0 nearest, no anti-aliasing). For intensity/image data pass
order=1 (or higher) and anti_aliasing=True for a smooth, alias-free result.
import zarr
import bioimage_py as bp
# Image data: smooth, anti-aliased 2x downsample into a new zarr array.
raw = zarr.open_array("raw.zarr", mode="r")
target = tuple(s // 2 for s in raw.shape)
out = zarr.open_array("raw_s1.zarr", mode="w", shape=target, dtype=raw.dtype, chunks=(64, 64, 64))
bp.downsample(raw, 2, out, order=1, anti_aliasing=True, block_shape=(64, 64, 64), num_workers=8)
# Label data: keep the defaults so no label ids are invented. Returns a numpy array when no output given.
seg = zarr.open_array("seg.zarr", mode="r")
small = bp.downsample(seg, 2)
# Anisotropic factor (downsample y/x only): bp.downsample(raw, (1, 2, 2), out, ...)
The downscaled shape is computed with bioimage_py.util.downscale_shape (ceil mode); under the hood
downsample wraps the input in a bioimage_py.wrapper.ResizedSource and copies it block-wise.
Re-running failed blocks
A distributed run that loses some blocks (a transient node failure, an out-of-memory kill, a slurm
timeout) raises a RunnerError. Each worker persists progress per block, so the error reports the
precise failed_block_ids (only the blocks that did not complete, not the whole task) and, for
distributed backends, the preserved tmp_folder — the completed work is not thrown away.
import bioimage_py as bp
from bioimage_py.runner import RunnerError
try:
bp.filters.gaussian_smoothing(raw, 2.0, output=out, block_shape=(64, 64, 64),
num_workers=64, job_type="slurm")
except RunnerError as e:
print(e.failed_block_ids) # e.g. [128, 129, 511]
print(e.tmp_folder) # /shared/tmp/bioimage_py_xxxx (preserved for resume/debug)
Recommended — resume_from (distributed only). Re-issue the same call pointing at the
preserved temp folder: only the incomplete blocks are re-run, and the result is merged with the
blocks that already finished. This is correct for array-output ops (the missing blocks are written)
and return-value ops (stats.mean, morphology.morphology, …), which reduce over the full merged
set:
bp.filters.gaussian_smoothing(raw, 2.0, output=out, block_shape=(64, 64, 64),
num_workers=64, job_type="slurm", resume_from=e.tmp_folder)
resume_from resumes from the original run's serialized payload, so pass it to finish the same
call — the input/output/parameters on the resuming call are ignored in favour of the originals.
Simpler — block_ids (a fresh re-run of just those blocks). For array-output and other
per-block-independent ops you can re-run the reported blocks directly; this works on every backend,
including local:
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8, job_type="slurm",
block_ids=e.failed_block_ids)
resume_from and block_ids are mutually exclusive. Two ops differ: segmentation.label has a
global cross-block merge, so a failed label is re-run whole (it accepts neither argument);
morphology.regionprops re-runs per object via item_ids / resume_from. A local run keeps no
temp folder, so re-run it (optionally with block_ids=e.failed_block_ids); resume_from is rejected
for job_type="local".
Slurm configuration
Slurm settings are cluster- and user-specific (partition, account, qos, node constraint, the shared
tmp_root, ...). Pass them per call as a SlurmConfig:
from bioimage_py import SlurmConfig
cfg = SlurmConfig(tmp_root="/scratch/shared/me", partition="gpu", account="myproj", time="01:00:00")
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=64, job_type="slurm", job_config=cfg)
To avoid repeating these every time, store them once as user defaults in
~/.config/bioimage-py/config.toml (honoring $XDG_CONFIG_HOME). Use the helper rather than
editing the file by hand — it validates field names and preserves the rest of the file:
from bioimage_py import write_slurm_config
write_slurm_config(tmp_root="/scratch/shared/me", partition="gpu", account="myproj")
These defaults are picked up automatically whenever a slurm run gets no explicit job_config
(e.g. bp.copy(..., job_type="slurm")). To combine the stored defaults with per-run tweaks, use
SlurmConfig.load(**overrides) (overrides win); a directly constructed SlurmConfig(...) is used
verbatim and does not read the file. Set BIOIMAGE_PY_NO_CONFIG=1 to ignore the file
(reproducible CI), or BIOIMAGE_PY_CONFIG=/path/to/config.toml to point at a specific file (e.g. a
shared cluster-wide config).
1"""Efficient, parallel, and distributed implementation of image analysis and segmentation functionality for biomedical imaging. 2 3Reimplements functionality from [elf](https://github.com/constantinpape/elf) and [cluster_tools](https://github.com/constantinpape/cluster_tools) in a more efficient and scalable manner. 4 5**Note:** this package is in an early state and mainly provides support for data conversion, downsampling, and some initial segmentation functionality (connected components and watershed). 6The functionality will be extended soon; the implementation of seamlessly switching between local and distributed execution (via slurm) is already in place. Any feedback on issues you find or on how to improve usability is welcome! 7 8.. include:: ../docs/installation.md 9.. include:: ../docs/usage.md 10""" # noqa 11from . import evaluation, filters, io, morphology, operations, segmentation, stats # noqa: F401 12from .copy import copy 13from .downsample import downsample 14from .runner import SlurmConfig, config_file_path, get_runner, write_slurm_config 15from .sources import as_source, open_cloudvolume, open_source, open_webknossos 16from .util import to_roi 17from .__version__ import __version__ 18 19__all__ = [ 20 "__version__", 21 "stats", 22 "filters", 23 "segmentation", 24 "morphology", 25 "evaluation", 26 "operations", 27 "io", 28 "copy", 29 "downsample", 30 "get_runner", 31 "SlurmConfig", 32 "config_file_path", 33 "write_slurm_config", 34 "as_source", 35 "open_source", 36 "open_cloudvolume", 37 "open_webknossos", 38 "to_roi", 39]
96def copy( 97 input: SourceLike, 98 output: Optional[SourceLike] = None, 99 *, 100 block_shape: Optional[Tuple[int, ...]] = None, 101 job_type: str = "local", 102 job_config: Optional[RunnerConfig] = None, 103 num_workers: int = 1, 104 mask: Optional[SourceLike] = None, 105 block_ids: Optional[Sequence[int]] = None, 106 resume_from: Optional[str] = None, 107) -> SourceLike: 108 """Copy a source into an output, block-wise. 109 110 The data is read from ``input`` and written into ``output`` one block at a time. The input may 111 be any source, including a read-only ``FileSource`` (e.g. a tiff stack) or a ``wrapper`` source 112 whose transformation is computed on read; copying it materializes the transformed data to the 113 output. The data is written into the output as-is, so the output array's dtype governs and a 114 cast is applied on assignment when it differs from the input dtype. 115 116 Args: 117 input: The input data to copy (a numpy/zarr/n5 array or a `Source`). 118 output: The output array to write into. Optional for local execution — a numpy array 119 matching the input shape and dtype is allocated and returned if omitted; **required** 120 for distributed execution, where it must be a writable, file-backed (zarr/n5) array. 121 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 122 for unchunked data. 123 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 124 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 125 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 126 backends). 127 mask: Optional binary mask; only voxels within the mask are copied (out-of-mask output 128 voxels are left unchanged). 129 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks 130 into the existing ``output``). Mutually exclusive with ``resume_from``. 131 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 132 ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive 133 with ``block_ids``. 134 135 Returns: 136 The output array (the provided ``output``, or a newly allocated numpy array). 137 """ 138 return _copy_source(input, output, block_shape=block_shape, job_type=job_type, 139 job_config=job_config, num_workers=num_workers, mask=mask, name="copy", 140 block_ids=block_ids, resume_from=resume_from)
Copy a source into an output, block-wise.
The data is read from input and written into output one block at a time. The input may
be any source, including a read-only FileSource (e.g. a tiff stack) or a wrapper source
whose transformation is computed on read; copying it materializes the transformed data to the
output. The data is written into the output as-is, so the output array's dtype governs and a
cast is applied on assignment when it differs from the input dtype.
Args:
input: The input data to copy (a numpy/zarr/n5 array or a Source).
output: The output array to write into. Optional for local execution — a numpy array
matching the input shape and dtype is allocated and returned if omitted; required
for distributed execution, where it must be a writable, file-backed (zarr/n5) array.
block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
mask: Optional binary mask; only voxels within the mask are copied (out-of-mask output
voxels are left unchanged).
block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks
into the existing output). Mutually exclusive with resume_from.
resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
runner.run); the missing blocks are written into output. Mutually exclusive
with block_ids.
Returns:
The output array (the provided output, or a newly allocated numpy array).
25def downsample( 26 input: SourceLike, 27 scale_factor: ScaleFactor, 28 output: Optional[SourceLike] = None, 29 *, 30 order: int = 0, 31 anti_aliasing: bool = False, 32 block_shape: Optional[Tuple[int, ...]] = None, 33 job_type: str = "local", 34 job_config: Optional[RunnerConfig] = None, 35 num_workers: int = 1, 36 mask: Optional[SourceLike] = None, 37 block_ids: Optional[Sequence[int]] = None, 38 resume_from: Optional[str] = None, 39) -> SourceLike: 40 """Downsample a source by an integer factor, block-wise. 41 42 The input is wrapped in a :class:`~bioimage_py.wrapper.ResizedSource` at the downscaled shape 43 (computed with :func:`~bioimage_py.util.downscale_shape`, ceil mode) and copied into the output. 44 45 The defaults (``order=0`` nearest, ``anti_aliasing=False``) are label-safe — they preserve the 46 input values and are appropriate for segmentations. For intensity / image data pass ``order=1`` 47 (or higher) and ``anti_aliasing=True`` for a smooth, alias-free downsample. 48 49 Args: 50 input: The input data to downsample (a numpy/zarr/n5 array or a `Source`). 2D or 3D. 51 scale_factor: The downscaling factor: a single int (isotropic) or a per-axis sequence of 52 ints. Each factor must be ``>= 1`` (1 leaves that axis unchanged). 53 output: The output array to write into. Optional for local execution — a numpy array of the 54 downscaled shape and the input dtype is allocated and returned if omitted; **required** 55 for distributed execution, where it (and the input) must be file-backed (zarr/n5). 56 order: The interpolation order (0 to 5). Use ``0`` (nearest) for label data. 57 anti_aliasing: Whether to Gaussian pre-smooth before sampling to avoid aliasing. 58 Recommended for image data; leave ``False`` for labels. 59 block_shape: Shape of the processing blocks (in the downscaled output space). Defaults to 60 the resized source's chunk shape; required for unchunked data. 61 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 62 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 63 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 64 backends). 65 mask: Optional binary mask in the downscaled output space; only voxels within the mask are 66 written (out-of-mask output voxels are left unchanged). 67 block_ids: Restrict processing to these block ids (in the downscaled output space), e.g. to 68 re-run previously failed blocks into the existing ``output``. Mutually exclusive with 69 ``resume_from``. 70 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 71 ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive 72 with ``block_ids``. 73 74 Returns: 75 The output array (the provided ``output``, or a newly allocated numpy array). 76 """ 77 src = as_source(input) 78 ndim = src.ndim 79 if isinstance(scale_factor, Integral): 80 factors: Tuple[int, ...] = (int(scale_factor),) * ndim 81 else: 82 factors = tuple(int(f) for f in scale_factor) 83 if len(factors) != ndim: 84 raise ValueError( 85 f"scale_factor {scale_factor} does not match the input dimensionality {ndim}." 86 ) 87 if any(f < 1 for f in factors): 88 raise ValueError( 89 f"downsample requires scale factors >= 1, got {factors}; " 90 "use a ResizedSource directly to upsample." 91 ) 92 93 target_shape = downscale_shape(src.shape, factors) 94 wrapped = ResizedSource(src, target_shape, order=order, anti_aliasing=anti_aliasing) 95 return _copy_source(wrapped, output, block_shape=block_shape, job_type=job_type, 96 job_config=job_config, num_workers=num_workers, mask=mask, name="downsample", 97 block_ids=block_ids, resume_from=resume_from)
Downsample a source by an integer factor, block-wise.
The input is wrapped in a ~bioimage_py.wrapper.ResizedSource at the downscaled shape
(computed with ~bioimage_py.util.downscale_shape(), ceil mode) and copied into the output.
The defaults (order=0 nearest, anti_aliasing=False) are label-safe — they preserve the
input values and are appropriate for segmentations. For intensity / image data pass order=1
(or higher) and anti_aliasing=True for a smooth, alias-free downsample.
Args:
input: The input data to downsample (a numpy/zarr/n5 array or a Source). 2D or 3D.
scale_factor: The downscaling factor: a single int (isotropic) or a per-axis sequence of
ints. Each factor must be >= 1 (1 leaves that axis unchanged).
output: The output array to write into. Optional for local execution — a numpy array of the
downscaled shape and the input dtype is allocated and returned if omitted; required
for distributed execution, where it (and the input) must be file-backed (zarr/n5).
order: The interpolation order (0 to 5). Use 0 (nearest) for label data.
anti_aliasing: Whether to Gaussian pre-smooth before sampling to avoid aliasing.
Recommended for image data; leave False for labels.
block_shape: Shape of the processing blocks (in the downscaled output space). Defaults to
the resized source's chunk shape; required for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
mask: Optional binary mask in the downscaled output space; only voxels within the mask are
written (out-of-mask output voxels are left unchanged).
block_ids: Restrict processing to these block ids (in the downscaled output space), e.g. to
re-run previously failed blocks into the existing output. Mutually exclusive with
resume_from.
resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
runner.run); the missing blocks are written into output. Mutually exclusive
with block_ids.
Returns:
The output array (the provided output, or a newly allocated numpy array).
18def get_runner(job_type: str, config: Optional[RunnerConfig] = None) -> Runner: 19 """Return a runner for the given job type. 20 21 Args: 22 job_type: One of ``"local"``, ``"subprocess"`` or ``"slurm"``. 23 config: Optional runner configuration. 24 25 Returns: 26 A :class:`~bioimage_py.runner.base.Runner` instance. 27 28 Raises: 29 ValueError: If ``job_type`` is unknown. 30 """ 31 try: 32 cls = _RUNNERS[job_type.lower()] 33 except KeyError: 34 raise ValueError(f"Unknown job_type {job_type!r}; expected one of {sorted(_RUNNERS)}.") 35 return cls(config)
Return a runner for the given job type.
Args:
job_type: One of "local", "subprocess" or "slurm".
config: Optional runner configuration.
Returns:
A ~bioimage_py.runner.base.Runner instance.
Raises:
ValueError: If job_type is unknown.
29@dataclass 30class SlurmConfig(RunnerConfig): 31 """Configuration for the slurm runner. 32 33 Inherits ``poll_interval``, ``tmp_root`` and ``python_executable`` from 34 :class:`RunnerConfig`. For slurm, ``tmp_root`` is **required** and must point at a 35 shared filesystem visible to all compute nodes (not node-local ``/tmp``), and 36 ``num_workers`` (passed to the op / ``run``) is interpreted as the array throttle — the 37 maximum number of tasks allowed to run concurrently — independently of how many tasks 38 the work is partitioned into. 39 40 Cluster-specific values (``partition``, ``account``, ``constraint``, ``tmp_root``, ...) 41 can be stored once in a user config file and reused as defaults; see 42 :meth:`load` and :func:`write_slurm_config`. 43 44 Attributes: 45 partition: The slurm partition to submit to. 46 time: The per-task time limit (slurm time format, e.g. ``"01:00:00"``). 47 mem: The per-task memory limit (e.g. ``"8G"``). 48 cpus_per_task: Number of CPUs requested per task. 49 gpus: Number of GPUs requested per task (emitted as ``--gpus`` only when > 0). 50 account: The accounting project to charge. 51 qos: The quality-of-service to request. 52 constraint: A node feature constraint. 53 shebang: Optional environment setup for the generated job script. If given, its 54 first line must be an interpreter line (starting with ``#!``) which is placed at 55 the top of the script; any remaining lines are emitted as an activation preamble 56 *after* the ``#SBATCH`` directives (so the directives are still honoured). The 57 preamble is for making the package importable on the node (e.g. ``module load`` 58 / ``LD_LIBRARY_PATH`` exports), not for choosing the interpreter: the worker is 59 always launched with the absolute ``python_executable`` (defaulting to the 60 submitting ``sys.executable``). ``None`` uses ``#!/bin/bash`` and that absolute 61 interpreter, which needs no activation when the env lives on a shared 62 filesystem. Example:: 63 64 shebang = "#!/bin/bash\\nmodule load gcc\\nexport LD_LIBRARY_PATH=...:$LD_LIBRARY_PATH" 65 66 max_array_size: Override for the maximum number of array tasks per job. ``None`` 67 queries the cluster's ``MaxArraySize`` (falling back to a safe default). A run 68 partitioned into more tasks than this is rejected up front with a clear error. 69 latency_wait: Seconds to wait for a finished task's ``.success`` sentinel to become 70 visible on a shared (NFS) filesystem before giving up on it. A task that the 71 scheduler reports ``COMPLETED`` wrote its sentinel, but the orchestrating node's 72 attribute cache can lag the compute node by up to the mount's ``acdirmax`` 73 (typically 60 s); this must comfortably exceed that. It only bounds the wait on a 74 ``COMPLETED``-but-not-yet-visible task — a task is resolved the moment its 75 sentinel appears, so a generous value does not slow down successful runs. 76 """ 77 78 partition: Optional[str] = None 79 time: Optional[str] = None 80 mem: Optional[str] = None 81 cpus_per_task: int = 1 82 gpus: int = 0 83 account: Optional[str] = None 84 qos: Optional[str] = None 85 constraint: Optional[str] = None 86 shebang: Optional[str] = None 87 max_array_size: Optional[int] = None 88 latency_wait: float = 120.0 89 90 @classmethod 91 def load(cls, path: Optional[str] = None, **overrides: Any) -> "SlurmConfig": 92 """Build a config from the user config file, with explicit overrides taking precedence. 93 94 Precedence is ``overrides`` > config file ``[slurm]`` section > dataclass defaults. 95 This is the way to combine the stored user defaults with per-run tweaks; constructing 96 ``SlurmConfig(...)`` directly does **not** consult the file (an explicitly built config 97 is used verbatim). 98 99 Args: 100 path: Path to the config file. ``None`` resolves the default location (see 101 :func:`config_file_path`). A missing file is treated as empty. 102 **overrides: Field values that override the file defaults. Each name must be a 103 valid ``SlurmConfig`` field. 104 105 Returns: 106 A :class:`SlurmConfig` with file defaults filled in and overrides applied. 107 108 Raises: 109 ValueError: If the file or ``overrides`` contain an unknown field name. 110 """ 111 _validate_keys(overrides, "load() overrides") 112 merged: Dict[str, Any] = dict(_read_slurm_defaults(path)) 113 merged.update(overrides) 114 return cls(**merged)
Configuration for the slurm runner.
Inherits poll_interval, tmp_root and python_executable from
RunnerConfig. For slurm, tmp_root is required and must point at a
shared filesystem visible to all compute nodes (not node-local /tmp), and
num_workers (passed to the op / run) is interpreted as the array throttle — the
maximum number of tasks allowed to run concurrently — independently of how many tasks
the work is partitioned into.
Cluster-specific values (partition, account, constraint, tmp_root, ...)
can be stored once in a user config file and reused as defaults; see
load() and write_slurm_config().
Attributes:
partition: The slurm partition to submit to.
time: The per-task time limit (slurm time format, e.g. "01:00:00").
mem: The per-task memory limit (e.g. "8G").
cpus_per_task: Number of CPUs requested per task.
gpus: Number of GPUs requested per task (emitted as --gpus only when > 0).
account: The accounting project to charge.
qos: The quality-of-service to request.
constraint: A node feature constraint.
shebang: Optional environment setup for the generated job script. If given, its
first line must be an interpreter line (starting with #!) which is placed at
the top of the script; any remaining lines are emitted as an activation preamble
after the #SBATCH directives (so the directives are still honoured). The
preamble is for making the package importable on the node (e.g. module load
/ LD_LIBRARY_PATH exports), not for choosing the interpreter: the worker is
always launched with the absolute python_executable (defaulting to the
submitting sys.executable). None uses #!/bin/bash and that absolute
interpreter, which needs no activation when the env lives on a shared
filesystem. Example::
shebang = "#!/bin/bash\nmodule load gcc\nexport LD_LIBRARY_PATH=...:$LD_LIBRARY_PATH"
max_array_size: Override for the maximum number of array tasks per job. ``None``
queries the cluster's ``MaxArraySize`` (falling back to a safe default). A run
partitioned into more tasks than this is rejected up front with a clear error.
latency_wait: Seconds to wait for a finished task's ``.success`` sentinel to become
visible on a shared (NFS) filesystem before giving up on it. A task that the
scheduler reports ``COMPLETED`` wrote its sentinel, but the orchestrating node's
attribute cache can lag the compute node by up to the mount's ``acdirmax``
(typically 60 s); this must comfortably exceed that. It only bounds the wait on a
``COMPLETED``-but-not-yet-visible task — a task is resolved the moment its
sentinel appears, so a generous value does not slow down successful runs.
90 @classmethod 91 def load(cls, path: Optional[str] = None, **overrides: Any) -> "SlurmConfig": 92 """Build a config from the user config file, with explicit overrides taking precedence. 93 94 Precedence is ``overrides`` > config file ``[slurm]`` section > dataclass defaults. 95 This is the way to combine the stored user defaults with per-run tweaks; constructing 96 ``SlurmConfig(...)`` directly does **not** consult the file (an explicitly built config 97 is used verbatim). 98 99 Args: 100 path: Path to the config file. ``None`` resolves the default location (see 101 :func:`config_file_path`). A missing file is treated as empty. 102 **overrides: Field values that override the file defaults. Each name must be a 103 valid ``SlurmConfig`` field. 104 105 Returns: 106 A :class:`SlurmConfig` with file defaults filled in and overrides applied. 107 108 Raises: 109 ValueError: If the file or ``overrides`` contain an unknown field name. 110 """ 111 _validate_keys(overrides, "load() overrides") 112 merged: Dict[str, Any] = dict(_read_slurm_defaults(path)) 113 merged.update(overrides) 114 return cls(**merged)
Build a config from the user config file, with explicit overrides taking precedence.
Precedence is overrides > config file [slurm] section > dataclass defaults.
This is the way to combine the stored user defaults with per-run tweaks; constructing
SlurmConfig(...) directly does not consult the file (an explicitly built config
is used verbatim).
Args:
path: Path to the config file. None resolves the default location (see
config_file_path()). A missing file is treated as empty.
**overrides: Field values that override the file defaults. Each name must be a
valid SlurmConfig field.
Returns:
A SlurmConfig with file defaults filled in and overrides applied.
Raises:
ValueError: If the file or overrides contain an unknown field name.
117def config_file_path(path: Optional[str] = None) -> Path: 118 """Resolve the path to the user config file. 119 120 Resolution order: an explicit ``path`` argument, then the ``BIOIMAGE_PY_CONFIG`` 121 environment variable, then ``$XDG_CONFIG_HOME/bioimage-py/config.toml`` (falling back to 122 ``~/.config/bioimage-py/config.toml``). 123 124 Args: 125 path: An explicit path that short-circuits the resolution. ``None`` resolves the 126 default location. 127 128 Returns: 129 The resolved path (not guaranteed to exist). 130 """ 131 if path is not None: 132 return Path(path).expanduser() 133 env = os.environ.get("BIOIMAGE_PY_CONFIG") 134 if env: 135 return Path(env).expanduser() 136 base = os.environ.get("XDG_CONFIG_HOME") or os.path.join(os.path.expanduser("~"), ".config") 137 return Path(base) / "bioimage-py" / "config.toml"
Resolve the path to the user config file.
Resolution order: an explicit path argument, then the BIOIMAGE_PY_CONFIG
environment variable, then $XDG_CONFIG_HOME/bioimage-py/config.toml (falling back to
~/.config/bioimage-py/config.toml).
Args:
path: An explicit path that short-circuits the resolution. None resolves the
default location.
Returns: The resolved path (not guaranteed to exist).
179def write_slurm_config(path: Optional[str] = None, *, replace: bool = False, **fields: Any) -> str: 180 """Create or update the user config file with default slurm settings. 181 182 This is the supported way to set up cluster-specific defaults (partition, account, 183 constraint, ``tmp_root``, ...) instead of editing the file by hand. Provided fields are 184 merged into the existing ``[slurm]`` table by default (so the file can be built up over 185 several calls); ``None`` values are skipped, and any other top-level tables in the file 186 (reserved for future named profiles) are preserved. 187 188 Args: 189 path: Path to write to. ``None`` resolves the default location (see 190 :func:`config_file_path`); the parent directory is created if needed. 191 replace: If ``True``, replace the whole ``[slurm]`` table instead of merging into it. 192 **fields: Default field values to store. Each name must be a valid ``SlurmConfig`` 193 field. 194 195 Returns: 196 The path that was written. 197 198 Raises: 199 ValueError: If ``fields`` contains an unknown field name. 200 """ 201 _validate_keys(fields, "write_slurm_config()") 202 provided = {k: v for k, v in fields.items() if v is not None} 203 fp = config_file_path(path) 204 data = _parse_toml(fp) 205 section = {} if replace else dict(data.get("slurm", {})) 206 section.update(provided) 207 data["slurm"] = section 208 209 import tomli_w # local import: only the writer needs the (optional-at-runtime) dependency. 210 211 fp.parent.mkdir(parents=True, exist_ok=True) 212 with open(fp, "wb") as f: 213 tomli_w.dump(data, f) 214 return str(fp)
Create or update the user config file with default slurm settings.
This is the supported way to set up cluster-specific defaults (partition, account,
constraint, tmp_root, ...) instead of editing the file by hand. Provided fields are
merged into the existing [slurm] table by default (so the file can be built up over
several calls); None values are skipped, and any other top-level tables in the file
(reserved for future named profiles) are preserved.
Args:
path: Path to write to. None resolves the default location (see
config_file_path()); the parent directory is created if needed.
replace: If True, replace the whole [slurm] table instead of merging into it.
**fields: Default field values to store. Each name must be a valid SlurmConfig
field.
Returns: The path that was written.
Raises:
ValueError: If fields contains an unknown field name.
30def as_source(obj: "SourceLike") -> Source: 31 """Convert a supported object into a :class:`Source`. 32 33 Idempotent on :class:`Source` inputs. numpy / zarr / z5py arrays are wrapped in an 34 :class:`ArraySource`. Bare paths are intentionally not supported (see the design doc). 35 36 Args: 37 obj: The object to convert. 38 39 Returns: 40 A :class:`Source`. 41 42 Raises: 43 TypeError: If the object cannot be converted (e.g. a string path). 44 """ 45 if isinstance(obj, Source): 46 return obj 47 if isinstance(obj, (str, bytes)): 48 raise TypeError( 49 "Passing strings / file paths as a source is not supported. Open the array " 50 "yourself (e.g. with zarr or z5py) and pass the handle." 51 ) 52 for predicate, converter in _CONVERTERS: 53 if predicate(obj): 54 return converter(obj) 55 # numpy and any array-like with shape/dtype fall back to ArraySource. 56 if isinstance(obj, np.ndarray) or (hasattr(obj, "shape") and hasattr(obj, "dtype")): 57 return ArraySource(obj) 58 raise TypeError(f"Cannot convert object of type {type(obj)!r} to a Source.")
Convert a supported object into a Source.
Idempotent on Source inputs. numpy / zarr / z5py arrays are wrapped in an
ArraySource. Bare paths are intentionally not supported (see the design doc).
Args: obj: The object to convert.
Returns:
A Source.
Raises: TypeError: If the object cannot be converted (e.g. a string path).
103def open_source( 104 path: PathLike, 105 internal_path: Optional[str] = None, 106 format: Optional[str] = None, 107 mode: str = "r", 108 **kwargs: Any, 109) -> FileSource: 110 """Open a file-backed array as a :class:`Source`. 111 112 The format is inferred from the path extension (overridable via ``format``). ``internal_path`` 113 selects the array inside a container; when omitted it defaults to the format's natural key 114 (e.g. ``"data"`` for mrc/nifti, ``"mag1"`` for knossos, ``""`` for a single image stack), and is 115 required for multi-array containers (hdf5/zarr/n5). 116 117 Args: 118 path: Path to the file or folder to open. 119 internal_path: Key of the array inside the container; format-dependent default if omitted. 120 format: Force a registered format name, overriding extension inference. 121 mode: Open mode. ``"r"`` (default) is read-only; write modes (``"a"``/``"r+"``/``"w"``) 122 are only honored for writable formats (zarr/n5/hdf5). 123 kwargs: Extra keyword arguments forwarded to the backend constructor. 124 125 Returns: 126 A :class:`FileSource` with a reopenable ``kind="file"`` spec. 127 """ 128 fmt = format if format is not None else infer_format(path) 129 # Validate the format is installed up front (raises a clear error otherwise). 130 constructor_for_format(fmt) 131 132 handle = open_file(path, mode=mode, format=fmt, **kwargs) 133 key = internal_path if internal_path is not None else getattr(handle, "default_key", None) 134 dataset, recorded_key = _resolve_dataset(handle, key) 135 136 writable = is_writable_format(fmt) and mode != "r" 137 return FileSource( 138 dataset, 139 path=path, 140 internal_path=recorded_key, 141 format=fmt, 142 mode=mode, 143 open_kwargs=kwargs, 144 writable=writable, 145 )
Open a file-backed array as a Source.
The format is inferred from the path extension (overridable via format). internal_path
selects the array inside a container; when omitted it defaults to the format's natural key
(e.g. "data" for mrc/nifti, "mag1" for knossos, "" for a single image stack), and is
required for multi-array containers (hdf5/zarr/n5).
Args:
path: Path to the file or folder to open.
internal_path: Key of the array inside the container; format-dependent default if omitted.
format: Force a registered format name, overriding extension inference.
mode: Open mode. "r" (default) is read-only; write modes ("a"/"r+"/"w")
are only honored for writable formats (zarr/n5/hdf5).
kwargs: Extra keyword arguments forwarded to the backend constructor.
Returns:
A FileSource with a reopenable kind="file" spec.
130def open_cloudvolume( 131 cloudpath: str, 132 mip: int = 0, 133 fill_missing: bool = False, 134 bounded: bool = True, 135 cache: bool = False, 136 non_aligned_writes: bool = True, 137 offset: Optional[Tuple[int, int, int]] = None, 138 size: Optional[Tuple[int, int, int]] = None, 139 **kwargs: Any, 140) -> CloudVolumeSource: 141 """Open a CloudVolume (precomputed) layer as a writable ZYX :class:`Source`. 142 143 Args: 144 cloudpath: The CloudVolume cloudpath (e.g. ``"precomputed://..."`` or ``"file://..."``). 145 mip: The resolution (mip) level to open. 146 fill_missing: Whether to zero-fill missing chunks instead of raising. For a *writable* 147 output whose volume size is not a multiple of the chunk size, set this to ``True`` so 148 the partial boundary chunks can be read-modify-written into a fresh layer. 149 bounded: Whether reads/writes are restricted to the volume bounds. 150 cache: Whether to enable CloudVolume's local cache. 151 non_aligned_writes: Whether to allow writes that are not chunk-aligned (needed for the 152 partial blocks at the volume boundary in block-wise writes). 153 offset: Optional absolute XYZ origin of the view; defaults to the layer's ``voxel_offset``. 154 size: Optional XYZ size of the view; defaults to the layer's ``volume_size``. 155 kwargs: Extra keyword arguments forwarded to ``CloudVolume``. 156 157 Returns: 158 A :class:`CloudVolumeSource`. 159 """ 160 from cloudvolume import CloudVolume 161 162 open_params: Dict[str, Any] = dict( 163 mip=mip, 164 fill_missing=fill_missing, 165 bounded=bounded, 166 cache=cache, 167 non_aligned_writes=non_aligned_writes, 168 ) 169 open_params.update(kwargs) 170 volume = CloudVolume(cloudpath, progress=False, **open_params) 171 return CloudVolumeSource(volume, offset=offset, size=size, open_params=open_params)
Open a CloudVolume (precomputed) layer as a writable ZYX Source.
Args:
cloudpath: The CloudVolume cloudpath (e.g. "precomputed://..." or "file://...").
mip: The resolution (mip) level to open.
fill_missing: Whether to zero-fill missing chunks instead of raising. For a writable
output whose volume size is not a multiple of the chunk size, set this to True so
the partial boundary chunks can be read-modify-written into a fresh layer.
bounded: Whether reads/writes are restricted to the volume bounds.
cache: Whether to enable CloudVolume's local cache.
non_aligned_writes: Whether to allow writes that are not chunk-aligned (needed for the
partial blocks at the volume boundary in block-wise writes).
offset: Optional absolute XYZ origin of the view; defaults to the layer's voxel_offset.
size: Optional XYZ size of the view; defaults to the layer's volume_size.
kwargs: Extra keyword arguments forwarded to CloudVolume.
Returns:
A CloudVolumeSource.
187def open_webknossos( 188 dataset_name_or_url: str, 189 organization_id: Optional[str] = None, 190 layer_name: str = "", 191 mag: int = 1, 192 offset: Optional[Tuple[int, int, int]] = None, 193 size: Optional[Tuple[int, int, int]] = None, 194) -> WebKnossosSource: 195 """Open a (remote) WebKnossos layer as a read-only ZYX :class:`Source`. 196 197 Args: 198 dataset_name_or_url: The WebKnossos dataset name or URL (or an annotation URL). 199 organization_id: The organization id (required when opening by dataset name). 200 layer_name: The name of the layer to open. 201 mag: The magnification (resolution) level. 202 offset: Optional absolute XYZ origin of the view; defaults to the layer bbox ``topleft``. 203 size: Optional XYZ size of the view; defaults to the layer bbox ``size``. 204 205 Returns: 206 A :class:`WebKnossosSource`. 207 """ 208 return WebKnossosSource( 209 dataset_name_or_url=dataset_name_or_url, 210 organization_id=organization_id, 211 layer_name=layer_name, 212 mag=mag, 213 offset=offset, 214 size=size, 215 )
Open a (remote) WebKnossos layer as a read-only ZYX Source.
Args:
dataset_name_or_url: The WebKnossos dataset name or URL (or an annotation URL).
organization_id: The organization id (required when opening by dataset name).
layer_name: The name of the layer to open.
mag: The magnification (resolution) level.
offset: Optional absolute XYZ origin of the view; defaults to the layer bbox topleft.
size: Optional XYZ size of the view; defaults to the layer bbox size.
Returns:
A WebKnossosSource.
29def to_roi(block: BlockDescriptor) -> Tuple[slice, ...]: 30 """Convert a ``bioimage_cpp.utils`` ``Block`` into a tuple of slices. 31 32 Args: 33 block: A ``Block`` (carrying ``begin``/``end`` coordinate lists). For halo 34 operations pass one of ``block.outer_block`` / ``block.inner_block`` / 35 ``block.inner_block_local``. 36 37 Returns: 38 A tuple of slices that indexes a source or array. 39 """ 40 return tuple(slice(int(b), int(e)) for b, e in zip(block.begin, block.end))
Convert a bioimage_cpp.utils Block into a tuple of slices.
Args:
block: A Block (carrying begin/end coordinate lists). For halo
operations pass one of block.outer_block / block.inner_block /
block.inner_block_local.
Returns: A tuple of slices that indexes a source or array.