bioimage_py

Efficient, parallel, and distributed implementation of image analysis and segmentation functionality for biomedical imaging.

Reimplements functionality from elf and cluster_tools in a more efficient and scalable manner.

Note: this package is in an early state and mainly provides support for data conversion, downsampling, and some initial segmentation functionality (connected components and watershed). The functionality will be extended soon; the implementation of seamlessly switching between local and distributed execution (via slurm) is already in place. Any feedback on issues you find or on how to improve usability is welcome!

Installation

This package can be installed via pip:

pip install bioimage-py

and conda:

conda install -c conda-forge bioimage-py

You can also install it from source by cloning the repository and then running

python -m pip install -e .

This pulls the core dependencies (numpy, pandas, scikit-image, cloudpickle, tqdm, threadpoolctl and bioimage-cpp), which are enough for in-memory (numpy) workflows and the local execution backend.

Optional dependencies

File-backed and distributed I/O, and the individual file-format backends, are optional extras. Install the ones you need, e.g. python -m pip install -e ".[io]" or combine several (python -m pip install -e ".[io,nifti]"):

Extra	Pulls in	Enables
`io`	`zarr>=3`, `z5py`	Chunked zarr / n5 arrays — required for file-backed and distributed (`subprocess`/`slurm`) runs.
`hdf5`	`h5py`	HDF5 input (read). HDF5 is rejected as a distributed output.
`mrc`	`mrcfile`	MRC / REC volumes (read-only).
`nifti`	`nibabel`	NIfTI volumes (read-only).
`imagestack`	`imageio`, `tifffile`	TIFF files and folders of image slices.
`msr`	`msr-reader`	MSR / OBF microscopy files (read-only).
`cloudvolume`	`cloud-volume`	`CloudVolume` (precomputed) layers — writable, Linux only.
`webknossos`	`webknossos`	WebKnossos layers — read-only, remote or local.
`io-all`	all of the above	Every supported I/O backend in one go.
`test`	`pytest`, `zarr>=3`, `scikit-image`, `scipy`, `openpyxl`	Running the test suite.
`dev`	`flake8`, `pyflakes`	Linting.

Distributed (subprocess / slurm) execution always requires a file-backed output, so install at least the io extra for those workflows.

Usage

Operations run block-wise and share a common interface: pass block_shape and num_workers for parallel local execution, or job_type="slurm" to run distributed (one task per block). For distributed runs the output must be a file-backed (zarr/n5) array.

`copy` — block-wise copy of one source into another

Useful for converting between storage formats (e.g. a tiff stack to zarr) or for persisting an on-the-fly wrapper transformation to file.

import zarr
import bioimage_py as bp

# Convert a tiff stack (single multi-page file, or a folder of slices via bp.open_source(folder, "*.tif"))
# to a chunked zarr array.
src = bp.open_source("stack.tif")
out = zarr.open_array("out.zarr", mode="w", shape=src.shape, dtype=src.dtype, chunks=(64, 64, 64))
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8)

# Persist a wrapper (here a threshold) to file instead of recomputing it on every read.
from bioimage_py.wrapper import ThresholdSource
mask = zarr.open_array("mask.zarr", mode="w", shape=src.shape, dtype="bool", chunks=(64, 64, 64))
bp.copy(ThresholdSource(src, 128), mask, block_shape=(64, 64, 64), num_workers=8)

# Distributed: output must be file-backed (zarr/n5).
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8, job_type="slurm")

If output is omitted, a numpy array is allocated and returned (local execution only).

`downsample` — block-wise downsampling by an integer factor

Defaults are label-safe (order=0 nearest, no anti-aliasing). For intensity/image data pass order=1 (or higher) and anti_aliasing=True for a smooth, alias-free result.

import zarr
import bioimage_py as bp

# Image data: smooth, anti-aliased 2x downsample into a new zarr array.
raw = zarr.open_array("raw.zarr", mode="r")
target = tuple(s // 2 for s in raw.shape)
out = zarr.open_array("raw_s1.zarr", mode="w", shape=target, dtype=raw.dtype, chunks=(64, 64, 64))
bp.downsample(raw, 2, out, order=1, anti_aliasing=True, block_shape=(64, 64, 64), num_workers=8)

# Label data: keep the defaults so no label ids are invented. Returns a numpy array when no output given.
seg = zarr.open_array("seg.zarr", mode="r")
small = bp.downsample(seg, 2)

# Anisotropic factor (downsample y/x only): bp.downsample(raw, (1, 2, 2), out, ...)

The downscaled shape is computed with bioimage_py.util.downscale_shape (ceil mode); under the hood downsample wraps the input in a bioimage_py.wrapper.ResizedSource and copies it block-wise.

Re-running failed blocks

A distributed run that loses some blocks (a transient node failure, an out-of-memory kill, a slurm timeout) raises a RunnerError. Each worker persists progress per block, so the error reports the precise failed_block_ids (only the blocks that did not complete, not the whole task) and, for distributed backends, the preserved tmp_folder — the completed work is not thrown away.

import bioimage_py as bp
from bioimage_py.runner import RunnerError

try:
    bp.filters.gaussian_smoothing(raw, 2.0, output=out, block_shape=(64, 64, 64),
                                  num_workers=64, job_type="slurm")
except RunnerError as e:
    print(e.failed_block_ids)  # e.g. [128, 129, 511]
    print(e.tmp_folder)        # /shared/tmp/bioimage_py_xxxx  (preserved for resume/debug)

Recommended — resume_from (distributed only). Re-issue the same call pointing at the preserved temp folder: only the incomplete blocks are re-run, and the result is merged with the blocks that already finished. This is correct for array-output ops (the missing blocks are written) and return-value ops (stats.mean, morphology.morphology, …), which reduce over the full merged set:

bp.filters.gaussian_smoothing(raw, 2.0, output=out, block_shape=(64, 64, 64),
                              num_workers=64, job_type="slurm", resume_from=e.tmp_folder)

resume_from resumes from the original run's serialized payload, so pass it to finish the same call — the input/output/parameters on the resuming call are ignored in favour of the originals.

Simpler — block_ids (a fresh re-run of just those blocks). For array-output and other per-block-independent ops you can re-run the reported blocks directly; this works on every backend, including local:

bp.copy(src, out, block_shape=(64, 64, 64), num_workers=8, job_type="slurm",
        block_ids=e.failed_block_ids)

resume_from and block_ids are mutually exclusive. Two ops differ: segmentation.label has a global cross-block merge, so a failed label is re-run whole (it accepts neither argument); morphology.regionprops re-runs per object via item_ids / resume_from. A local run keeps no temp folder, so re-run it (optionally with block_ids=e.failed_block_ids); resume_from is rejected for job_type="local".

Slurm configuration

Slurm settings are cluster- and user-specific (partition, account, qos, node constraint, the shared tmp_root, ...). Pass them per call as a SlurmConfig:

from bioimage_py import SlurmConfig

cfg = SlurmConfig(tmp_root="/scratch/shared/me", partition="gpu", account="myproj", time="01:00:00")
bp.copy(src, out, block_shape=(64, 64, 64), num_workers=64, job_type="slurm", job_config=cfg)

To avoid repeating these every time, store them once as user defaults in ~/.config/bioimage-py/config.toml (honoring $XDG_CONFIG_HOME). Use the helper rather than editing the file by hand — it validates field names and preserves the rest of the file:

from bioimage_py import write_slurm_config

write_slurm_config(tmp_root="/scratch/shared/me", partition="gpu", account="myproj")

These defaults are picked up automatically whenever a slurm run gets no explicit job_config (e.g. bp.copy(..., job_type="slurm")). To combine the stored defaults with per-run tweaks, use SlurmConfig.load(**overrides) (overrides win); a directly constructed SlurmConfig(...) is used verbatim and does not read the file. Set BIOIMAGE_PY_NO_CONFIG=1 to ignore the file (reproducible CI), or BIOIMAGE_PY_CONFIG=/path/to/config.toml to point at a specific file (e.g. a shared cluster-wide config).

View Source

 1"""Efficient, parallel, and distributed implementation of image analysis and segmentation functionality for biomedical imaging.
 2
 3Reimplements functionality from [elf](https://github.com/constantinpape/elf) and [cluster_tools](https://github.com/constantinpape/cluster_tools) in a more efficient and scalable manner.
 4
 5**Note:** this package is in an early state and mainly provides support for data conversion, downsampling, and some initial segmentation functionality (connected components and watershed).
 6The functionality will be extended soon; the implementation of seamlessly switching between local and distributed execution (via slurm) is already in place. Any feedback on issues you find or on how to improve usability is welcome!
 7
 8.. include:: ../docs/installation.md
 9.. include:: ../docs/usage.md
10"""  # noqa
11from . import evaluation, filters, io, morphology, operations, segmentation, stats  # noqa: F401
12from .copy import copy
13from .downsample import downsample
14from .runner import SlurmConfig, config_file_path, get_runner, write_slurm_config
15from .sources import as_source, open_cloudvolume, open_source, open_webknossos
16from .util import to_roi
17from .__version__ import __version__
18
19__all__ = [
20    "__version__",
21    "stats",
22    "filters",
23    "segmentation",
24    "morphology",
25    "evaluation",
26    "operations",
27    "io",
28    "copy",
29    "downsample",
30    "get_runner",
31    "SlurmConfig",
32    "config_file_path",
33    "write_slurm_config",
34    "as_source",
35    "open_source",
36    "open_cloudvolume",
37    "open_webknossos",
38    "to_roi",
39]

__version__ = '0.2.1'

def copy( input: 'SourceLike', output: 'Optional[SourceLike]' = None, *, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, num_workers: int = 1, mask: 'Optional[SourceLike]' = None, block_ids: Optional[Sequence[int]] = None, resume_from: Optional[str] = None) -> 'SourceLike': View Source

 96def copy(
 97    input: SourceLike,
 98    output: Optional[SourceLike] = None,
 99    *,
100    block_shape: Optional[Tuple[int, ...]] = None,
101    job_type: str = "local",
102    job_config: Optional[RunnerConfig] = None,
103    num_workers: int = 1,
104    mask: Optional[SourceLike] = None,
105    block_ids: Optional[Sequence[int]] = None,
106    resume_from: Optional[str] = None,
107) -> SourceLike:
108    """Copy a source into an output, block-wise.
109
110    The data is read from ``input`` and written into ``output`` one block at a time. The input may
111    be any source, including a read-only ``FileSource`` (e.g. a tiff stack) or a ``wrapper`` source
112    whose transformation is computed on read; copying it materializes the transformed data to the
113    output. The data is written into the output as-is, so the output array's dtype governs and a
114    cast is applied on assignment when it differs from the input dtype.
115
116    Args:
117        input: The input data to copy (a numpy/zarr/n5 array or a `Source`).
118        output: The output array to write into. Optional for local execution — a numpy array
119            matching the input shape and dtype is allocated and returned if omitted; **required**
120            for distributed execution, where it must be a writable, file-backed (zarr/n5) array.
121        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
122            for unchunked data.
123        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
124        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
125        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
126            backends).
127        mask: Optional binary mask; only voxels within the mask are copied (out-of-mask output
128            voxels are left unchanged).
129        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks
130            into the existing ``output``). Mutually exclusive with ``resume_from``.
131        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
132            ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive
133            with ``block_ids``.
134
135    Returns:
136        The output array (the provided ``output``, or a newly allocated numpy array).
137    """
138    return _copy_source(input, output, block_shape=block_shape, job_type=job_type,
139                        job_config=job_config, num_workers=num_workers, mask=mask, name="copy",
140                        block_ids=block_ids, resume_from=resume_from)

Copy a source into an output, block-wise.

The data is read from input and written into output one block at a time. The input may be any source, including a read-only FileSource (e.g. a tiff stack) or a wrapper source whose transformation is computed on read; copying it materializes the transformed data to the output. The data is written into the output as-is, so the output array's dtype governs and a cast is applied on assignment when it differs from the input dtype.

Args: input: The input data to copy (a numpy/zarr/n5 array or a Source). output: The output array to write into. Optional for local execution — a numpy array matching the input shape and dtype is allocated and returned if omitted; required for distributed execution, where it must be a writable, file-backed (zarr/n5) array. block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required for unchunked data. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). num_workers: Number of parallel workers (threads for local, tasks for distributed backends). mask: Optional binary mask; only voxels within the mask are copied (out-of-mask output voxels are left unchanged). block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks into the existing output). Mutually exclusive with resume_from. resume_from: Distributed only; the preserved temp folder of a failed run to resume (see runner.run); the missing blocks are written into output. Mutually exclusive with block_ids.

Returns: The output array (the provided output, or a newly allocated numpy array).

def downsample( input: 'SourceLike', scale_factor: Union[int, Sequence[int]], output: 'Optional[SourceLike]' = None, *, order: int = 0, anti_aliasing: bool = False, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, num_workers: int = 1, mask: 'Optional[SourceLike]' = None, block_ids: Optional[Sequence[int]] = None, resume_from: Optional[str] = None) -> 'SourceLike': View Source

25def downsample(
26    input: SourceLike,
27    scale_factor: ScaleFactor,
28    output: Optional[SourceLike] = None,
29    *,
30    order: int = 0,
31    anti_aliasing: bool = False,
32    block_shape: Optional[Tuple[int, ...]] = None,
33    job_type: str = "local",
34    job_config: Optional[RunnerConfig] = None,
35    num_workers: int = 1,
36    mask: Optional[SourceLike] = None,
37    block_ids: Optional[Sequence[int]] = None,
38    resume_from: Optional[str] = None,
39) -> SourceLike:
40    """Downsample a source by an integer factor, block-wise.
41
42    The input is wrapped in a :class:`~bioimage_py.wrapper.ResizedSource` at the downscaled shape
43    (computed with :func:`~bioimage_py.util.downscale_shape`, ceil mode) and copied into the output.
44
45    The defaults (``order=0`` nearest, ``anti_aliasing=False``) are label-safe — they preserve the
46    input values and are appropriate for segmentations. For intensity / image data pass ``order=1``
47    (or higher) and ``anti_aliasing=True`` for a smooth, alias-free downsample.
48
49    Args:
50        input: The input data to downsample (a numpy/zarr/n5 array or a `Source`). 2D or 3D.
51        scale_factor: The downscaling factor: a single int (isotropic) or a per-axis sequence of
52            ints. Each factor must be ``>= 1`` (1 leaves that axis unchanged).
53        output: The output array to write into. Optional for local execution — a numpy array of the
54            downscaled shape and the input dtype is allocated and returned if omitted; **required**
55            for distributed execution, where it (and the input) must be file-backed (zarr/n5).
56        order: The interpolation order (0 to 5). Use ``0`` (nearest) for label data.
57        anti_aliasing: Whether to Gaussian pre-smooth before sampling to avoid aliasing.
58            Recommended for image data; leave ``False`` for labels.
59        block_shape: Shape of the processing blocks (in the downscaled output space). Defaults to
60            the resized source's chunk shape; required for unchunked data.
61        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
62        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
63        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
64            backends).
65        mask: Optional binary mask in the downscaled output space; only voxels within the mask are
66            written (out-of-mask output voxels are left unchanged).
67        block_ids: Restrict processing to these block ids (in the downscaled output space), e.g. to
68            re-run previously failed blocks into the existing ``output``. Mutually exclusive with
69            ``resume_from``.
70        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
71            ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive
72            with ``block_ids``.
73
74    Returns:
75        The output array (the provided ``output``, or a newly allocated numpy array).
76    """
77    src = as_source(input)
78    ndim = src.ndim
79    if isinstance(scale_factor, Integral):
80        factors: Tuple[int, ...] = (int(scale_factor),) * ndim
81    else:
82        factors = tuple(int(f) for f in scale_factor)
83        if len(factors) != ndim:
84            raise ValueError(
85                f"scale_factor {scale_factor} does not match the input dimensionality {ndim}."
86            )
87    if any(f < 1 for f in factors):
88        raise ValueError(
89            f"downsample requires scale factors >= 1, got {factors}; "
90            "use a ResizedSource directly to upsample."
91        )
92
93    target_shape = downscale_shape(src.shape, factors)
94    wrapped = ResizedSource(src, target_shape, order=order, anti_aliasing=anti_aliasing)
95    return _copy_source(wrapped, output, block_shape=block_shape, job_type=job_type,
96                        job_config=job_config, num_workers=num_workers, mask=mask, name="downsample",
97                        block_ids=block_ids, resume_from=resume_from)

Downsample a source by an integer factor, block-wise.

The input is wrapped in a ~bioimage_py.wrapper.ResizedSource at the downscaled shape (computed with ~bioimage_py.util.downscale_shape(), ceil mode) and copied into the output.

The defaults (order=0 nearest, anti_aliasing=False) are label-safe — they preserve the input values and are appropriate for segmentations. For intensity / image data pass order=1 (or higher) and anti_aliasing=True for a smooth, alias-free downsample.

Args: input: The input data to downsample (a numpy/zarr/n5 array or a Source). 2D or 3D. scale_factor: The downscaling factor: a single int (isotropic) or a per-axis sequence of ints. Each factor must be >= 1 (1 leaves that axis unchanged). output: The output array to write into. Optional for local execution — a numpy array of the downscaled shape and the input dtype is allocated and returned if omitted; required for distributed execution, where it (and the input) must be file-backed (zarr/n5). order: The interpolation order (0 to 5). Use 0 (nearest) for label data. anti_aliasing: Whether to Gaussian pre-smooth before sampling to avoid aliasing. Recommended for image data; leave False for labels. block_shape: Shape of the processing blocks (in the downscaled output space). Defaults to the resized source's chunk shape; required for unchunked data. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). num_workers: Number of parallel workers (threads for local, tasks for distributed backends). mask: Optional binary mask in the downscaled output space; only voxels within the mask are written (out-of-mask output voxels are left unchanged). block_ids: Restrict processing to these block ids (in the downscaled output space), e.g. to re-run previously failed blocks into the existing output. Mutually exclusive with resume_from. resume_from: Distributed only; the preserved temp folder of a failed run to resume (see runner.run); the missing blocks are written into output. Mutually exclusive with block_ids.

Returns: The output array (the provided output, or a newly allocated numpy array).

def get_runner( job_type: str, config: Optional[bioimage_py.runner.RunnerConfig] = None) -> bioimage_py.runner.Runner: View Source

18def get_runner(job_type: str, config: Optional[RunnerConfig] = None) -> Runner:
19    """Return a runner for the given job type.
20
21    Args:
22        job_type: One of ``"local"``, ``"subprocess"`` or ``"slurm"``.
23        config: Optional runner configuration.
24
25    Returns:
26        A :class:`~bioimage_py.runner.base.Runner` instance.
27
28    Raises:
29        ValueError: If ``job_type`` is unknown.
30    """
31    try:
32        cls = _RUNNERS[job_type.lower()]
33    except KeyError:
34        raise ValueError(f"Unknown job_type {job_type!r}; expected one of {sorted(_RUNNERS)}.")
35    return cls(config)

Return a runner for the given job type.

Args: job_type: One of "local", "subprocess" or "slurm". config: Optional runner configuration.

Returns: A ~bioimage_py.runner.base.Runner instance.

Raises: ValueError: If job_type is unknown.

@dataclass

class SlurmConfig(bioimage_py.runner.config.RunnerConfig): View Source

 29@dataclass
 30class SlurmConfig(RunnerConfig):
 31    """Configuration for the slurm runner.
 32
 33    Inherits ``poll_interval``, ``tmp_root`` and ``python_executable`` from
 34    :class:`RunnerConfig`. For slurm, ``tmp_root`` is **required** and must point at a
 35    shared filesystem visible to all compute nodes (not node-local ``/tmp``), and
 36    ``num_workers`` (passed to the op / ``run``) is interpreted as the array throttle — the
 37    maximum number of tasks allowed to run concurrently — independently of how many tasks
 38    the work is partitioned into.
 39
 40    Cluster-specific values (``partition``, ``account``, ``constraint``, ``tmp_root``, ...)
 41    can be stored once in a user config file and reused as defaults; see
 42    :meth:`load` and :func:`write_slurm_config`.
 43
 44    Attributes:
 45        partition: The slurm partition to submit to.
 46        time: The per-task time limit (slurm time format, e.g. ``"01:00:00"``).
 47        mem: The per-task memory limit (e.g. ``"8G"``).
 48        cpus_per_task: Number of CPUs requested per task.
 49        gpus: Number of GPUs requested per task (emitted as ``--gpus`` only when > 0).
 50        account: The accounting project to charge.
 51        qos: The quality-of-service to request.
 52        constraint: A node feature constraint.
 53        shebang: Optional environment setup for the generated job script. If given, its
 54            first line must be an interpreter line (starting with ``#!``) which is placed at
 55            the top of the script; any remaining lines are emitted as an activation preamble
 56            *after* the ``#SBATCH`` directives (so the directives are still honoured). The
 57            preamble is for making the package importable on the node (e.g. ``module load``
 58            / ``LD_LIBRARY_PATH`` exports), not for choosing the interpreter: the worker is
 59            always launched with the absolute ``python_executable`` (defaulting to the
 60            submitting ``sys.executable``). ``None`` uses ``#!/bin/bash`` and that absolute
 61            interpreter, which needs no activation when the env lives on a shared
 62            filesystem. Example::
 63
 64                shebang = "#!/bin/bash\\nmodule load gcc\\nexport LD_LIBRARY_PATH=...:$LD_LIBRARY_PATH"
 65
 66        max_array_size: Override for the maximum number of array tasks per job. ``None``
 67            queries the cluster's ``MaxArraySize`` (falling back to a safe default). A run
 68            partitioned into more tasks than this is rejected up front with a clear error.
 69        latency_wait: Seconds to wait for a finished task's ``.success`` sentinel to become
 70            visible on a shared (NFS) filesystem before giving up on it. A task that the
 71            scheduler reports ``COMPLETED`` wrote its sentinel, but the orchestrating node's
 72            attribute cache can lag the compute node by up to the mount's ``acdirmax``
 73            (typically 60 s); this must comfortably exceed that. It only bounds the wait on a
 74            ``COMPLETED``-but-not-yet-visible task — a task is resolved the moment its
 75            sentinel appears, so a generous value does not slow down successful runs.
 76    """
 77
 78    partition: Optional[str] = None
 79    time: Optional[str] = None
 80    mem: Optional[str] = None
 81    cpus_per_task: int = 1
 82    gpus: int = 0
 83    account: Optional[str] = None
 84    qos: Optional[str] = None
 85    constraint: Optional[str] = None
 86    shebang: Optional[str] = None
 87    max_array_size: Optional[int] = None
 88    latency_wait: float = 120.0
 89
 90    @classmethod
 91    def load(cls, path: Optional[str] = None, **overrides: Any) -> "SlurmConfig":
 92        """Build a config from the user config file, with explicit overrides taking precedence.
 93
 94        Precedence is ``overrides`` > config file ``[slurm]`` section > dataclass defaults.
 95        This is the way to combine the stored user defaults with per-run tweaks; constructing
 96        ``SlurmConfig(...)`` directly does **not** consult the file (an explicitly built config
 97        is used verbatim).
 98
 99        Args:
100            path: Path to the config file. ``None`` resolves the default location (see
101                :func:`config_file_path`). A missing file is treated as empty.
102            **overrides: Field values that override the file defaults. Each name must be a
103                valid ``SlurmConfig`` field.
104
105        Returns:
106            A :class:`SlurmConfig` with file defaults filled in and overrides applied.
107
108        Raises:
109            ValueError: If the file or ``overrides`` contain an unknown field name.
110        """
111        _validate_keys(overrides, "load() overrides")
112        merged: Dict[str, Any] = dict(_read_slurm_defaults(path))
113        merged.update(overrides)
114        return cls(**merged)

Configuration for the slurm runner.

Inherits poll_interval, tmp_root and python_executable from RunnerConfig. For slurm, tmp_root is required and must point at a shared filesystem visible to all compute nodes (not node-local /tmp), and num_workers (passed to the op / run) is interpreted as the array throttle — the maximum number of tasks allowed to run concurrently — independently of how many tasks the work is partitioned into.

Cluster-specific values (partition, account, constraint, tmp_root, ...) can be stored once in a user config file and reused as defaults; see load() and write_slurm_config().

Attributes: partition: The slurm partition to submit to. time: The per-task time limit (slurm time format, e.g. "01:00:00"). mem: The per-task memory limit (e.g. "8G"). cpus_per_task: Number of CPUs requested per task. gpus: Number of GPUs requested per task (emitted as --gpus only when > 0). account: The accounting project to charge. qos: The quality-of-service to request. constraint: A node feature constraint. shebang: Optional environment setup for the generated job script. If given, its first line must be an interpreter line (starting with #!) which is placed at the top of the script; any remaining lines are emitted as an activation preamble after the #SBATCH directives (so the directives are still honoured). The preamble is for making the package importable on the node (e.g. module load / LD_LIBRARY_PATH exports), not for choosing the interpreter: the worker is always launched with the absolute python_executable (defaulting to the submitting sys.executable). None uses #!/bin/bash and that absolute interpreter, which needs no activation when the env lives on a shared filesystem. Example::

        shebang = "#!/bin/bash\nmodule load gcc\nexport LD_LIBRARY_PATH=...:$LD_LIBRARY_PATH"

max_array_size: Override for the maximum number of array tasks per job. ``None``
    queries the cluster's ``MaxArraySize`` (falling back to a safe default). A run
    partitioned into more tasks than this is rejected up front with a clear error.
latency_wait: Seconds to wait for a finished task's ``.success`` sentinel to become
    visible on a shared (NFS) filesystem before giving up on it. A task that the
    scheduler reports ``COMPLETED`` wrote its sentinel, but the orchestrating node's
    attribute cache can lag the compute node by up to the mount's ``acdirmax``
    (typically 60 s); this must comfortably exceed that. It only bounds the wait on a
    ``COMPLETED``-but-not-yet-visible task — a task is resolved the moment its
    sentinel appears, so a generous value does not slow down successful runs.

SlurmConfig( poll_interval: float = 10.0, tmp_root: Optional[str] = None, python_executable: Optional[str] = None, partition: Optional[str] = None, time: Optional[str] = None, mem: Optional[str] = None, cpus_per_task: int = 1, gpus: int = 0, account: Optional[str] = None, qos: Optional[str] = None, constraint: Optional[str] = None, shebang: Optional[str] = None, max_array_size: Optional[int] = None, latency_wait: float = 120.0)

partition: Optional[str] = None

time: Optional[str] = None

mem: Optional[str] = None

cpus_per_task: int = 1

gpus: int = 0

account: Optional[str] = None

qos: Optional[str] = None

constraint: Optional[str] = None

shebang: Optional[str] = None

max_array_size: Optional[int] = None

latency_wait: float = 120.0

@classmethod

def load( cls, path: Optional[str] = None, **overrides: Any) -> SlurmConfig: View Source

 90    @classmethod
 91    def load(cls, path: Optional[str] = None, **overrides: Any) -> "SlurmConfig":
 92        """Build a config from the user config file, with explicit overrides taking precedence.
 93
 94        Precedence is ``overrides`` > config file ``[slurm]`` section > dataclass defaults.
 95        This is the way to combine the stored user defaults with per-run tweaks; constructing
 96        ``SlurmConfig(...)`` directly does **not** consult the file (an explicitly built config
 97        is used verbatim).
 98
 99        Args:
100            path: Path to the config file. ``None`` resolves the default location (see
101                :func:`config_file_path`). A missing file is treated as empty.
102            **overrides: Field values that override the file defaults. Each name must be a
103                valid ``SlurmConfig`` field.
104
105        Returns:
106            A :class:`SlurmConfig` with file defaults filled in and overrides applied.
107
108        Raises:
109            ValueError: If the file or ``overrides`` contain an unknown field name.
110        """
111        _validate_keys(overrides, "load() overrides")
112        merged: Dict[str, Any] = dict(_read_slurm_defaults(path))
113        merged.update(overrides)
114        return cls(**merged)

Build a config from the user config file, with explicit overrides taking precedence.

Precedence is overrides > config file [slurm] section > dataclass defaults. This is the way to combine the stored user defaults with per-run tweaks; constructing SlurmConfig(...) directly does not consult the file (an explicitly built config is used verbatim).

Args: path: Path to the config file. None resolves the default location (see config_file_path()). A missing file is treated as empty. **overrides: Field values that override the file defaults. Each name must be a valid SlurmConfig field.

Returns: A SlurmConfig with file defaults filled in and overrides applied.

Raises: ValueError: If the file or overrides contain an unknown field name.

def config_file_path(path: Optional[str] = None) -> pathlib._local.Path: View Source

117def config_file_path(path: Optional[str] = None) -> Path:
118    """Resolve the path to the user config file.
119
120    Resolution order: an explicit ``path`` argument, then the ``BIOIMAGE_PY_CONFIG``
121    environment variable, then ``$XDG_CONFIG_HOME/bioimage-py/config.toml`` (falling back to
122    ``~/.config/bioimage-py/config.toml``).
123
124    Args:
125        path: An explicit path that short-circuits the resolution. ``None`` resolves the
126            default location.
127
128    Returns:
129        The resolved path (not guaranteed to exist).
130    """
131    if path is not None:
132        return Path(path).expanduser()
133    env = os.environ.get("BIOIMAGE_PY_CONFIG")
134    if env:
135        return Path(env).expanduser()
136    base = os.environ.get("XDG_CONFIG_HOME") or os.path.join(os.path.expanduser("~"), ".config")
137    return Path(base) / "bioimage-py" / "config.toml"

Resolve the path to the user config file.

Resolution order: an explicit path argument, then the BIOIMAGE_PY_CONFIG environment variable, then $XDG_CONFIG_HOME/bioimage-py/config.toml (falling back to ~/.config/bioimage-py/config.toml).

Args: path: An explicit path that short-circuits the resolution. None resolves the default location.

Returns: The resolved path (not guaranteed to exist).

def write_slurm_config( path: Optional[str] = None, *, replace: bool = False, **fields: Any) -> str: View Source

179def write_slurm_config(path: Optional[str] = None, *, replace: bool = False, **fields: Any) -> str:
180    """Create or update the user config file with default slurm settings.
181
182    This is the supported way to set up cluster-specific defaults (partition, account,
183    constraint, ``tmp_root``, ...) instead of editing the file by hand. Provided fields are
184    merged into the existing ``[slurm]`` table by default (so the file can be built up over
185    several calls); ``None`` values are skipped, and any other top-level tables in the file
186    (reserved for future named profiles) are preserved.
187
188    Args:
189        path: Path to write to. ``None`` resolves the default location (see
190            :func:`config_file_path`); the parent directory is created if needed.
191        replace: If ``True``, replace the whole ``[slurm]`` table instead of merging into it.
192        **fields: Default field values to store. Each name must be a valid ``SlurmConfig``
193            field.
194
195    Returns:
196        The path that was written.
197
198    Raises:
199        ValueError: If ``fields`` contains an unknown field name.
200    """
201    _validate_keys(fields, "write_slurm_config()")
202    provided = {k: v for k, v in fields.items() if v is not None}
203    fp = config_file_path(path)
204    data = _parse_toml(fp)
205    section = {} if replace else dict(data.get("slurm", {}))
206    section.update(provided)
207    data["slurm"] = section
208
209    import tomli_w  # local import: only the writer needs the (optional-at-runtime) dependency.
210
211    fp.parent.mkdir(parents=True, exist_ok=True)
212    with open(fp, "wb") as f:
213        tomli_w.dump(data, f)
214    return str(fp)

Create or update the user config file with default slurm settings.

This is the supported way to set up cluster-specific defaults (partition, account, constraint, tmp_root, ...) instead of editing the file by hand. Provided fields are merged into the existing [slurm] table by default (so the file can be built up over several calls); None values are skipped, and any other top-level tables in the file (reserved for future named profiles) are preserved.

Args: path: Path to write to. None resolves the default location (see config_file_path()); the parent directory is created if needed. replace: If True, replace the whole [slurm] table instead of merging into it. **fields: Default field values to store. Each name must be a valid SlurmConfig field.

Returns: The path that was written.

Raises: ValueError: If fields contains an unknown field name.

def as_source(obj: "'SourceLike'") -> bioimage_py.sources.Source: View Source

30def as_source(obj: "SourceLike") -> Source:
31    """Convert a supported object into a :class:`Source`.
32
33    Idempotent on :class:`Source` inputs. numpy / zarr / z5py arrays are wrapped in an
34    :class:`ArraySource`. Bare paths are intentionally not supported (see the design doc).
35
36    Args:
37        obj: The object to convert.
38
39    Returns:
40        A :class:`Source`.
41
42    Raises:
43        TypeError: If the object cannot be converted (e.g. a string path).
44    """
45    if isinstance(obj, Source):
46        return obj
47    if isinstance(obj, (str, bytes)):
48        raise TypeError(
49            "Passing strings / file paths as a source is not supported. Open the array "
50            "yourself (e.g. with zarr or z5py) and pass the handle."
51        )
52    for predicate, converter in _CONVERTERS:
53        if predicate(obj):
54            return converter(obj)
55    # numpy and any array-like with shape/dtype fall back to ArraySource.
56    if isinstance(obj, np.ndarray) or (hasattr(obj, "shape") and hasattr(obj, "dtype")):
57        return ArraySource(obj)
58    raise TypeError(f"Cannot convert object of type {type(obj)!r} to a Source.")

Convert a supported object into a Source.

Idempotent on Source inputs. numpy / zarr / z5py arrays are wrapped in an ArraySource. Bare paths are intentionally not supported (see the design doc).

Args: obj: The object to convert.

Returns: A Source.

Raises: TypeError: If the object cannot be converted (e.g. a string path).

def open_source( path: Union[os.PathLike, str], internal_path: Optional[str] = None, format: Optional[str] = None, mode: str = 'r', **kwargs: Any) -> bioimage_py.sources.FileSource: View Source

103def open_source(
104    path: PathLike,
105    internal_path: Optional[str] = None,
106    format: Optional[str] = None,
107    mode: str = "r",
108    **kwargs: Any,
109) -> FileSource:
110    """Open a file-backed array as a :class:`Source`.
111
112    The format is inferred from the path extension (overridable via ``format``). ``internal_path``
113    selects the array inside a container; when omitted it defaults to the format's natural key
114    (e.g. ``"data"`` for mrc/nifti, ``"mag1"`` for knossos, ``""`` for a single image stack), and is
115    required for multi-array containers (hdf5/zarr/n5).
116
117    Args:
118        path: Path to the file or folder to open.
119        internal_path: Key of the array inside the container; format-dependent default if omitted.
120        format: Force a registered format name, overriding extension inference.
121        mode: Open mode. ``"r"`` (default) is read-only; write modes (``"a"``/``"r+"``/``"w"``)
122            are only honored for writable formats (zarr/n5/hdf5).
123        kwargs: Extra keyword arguments forwarded to the backend constructor.
124
125    Returns:
126        A :class:`FileSource` with a reopenable ``kind="file"`` spec.
127    """
128    fmt = format if format is not None else infer_format(path)
129    # Validate the format is installed up front (raises a clear error otherwise).
130    constructor_for_format(fmt)
131
132    handle = open_file(path, mode=mode, format=fmt, **kwargs)
133    key = internal_path if internal_path is not None else getattr(handle, "default_key", None)
134    dataset, recorded_key = _resolve_dataset(handle, key)
135
136    writable = is_writable_format(fmt) and mode != "r"
137    return FileSource(
138        dataset,
139        path=path,
140        internal_path=recorded_key,
141        format=fmt,
142        mode=mode,
143        open_kwargs=kwargs,
144        writable=writable,
145    )

Open a file-backed array as a Source.

The format is inferred from the path extension (overridable via format). internal_path selects the array inside a container; when omitted it defaults to the format's natural key (e.g. "data" for mrc/nifti, "mag1" for knossos, "" for a single image stack), and is required for multi-array containers (hdf5/zarr/n5).

Args: path: Path to the file or folder to open. internal_path: Key of the array inside the container; format-dependent default if omitted. format: Force a registered format name, overriding extension inference. mode: Open mode. "r" (default) is read-only; write modes ("a"/"r+"/"w") are only honored for writable formats (zarr/n5/hdf5). kwargs: Extra keyword arguments forwarded to the backend constructor.

Returns: A FileSource with a reopenable kind="file" spec.

def open_cloudvolume( cloudpath: str, mip: int = 0, fill_missing: bool = False, bounded: bool = True, cache: bool = False, non_aligned_writes: bool = True, offset: Optional[Tuple[int, int, int]] = None, size: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> bioimage_py.sources.CloudVolumeSource: View Source

130def open_cloudvolume(
131    cloudpath: str,
132    mip: int = 0,
133    fill_missing: bool = False,
134    bounded: bool = True,
135    cache: bool = False,
136    non_aligned_writes: bool = True,
137    offset: Optional[Tuple[int, int, int]] = None,
138    size: Optional[Tuple[int, int, int]] = None,
139    **kwargs: Any,
140) -> CloudVolumeSource:
141    """Open a CloudVolume (precomputed) layer as a writable ZYX :class:`Source`.
142
143    Args:
144        cloudpath: The CloudVolume cloudpath (e.g. ``"precomputed://..."`` or ``"file://..."``).
145        mip: The resolution (mip) level to open.
146        fill_missing: Whether to zero-fill missing chunks instead of raising. For a *writable*
147            output whose volume size is not a multiple of the chunk size, set this to ``True`` so
148            the partial boundary chunks can be read-modify-written into a fresh layer.
149        bounded: Whether reads/writes are restricted to the volume bounds.
150        cache: Whether to enable CloudVolume's local cache.
151        non_aligned_writes: Whether to allow writes that are not chunk-aligned (needed for the
152            partial blocks at the volume boundary in block-wise writes).
153        offset: Optional absolute XYZ origin of the view; defaults to the layer's ``voxel_offset``.
154        size: Optional XYZ size of the view; defaults to the layer's ``volume_size``.
155        kwargs: Extra keyword arguments forwarded to ``CloudVolume``.
156
157    Returns:
158        A :class:`CloudVolumeSource`.
159    """
160    from cloudvolume import CloudVolume
161
162    open_params: Dict[str, Any] = dict(
163        mip=mip,
164        fill_missing=fill_missing,
165        bounded=bounded,
166        cache=cache,
167        non_aligned_writes=non_aligned_writes,
168    )
169    open_params.update(kwargs)
170    volume = CloudVolume(cloudpath, progress=False, **open_params)
171    return CloudVolumeSource(volume, offset=offset, size=size, open_params=open_params)

Open a CloudVolume (precomputed) layer as a writable ZYX Source.

Args: cloudpath: The CloudVolume cloudpath (e.g. "precomputed://..." or "file://..."). mip: The resolution (mip) level to open. fill_missing: Whether to zero-fill missing chunks instead of raising. For a writable output whose volume size is not a multiple of the chunk size, set this to True so the partial boundary chunks can be read-modify-written into a fresh layer. bounded: Whether reads/writes are restricted to the volume bounds. cache: Whether to enable CloudVolume's local cache. non_aligned_writes: Whether to allow writes that are not chunk-aligned (needed for the partial blocks at the volume boundary in block-wise writes). offset: Optional absolute XYZ origin of the view; defaults to the layer's voxel_offset. size: Optional XYZ size of the view; defaults to the layer's volume_size. kwargs: Extra keyword arguments forwarded to CloudVolume.

Returns: A CloudVolumeSource.

def open_webknossos( dataset_name_or_url: str, organization_id: Optional[str] = None, layer_name: str = '', mag: int = 1, offset: Optional[Tuple[int, int, int]] = None, size: Optional[Tuple[int, int, int]] = None) -> bioimage_py.sources.WebKnossosSource: View Source

187def open_webknossos(
188    dataset_name_or_url: str,
189    organization_id: Optional[str] = None,
190    layer_name: str = "",
191    mag: int = 1,
192    offset: Optional[Tuple[int, int, int]] = None,
193    size: Optional[Tuple[int, int, int]] = None,
194) -> WebKnossosSource:
195    """Open a (remote) WebKnossos layer as a read-only ZYX :class:`Source`.
196
197    Args:
198        dataset_name_or_url: The WebKnossos dataset name or URL (or an annotation URL).
199        organization_id: The organization id (required when opening by dataset name).
200        layer_name: The name of the layer to open.
201        mag: The magnification (resolution) level.
202        offset: Optional absolute XYZ origin of the view; defaults to the layer bbox ``topleft``.
203        size: Optional XYZ size of the view; defaults to the layer bbox ``size``.
204
205    Returns:
206        A :class:`WebKnossosSource`.
207    """
208    return WebKnossosSource(
209        dataset_name_or_url=dataset_name_or_url,
210        organization_id=organization_id,
211        layer_name=layer_name,
212        mag=mag,
213        offset=offset,
214        size=size,
215    )

Open a (remote) WebKnossos layer as a read-only ZYX Source.

Args: dataset_name_or_url: The WebKnossos dataset name or URL (or an annotation URL). organization_id: The organization id (required when opening by dataset name). layer_name: The name of the layer to open. mag: The magnification (resolution) level. offset: Optional absolute XYZ origin of the view; defaults to the layer bbox topleft. size: Optional XYZ size of the view; defaults to the layer bbox size.

Returns: A WebKnossosSource.

def to_roi( block: Union[bioimage_cpp._core.Block, bioimage_cpp._core.BlockWithHalo]) -> Tuple[slice, ...]: View Source

63def to_roi(block: BlockDescriptor) -> Tuple[slice, ...]:
64    """Convert a ``bioimage_cpp.utils`` ``Block`` into a tuple of slices.
65
66    Args:
67        block: A ``Block`` (carrying ``begin``/``end`` coordinate lists). For halo
68            operations pass one of ``block.outer_block`` / ``block.inner_block`` /
69            ``block.inner_block_local``.
70
71    Returns:
72        A tuple of slices that indexes a source or array.
73    """
74    return tuple(slice(int(b), int(e)) for b, e in zip(block.begin, block.end))

Convert a bioimage_cpp.utils Block into a tuple of slices.

Args: block: A Block (carrying begin/end coordinate lists). For halo operations pass one of block.outer_block / block.inner_block / block.inner_block_local.

Returns: A tuple of slices that indexes a source or array.