bioimage_py.segmentation.watershed

Block-wise seeded watershed (single-stage, halo-based).

Ports elf.parallel.seeded_watershed: the user supplies a height map and pre-computed integer seed markers, and each block runs bioimage_cpp.segmentation.watershed over a halo-padded region, writing back the (halo-free) inner block. Seed ids are an input -- they are preserved verbatim, with no cross-block merge or global offsetting -- so, unlike the multi-stage label, this is a single-stage operation that supports block_ids / resume_from re-runs.

Because each block grows its catchment basins only within its halo-padded region, the block-wise result equals a whole-array watershed only where the halo covers the relevant basins; pick a halo large enough for the object extents at block boundaries. What is always exact is backend determinism: for a fixed (block_shape, halo) every backend produces bit-identical output.

  1"""Block-wise seeded watershed (single-stage, halo-based).
  2
  3Ports ``elf.parallel.seeded_watershed``: the user supplies a height map and pre-computed integer
  4seed markers, and each block runs ``bioimage_cpp.segmentation.watershed`` over a halo-padded region,
  5writing back the (halo-free) inner block. Seed ids are an *input* -- they are preserved verbatim,
  6with no cross-block merge or global offsetting -- so, unlike the multi-stage ``label``, this is a
  7single-stage operation that supports ``block_ids`` / ``resume_from`` re-runs.
  8
  9Because each block grows its catchment basins only within its halo-padded region, the block-wise
 10result equals a whole-array watershed only where the halo covers the relevant basins; pick a halo
 11large enough for the object extents at block boundaries. What is always exact is backend determinism:
 12for a fixed ``(block_shape, halo)`` every backend produces bit-identical output.
 13"""
 14from __future__ import annotations
 15
 16from typing import Optional, Sequence, Tuple
 17
 18import bioimage_cpp as bic
 19import numpy as np
 20
 21from ..runner import get_runner
 22from ..runner.config import RunnerConfig
 23from ..sources import Source, SourceLike, as_source
 24from ..util import BlockDescriptor, check_rerun_args, full_roi, is_direct, to_roi
 25
 26__all__ = ["watershed"]
 27
 28# Dtypes that bic.segmentation.watershed accepts directly (no copy needed).
 29_HMAP_DTYPES = (np.float32, np.float64)
 30_SEED_DTYPES = (np.uint32, np.uint64, np.int32, np.int64)
 31
 32
 33def _as_hmap(data: np.ndarray) -> np.ndarray:
 34    """Cast a height map to float32 unless it is already a float type."""
 35    return data if data.dtype in _HMAP_DTYPES else data.astype("float32")
 36
 37
 38def _as_seeds(data: np.ndarray) -> np.ndarray:
 39    """Cast seeds to uint32 unless they are already an integer type watershed accepts."""
 40    return data if data.dtype in _SEED_DTYPES else data.astype("uint32")
 41
 42
 43def _watershed_block(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source],
 44                     mask: Optional[Source]) -> None:
 45    """Per-block seeded watershed: read the halo'd region, run watershed, write the inner block."""
 46    hmap_src, seeds_src = inputs[0], inputs[1]
 47    out_src = outputs[0]
 48    outer = to_roi(block.outer_block)
 49    inner = to_roi(block.inner_block)
 50    inner_local = to_roi(block.inner_block_local)
 51
 52    if mask is not None:
 53        block_mask = mask[outer].astype(bool)
 54        if not block_mask[inner_local].any():
 55            return None
 56    else:
 57        block_mask = None
 58
 59    block_hmap = _as_hmap(hmap_src[outer])
 60    block_seeds = _as_seeds(seeds_src[outer])
 61    ws = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask)
 62    out_src[inner] = ws[inner_local]
 63    return None
 64
 65
 66def watershed(
 67    input: SourceLike,
 68    seeds: SourceLike,
 69    output: Optional[SourceLike] = None,
 70    *,
 71    halo: Optional[Sequence[int]] = None,
 72    block_shape: Optional[Tuple[int, ...]] = None,
 73    job_type: str = "local",
 74    job_config: Optional[RunnerConfig] = None,
 75    num_workers: int = 1,
 76    mask: Optional[SourceLike] = None,
 77    block_ids: Optional[Sequence[int]] = None,
 78    resume_from: Optional[str] = None,
 79) -> SourceLike:
 80    """Compute a seeded watershed over a height map, block-wise.
 81
 82    Each block runs a seeded watershed (``bioimage_cpp.segmentation.watershed``) on a halo-padded
 83    region and writes back the halo-free inner block. The ``seeds`` define the segments and their
 84    ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds
 85    (e.g. a connected-component labeling of a seed mask) for a coherent result.
 86
 87    Being single-stage, this operation supports ``block_ids`` / ``resume_from`` re-runs. The
 88    block-wise output matches a whole-array watershed only where ``halo`` covers the relevant
 89    catchment basins, but is bit-identical across backends for a fixed ``(block_shape, halo)``.
 90
 91    Args:
 92        input: The height map (a numpy/zarr/n5 array or a `Source`). It is cast to ``float32`` for
 93            the watershed if it is not already a float type.
 94        seeds: The pre-computed integer seed markers (``0`` = background), same shape as ``input``.
 95            A non-integer seed map is cast to ``uint32``.
 96        output: The integer output array to write the segmentation into. Optional for local
 97            execution -- a ``uint64`` numpy array is allocated and returned if omitted; **required**
 98            for distributed execution. It must be wide enough to hold the seed ids (``uint64``
 99            recommended; a ``uint32`` watershed result writes losslessly into a ``uint64`` output).
100        halo: Per-axis halo enlarging each block; **required** for the block-wise path (there is no
101            principled default for a watershed). Choose it large enough to cover object extents at
102            block boundaries. Ignored by the direct (single-block) path.
103        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
104            for unchunked data.
105        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
106        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
107        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
108            backends).
109        mask: Optional binary mask; voxels outside the mask are excluded and stay ``0``.
110        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks
111            into the existing ``output``). Mutually exclusive with ``resume_from``.
112        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
113            ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive
114            with ``block_ids``.
115
116    Returns:
117        The output array (the provided ``output``, or a newly allocated ``uint64`` numpy array).
118    """
119    check_rerun_args(job_type, resume_from, block_ids)
120
121    src = as_source(input)
122    seeds_src = as_source(seeds)
123    ndim = src.ndim
124    if tuple(seeds_src.shape) != tuple(src.shape):
125        raise ValueError(
126            f"seeds shape {seeds_src.shape} does not match input shape {src.shape}."
127        )
128
129    # A subset/resume rerun is inherently block-wise, so it cannot use the direct (whole-array) path.
130    direct = (is_direct(job_type, num_workers, block_shape)
131              and block_ids is None and resume_from is None)
132
133    if output is None:
134        if job_type != "local":
135            raise ValueError(
136                f"'output' is required for distributed execution (job_type={job_type!r}); "
137                "pass a file-backed (zarr/n5) output array."
138            )
139        out_array: SourceLike = np.zeros(tuple(src.shape), dtype="uint64")
140    else:
141        out_array = output
142
143    out = as_source(out_array)
144    if not np.issubdtype(out.dtype, np.integer):
145        raise ValueError(f"output must have an integer dtype, got {out.dtype}.")
146
147    if direct:
148        block_hmap = _as_hmap(src[full_roi(ndim)])
149        block_seeds = _as_seeds(seeds_src[full_roi(ndim)])
150        block_mask = as_source(mask)[full_roi(ndim)].astype(bool) if mask is not None else None
151        out[full_roi(ndim)] = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask)
152        return out_array
153
154    if halo is None:
155        raise ValueError(
156            "halo is required for block-wise watershed; choose one large enough to cover object "
157            "extents at block boundaries."
158        )
159
160    runner = get_runner(job_type, job_config)
161    runner.run(_watershed_block, [input, seeds], outputs=[out_array], halo=halo,
162               block_shape=block_shape, mask=mask, num_workers=num_workers,
163               block_ids=block_ids, resume_from=resume_from, name="watershed")
164    return out_array
def watershed( input: 'SourceLike', seeds: 'SourceLike', output: 'Optional[SourceLike]' = None, *, halo: Optional[Sequence[int]] = None, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, num_workers: int = 1, mask: 'Optional[SourceLike]' = None, block_ids: Optional[Sequence[int]] = None, resume_from: Optional[str] = None) -> 'SourceLike':
 67def watershed(
 68    input: SourceLike,
 69    seeds: SourceLike,
 70    output: Optional[SourceLike] = None,
 71    *,
 72    halo: Optional[Sequence[int]] = None,
 73    block_shape: Optional[Tuple[int, ...]] = None,
 74    job_type: str = "local",
 75    job_config: Optional[RunnerConfig] = None,
 76    num_workers: int = 1,
 77    mask: Optional[SourceLike] = None,
 78    block_ids: Optional[Sequence[int]] = None,
 79    resume_from: Optional[str] = None,
 80) -> SourceLike:
 81    """Compute a seeded watershed over a height map, block-wise.
 82
 83    Each block runs a seeded watershed (``bioimage_cpp.segmentation.watershed``) on a halo-padded
 84    region and writes back the halo-free inner block. The ``seeds`` define the segments and their
 85    ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds
 86    (e.g. a connected-component labeling of a seed mask) for a coherent result.
 87
 88    Being single-stage, this operation supports ``block_ids`` / ``resume_from`` re-runs. The
 89    block-wise output matches a whole-array watershed only where ``halo`` covers the relevant
 90    catchment basins, but is bit-identical across backends for a fixed ``(block_shape, halo)``.
 91
 92    Args:
 93        input: The height map (a numpy/zarr/n5 array or a `Source`). It is cast to ``float32`` for
 94            the watershed if it is not already a float type.
 95        seeds: The pre-computed integer seed markers (``0`` = background), same shape as ``input``.
 96            A non-integer seed map is cast to ``uint32``.
 97        output: The integer output array to write the segmentation into. Optional for local
 98            execution -- a ``uint64`` numpy array is allocated and returned if omitted; **required**
 99            for distributed execution. It must be wide enough to hold the seed ids (``uint64``
100            recommended; a ``uint32`` watershed result writes losslessly into a ``uint64`` output).
101        halo: Per-axis halo enlarging each block; **required** for the block-wise path (there is no
102            principled default for a watershed). Choose it large enough to cover object extents at
103            block boundaries. Ignored by the direct (single-block) path.
104        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
105            for unchunked data.
106        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
107        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
108        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
109            backends).
110        mask: Optional binary mask; voxels outside the mask are excluded and stay ``0``.
111        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks
112            into the existing ``output``). Mutually exclusive with ``resume_from``.
113        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
114            ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive
115            with ``block_ids``.
116
117    Returns:
118        The output array (the provided ``output``, or a newly allocated ``uint64`` numpy array).
119    """
120    check_rerun_args(job_type, resume_from, block_ids)
121
122    src = as_source(input)
123    seeds_src = as_source(seeds)
124    ndim = src.ndim
125    if tuple(seeds_src.shape) != tuple(src.shape):
126        raise ValueError(
127            f"seeds shape {seeds_src.shape} does not match input shape {src.shape}."
128        )
129
130    # A subset/resume rerun is inherently block-wise, so it cannot use the direct (whole-array) path.
131    direct = (is_direct(job_type, num_workers, block_shape)
132              and block_ids is None and resume_from is None)
133
134    if output is None:
135        if job_type != "local":
136            raise ValueError(
137                f"'output' is required for distributed execution (job_type={job_type!r}); "
138                "pass a file-backed (zarr/n5) output array."
139            )
140        out_array: SourceLike = np.zeros(tuple(src.shape), dtype="uint64")
141    else:
142        out_array = output
143
144    out = as_source(out_array)
145    if not np.issubdtype(out.dtype, np.integer):
146        raise ValueError(f"output must have an integer dtype, got {out.dtype}.")
147
148    if direct:
149        block_hmap = _as_hmap(src[full_roi(ndim)])
150        block_seeds = _as_seeds(seeds_src[full_roi(ndim)])
151        block_mask = as_source(mask)[full_roi(ndim)].astype(bool) if mask is not None else None
152        out[full_roi(ndim)] = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask)
153        return out_array
154
155    if halo is None:
156        raise ValueError(
157            "halo is required for block-wise watershed; choose one large enough to cover object "
158            "extents at block boundaries."
159        )
160
161    runner = get_runner(job_type, job_config)
162    runner.run(_watershed_block, [input, seeds], outputs=[out_array], halo=halo,
163               block_shape=block_shape, mask=mask, num_workers=num_workers,
164               block_ids=block_ids, resume_from=resume_from, name="watershed")
165    return out_array

Compute a seeded watershed over a height map, block-wise.

Each block runs a seeded watershed (bioimage_cpp.segmentation.watershed) on a halo-padded region and writes back the halo-free inner block. The seeds define the segments and their ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds (e.g. a connected-component labeling of a seed mask) for a coherent result.

Being single-stage, this operation supports block_ids / resume_from re-runs. The block-wise output matches a whole-array watershed only where halo covers the relevant catchment basins, but is bit-identical across backends for a fixed (block_shape, halo).

Args: input: The height map (a numpy/zarr/n5 array or a Source). It is cast to float32 for the watershed if it is not already a float type. seeds: The pre-computed integer seed markers (0 = background), same shape as input. A non-integer seed map is cast to uint32. output: The integer output array to write the segmentation into. Optional for local execution -- a uint64 numpy array is allocated and returned if omitted; required for distributed execution. It must be wide enough to hold the seed ids (uint64 recommended; a uint32 watershed result writes losslessly into a uint64 output). halo: Per-axis halo enlarging each block; required for the block-wise path (there is no principled default for a watershed). Choose it large enough to cover object extents at block boundaries. Ignored by the direct (single-block) path. block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required for unchunked data. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). num_workers: Number of parallel workers (threads for local, tasks for distributed backends). mask: Optional binary mask; voxels outside the mask are excluded and stay 0. block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks into the existing output). Mutually exclusive with resume_from. resume_from: Distributed only; the preserved temp folder of a failed run to resume (see runner.run); the missing blocks are written into output. Mutually exclusive with block_ids.

Returns: The output array (the provided output, or a newly allocated uint64 numpy array).