bioimage_py.segmentation.watershed
Block-wise seeded watershed (single-stage, halo-based).
Ports elf.parallel.seeded_watershed: the user supplies a height map and pre-computed integer
seed markers, and each block runs bioimage_cpp.segmentation.watershed over a halo-padded region,
writing back the (halo-free) inner block. Seed ids are an input -- they are preserved verbatim,
with no cross-block merge or global offsetting -- so, unlike the multi-stage label, this is a
single-stage operation that supports block_ids / resume_from re-runs.
Because each block grows its catchment basins only within its halo-padded region, the block-wise
result equals a whole-array watershed only where the halo covers the relevant basins; pick a halo
large enough for the object extents at block boundaries. What is always exact is backend determinism:
for a fixed (block_shape, halo) every backend produces bit-identical output.
1"""Block-wise seeded watershed (single-stage, halo-based). 2 3Ports ``elf.parallel.seeded_watershed``: the user supplies a height map and pre-computed integer 4seed markers, and each block runs ``bioimage_cpp.segmentation.watershed`` over a halo-padded region, 5writing back the (halo-free) inner block. Seed ids are an *input* -- they are preserved verbatim, 6with no cross-block merge or global offsetting -- so, unlike the multi-stage ``label``, this is a 7single-stage operation that supports ``block_ids`` / ``resume_from`` re-runs. 8 9Because each block grows its catchment basins only within its halo-padded region, the block-wise 10result equals a whole-array watershed only where the halo covers the relevant basins; pick a halo 11large enough for the object extents at block boundaries. What is always exact is backend determinism: 12for a fixed ``(block_shape, halo)`` every backend produces bit-identical output. 13""" 14from __future__ import annotations 15 16from typing import Optional, Sequence, Tuple 17 18import bioimage_cpp as bic 19import numpy as np 20 21from ..runner import get_runner 22from ..runner.config import RunnerConfig 23from ..sources import Source, SourceLike, as_source 24from ..util import BlockDescriptor, check_rerun_args, full_roi, is_direct, to_roi 25 26__all__ = ["watershed"] 27 28# Dtypes that bic.segmentation.watershed accepts directly (no copy needed). 29_HMAP_DTYPES = (np.float32, np.float64) 30_SEED_DTYPES = (np.uint32, np.uint64, np.int32, np.int64) 31 32 33def _as_hmap(data: np.ndarray) -> np.ndarray: 34 """Cast a height map to float32 unless it is already a float type.""" 35 return data if data.dtype in _HMAP_DTYPES else data.astype("float32") 36 37 38def _as_seeds(data: np.ndarray) -> np.ndarray: 39 """Cast seeds to uint32 unless they are already an integer type watershed accepts.""" 40 return data if data.dtype in _SEED_DTYPES else data.astype("uint32") 41 42 43def _watershed_block(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source], 44 mask: Optional[Source]) -> None: 45 """Per-block seeded watershed: read the halo'd region, run watershed, write the inner block.""" 46 hmap_src, seeds_src = inputs[0], inputs[1] 47 out_src = outputs[0] 48 outer = to_roi(block.outer_block) 49 inner = to_roi(block.inner_block) 50 inner_local = to_roi(block.inner_block_local) 51 52 if mask is not None: 53 block_mask = mask[outer].astype(bool) 54 if not block_mask[inner_local].any(): 55 return None 56 else: 57 block_mask = None 58 59 block_hmap = _as_hmap(hmap_src[outer]) 60 block_seeds = _as_seeds(seeds_src[outer]) 61 ws = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask) 62 out_src[inner] = ws[inner_local] 63 return None 64 65 66def watershed( 67 input: SourceLike, 68 seeds: SourceLike, 69 output: Optional[SourceLike] = None, 70 *, 71 halo: Optional[Sequence[int]] = None, 72 block_shape: Optional[Tuple[int, ...]] = None, 73 job_type: str = "local", 74 job_config: Optional[RunnerConfig] = None, 75 num_workers: int = 1, 76 mask: Optional[SourceLike] = None, 77 block_ids: Optional[Sequence[int]] = None, 78 resume_from: Optional[str] = None, 79) -> SourceLike: 80 """Compute a seeded watershed over a height map, block-wise. 81 82 Each block runs a seeded watershed (``bioimage_cpp.segmentation.watershed``) on a halo-padded 83 region and writes back the halo-free inner block. The ``seeds`` define the segments and their 84 ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds 85 (e.g. a connected-component labeling of a seed mask) for a coherent result. 86 87 Being single-stage, this operation supports ``block_ids`` / ``resume_from`` re-runs. The 88 block-wise output matches a whole-array watershed only where ``halo`` covers the relevant 89 catchment basins, but is bit-identical across backends for a fixed ``(block_shape, halo)``. 90 91 Args: 92 input: The height map (a numpy/zarr/n5 array or a `Source`). It is cast to ``float32`` for 93 the watershed if it is not already a float type. 94 seeds: The pre-computed integer seed markers (``0`` = background), same shape as ``input``. 95 A non-integer seed map is cast to ``uint32``. 96 output: The integer output array to write the segmentation into. Optional for local 97 execution -- a ``uint64`` numpy array is allocated and returned if omitted; **required** 98 for distributed execution. It must be wide enough to hold the seed ids (``uint64`` 99 recommended; a ``uint32`` watershed result writes losslessly into a ``uint64`` output). 100 halo: Per-axis halo enlarging each block; **required** for the block-wise path (there is no 101 principled default for a watershed). Choose it large enough to cover object extents at 102 block boundaries. Ignored by the direct (single-block) path. 103 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 104 for unchunked data. 105 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 106 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 107 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 108 backends). 109 mask: Optional binary mask; voxels outside the mask are excluded and stay ``0``. 110 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks 111 into the existing ``output``). Mutually exclusive with ``resume_from``. 112 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 113 ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive 114 with ``block_ids``. 115 116 Returns: 117 The output array (the provided ``output``, or a newly allocated ``uint64`` numpy array). 118 """ 119 check_rerun_args(job_type, resume_from, block_ids) 120 121 src = as_source(input) 122 seeds_src = as_source(seeds) 123 ndim = src.ndim 124 if tuple(seeds_src.shape) != tuple(src.shape): 125 raise ValueError( 126 f"seeds shape {seeds_src.shape} does not match input shape {src.shape}." 127 ) 128 129 # A subset/resume rerun is inherently block-wise, so it cannot use the direct (whole-array) path. 130 direct = (is_direct(job_type, num_workers, block_shape) 131 and block_ids is None and resume_from is None) 132 133 if output is None: 134 if job_type != "local": 135 raise ValueError( 136 f"'output' is required for distributed execution (job_type={job_type!r}); " 137 "pass a file-backed (zarr/n5) output array." 138 ) 139 out_array: SourceLike = np.zeros(tuple(src.shape), dtype="uint64") 140 else: 141 out_array = output 142 143 out = as_source(out_array) 144 if not np.issubdtype(out.dtype, np.integer): 145 raise ValueError(f"output must have an integer dtype, got {out.dtype}.") 146 147 if direct: 148 block_hmap = _as_hmap(src[full_roi(ndim)]) 149 block_seeds = _as_seeds(seeds_src[full_roi(ndim)]) 150 block_mask = as_source(mask)[full_roi(ndim)].astype(bool) if mask is not None else None 151 out[full_roi(ndim)] = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask) 152 return out_array 153 154 if halo is None: 155 raise ValueError( 156 "halo is required for block-wise watershed; choose one large enough to cover object " 157 "extents at block boundaries." 158 ) 159 160 runner = get_runner(job_type, job_config) 161 runner.run(_watershed_block, [input, seeds], outputs=[out_array], halo=halo, 162 block_shape=block_shape, mask=mask, num_workers=num_workers, 163 block_ids=block_ids, resume_from=resume_from, name="watershed") 164 return out_array
67def watershed( 68 input: SourceLike, 69 seeds: SourceLike, 70 output: Optional[SourceLike] = None, 71 *, 72 halo: Optional[Sequence[int]] = None, 73 block_shape: Optional[Tuple[int, ...]] = None, 74 job_type: str = "local", 75 job_config: Optional[RunnerConfig] = None, 76 num_workers: int = 1, 77 mask: Optional[SourceLike] = None, 78 block_ids: Optional[Sequence[int]] = None, 79 resume_from: Optional[str] = None, 80) -> SourceLike: 81 """Compute a seeded watershed over a height map, block-wise. 82 83 Each block runs a seeded watershed (``bioimage_cpp.segmentation.watershed``) on a halo-padded 84 region and writes back the halo-free inner block. The ``seeds`` define the segments and their 85 ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds 86 (e.g. a connected-component labeling of a seed mask) for a coherent result. 87 88 Being single-stage, this operation supports ``block_ids`` / ``resume_from`` re-runs. The 89 block-wise output matches a whole-array watershed only where ``halo`` covers the relevant 90 catchment basins, but is bit-identical across backends for a fixed ``(block_shape, halo)``. 91 92 Args: 93 input: The height map (a numpy/zarr/n5 array or a `Source`). It is cast to ``float32`` for 94 the watershed if it is not already a float type. 95 seeds: The pre-computed integer seed markers (``0`` = background), same shape as ``input``. 96 A non-integer seed map is cast to ``uint32``. 97 output: The integer output array to write the segmentation into. Optional for local 98 execution -- a ``uint64`` numpy array is allocated and returned if omitted; **required** 99 for distributed execution. It must be wide enough to hold the seed ids (``uint64`` 100 recommended; a ``uint32`` watershed result writes losslessly into a ``uint64`` output). 101 halo: Per-axis halo enlarging each block; **required** for the block-wise path (there is no 102 principled default for a watershed). Choose it large enough to cover object extents at 103 block boundaries. Ignored by the direct (single-block) path. 104 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 105 for unchunked data. 106 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 107 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 108 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 109 backends). 110 mask: Optional binary mask; voxels outside the mask are excluded and stay ``0``. 111 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks 112 into the existing ``output``). Mutually exclusive with ``resume_from``. 113 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 114 ``runner.run``); the missing blocks are written into ``output``. Mutually exclusive 115 with ``block_ids``. 116 117 Returns: 118 The output array (the provided ``output``, or a newly allocated ``uint64`` numpy array). 119 """ 120 check_rerun_args(job_type, resume_from, block_ids) 121 122 src = as_source(input) 123 seeds_src = as_source(seeds) 124 ndim = src.ndim 125 if tuple(seeds_src.shape) != tuple(src.shape): 126 raise ValueError( 127 f"seeds shape {seeds_src.shape} does not match input shape {src.shape}." 128 ) 129 130 # A subset/resume rerun is inherently block-wise, so it cannot use the direct (whole-array) path. 131 direct = (is_direct(job_type, num_workers, block_shape) 132 and block_ids is None and resume_from is None) 133 134 if output is None: 135 if job_type != "local": 136 raise ValueError( 137 f"'output' is required for distributed execution (job_type={job_type!r}); " 138 "pass a file-backed (zarr/n5) output array." 139 ) 140 out_array: SourceLike = np.zeros(tuple(src.shape), dtype="uint64") 141 else: 142 out_array = output 143 144 out = as_source(out_array) 145 if not np.issubdtype(out.dtype, np.integer): 146 raise ValueError(f"output must have an integer dtype, got {out.dtype}.") 147 148 if direct: 149 block_hmap = _as_hmap(src[full_roi(ndim)]) 150 block_seeds = _as_seeds(seeds_src[full_roi(ndim)]) 151 block_mask = as_source(mask)[full_roi(ndim)].astype(bool) if mask is not None else None 152 out[full_roi(ndim)] = bic.segmentation.watershed(block_hmap, block_seeds, mask=block_mask) 153 return out_array 154 155 if halo is None: 156 raise ValueError( 157 "halo is required for block-wise watershed; choose one large enough to cover object " 158 "extents at block boundaries." 159 ) 160 161 runner = get_runner(job_type, job_config) 162 runner.run(_watershed_block, [input, seeds], outputs=[out_array], halo=halo, 163 block_shape=block_shape, mask=mask, num_workers=num_workers, 164 block_ids=block_ids, resume_from=resume_from, name="watershed") 165 return out_array
Compute a seeded watershed over a height map, block-wise.
Each block runs a seeded watershed (bioimage_cpp.segmentation.watershed) on a halo-padded
region and writes back the halo-free inner block. The seeds define the segments and their
ids are preserved verbatim -- there is no cross-block merge, so pass globally consistent seeds
(e.g. a connected-component labeling of a seed mask) for a coherent result.
Being single-stage, this operation supports block_ids / resume_from re-runs. The
block-wise output matches a whole-array watershed only where halo covers the relevant
catchment basins, but is bit-identical across backends for a fixed (block_shape, halo).
Args:
input: The height map (a numpy/zarr/n5 array or a Source). It is cast to float32 for
the watershed if it is not already a float type.
seeds: The pre-computed integer seed markers (0 = background), same shape as input.
A non-integer seed map is cast to uint32.
output: The integer output array to write the segmentation into. Optional for local
execution -- a uint64 numpy array is allocated and returned if omitted; required
for distributed execution. It must be wide enough to hold the seed ids (uint64
recommended; a uint32 watershed result writes losslessly into a uint64 output).
halo: Per-axis halo enlarging each block; required for the block-wise path (there is no
principled default for a watershed). Choose it large enough to cover object extents at
block boundaries. Ignored by the direct (single-block) path.
block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
mask: Optional binary mask; voxels outside the mask are excluded and stay 0.
block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks
into the existing output). Mutually exclusive with resume_from.
resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
runner.run); the missing blocks are written into output. Mutually exclusive
with block_ids.
Returns:
The output array (the provided output, or a newly allocated uint64 numpy array).