bioimage_py.segmentation.size_filter
Block-wise segmentation filtering: a generic per-block predicate and a size filter.
segmentation_filter is the generic, single-pass form: it applies a user filter_function (and
optional relabel) to each block, so it supports block_ids / resume_from. Both callables
are cloudpickled to the workers, so they must be picklable (capture only picklable values).
size_filter removes objects below min_size / above max_size. It is multi-stage (a global
unique count reduction, then a filter pass via segmentation_filter), so it does not accept
block_ids / resume_from.
1"""Block-wise segmentation filtering: a generic per-block predicate and a size filter. 2 3``segmentation_filter`` is the generic, single-pass form: it applies a user ``filter_function`` (and 4optional ``relabel``) to each block, so it supports ``block_ids`` / ``resume_from``. Both callables 5are cloudpickled to the workers, so they must be picklable (capture only picklable values). 6 7``size_filter`` removes objects below ``min_size`` / above ``max_size``. It is multi-stage (a global 8``unique`` count reduction, then a filter pass via ``segmentation_filter``), so it does **not** accept 9``block_ids`` / ``resume_from``. 10""" 11from __future__ import annotations 12 13from typing import Callable, Dict, Optional, Sequence, Tuple 14 15import bioimage_cpp as bic 16import numpy as np 17 18from ..runner import get_runner 19from ..runner.config import RunnerConfig 20from ..sources import Source, SourceLike, as_source 21from ..stats.unique import unique 22from ..util import BlockDescriptor, ComputeFn, check_rerun_args, full_roi, is_direct, same_array, to_roi 23 24__all__ = ["segmentation_filter", "size_filter"] 25 26# A per-block predicate/relabel callable: ``f(block_seg, block_mask) -> block_seg``. ``block_mask`` is 27# the boolean in-mask array for the block (or ``None``); when given, only its in-mask voxels are used. 28BlockFn = Callable[[np.ndarray, Optional[np.ndarray]], np.ndarray] 29 30 31def _make_filter_block(filter_function: BlockFn, relabel: Optional[BlockFn]) -> ComputeFn: 32 """Build the per-block function applying ``filter_function`` (and optional ``relabel``).""" 33 34 def _compute(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source], 35 mask: Optional[Source]) -> None: 36 input_, output_ = inputs[0], outputs[0] 37 roi = to_roi(block) 38 if mask is None: 39 block_mask = None 40 else: 41 block_mask = mask[roi].astype(bool) 42 if not block_mask.any(): 43 return None 44 filtered = filter_function(input_[roi], block_mask) 45 if relabel is not None: 46 filtered = relabel(filtered, block_mask) 47 if block_mask is None: 48 output_[roi] = filtered 49 else: # keep out-of-mask voxels of the output unchanged. 50 output_[roi] = np.where(block_mask, filtered, output_[roi]) 51 return None 52 53 return _compute 54 55 56def segmentation_filter( 57 input: SourceLike, 58 filter_function: BlockFn, 59 output: Optional[SourceLike] = None, 60 *, 61 relabel: Optional[BlockFn] = None, 62 block_shape: Optional[Tuple[int, ...]] = None, 63 job_type: str = "local", 64 job_config: Optional[RunnerConfig] = None, 65 num_workers: int = 1, 66 mask: Optional[SourceLike] = None, 67 block_ids: Optional[Sequence[int]] = None, 68 resume_from: Optional[str] = None, 69) -> SourceLike: 70 """Filter a segmentation with a custom per-block criterion, block-wise. 71 72 Args: 73 input: The input segmentation (a numpy/zarr/n5 array or a `Source`). 74 filter_function: A picklable callable ``filter_function(block_seg, block_mask)`` returning the 75 filtered block. ``block_mask`` is the block's boolean in-mask array, or ``None`` when no 76 mask is used; when a mask is used, restrict the criterion to the in-mask voxels. 77 output: The output array to write into. Optional for local execution -- a numpy array 78 matching the input shape and dtype is allocated and returned if omitted; **required** for 79 distributed execution (a writable, file-backed zarr/n5 array). 80 relabel: Optional picklable callable ``relabel(block_seg, block_mask)`` applied after 81 ``filter_function`` (e.g. a consecutive relabeling); same masking contract. 82 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 83 for unchunked data. 84 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 85 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 86 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 87 backends). 88 mask: Optional binary mask; out-of-mask output voxels are left unchanged. 89 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks). 90 Mutually exclusive with ``resume_from``. 91 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 92 ``runner.run``). Mutually exclusive with ``block_ids``. 93 94 Returns: 95 The output array (the provided ``output``, or a newly allocated numpy array). 96 """ 97 check_rerun_args(job_type, resume_from, block_ids) 98 src = as_source(input) 99 ndim = src.ndim 100 direct = (is_direct(job_type, num_workers, block_shape) and mask is None 101 and block_ids is None and resume_from is None) 102 103 if output is None: 104 if job_type != "local": 105 raise ValueError( 106 f"'output' is required for distributed execution (job_type={job_type!r}); " 107 "pass a file-backed (zarr/n5) output array." 108 ) 109 out_array: SourceLike = np.zeros(tuple(src.shape), dtype=src.dtype) 110 else: 111 out_array = output 112 out = as_source(out_array) 113 if not direct and same_array(out, src): 114 raise ValueError("Block-wise segmentation_filter needs 'output' to differ from 'input'.") 115 116 if direct: 117 filtered = filter_function(src[full_roi(ndim)], None) 118 if relabel is not None: 119 filtered = relabel(filtered, None) 120 out[full_roi(ndim)] = filtered 121 return out_array 122 123 runner = get_runner(job_type, job_config) 124 runner.run(_make_filter_block(filter_function, relabel), [input], outputs=[out_array], 125 block_shape=block_shape, mask=mask, num_workers=num_workers, 126 block_ids=block_ids, resume_from=resume_from, name="segmentation_filter") 127 return out_array 128 129 130def _make_size_filter(filter_ids: np.ndarray) -> BlockFn: 131 """Build the filter callable that sets voxels of the discarded ids to ``0``.""" 132 133 def filter_function(block_seg: np.ndarray, block_mask: Optional[np.ndarray]) -> np.ndarray: 134 discard = np.isin(block_seg, filter_ids) 135 if block_mask is not None: 136 discard &= block_mask 137 out = block_seg.copy() 138 out[discard] = 0 139 return out 140 141 return filter_function 142 143 144def _make_size_relabel(mapping: Dict[int, int]) -> BlockFn: 145 """Build the relabel callable mapping surviving ids to consecutive values.""" 146 147 def relabel(block_seg: np.ndarray, block_mask: Optional[np.ndarray]) -> np.ndarray: 148 if block_mask is None: 149 return bic.utils.take_dict(mapping, block_seg) 150 out = block_seg.copy() 151 out[block_mask] = bic.utils.take_dict(mapping, block_seg[block_mask]) 152 return out 153 154 return relabel 155 156 157def size_filter( 158 input: SourceLike, 159 output: Optional[SourceLike] = None, 160 *, 161 min_size: Optional[int] = None, 162 max_size: Optional[int] = None, 163 relabel: bool = True, 164 block_shape: Optional[Tuple[int, ...]] = None, 165 job_type: str = "local", 166 job_config: Optional[RunnerConfig] = None, 167 num_workers: int = 1, 168 mask: Optional[SourceLike] = None, 169) -> SourceLike: 170 """Remove objects smaller than ``min_size`` and/or larger than ``max_size`` from a segmentation. 171 172 Multi-stage (a global ``unique`` count reduction, then a filter pass), so it does **not** accept 173 ``block_ids`` / ``resume_from``. By default it relabels the result consecutively; pass 174 ``relabel=False`` to keep the original ids of the surviving objects. 175 176 Args: 177 input: The input segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 178 output: The output array to write into. Optional for local execution -- a numpy array 179 matching the input shape and dtype is allocated and returned if omitted; **required** for 180 distributed execution (a writable, file-backed zarr/n5 array). 181 min_size: The minimum object size; smaller objects are removed. At least one of ``min_size`` / 182 ``max_size`` is required. 183 max_size: The maximum object size; larger objects are removed. 184 relabel: Whether to relabel the surviving objects consecutively after filtering. 185 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 186 for unchunked data. Required when a ``mask`` is given (the size reduction is block-wise). 187 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 188 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 189 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 190 backends). 191 mask: Optional binary mask; out-of-mask output voxels are left unchanged. 192 193 Returns: 194 The output array (the provided ``output``, or a newly allocated numpy array). 195 """ 196 if min_size is None and max_size is None: 197 raise ValueError("size_filter requires at least one of 'min_size' or 'max_size'.") 198 src = as_source(input) 199 if not np.issubdtype(np.dtype(src.dtype), np.integer): 200 raise ValueError(f"size_filter expects an integer label image, got dtype {src.dtype}.") 201 202 # Pass 1: unique ids with their sizes. 203 ids, counts = unique(input, return_counts=True, block_shape=block_shape, job_type=job_type, 204 job_config=job_config, num_workers=num_workers, mask=mask) 205 206 # In-process: ids to discard and the consecutive relabeling of the survivors. 207 discard = np.zeros(ids.shape, dtype=bool) 208 if min_size is not None: 209 discard |= counts < min_size 210 if max_size is not None: 211 discard |= counts > max_size 212 filter_ids = ids[discard] 213 214 relabel_fn: Optional[BlockFn] = None 215 if relabel: 216 # Reserve 0 for background and map the surviving foreground ids to 1..K consecutively, so a 217 # surviving object can never collide with the (possibly newly introduced) background 0. 218 remaining_fg = ids[(~discard) & (ids != 0)] 219 mapping: Dict[int, int] = {int(v): i for i, v in enumerate(remaining_fg.tolist(), start=1)} 220 mapping[0] = 0 221 relabel_fn = _make_size_relabel(mapping) 222 223 return segmentation_filter(input, _make_size_filter(filter_ids), output, relabel=relabel_fn, 224 block_shape=block_shape, job_type=job_type, job_config=job_config, 225 num_workers=num_workers, mask=mask)
57def segmentation_filter( 58 input: SourceLike, 59 filter_function: BlockFn, 60 output: Optional[SourceLike] = None, 61 *, 62 relabel: Optional[BlockFn] = None, 63 block_shape: Optional[Tuple[int, ...]] = None, 64 job_type: str = "local", 65 job_config: Optional[RunnerConfig] = None, 66 num_workers: int = 1, 67 mask: Optional[SourceLike] = None, 68 block_ids: Optional[Sequence[int]] = None, 69 resume_from: Optional[str] = None, 70) -> SourceLike: 71 """Filter a segmentation with a custom per-block criterion, block-wise. 72 73 Args: 74 input: The input segmentation (a numpy/zarr/n5 array or a `Source`). 75 filter_function: A picklable callable ``filter_function(block_seg, block_mask)`` returning the 76 filtered block. ``block_mask`` is the block's boolean in-mask array, or ``None`` when no 77 mask is used; when a mask is used, restrict the criterion to the in-mask voxels. 78 output: The output array to write into. Optional for local execution -- a numpy array 79 matching the input shape and dtype is allocated and returned if omitted; **required** for 80 distributed execution (a writable, file-backed zarr/n5 array). 81 relabel: Optional picklable callable ``relabel(block_seg, block_mask)`` applied after 82 ``filter_function`` (e.g. a consecutive relabeling); same masking contract. 83 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 84 for unchunked data. 85 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 86 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 87 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 88 backends). 89 mask: Optional binary mask; out-of-mask output voxels are left unchanged. 90 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks). 91 Mutually exclusive with ``resume_from``. 92 resume_from: Distributed only; the preserved temp folder of a failed run to resume (see 93 ``runner.run``). Mutually exclusive with ``block_ids``. 94 95 Returns: 96 The output array (the provided ``output``, or a newly allocated numpy array). 97 """ 98 check_rerun_args(job_type, resume_from, block_ids) 99 src = as_source(input) 100 ndim = src.ndim 101 direct = (is_direct(job_type, num_workers, block_shape) and mask is None 102 and block_ids is None and resume_from is None) 103 104 if output is None: 105 if job_type != "local": 106 raise ValueError( 107 f"'output' is required for distributed execution (job_type={job_type!r}); " 108 "pass a file-backed (zarr/n5) output array." 109 ) 110 out_array: SourceLike = np.zeros(tuple(src.shape), dtype=src.dtype) 111 else: 112 out_array = output 113 out = as_source(out_array) 114 if not direct and same_array(out, src): 115 raise ValueError("Block-wise segmentation_filter needs 'output' to differ from 'input'.") 116 117 if direct: 118 filtered = filter_function(src[full_roi(ndim)], None) 119 if relabel is not None: 120 filtered = relabel(filtered, None) 121 out[full_roi(ndim)] = filtered 122 return out_array 123 124 runner = get_runner(job_type, job_config) 125 runner.run(_make_filter_block(filter_function, relabel), [input], outputs=[out_array], 126 block_shape=block_shape, mask=mask, num_workers=num_workers, 127 block_ids=block_ids, resume_from=resume_from, name="segmentation_filter") 128 return out_array
Filter a segmentation with a custom per-block criterion, block-wise.
Args:
input: The input segmentation (a numpy/zarr/n5 array or a Source).
filter_function: A picklable callable filter_function(block_seg, block_mask) returning the
filtered block. block_mask is the block's boolean in-mask array, or None when no
mask is used; when a mask is used, restrict the criterion to the in-mask voxels.
output: The output array to write into. Optional for local execution -- a numpy array
matching the input shape and dtype is allocated and returned if omitted; required for
distributed execution (a writable, file-backed zarr/n5 array).
relabel: Optional picklable callable relabel(block_seg, block_mask) applied after
filter_function (e.g. a consecutive relabeling); same masking contract.
block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
mask: Optional binary mask; out-of-mask output voxels are left unchanged.
block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks).
Mutually exclusive with resume_from.
resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
runner.run). Mutually exclusive with block_ids.
Returns:
The output array (the provided output, or a newly allocated numpy array).
158def size_filter( 159 input: SourceLike, 160 output: Optional[SourceLike] = None, 161 *, 162 min_size: Optional[int] = None, 163 max_size: Optional[int] = None, 164 relabel: bool = True, 165 block_shape: Optional[Tuple[int, ...]] = None, 166 job_type: str = "local", 167 job_config: Optional[RunnerConfig] = None, 168 num_workers: int = 1, 169 mask: Optional[SourceLike] = None, 170) -> SourceLike: 171 """Remove objects smaller than ``min_size`` and/or larger than ``max_size`` from a segmentation. 172 173 Multi-stage (a global ``unique`` count reduction, then a filter pass), so it does **not** accept 174 ``block_ids`` / ``resume_from``. By default it relabels the result consecutively; pass 175 ``relabel=False`` to keep the original ids of the surviving objects. 176 177 Args: 178 input: The input segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 179 output: The output array to write into. Optional for local execution -- a numpy array 180 matching the input shape and dtype is allocated and returned if omitted; **required** for 181 distributed execution (a writable, file-backed zarr/n5 array). 182 min_size: The minimum object size; smaller objects are removed. At least one of ``min_size`` / 183 ``max_size`` is required. 184 max_size: The maximum object size; larger objects are removed. 185 relabel: Whether to relabel the surviving objects consecutively after filtering. 186 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required 187 for unchunked data. Required when a ``mask`` is given (the size reduction is block-wise). 188 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 189 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 190 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 191 backends). 192 mask: Optional binary mask; out-of-mask output voxels are left unchanged. 193 194 Returns: 195 The output array (the provided ``output``, or a newly allocated numpy array). 196 """ 197 if min_size is None and max_size is None: 198 raise ValueError("size_filter requires at least one of 'min_size' or 'max_size'.") 199 src = as_source(input) 200 if not np.issubdtype(np.dtype(src.dtype), np.integer): 201 raise ValueError(f"size_filter expects an integer label image, got dtype {src.dtype}.") 202 203 # Pass 1: unique ids with their sizes. 204 ids, counts = unique(input, return_counts=True, block_shape=block_shape, job_type=job_type, 205 job_config=job_config, num_workers=num_workers, mask=mask) 206 207 # In-process: ids to discard and the consecutive relabeling of the survivors. 208 discard = np.zeros(ids.shape, dtype=bool) 209 if min_size is not None: 210 discard |= counts < min_size 211 if max_size is not None: 212 discard |= counts > max_size 213 filter_ids = ids[discard] 214 215 relabel_fn: Optional[BlockFn] = None 216 if relabel: 217 # Reserve 0 for background and map the surviving foreground ids to 1..K consecutively, so a 218 # surviving object can never collide with the (possibly newly introduced) background 0. 219 remaining_fg = ids[(~discard) & (ids != 0)] 220 mapping: Dict[int, int] = {int(v): i for i, v in enumerate(remaining_fg.tolist(), start=1)} 221 mapping[0] = 0 222 relabel_fn = _make_size_relabel(mapping) 223 224 return segmentation_filter(input, _make_size_filter(filter_ids), output, relabel=relabel_fn, 225 block_shape=block_shape, job_type=job_type, job_config=job_config, 226 num_workers=num_workers, mask=mask)
Remove objects smaller than min_size and/or larger than max_size from a segmentation.
Multi-stage (a global unique count reduction, then a filter pass), so it does not accept
block_ids / resume_from. By default it relabels the result consecutively; pass
relabel=False to keep the original ids of the surviving objects.
Args:
input: The input segmentation (a numpy/zarr/n5 array or a Source); must be integer-typed.
output: The output array to write into. Optional for local execution -- a numpy array
matching the input shape and dtype is allocated and returned if omitted; required for
distributed execution (a writable, file-backed zarr/n5 array).
min_size: The minimum object size; smaller objects are removed. At least one of min_size /
max_size is required.
max_size: The maximum object size; larger objects are removed.
relabel: Whether to relabel the surviving objects consecutively after filtering.
block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
for unchunked data. Required when a mask is given (the size reduction is block-wise).
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
mask: Optional binary mask; out-of-mask output voxels are left unchanged.
Returns:
The output array (the provided output, or a newly allocated numpy array).