bioimage_py.segmentation.size_filter

Block-wise segmentation filtering: a generic per-block predicate and a size filter.

segmentation_filter is the generic, single-pass form: it applies a user filter_function (and optional relabel) to each block, so it supports block_ids / resume_from. Both callables are cloudpickled to the workers, so they must be picklable (capture only picklable values).

size_filter removes objects below min_size / above max_size. It is multi-stage (a global unique count reduction, then a filter pass via segmentation_filter), so it does not accept block_ids / resume_from.

  1"""Block-wise segmentation filtering: a generic per-block predicate and a size filter.
  2
  3``segmentation_filter`` is the generic, single-pass form: it applies a user ``filter_function`` (and
  4optional ``relabel``) to each block, so it supports ``block_ids`` / ``resume_from``. Both callables
  5are cloudpickled to the workers, so they must be picklable (capture only picklable values).
  6
  7``size_filter`` removes objects below ``min_size`` / above ``max_size``. It is multi-stage (a global
  8``unique`` count reduction, then a filter pass via ``segmentation_filter``), so it does **not** accept
  9``block_ids`` / ``resume_from``.
 10"""
 11from __future__ import annotations
 12
 13from typing import Callable, Dict, Optional, Sequence, Tuple
 14
 15import bioimage_cpp as bic
 16import numpy as np
 17
 18from ..runner import get_runner
 19from ..runner.config import RunnerConfig
 20from ..sources import Source, SourceLike, as_source
 21from ..stats.unique import unique
 22from ..util import BlockDescriptor, ComputeFn, check_rerun_args, full_roi, is_direct, same_array, to_roi
 23
 24__all__ = ["segmentation_filter", "size_filter"]
 25
 26# A per-block predicate/relabel callable: ``f(block_seg, block_mask) -> block_seg``. ``block_mask`` is
 27# the boolean in-mask array for the block (or ``None``); when given, only its in-mask voxels are used.
 28BlockFn = Callable[[np.ndarray, Optional[np.ndarray]], np.ndarray]
 29
 30
 31def _make_filter_block(filter_function: BlockFn, relabel: Optional[BlockFn]) -> ComputeFn:
 32    """Build the per-block function applying ``filter_function`` (and optional ``relabel``)."""
 33
 34    def _compute(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source],
 35                 mask: Optional[Source]) -> None:
 36        input_, output_ = inputs[0], outputs[0]
 37        roi = to_roi(block)
 38        if mask is None:
 39            block_mask = None
 40        else:
 41            block_mask = mask[roi].astype(bool)
 42            if not block_mask.any():
 43                return None
 44        filtered = filter_function(input_[roi], block_mask)
 45        if relabel is not None:
 46            filtered = relabel(filtered, block_mask)
 47        if block_mask is None:
 48            output_[roi] = filtered
 49        else:  # keep out-of-mask voxels of the output unchanged.
 50            output_[roi] = np.where(block_mask, filtered, output_[roi])
 51        return None
 52
 53    return _compute
 54
 55
 56def segmentation_filter(
 57    input: SourceLike,
 58    filter_function: BlockFn,
 59    output: Optional[SourceLike] = None,
 60    *,
 61    relabel: Optional[BlockFn] = None,
 62    block_shape: Optional[Tuple[int, ...]] = None,
 63    job_type: str = "local",
 64    job_config: Optional[RunnerConfig] = None,
 65    num_workers: int = 1,
 66    mask: Optional[SourceLike] = None,
 67    block_ids: Optional[Sequence[int]] = None,
 68    resume_from: Optional[str] = None,
 69) -> SourceLike:
 70    """Filter a segmentation with a custom per-block criterion, block-wise.
 71
 72    Args:
 73        input: The input segmentation (a numpy/zarr/n5 array or a `Source`).
 74        filter_function: A picklable callable ``filter_function(block_seg, block_mask)`` returning the
 75            filtered block. ``block_mask`` is the block's boolean in-mask array, or ``None`` when no
 76            mask is used; when a mask is used, restrict the criterion to the in-mask voxels.
 77        output: The output array to write into. Optional for local execution -- a numpy array
 78            matching the input shape and dtype is allocated and returned if omitted; **required** for
 79            distributed execution (a writable, file-backed zarr/n5 array).
 80        relabel: Optional picklable callable ``relabel(block_seg, block_mask)`` applied after
 81            ``filter_function`` (e.g. a consecutive relabeling); same masking contract.
 82        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
 83            for unchunked data.
 84        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
 85        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
 86        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
 87            backends).
 88        mask: Optional binary mask; out-of-mask output voxels are left unchanged.
 89        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks).
 90            Mutually exclusive with ``resume_from``.
 91        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
 92            ``runner.run``). Mutually exclusive with ``block_ids``.
 93
 94    Returns:
 95        The output array (the provided ``output``, or a newly allocated numpy array).
 96    """
 97    check_rerun_args(job_type, resume_from, block_ids)
 98    src = as_source(input)
 99    ndim = src.ndim
100    direct = (is_direct(job_type, num_workers, block_shape) and mask is None
101              and block_ids is None and resume_from is None)
102
103    if output is None:
104        if job_type != "local":
105            raise ValueError(
106                f"'output' is required for distributed execution (job_type={job_type!r}); "
107                "pass a file-backed (zarr/n5) output array."
108            )
109        out_array: SourceLike = np.zeros(tuple(src.shape), dtype=src.dtype)
110    else:
111        out_array = output
112    out = as_source(out_array)
113    if not direct and same_array(out, src):
114        raise ValueError("Block-wise segmentation_filter needs 'output' to differ from 'input'.")
115
116    if direct:
117        filtered = filter_function(src[full_roi(ndim)], None)
118        if relabel is not None:
119            filtered = relabel(filtered, None)
120        out[full_roi(ndim)] = filtered
121        return out_array
122
123    runner = get_runner(job_type, job_config)
124    runner.run(_make_filter_block(filter_function, relabel), [input], outputs=[out_array],
125               block_shape=block_shape, mask=mask, num_workers=num_workers,
126               block_ids=block_ids, resume_from=resume_from, name="segmentation_filter")
127    return out_array
128
129
130def _make_size_filter(filter_ids: np.ndarray) -> BlockFn:
131    """Build the filter callable that sets voxels of the discarded ids to ``0``."""
132
133    def filter_function(block_seg: np.ndarray, block_mask: Optional[np.ndarray]) -> np.ndarray:
134        discard = np.isin(block_seg, filter_ids)
135        if block_mask is not None:
136            discard &= block_mask
137        out = block_seg.copy()
138        out[discard] = 0
139        return out
140
141    return filter_function
142
143
144def _make_size_relabel(mapping: Dict[int, int]) -> BlockFn:
145    """Build the relabel callable mapping surviving ids to consecutive values."""
146
147    def relabel(block_seg: np.ndarray, block_mask: Optional[np.ndarray]) -> np.ndarray:
148        if block_mask is None:
149            return bic.utils.take_dict(mapping, block_seg)
150        out = block_seg.copy()
151        out[block_mask] = bic.utils.take_dict(mapping, block_seg[block_mask])
152        return out
153
154    return relabel
155
156
157def size_filter(
158    input: SourceLike,
159    output: Optional[SourceLike] = None,
160    *,
161    min_size: Optional[int] = None,
162    max_size: Optional[int] = None,
163    relabel: bool = True,
164    block_shape: Optional[Tuple[int, ...]] = None,
165    job_type: str = "local",
166    job_config: Optional[RunnerConfig] = None,
167    num_workers: int = 1,
168    mask: Optional[SourceLike] = None,
169) -> SourceLike:
170    """Remove objects smaller than ``min_size`` and/or larger than ``max_size`` from a segmentation.
171
172    Multi-stage (a global ``unique`` count reduction, then a filter pass), so it does **not** accept
173    ``block_ids`` / ``resume_from``. By default it relabels the result consecutively; pass
174    ``relabel=False`` to keep the original ids of the surviving objects.
175
176    Args:
177        input: The input segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed.
178        output: The output array to write into. Optional for local execution -- a numpy array
179            matching the input shape and dtype is allocated and returned if omitted; **required** for
180            distributed execution (a writable, file-backed zarr/n5 array).
181        min_size: The minimum object size; smaller objects are removed. At least one of ``min_size`` /
182            ``max_size`` is required.
183        max_size: The maximum object size; larger objects are removed.
184        relabel: Whether to relabel the surviving objects consecutively after filtering.
185        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
186            for unchunked data. Required when a ``mask`` is given (the size reduction is block-wise).
187        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
188        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
189        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
190            backends).
191        mask: Optional binary mask; out-of-mask output voxels are left unchanged.
192
193    Returns:
194        The output array (the provided ``output``, or a newly allocated numpy array).
195    """
196    if min_size is None and max_size is None:
197        raise ValueError("size_filter requires at least one of 'min_size' or 'max_size'.")
198    src = as_source(input)
199    if not np.issubdtype(np.dtype(src.dtype), np.integer):
200        raise ValueError(f"size_filter expects an integer label image, got dtype {src.dtype}.")
201
202    # Pass 1: unique ids with their sizes.
203    ids, counts = unique(input, return_counts=True, block_shape=block_shape, job_type=job_type,
204                         job_config=job_config, num_workers=num_workers, mask=mask)
205
206    # In-process: ids to discard and the consecutive relabeling of the survivors.
207    discard = np.zeros(ids.shape, dtype=bool)
208    if min_size is not None:
209        discard |= counts < min_size
210    if max_size is not None:
211        discard |= counts > max_size
212    filter_ids = ids[discard]
213
214    relabel_fn: Optional[BlockFn] = None
215    if relabel:
216        # Reserve 0 for background and map the surviving foreground ids to 1..K consecutively, so a
217        # surviving object can never collide with the (possibly newly introduced) background 0.
218        remaining_fg = ids[(~discard) & (ids != 0)]
219        mapping: Dict[int, int] = {int(v): i for i, v in enumerate(remaining_fg.tolist(), start=1)}
220        mapping[0] = 0
221        relabel_fn = _make_size_relabel(mapping)
222
223    return segmentation_filter(input, _make_size_filter(filter_ids), output, relabel=relabel_fn,
224                               block_shape=block_shape, job_type=job_type, job_config=job_config,
225                               num_workers=num_workers, mask=mask)
def segmentation_filter( input: 'SourceLike', filter_function: Callable[[numpy.ndarray, Optional[numpy.ndarray]], numpy.ndarray], output: 'Optional[SourceLike]' = None, *, relabel: Optional[Callable[[numpy.ndarray, Optional[numpy.ndarray]], numpy.ndarray]] = None, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, num_workers: int = 1, mask: 'Optional[SourceLike]' = None, block_ids: Optional[Sequence[int]] = None, resume_from: Optional[str] = None) -> 'SourceLike':
 57def segmentation_filter(
 58    input: SourceLike,
 59    filter_function: BlockFn,
 60    output: Optional[SourceLike] = None,
 61    *,
 62    relabel: Optional[BlockFn] = None,
 63    block_shape: Optional[Tuple[int, ...]] = None,
 64    job_type: str = "local",
 65    job_config: Optional[RunnerConfig] = None,
 66    num_workers: int = 1,
 67    mask: Optional[SourceLike] = None,
 68    block_ids: Optional[Sequence[int]] = None,
 69    resume_from: Optional[str] = None,
 70) -> SourceLike:
 71    """Filter a segmentation with a custom per-block criterion, block-wise.
 72
 73    Args:
 74        input: The input segmentation (a numpy/zarr/n5 array or a `Source`).
 75        filter_function: A picklable callable ``filter_function(block_seg, block_mask)`` returning the
 76            filtered block. ``block_mask`` is the block's boolean in-mask array, or ``None`` when no
 77            mask is used; when a mask is used, restrict the criterion to the in-mask voxels.
 78        output: The output array to write into. Optional for local execution -- a numpy array
 79            matching the input shape and dtype is allocated and returned if omitted; **required** for
 80            distributed execution (a writable, file-backed zarr/n5 array).
 81        relabel: Optional picklable callable ``relabel(block_seg, block_mask)`` applied after
 82            ``filter_function`` (e.g. a consecutive relabeling); same masking contract.
 83        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
 84            for unchunked data.
 85        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
 86        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
 87        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
 88            backends).
 89        mask: Optional binary mask; out-of-mask output voxels are left unchanged.
 90        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks).
 91            Mutually exclusive with ``resume_from``.
 92        resume_from: Distributed only; the preserved temp folder of a failed run to resume (see
 93            ``runner.run``). Mutually exclusive with ``block_ids``.
 94
 95    Returns:
 96        The output array (the provided ``output``, or a newly allocated numpy array).
 97    """
 98    check_rerun_args(job_type, resume_from, block_ids)
 99    src = as_source(input)
100    ndim = src.ndim
101    direct = (is_direct(job_type, num_workers, block_shape) and mask is None
102              and block_ids is None and resume_from is None)
103
104    if output is None:
105        if job_type != "local":
106            raise ValueError(
107                f"'output' is required for distributed execution (job_type={job_type!r}); "
108                "pass a file-backed (zarr/n5) output array."
109            )
110        out_array: SourceLike = np.zeros(tuple(src.shape), dtype=src.dtype)
111    else:
112        out_array = output
113    out = as_source(out_array)
114    if not direct and same_array(out, src):
115        raise ValueError("Block-wise segmentation_filter needs 'output' to differ from 'input'.")
116
117    if direct:
118        filtered = filter_function(src[full_roi(ndim)], None)
119        if relabel is not None:
120            filtered = relabel(filtered, None)
121        out[full_roi(ndim)] = filtered
122        return out_array
123
124    runner = get_runner(job_type, job_config)
125    runner.run(_make_filter_block(filter_function, relabel), [input], outputs=[out_array],
126               block_shape=block_shape, mask=mask, num_workers=num_workers,
127               block_ids=block_ids, resume_from=resume_from, name="segmentation_filter")
128    return out_array

Filter a segmentation with a custom per-block criterion, block-wise.

Args: input: The input segmentation (a numpy/zarr/n5 array or a Source). filter_function: A picklable callable filter_function(block_seg, block_mask) returning the filtered block. block_mask is the block's boolean in-mask array, or None when no mask is used; when a mask is used, restrict the criterion to the in-mask voxels. output: The output array to write into. Optional for local execution -- a numpy array matching the input shape and dtype is allocated and returned if omitted; required for distributed execution (a writable, file-backed zarr/n5 array). relabel: Optional picklable callable relabel(block_seg, block_mask) applied after filter_function (e.g. a consecutive relabeling); same masking contract. block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required for unchunked data. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). num_workers: Number of parallel workers (threads for local, tasks for distributed backends). mask: Optional binary mask; out-of-mask output voxels are left unchanged. block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks). Mutually exclusive with resume_from. resume_from: Distributed only; the preserved temp folder of a failed run to resume (see runner.run). Mutually exclusive with block_ids.

Returns: The output array (the provided output, or a newly allocated numpy array).

def size_filter( input: 'SourceLike', output: 'Optional[SourceLike]' = None, *, min_size: Optional[int] = None, max_size: Optional[int] = None, relabel: bool = True, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, num_workers: int = 1, mask: 'Optional[SourceLike]' = None) -> 'SourceLike':
158def size_filter(
159    input: SourceLike,
160    output: Optional[SourceLike] = None,
161    *,
162    min_size: Optional[int] = None,
163    max_size: Optional[int] = None,
164    relabel: bool = True,
165    block_shape: Optional[Tuple[int, ...]] = None,
166    job_type: str = "local",
167    job_config: Optional[RunnerConfig] = None,
168    num_workers: int = 1,
169    mask: Optional[SourceLike] = None,
170) -> SourceLike:
171    """Remove objects smaller than ``min_size`` and/or larger than ``max_size`` from a segmentation.
172
173    Multi-stage (a global ``unique`` count reduction, then a filter pass), so it does **not** accept
174    ``block_ids`` / ``resume_from``. By default it relabels the result consecutively; pass
175    ``relabel=False`` to keep the original ids of the surviving objects.
176
177    Args:
178        input: The input segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed.
179        output: The output array to write into. Optional for local execution -- a numpy array
180            matching the input shape and dtype is allocated and returned if omitted; **required** for
181            distributed execution (a writable, file-backed zarr/n5 array).
182        min_size: The minimum object size; smaller objects are removed. At least one of ``min_size`` /
183            ``max_size`` is required.
184        max_size: The maximum object size; larger objects are removed.
185        relabel: Whether to relabel the surviving objects consecutively after filtering.
186        block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required
187            for unchunked data. Required when a ``mask`` is given (the size reduction is block-wise).
188        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
189        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
190        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
191            backends).
192        mask: Optional binary mask; out-of-mask output voxels are left unchanged.
193
194    Returns:
195        The output array (the provided ``output``, or a newly allocated numpy array).
196    """
197    if min_size is None and max_size is None:
198        raise ValueError("size_filter requires at least one of 'min_size' or 'max_size'.")
199    src = as_source(input)
200    if not np.issubdtype(np.dtype(src.dtype), np.integer):
201        raise ValueError(f"size_filter expects an integer label image, got dtype {src.dtype}.")
202
203    # Pass 1: unique ids with their sizes.
204    ids, counts = unique(input, return_counts=True, block_shape=block_shape, job_type=job_type,
205                         job_config=job_config, num_workers=num_workers, mask=mask)
206
207    # In-process: ids to discard and the consecutive relabeling of the survivors.
208    discard = np.zeros(ids.shape, dtype=bool)
209    if min_size is not None:
210        discard |= counts < min_size
211    if max_size is not None:
212        discard |= counts > max_size
213    filter_ids = ids[discard]
214
215    relabel_fn: Optional[BlockFn] = None
216    if relabel:
217        # Reserve 0 for background and map the surviving foreground ids to 1..K consecutively, so a
218        # surviving object can never collide with the (possibly newly introduced) background 0.
219        remaining_fg = ids[(~discard) & (ids != 0)]
220        mapping: Dict[int, int] = {int(v): i for i, v in enumerate(remaining_fg.tolist(), start=1)}
221        mapping[0] = 0
222        relabel_fn = _make_size_relabel(mapping)
223
224    return segmentation_filter(input, _make_size_filter(filter_ids), output, relabel=relabel_fn,
225                               block_shape=block_shape, job_type=job_type, job_config=job_config,
226                               num_workers=num_workers, mask=mask)

Remove objects smaller than min_size and/or larger than max_size from a segmentation.

Multi-stage (a global unique count reduction, then a filter pass), so it does not accept block_ids / resume_from. By default it relabels the result consecutively; pass relabel=False to keep the original ids of the surviving objects.

Args: input: The input segmentation (a numpy/zarr/n5 array or a Source); must be integer-typed. output: The output array to write into. Optional for local execution -- a numpy array matching the input shape and dtype is allocated and returned if omitted; required for distributed execution (a writable, file-backed zarr/n5 array). min_size: The minimum object size; smaller objects are removed. At least one of min_size / max_size is required. max_size: The maximum object size; larger objects are removed. relabel: Whether to relabel the surviving objects consecutively after filtering. block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required for unchunked data. Required when a mask is given (the size reduction is block-wise). job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). num_workers: Number of parallel workers (threads for local, tasks for distributed backends). mask: Optional binary mask; out-of-mask output voxels are left unchanged.

Returns: The output array (the provided output, or a newly allocated numpy array).