bioimage_py.morphology.morphology

Per-label morphology (size, center of mass, bounding box) via the runner's return channel.

This is a reduction operation: each block computes per-label sufficient statistics, and the main process merges them into one table (one row per label). It mirrors the stats ops (bioimage_py.stats.mean_and_std()) — the per-block statistics flow through runner.run(..., has_return_val=True) and the merge is pure numpy — so it behaves identically across the local / subprocess / slurm backends. No custom C++ (nifty) code is required.

View Source

  1"""Per-label morphology (size, center of mass, bounding box) via the runner's return channel.
  2
  3This is a reduction operation: each block computes per-label sufficient statistics, and the main
  4process merges them into one table (one row per label). It mirrors the ``stats`` ops
  5(:func:`bioimage_py.stats.mean_and_std`) — the per-block statistics flow through
  6``runner.run(..., has_return_val=True)`` and the merge is pure numpy — so it behaves identically across
  7the ``local`` / ``subprocess`` / ``slurm`` backends. No custom C++ (nifty) code is required.
  8"""
  9from __future__ import annotations
 10
 11from typing import Callable, List, Optional, Sequence, Tuple
 12
 13import numpy as np
 14import pandas as pd
 15
 16from ..runner import get_runner
 17from ..runner.config import RunnerConfig
 18from ..sources import Source, SourceLike, as_source
 19from ..util import BlockDescriptor, check_direct, check_rerun_args, full_roi, to_roi
 20
 21__all__ = ["morphology"]
 22
 23
 24def _axis_names(ndim: int) -> List[str]:
 25    """Return per-axis names: ``z``/``y``/``x`` for 2D/3D, else ``axis0`` ... ``axis{ndim-1}``."""
 26    if ndim == 2:
 27        return ["y", "x"]
 28    if ndim == 3:
 29        return ["z", "y", "x"]
 30    return [f"axis{a}" for a in range(ndim)]
 31
 32
 33def _block_table(seg: np.ndarray, offset: Sequence[int]) -> Optional[np.ndarray]:
 34    """Compute the per-label morphology table for one block.
 35
 36    A single sort of the flattened labels feeds every statistic (count, weighted coordinate sum and
 37    bounding box), which is markedly faster than ``np.unique`` + ``bincount`` (those pay for a second
 38    sort to group the bounding-box coordinates).
 39
 40    Args:
 41        seg: The integer label block. ``0`` is treated as background and dropped.
 42        offset: The block's begin coordinate, added to make coordinates global.
 43
 44    Returns:
 45        A ``(K, 2 + 3 * ndim)`` float64 array with columns
 46        ``[label, size, wsum_axis..., bb_min_axis..., bb_max_axis...]`` where ``wsum`` is the weighted
 47        coordinate sum (``size * com``) and ``bb_max`` is the inclusive max coordinate. ``None`` if the
 48        block contains no foreground.
 49    """
 50    ndim = seg.ndim
 51    flat = seg.ravel()
 52    order = np.argsort(flat)  # default quicksort: intra-group order is irrelevant to sum/min/max/count.
 53    sl = flat[order]
 54    starts = np.flatnonzero(np.concatenate(([True], sl[1:] != sl[:-1])))
 55    uniq = sl[starts]
 56    size = np.diff(np.append(starts, sl.size)).astype("float64")
 57
 58    table = np.empty((uniq.shape[0], 2 + 3 * ndim), dtype="float64")
 59    table[:, 0] = uniq
 60    table[:, 1] = size
 61    for a in range(ndim):
 62        reshape = [1] * ndim
 63        reshape[a] = seg.shape[a]
 64        coord = np.broadcast_to(
 65            (np.arange(seg.shape[a], dtype="float64") + offset[a]).reshape(reshape), seg.shape
 66        ).ravel()[order]
 67        table[:, 2 + a] = np.add.reduceat(coord, starts)
 68        table[:, 2 + ndim + a] = np.minimum.reduceat(coord, starts)
 69        table[:, 2 + 2 * ndim + a] = np.maximum.reduceat(coord, starts)
 70
 71    table = table[table[:, 0] != 0]  # drop background (label 0); uniq is sorted, so it is row 0 if present.
 72    return table if table.shape[0] else None
 73
 74
 75def _merge_tables(tables: List[np.ndarray], ndim: int) -> Tuple[np.ndarray, ...]:
 76    """Merge per-block tables into per-label statistics.
 77
 78    Uses the same single-sort + ``reduceat`` scheme as :func:`_block_table`: group the stacked rows by
 79    label, sum the size and weighted-coordinate columns, and take the min/max of the bounding-box
 80    columns. The center of mass is then ``weighted_sum / size`` per axis.
 81
 82    Args:
 83        tables: The non-``None`` per-block tables (``(_, 2 + 3 * ndim)`` arrays).
 84        ndim: The number of spatial dimensions.
 85
 86    Returns:
 87        A tuple ``(labels, size, com, bb_min, bb_max)`` with shapes ``(K,)``, ``(K,)``, ``(K, ndim)``,
 88        ``(K, ndim)``, ``(K, ndim)``. ``bb_max`` is the inclusive max coordinate.
 89    """
 90    if not tables:
 91        return (np.zeros((0,), "float64"), np.zeros((0,), "float64"),
 92                np.zeros((0, ndim), "float64"), np.zeros((0, ndim), "float64"),
 93                np.zeros((0, ndim), "float64"))
 94    stacked = np.vstack(tables)
 95    order = np.argsort(stacked[:, 0])
 96    stacked = stacked[order]
 97    labels = stacked[:, 0]
 98    starts = np.flatnonzero(np.concatenate(([True], labels[1:] != labels[:-1])))
 99
100    uniq = labels[starts]
101    summed = np.add.reduceat(stacked[:, 1:2 + ndim], starts, axis=0)  # [size, wsum_axis...]
102    size = summed[:, 0]
103    com = summed[:, 1:] / size[:, None]
104    bb_min = np.minimum.reduceat(stacked[:, 2 + ndim:2 + 2 * ndim], starts, axis=0)
105    bb_max = np.maximum.reduceat(stacked[:, 2 + 2 * ndim:2 + 3 * ndim], starts, axis=0)
106    return uniq, size, com, bb_min, bb_max
107
108
109def _to_dataframe(labels: np.ndarray, size: np.ndarray, com: np.ndarray,
110                  bb_min: np.ndarray, bb_max: np.ndarray, ndim: int) -> "pd.DataFrame":
111    """Assemble the per-label statistics into a sorted pandas DataFrame.
112
113    ``bb_max`` is converted to an exclusive slice stop (``max_coordinate + 1``) so the bounding box of a
114    row is ``tuple(slice(row.bb_min_<ax>, row.bb_max_<ax>) for ax in axes)``.
115    """
116    axes = _axis_names(ndim)
117    columns = {"label": labels.astype("uint64"), "size": size.astype("int64")}
118    for a, ax in enumerate(axes):
119        columns[f"com_{ax}"] = com[:, a].astype("float64")
120    for a, ax in enumerate(axes):
121        columns[f"bb_min_{ax}"] = bb_min[:, a].astype("int64")
122    for a, ax in enumerate(axes):
123        columns[f"bb_max_{ax}"] = (bb_max[:, a] + 1).astype("int64")
124    return pd.DataFrame(columns).sort_values("label").reset_index(drop=True)
125
126
127def _morphology_block(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source],
128                      mask: Optional[Source]) -> Optional[np.ndarray]:
129    """Per-block morphology table (``None`` if the block has no foreground)."""
130    roi = to_roi(block)
131    seg = inputs[0][roi]
132    if mask is not None:
133        block_mask = mask[roi].astype(bool)
134        if not block_mask.any():
135            return None
136        seg = np.where(block_mask, seg, 0)  # Masked-out voxels fall into the dropped background.
137    return _block_table(seg, list(block.begin))
138
139
140def morphology(
141    input: SourceLike,
142    num_workers: int = 1,
143    block_shape: Optional[Tuple[int, ...]] = None,
144    job_type: str = "local",
145    job_config: Optional[RunnerConfig] = None,
146    mask: Optional[SourceLike] = None,
147    block_ids: Optional[Sequence[int]] = None,
148    resume_from: Optional[str] = None,
149    pre_cleanup: Optional[Callable[[str], None]] = None,
150) -> "pd.DataFrame":
151    """Compute per-label morphology (size, center of mass, bounding box) of a labeled volume.
152
153    Statistics are computed block-wise and merged so the result is exact regardless of how labels
154    straddle block boundaries. The background label ``0`` is excluded.
155
156    Args:
157        input: The input label image (a numpy/zarr/n5 array or a `Source`); must be integer-typed.
158        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
159            backends).
160        block_shape: Shape of the processing blocks. Defaults to the input chunk shape;
161            required for unchunked data.
162        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
163        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
164        mask: Optional binary mask; voxels outside the mask are excluded from the computation.
165        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks);
166            the table then reflects only those blocks.
167        resume_from: Distributed only; the preserved temp folder of a failed run to resume and
168            merge (see ``runner.run``). The returned table then covers the full volume (the
169            already-completed blocks merged with the re-run ones). Mutually exclusive with
170            ``block_ids``.
171        pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating
172            process with the job temp folder right before it is deleted (distributed backends only).
173            Use it to read out the per-task timing files under ``tmp_folder/timings/`` before cleanup.
174            Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path, which
175            have no temp folder.
176
177    Returns:
178        A pandas DataFrame with one row per label, sorted by label, with columns ``label``, ``size``,
179        ``com_<axis>`` (center of mass), ``bb_min_<axis>`` and ``bb_max_<axis>``. The bounding box is
180        slice-ready: ``bb_min`` is inclusive and ``bb_max`` is the exclusive stop, so the object's box
181        is ``tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes)``. Axis names are ``z``/``y``/
182        ``x`` for 2D/3D data and ``axis0`` ... otherwise.
183    """
184    check_rerun_args(job_type, resume_from, block_ids)
185    src = as_source(input)
186    if not np.issubdtype(np.dtype(src.dtype), np.integer):
187        raise ValueError(f"morphology expects an integer label image, got dtype {src.dtype}.")
188    ndim = src.ndim
189
190    if check_direct(job_type, num_workers, block_shape, mask, block_ids):
191        table = _block_table(src[full_roi(src.ndim)], [0] * ndim)
192        tables = [table] if table is not None else []
193    else:
194        runner = get_runner(job_type, job_config)
195        results = runner.run(_morphology_block, [input], num_workers=num_workers,
196                             block_shape=block_shape, mask=mask, block_ids=block_ids,
197                             resume_from=resume_from, has_return_val=True, name="morphology",
198                             pre_cleanup=pre_cleanup)
199        tables = [r for r in results if r is not None]
200
201    return _to_dataframe(*_merge_tables(tables, ndim), ndim)

def morphology( input: 'SourceLike', num_workers: int = 1, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, mask: 'Optional[SourceLike]' = None, block_ids: Optional[Sequence[int]] = None, resume_from: Optional[str] = None, pre_cleanup: Optional[Callable[[str], NoneType]] = None) -> pandas.DataFrame: View Source

141def morphology(
142    input: SourceLike,
143    num_workers: int = 1,
144    block_shape: Optional[Tuple[int, ...]] = None,
145    job_type: str = "local",
146    job_config: Optional[RunnerConfig] = None,
147    mask: Optional[SourceLike] = None,
148    block_ids: Optional[Sequence[int]] = None,
149    resume_from: Optional[str] = None,
150    pre_cleanup: Optional[Callable[[str], None]] = None,
151) -> "pd.DataFrame":
152    """Compute per-label morphology (size, center of mass, bounding box) of a labeled volume.
153
154    Statistics are computed block-wise and merged so the result is exact regardless of how labels
155    straddle block boundaries. The background label ``0`` is excluded.
156
157    Args:
158        input: The input label image (a numpy/zarr/n5 array or a `Source`); must be integer-typed.
159        num_workers: Number of parallel workers (threads for ``local``, tasks for distributed
160            backends).
161        block_shape: Shape of the processing blocks. Defaults to the input chunk shape;
162            required for unchunked data.
163        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
164        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
165        mask: Optional binary mask; voxels outside the mask are excluded from the computation.
166        block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks);
167            the table then reflects only those blocks.
168        resume_from: Distributed only; the preserved temp folder of a failed run to resume and
169            merge (see ``runner.run``). The returned table then covers the full volume (the
170            already-completed blocks merged with the re-run ones). Mutually exclusive with
171            ``block_ids``.
172        pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating
173            process with the job temp folder right before it is deleted (distributed backends only).
174            Use it to read out the per-task timing files under ``tmp_folder/timings/`` before cleanup.
175            Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path, which
176            have no temp folder.
177
178    Returns:
179        A pandas DataFrame with one row per label, sorted by label, with columns ``label``, ``size``,
180        ``com_<axis>`` (center of mass), ``bb_min_<axis>`` and ``bb_max_<axis>``. The bounding box is
181        slice-ready: ``bb_min`` is inclusive and ``bb_max`` is the exclusive stop, so the object's box
182        is ``tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes)``. Axis names are ``z``/``y``/
183        ``x`` for 2D/3D data and ``axis0`` ... otherwise.
184    """
185    check_rerun_args(job_type, resume_from, block_ids)
186    src = as_source(input)
187    if not np.issubdtype(np.dtype(src.dtype), np.integer):
188        raise ValueError(f"morphology expects an integer label image, got dtype {src.dtype}.")
189    ndim = src.ndim
190
191    if check_direct(job_type, num_workers, block_shape, mask, block_ids):
192        table = _block_table(src[full_roi(src.ndim)], [0] * ndim)
193        tables = [table] if table is not None else []
194    else:
195        runner = get_runner(job_type, job_config)
196        results = runner.run(_morphology_block, [input], num_workers=num_workers,
197                             block_shape=block_shape, mask=mask, block_ids=block_ids,
198                             resume_from=resume_from, has_return_val=True, name="morphology",
199                             pre_cleanup=pre_cleanup)
200        tables = [r for r in results if r is not None]
201
202    return _to_dataframe(*_merge_tables(tables, ndim), ndim)

Compute per-label morphology (size, center of mass, bounding box) of a labeled volume.

Statistics are computed block-wise and merged so the result is exact regardless of how labels straddle block boundaries. The background label 0 is excluded.

Args: input: The input label image (a numpy/zarr/n5 array or a Source); must be integer-typed. num_workers: Number of parallel workers (threads for local, tasks for distributed backends). block_shape: Shape of the processing blocks. Defaults to the input chunk shape; required for unchunked data. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). mask: Optional binary mask; voxels outside the mask are excluded from the computation. block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks); the table then reflects only those blocks. resume_from: Distributed only; the preserved temp folder of a failed run to resume and merge (see runner.run). The returned table then covers the full volume (the already-completed blocks merged with the re-run ones). Mutually exclusive with block_ids. pre_cleanup: Optional pre_cleanup(tmp_folder) callback invoked on the orchestrating process with the job temp folder right before it is deleted (distributed backends only). Use it to read out the per-task timing files under tmp_folder/timings/ before cleanup. Ignored for the local backend and for the direct (single-worker, unchunked) path, which have no temp folder.

Returns: A pandas DataFrame with one row per label, sorted by label, with columns label, size, com_<axis> (center of mass), bb_min_<axis> and bb_max_<axis>. The bounding box is slice-ready: bb_min is inclusive and bb_max is the exclusive stop, so the object's box is tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes). Axis names are z/y/ x for 2D/3D data and axis0 ... otherwise.