bioimage_py.morphology.morphology
Per-label morphology (size, center of mass, bounding box) via the runner's return channel.
This is a reduction operation: each block computes per-label sufficient statistics, and the main
process merges them into one table (one row per label). It mirrors the stats ops
(bioimage_py.stats.mean_and_std()) — the per-block statistics flow through
runner.run(..., has_return_val=True) and the merge is pure numpy — so it behaves identically across
the local / subprocess / slurm backends. No custom C++ (nifty) code is required.
1"""Per-label morphology (size, center of mass, bounding box) via the runner's return channel. 2 3This is a reduction operation: each block computes per-label sufficient statistics, and the main 4process merges them into one table (one row per label). It mirrors the ``stats`` ops 5(:func:`bioimage_py.stats.mean_and_std`) — the per-block statistics flow through 6``runner.run(..., has_return_val=True)`` and the merge is pure numpy — so it behaves identically across 7the ``local`` / ``subprocess`` / ``slurm`` backends. No custom C++ (nifty) code is required. 8""" 9from __future__ import annotations 10 11from typing import Callable, List, Optional, Sequence, Tuple 12 13import numpy as np 14import pandas as pd 15 16from ..runner import get_runner 17from ..runner.config import RunnerConfig 18from ..sources import Source, SourceLike, as_source 19from ..util import BlockDescriptor, check_direct, check_rerun_args, full_roi, to_roi 20 21__all__ = ["morphology"] 22 23 24def _axis_names(ndim: int) -> List[str]: 25 """Return per-axis names: ``z``/``y``/``x`` for 2D/3D, else ``axis0`` ... ``axis{ndim-1}``.""" 26 if ndim == 2: 27 return ["y", "x"] 28 if ndim == 3: 29 return ["z", "y", "x"] 30 return [f"axis{a}" for a in range(ndim)] 31 32 33def _block_table(seg: np.ndarray, offset: Sequence[int]) -> Optional[np.ndarray]: 34 """Compute the per-label morphology table for one block. 35 36 A single sort of the flattened labels feeds every statistic (count, weighted coordinate sum and 37 bounding box), which is markedly faster than ``np.unique`` + ``bincount`` (those pay for a second 38 sort to group the bounding-box coordinates). 39 40 Args: 41 seg: The integer label block. ``0`` is treated as background and dropped. 42 offset: The block's begin coordinate, added to make coordinates global. 43 44 Returns: 45 A ``(K, 2 + 3 * ndim)`` float64 array with columns 46 ``[label, size, wsum_axis..., bb_min_axis..., bb_max_axis...]`` where ``wsum`` is the weighted 47 coordinate sum (``size * com``) and ``bb_max`` is the inclusive max coordinate. ``None`` if the 48 block contains no foreground. 49 """ 50 ndim = seg.ndim 51 flat = seg.ravel() 52 order = np.argsort(flat) # default quicksort: intra-group order is irrelevant to sum/min/max/count. 53 sl = flat[order] 54 starts = np.flatnonzero(np.concatenate(([True], sl[1:] != sl[:-1]))) 55 uniq = sl[starts] 56 size = np.diff(np.append(starts, sl.size)).astype("float64") 57 58 table = np.empty((uniq.shape[0], 2 + 3 * ndim), dtype="float64") 59 table[:, 0] = uniq 60 table[:, 1] = size 61 for a in range(ndim): 62 reshape = [1] * ndim 63 reshape[a] = seg.shape[a] 64 coord = np.broadcast_to( 65 (np.arange(seg.shape[a], dtype="float64") + offset[a]).reshape(reshape), seg.shape 66 ).ravel()[order] 67 table[:, 2 + a] = np.add.reduceat(coord, starts) 68 table[:, 2 + ndim + a] = np.minimum.reduceat(coord, starts) 69 table[:, 2 + 2 * ndim + a] = np.maximum.reduceat(coord, starts) 70 71 table = table[table[:, 0] != 0] # drop background (label 0); uniq is sorted, so it is row 0 if present. 72 return table if table.shape[0] else None 73 74 75def _merge_tables(tables: List[np.ndarray], ndim: int) -> Tuple[np.ndarray, ...]: 76 """Merge per-block tables into per-label statistics. 77 78 Uses the same single-sort + ``reduceat`` scheme as :func:`_block_table`: group the stacked rows by 79 label, sum the size and weighted-coordinate columns, and take the min/max of the bounding-box 80 columns. The center of mass is then ``weighted_sum / size`` per axis. 81 82 Args: 83 tables: The non-``None`` per-block tables (``(_, 2 + 3 * ndim)`` arrays). 84 ndim: The number of spatial dimensions. 85 86 Returns: 87 A tuple ``(labels, size, com, bb_min, bb_max)`` with shapes ``(K,)``, ``(K,)``, ``(K, ndim)``, 88 ``(K, ndim)``, ``(K, ndim)``. ``bb_max`` is the inclusive max coordinate. 89 """ 90 if not tables: 91 return (np.zeros((0,), "float64"), np.zeros((0,), "float64"), 92 np.zeros((0, ndim), "float64"), np.zeros((0, ndim), "float64"), 93 np.zeros((0, ndim), "float64")) 94 stacked = np.vstack(tables) 95 order = np.argsort(stacked[:, 0]) 96 stacked = stacked[order] 97 labels = stacked[:, 0] 98 starts = np.flatnonzero(np.concatenate(([True], labels[1:] != labels[:-1]))) 99 100 uniq = labels[starts] 101 summed = np.add.reduceat(stacked[:, 1:2 + ndim], starts, axis=0) # [size, wsum_axis...] 102 size = summed[:, 0] 103 com = summed[:, 1:] / size[:, None] 104 bb_min = np.minimum.reduceat(stacked[:, 2 + ndim:2 + 2 * ndim], starts, axis=0) 105 bb_max = np.maximum.reduceat(stacked[:, 2 + 2 * ndim:2 + 3 * ndim], starts, axis=0) 106 return uniq, size, com, bb_min, bb_max 107 108 109def _to_dataframe(labels: np.ndarray, size: np.ndarray, com: np.ndarray, 110 bb_min: np.ndarray, bb_max: np.ndarray, ndim: int) -> "pd.DataFrame": 111 """Assemble the per-label statistics into a sorted pandas DataFrame. 112 113 ``bb_max`` is converted to an exclusive slice stop (``max_coordinate + 1``) so the bounding box of a 114 row is ``tuple(slice(row.bb_min_<ax>, row.bb_max_<ax>) for ax in axes)``. 115 """ 116 axes = _axis_names(ndim) 117 columns = {"label": labels.astype("uint64"), "size": size.astype("int64")} 118 for a, ax in enumerate(axes): 119 columns[f"com_{ax}"] = com[:, a].astype("float64") 120 for a, ax in enumerate(axes): 121 columns[f"bb_min_{ax}"] = bb_min[:, a].astype("int64") 122 for a, ax in enumerate(axes): 123 columns[f"bb_max_{ax}"] = (bb_max[:, a] + 1).astype("int64") 124 return pd.DataFrame(columns).sort_values("label").reset_index(drop=True) 125 126 127def _morphology_block(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source], 128 mask: Optional[Source]) -> Optional[np.ndarray]: 129 """Per-block morphology table (``None`` if the block has no foreground).""" 130 roi = to_roi(block) 131 seg = inputs[0][roi] 132 if mask is not None: 133 block_mask = mask[roi].astype(bool) 134 if not block_mask.any(): 135 return None 136 seg = np.where(block_mask, seg, 0) # Masked-out voxels fall into the dropped background. 137 return _block_table(seg, list(block.begin)) 138 139 140def morphology( 141 input: SourceLike, 142 num_workers: int = 1, 143 block_shape: Optional[Tuple[int, ...]] = None, 144 job_type: str = "local", 145 job_config: Optional[RunnerConfig] = None, 146 mask: Optional[SourceLike] = None, 147 block_ids: Optional[Sequence[int]] = None, 148 resume_from: Optional[str] = None, 149 pre_cleanup: Optional[Callable[[str], None]] = None, 150) -> "pd.DataFrame": 151 """Compute per-label morphology (size, center of mass, bounding box) of a labeled volume. 152 153 Statistics are computed block-wise and merged so the result is exact regardless of how labels 154 straddle block boundaries. The background label ``0`` is excluded. 155 156 Args: 157 input: The input label image (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 158 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 159 backends). 160 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; 161 required for unchunked data. 162 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 163 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 164 mask: Optional binary mask; voxels outside the mask are excluded from the computation. 165 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks); 166 the table then reflects only those blocks. 167 resume_from: Distributed only; the preserved temp folder of a failed run to resume and 168 merge (see ``runner.run``). The returned table then covers the full volume (the 169 already-completed blocks merged with the re-run ones). Mutually exclusive with 170 ``block_ids``. 171 pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating 172 process with the job temp folder right before it is deleted (distributed backends only). 173 Use it to read out the per-task timing files under ``tmp_folder/timings/`` before cleanup. 174 Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path, which 175 have no temp folder. 176 177 Returns: 178 A pandas DataFrame with one row per label, sorted by label, with columns ``label``, ``size``, 179 ``com_<axis>`` (center of mass), ``bb_min_<axis>`` and ``bb_max_<axis>``. The bounding box is 180 slice-ready: ``bb_min`` is inclusive and ``bb_max`` is the exclusive stop, so the object's box 181 is ``tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes)``. Axis names are ``z``/``y``/ 182 ``x`` for 2D/3D data and ``axis0`` ... otherwise. 183 """ 184 check_rerun_args(job_type, resume_from, block_ids) 185 src = as_source(input) 186 if not np.issubdtype(np.dtype(src.dtype), np.integer): 187 raise ValueError(f"morphology expects an integer label image, got dtype {src.dtype}.") 188 ndim = src.ndim 189 190 if check_direct(job_type, num_workers, block_shape, mask, block_ids): 191 table = _block_table(src[full_roi(src.ndim)], [0] * ndim) 192 tables = [table] if table is not None else [] 193 else: 194 runner = get_runner(job_type, job_config) 195 results = runner.run(_morphology_block, [input], num_workers=num_workers, 196 block_shape=block_shape, mask=mask, block_ids=block_ids, 197 resume_from=resume_from, has_return_val=True, name="morphology", 198 pre_cleanup=pre_cleanup) 199 tables = [r for r in results if r is not None] 200 201 return _to_dataframe(*_merge_tables(tables, ndim), ndim)
141def morphology( 142 input: SourceLike, 143 num_workers: int = 1, 144 block_shape: Optional[Tuple[int, ...]] = None, 145 job_type: str = "local", 146 job_config: Optional[RunnerConfig] = None, 147 mask: Optional[SourceLike] = None, 148 block_ids: Optional[Sequence[int]] = None, 149 resume_from: Optional[str] = None, 150 pre_cleanup: Optional[Callable[[str], None]] = None, 151) -> "pd.DataFrame": 152 """Compute per-label morphology (size, center of mass, bounding box) of a labeled volume. 153 154 Statistics are computed block-wise and merged so the result is exact regardless of how labels 155 straddle block boundaries. The background label ``0`` is excluded. 156 157 Args: 158 input: The input label image (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 159 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 160 backends). 161 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; 162 required for unchunked data. 163 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 164 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 165 mask: Optional binary mask; voxels outside the mask are excluded from the computation. 166 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks); 167 the table then reflects only those blocks. 168 resume_from: Distributed only; the preserved temp folder of a failed run to resume and 169 merge (see ``runner.run``). The returned table then covers the full volume (the 170 already-completed blocks merged with the re-run ones). Mutually exclusive with 171 ``block_ids``. 172 pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating 173 process with the job temp folder right before it is deleted (distributed backends only). 174 Use it to read out the per-task timing files under ``tmp_folder/timings/`` before cleanup. 175 Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path, which 176 have no temp folder. 177 178 Returns: 179 A pandas DataFrame with one row per label, sorted by label, with columns ``label``, ``size``, 180 ``com_<axis>`` (center of mass), ``bb_min_<axis>`` and ``bb_max_<axis>``. The bounding box is 181 slice-ready: ``bb_min`` is inclusive and ``bb_max`` is the exclusive stop, so the object's box 182 is ``tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes)``. Axis names are ``z``/``y``/ 183 ``x`` for 2D/3D data and ``axis0`` ... otherwise. 184 """ 185 check_rerun_args(job_type, resume_from, block_ids) 186 src = as_source(input) 187 if not np.issubdtype(np.dtype(src.dtype), np.integer): 188 raise ValueError(f"morphology expects an integer label image, got dtype {src.dtype}.") 189 ndim = src.ndim 190 191 if check_direct(job_type, num_workers, block_shape, mask, block_ids): 192 table = _block_table(src[full_roi(src.ndim)], [0] * ndim) 193 tables = [table] if table is not None else [] 194 else: 195 runner = get_runner(job_type, job_config) 196 results = runner.run(_morphology_block, [input], num_workers=num_workers, 197 block_shape=block_shape, mask=mask, block_ids=block_ids, 198 resume_from=resume_from, has_return_val=True, name="morphology", 199 pre_cleanup=pre_cleanup) 200 tables = [r for r in results if r is not None] 201 202 return _to_dataframe(*_merge_tables(tables, ndim), ndim)
Compute per-label morphology (size, center of mass, bounding box) of a labeled volume.
Statistics are computed block-wise and merged so the result is exact regardless of how labels
straddle block boundaries. The background label 0 is excluded.
Args:
input: The input label image (a numpy/zarr/n5 array or a Source); must be integer-typed.
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
block_shape: Shape of the processing blocks. Defaults to the input chunk shape;
required for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
mask: Optional binary mask; voxels outside the mask are excluded from the computation.
block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks);
the table then reflects only those blocks.
resume_from: Distributed only; the preserved temp folder of a failed run to resume and
merge (see runner.run). The returned table then covers the full volume (the
already-completed blocks merged with the re-run ones). Mutually exclusive with
block_ids.
pre_cleanup: Optional pre_cleanup(tmp_folder) callback invoked on the orchestrating
process with the job temp folder right before it is deleted (distributed backends only).
Use it to read out the per-task timing files under tmp_folder/timings/ before cleanup.
Ignored for the local backend and for the direct (single-worker, unchunked) path, which
have no temp folder.
Returns:
A pandas DataFrame with one row per label, sorted by label, with columns label, size,
com_<axis> (center of mass), bb_min_<axis> and bb_max_<axis>. The bounding box is
slice-ready: bb_min is inclusive and bb_max is the exclusive stop, so the object's box
is tuple(slice(bb_min_<axis>, bb_max_<axis>) for axis in axes). Axis names are z/y/
x for 2D/3D data and axis0 ... otherwise.