bioimage_py.evaluation.contingency_table
Block-wise contingency table (sparse segmentation overlap) via the runner's return channel.
The contingency table counts, for every pair of labels (a, b), the number of pixels assigned to
a in the first segmentation and b in the second. It is the shared primitive underlying nearly
every segmentation-comparison metric (variation of information, rand index, cremi score, object VI,
object matching, symmetric best dice).
It is a reduction operation: each block computes its sparse overlap counts and the main process sums
them into one table. The overlap counts are additive across blocks with no halo, because the blocks
partition the volume disjointly. This mirrors the morphology / stats ops — the per-block tables
flow through runner.run(..., has_return_val=True) and the merge is pure numpy — so it behaves
identically across the local / subprocess / slurm backends. The per-block counting itself is
delegated to bioimage_cpp.utils.segmentation_overlap().
1"""Block-wise contingency table (sparse segmentation overlap) via the runner's return channel. 2 3The contingency table counts, for every pair of labels ``(a, b)``, the number of pixels assigned to 4``a`` in the first segmentation and ``b`` in the second. It is the shared primitive underlying nearly 5every segmentation-comparison metric (variation of information, rand index, cremi score, object VI, 6object matching, symmetric best dice). 7 8It is a reduction operation: each block computes its sparse overlap counts and the main process sums 9them into one table. The overlap counts are additive across blocks with no halo, because the blocks 10partition the volume disjointly. This mirrors the ``morphology`` / ``stats`` ops — the per-block tables 11flow through ``runner.run(..., has_return_val=True)`` and the merge is pure numpy — so it behaves 12identically across the ``local`` / ``subprocess`` / ``slurm`` backends. The per-block counting itself is 13delegated to :func:`bioimage_cpp.utils.segmentation_overlap`. 14""" 15from __future__ import annotations 16 17from dataclasses import dataclass 18from typing import Callable, Dict, List, Optional, Sequence, Tuple 19 20import numpy as np 21import bioimage_cpp as bic 22 23from ..runner import get_runner 24from ..runner.config import RunnerConfig 25from ..sources import Source, SourceLike, as_source 26from ..util import BlockDescriptor, check_direct, check_rerun_args, full_roi, to_roi 27 28__all__ = ["ContingencyTable", "contingency_table"] 29 30 31@dataclass(frozen=True) 32class ContingencyTable: 33 """The sparse contingency table between two segmentations. 34 35 All arrays use ``uint64`` (lossless for label ids and counts). Background (label ``0``) is kept; 36 ignoring it is a metric-level concern handled by the callers of this primitive. 37 38 Attributes: 39 pairs: The ``(N, 2)`` array of co-occurring label pairs ``[label_a, label_b]``, sorted 40 lexicographically by ``(label_a, label_b)``. 41 counts: The ``(N,)`` overlap count for each pair in `pairs`. 42 labels_a: The ``(Ka,)`` sorted unique labels present in the first segmentation. 43 sizes_a: The ``(Ka,)`` size (pixel count) of each label in `labels_a`. 44 labels_b: The ``(Kb,)`` sorted unique labels present in the second segmentation. 45 sizes_b: The ``(Kb,)`` size (pixel count) of each label in `labels_b`. 46 n_points: The total number of pixels counted (after any masking). 47 """ 48 49 pairs: np.ndarray 50 counts: np.ndarray 51 labels_a: np.ndarray 52 sizes_a: np.ndarray 53 labels_b: np.ndarray 54 sizes_b: np.ndarray 55 n_points: int 56 57 def as_dicts(self) -> Tuple[Dict[int, int], Dict[int, int]]: 58 """Return the per-label sizes as ``({label_a: size}, {label_b: size})`` dictionaries. 59 60 Returns: 61 A dictionary mapping each label in the first segmentation to its size, and the analogous 62 dictionary for the second segmentation. 63 """ 64 a_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_a, self.sizes_a)} 65 b_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_b, self.sizes_b)} 66 return a_dict, b_dict 67 68 def drop_ignore(self, ignore_a: Optional[Sequence[int]] = None, 69 ignore_b: Optional[Sequence[int]] = None) -> "ContingencyTable": 70 """Return a copy with the voxels of the given ignore labels removed. 71 72 A pair is dropped if its A-label is in ``ignore_a`` **or** its B-label is in ``ignore_b`` (the 73 marginal sizes and ``n_points`` are recomputed over what remains). This is equivalent to 74 excluding those voxels before counting. Passing ``None`` (or an empty sequence) for a side is a 75 no-op for that side; dropping every present label yields the empty table. 76 77 Args: 78 ignore_a: Labels to ignore in the first segmentation. 79 ignore_b: Labels to ignore in the second segmentation. 80 81 Returns: 82 A new `ContingencyTable` without the ignored voxels. 83 """ 84 if ignore_a is None and ignore_b is None: 85 return self 86 drop = np.zeros(self.pairs.shape[0], dtype=bool) 87 if ignore_a is not None: 88 drop |= np.isin(self.pairs[:, 0], np.asarray(list(ignore_a), dtype=self.pairs.dtype)) 89 if ignore_b is not None: 90 drop |= np.isin(self.pairs[:, 1], np.asarray(list(ignore_b), dtype=self.pairs.dtype)) 91 keep = ~drop 92 return _table_from_pairs(self.pairs[keep], self.counts[keep]) 93 94 95def _overlap_rows(a: np.ndarray, b: np.ndarray) -> Optional[np.ndarray]: 96 """Compute the sparse overlap counts of two equally-shaped label arrays. 97 98 Returns a ``(K, 3)`` uint64 array with columns ``[label_a, label_b, count]`` (plain rather than the 99 structured ``overlap_table`` so the merge can ``vstack`` / ``lexsort`` / ``reduceat`` directly), or 100 ``None`` if there is no overlap (e.g. empty inputs). 101 """ 102 table = bic.utils.segmentation_overlap(a, b).overlap_table() 103 if table.shape[0] == 0: 104 return None 105 return np.stack([table["label_a"], table["label_b"], table["count"]], axis=1).astype("uint64") 106 107 108def _merge_tables(tables: List[np.ndarray]) -> ContingencyTable: 109 """Merge per-block overlap rows into one :class:`ContingencyTable`. 110 111 Groups the stacked ``[label_a, label_b, count]`` rows by their ``(label_a, label_b)`` pair and sums 112 the counts (single ``lexsort`` + ``reduceat``, mirroring ``morphology._merge_tables``). Marginals are 113 derived from the merged pairs, since every pixel maps to exactly one pair. 114 115 Args: 116 tables: The non-``None`` per-block ``(_, 3)`` uint64 arrays. 117 118 Returns: 119 The merged contingency table. 120 """ 121 if not tables: 122 return _table_from_pairs(np.zeros((0, 2), "uint64"), np.zeros((0,), "uint64")) 123 124 stacked = np.vstack(tables) 125 order = np.lexsort((stacked[:, 1], stacked[:, 0])) # sort by label_a, then label_b. 126 stacked = stacked[order] 127 a, b = stacked[:, 0], stacked[:, 1] 128 starts = np.flatnonzero(np.concatenate(([True], (a[1:] != a[:-1]) | (b[1:] != b[:-1])))) 129 pairs = stacked[starts][:, :2].copy() 130 counts = np.add.reduceat(stacked[:, 2], starts) 131 return _table_from_pairs(pairs, counts) 132 133 134def _table_from_pairs(pairs: np.ndarray, counts: np.ndarray) -> ContingencyTable: 135 """Build a :class:`ContingencyTable` from merged, ``(a, b)``-sorted unique pairs and their counts. 136 137 Derives the per-label marginals (sum over the other label) and ``n_points`` from the pairs. The 138 caller must pass unique pairs whose first column is sorted ascending — true of 139 :func:`_merge_tables`'s output, and preserved by the boolean row-selection in 140 :meth:`ContingencyTable.drop_ignore`. 141 142 Args: 143 pairs: The ``(N, 2)`` uint64 ``[label_a, label_b]`` pairs (unique, sorted by ``(a, b)``). 144 counts: The ``(N,)`` uint64 overlap count for each pair. 145 146 Returns: 147 The contingency table (empty when ``pairs`` has no rows). 148 """ 149 if pairs.shape[0] == 0: 150 empty = np.zeros((0,), "uint64") 151 return ContingencyTable(np.zeros((0, 2), "uint64"), empty, empty, empty, empty, empty, 0) 152 153 # Marginal for a: pairs[:, 0] is already sorted ascending, so group it directly. 154 a_col = pairs[:, 0] 155 a_starts = np.flatnonzero(np.concatenate(([True], a_col[1:] != a_col[:-1]))) 156 labels_a = a_col[a_starts].copy() 157 sizes_a = np.add.reduceat(counts, a_starts) 158 159 # Marginal for b: re-sort by label_b (stable, to keep grouping deterministic), then group. 160 b_order = np.argsort(pairs[:, 1], kind="stable") 161 b_sorted = pairs[:, 1][b_order] 162 b_counts = counts[b_order] 163 b_starts = np.flatnonzero(np.concatenate(([True], b_sorted[1:] != b_sorted[:-1]))) 164 labels_b = b_sorted[b_starts].copy() 165 sizes_b = np.add.reduceat(b_counts, b_starts) 166 167 return ContingencyTable(pairs, counts, labels_a, sizes_a, labels_b, sizes_b, int(counts.sum())) 168 169 170def _contingency_block(block: BlockDescriptor, inputs: Sequence[Source], outputs: Sequence[Source], 171 mask: Optional[Source]) -> Optional[np.ndarray]: 172 """Per-block overlap rows (``None`` if the block is fully masked out or has no overlap).""" 173 roi = to_roi(block) 174 a = inputs[0][roi] 175 b = inputs[1][roi] 176 if mask is not None: 177 block_mask = mask[roi].astype(bool) 178 if not block_mask.any(): 179 return None 180 a, b = a[block_mask], b[block_mask] # 1D, same length; segmentation_overlap accepts this. 181 return _overlap_rows(a, b) 182 183 184def contingency_table( 185 seg_a: SourceLike, 186 seg_b: SourceLike, 187 num_workers: int = 1, 188 block_shape: Optional[Tuple[int, ...]] = None, 189 job_type: str = "local", 190 job_config: Optional[RunnerConfig] = None, 191 mask: Optional[SourceLike] = None, 192 block_ids: Optional[Sequence[int]] = None, 193 resume_from: Optional[str] = None, 194 pre_cleanup: Optional[Callable[[str], None]] = None, 195) -> ContingencyTable: 196 """Compute the contingency table (sparse overlap counts) between two segmentations. 197 198 The two segmentations are compared pixel-by-pixel and the overlap counts are accumulated 199 block-wise, so the result is exact regardless of how labels straddle block boundaries. The pairing 200 is symmetric: which input is the candidate and which is the ground truth is up to the caller. 201 Background (label ``0``) is included; ignoring labels is left to the metrics built on top of this. 202 203 Args: 204 seg_a: The first segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 205 seg_b: The second segmentation; must be integer-typed and the same shape as `seg_a`. 206 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 207 backends). 208 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; 209 required for unchunked data. 210 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 211 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 212 mask: Optional binary mask; pixels outside the mask are excluded from the counts. 213 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks); 214 the table then reflects only those blocks. 215 resume_from: Distributed only; the preserved temp folder of a failed run to resume and 216 merge (see ``runner.run``). The returned table then covers the full volume (the 217 already-completed blocks merged with the re-run ones). Mutually exclusive with 218 ``block_ids``. 219 pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating 220 process with the job temp folder right before it is deleted (distributed backends only). 221 Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path. 222 223 Returns: 224 The merged `ContingencyTable`. 225 """ 226 check_rerun_args(job_type, resume_from, block_ids) 227 src_a = as_source(seg_a) 228 src_b = as_source(seg_b) 229 for name, src in (("seg_a", src_a), ("seg_b", src_b)): 230 if not np.issubdtype(np.dtype(src.dtype), np.integer): 231 raise ValueError(f"contingency_table expects integer label images, got dtype {src.dtype} " 232 f"for {name}.") 233 234 if check_direct(job_type, num_workers, block_shape, mask, block_ids): 235 table = _overlap_rows(src_a[full_roi(src_a.ndim)], src_b[full_roi(src_b.ndim)]) 236 tables = [table] if table is not None else [] 237 else: 238 runner = get_runner(job_type, job_config) 239 results = runner.run(_contingency_block, [seg_a, seg_b], num_workers=num_workers, 240 block_shape=block_shape, mask=mask, block_ids=block_ids, 241 resume_from=resume_from, has_return_val=True, name="contingency_table", 242 pre_cleanup=pre_cleanup) 243 tables = [r for r in results if r is not None] 244 245 return _merge_tables(tables)
32@dataclass(frozen=True) 33class ContingencyTable: 34 """The sparse contingency table between two segmentations. 35 36 All arrays use ``uint64`` (lossless for label ids and counts). Background (label ``0``) is kept; 37 ignoring it is a metric-level concern handled by the callers of this primitive. 38 39 Attributes: 40 pairs: The ``(N, 2)`` array of co-occurring label pairs ``[label_a, label_b]``, sorted 41 lexicographically by ``(label_a, label_b)``. 42 counts: The ``(N,)`` overlap count for each pair in `pairs`. 43 labels_a: The ``(Ka,)`` sorted unique labels present in the first segmentation. 44 sizes_a: The ``(Ka,)`` size (pixel count) of each label in `labels_a`. 45 labels_b: The ``(Kb,)`` sorted unique labels present in the second segmentation. 46 sizes_b: The ``(Kb,)`` size (pixel count) of each label in `labels_b`. 47 n_points: The total number of pixels counted (after any masking). 48 """ 49 50 pairs: np.ndarray 51 counts: np.ndarray 52 labels_a: np.ndarray 53 sizes_a: np.ndarray 54 labels_b: np.ndarray 55 sizes_b: np.ndarray 56 n_points: int 57 58 def as_dicts(self) -> Tuple[Dict[int, int], Dict[int, int]]: 59 """Return the per-label sizes as ``({label_a: size}, {label_b: size})`` dictionaries. 60 61 Returns: 62 A dictionary mapping each label in the first segmentation to its size, and the analogous 63 dictionary for the second segmentation. 64 """ 65 a_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_a, self.sizes_a)} 66 b_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_b, self.sizes_b)} 67 return a_dict, b_dict 68 69 def drop_ignore(self, ignore_a: Optional[Sequence[int]] = None, 70 ignore_b: Optional[Sequence[int]] = None) -> "ContingencyTable": 71 """Return a copy with the voxels of the given ignore labels removed. 72 73 A pair is dropped if its A-label is in ``ignore_a`` **or** its B-label is in ``ignore_b`` (the 74 marginal sizes and ``n_points`` are recomputed over what remains). This is equivalent to 75 excluding those voxels before counting. Passing ``None`` (or an empty sequence) for a side is a 76 no-op for that side; dropping every present label yields the empty table. 77 78 Args: 79 ignore_a: Labels to ignore in the first segmentation. 80 ignore_b: Labels to ignore in the second segmentation. 81 82 Returns: 83 A new `ContingencyTable` without the ignored voxels. 84 """ 85 if ignore_a is None and ignore_b is None: 86 return self 87 drop = np.zeros(self.pairs.shape[0], dtype=bool) 88 if ignore_a is not None: 89 drop |= np.isin(self.pairs[:, 0], np.asarray(list(ignore_a), dtype=self.pairs.dtype)) 90 if ignore_b is not None: 91 drop |= np.isin(self.pairs[:, 1], np.asarray(list(ignore_b), dtype=self.pairs.dtype)) 92 keep = ~drop 93 return _table_from_pairs(self.pairs[keep], self.counts[keep])
The sparse contingency table between two segmentations.
All arrays use uint64 (lossless for label ids and counts). Background (label 0) is kept;
ignoring it is a metric-level concern handled by the callers of this primitive.
Attributes:
pairs: The (N, 2) array of co-occurring label pairs [label_a, label_b], sorted
lexicographically by (label_a, label_b).
counts: The (N,) overlap count for each pair in pairs.
labels_a: The (Ka,) sorted unique labels present in the first segmentation.
sizes_a: The (Ka,) size (pixel count) of each label in labels_a.
labels_b: The (Kb,) sorted unique labels present in the second segmentation.
sizes_b: The (Kb,) size (pixel count) of each label in labels_b.
n_points: The total number of pixels counted (after any masking).
58 def as_dicts(self) -> Tuple[Dict[int, int], Dict[int, int]]: 59 """Return the per-label sizes as ``({label_a: size}, {label_b: size})`` dictionaries. 60 61 Returns: 62 A dictionary mapping each label in the first segmentation to its size, and the analogous 63 dictionary for the second segmentation. 64 """ 65 a_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_a, self.sizes_a)} 66 b_dict = {int(lab): int(cnt) for lab, cnt in zip(self.labels_b, self.sizes_b)} 67 return a_dict, b_dict
Return the per-label sizes as ({label_a: size}, {label_b: size}) dictionaries.
Returns: A dictionary mapping each label in the first segmentation to its size, and the analogous dictionary for the second segmentation.
69 def drop_ignore(self, ignore_a: Optional[Sequence[int]] = None, 70 ignore_b: Optional[Sequence[int]] = None) -> "ContingencyTable": 71 """Return a copy with the voxels of the given ignore labels removed. 72 73 A pair is dropped if its A-label is in ``ignore_a`` **or** its B-label is in ``ignore_b`` (the 74 marginal sizes and ``n_points`` are recomputed over what remains). This is equivalent to 75 excluding those voxels before counting. Passing ``None`` (or an empty sequence) for a side is a 76 no-op for that side; dropping every present label yields the empty table. 77 78 Args: 79 ignore_a: Labels to ignore in the first segmentation. 80 ignore_b: Labels to ignore in the second segmentation. 81 82 Returns: 83 A new `ContingencyTable` without the ignored voxels. 84 """ 85 if ignore_a is None and ignore_b is None: 86 return self 87 drop = np.zeros(self.pairs.shape[0], dtype=bool) 88 if ignore_a is not None: 89 drop |= np.isin(self.pairs[:, 0], np.asarray(list(ignore_a), dtype=self.pairs.dtype)) 90 if ignore_b is not None: 91 drop |= np.isin(self.pairs[:, 1], np.asarray(list(ignore_b), dtype=self.pairs.dtype)) 92 keep = ~drop 93 return _table_from_pairs(self.pairs[keep], self.counts[keep])
Return a copy with the voxels of the given ignore labels removed.
A pair is dropped if its A-label is in ignore_a or its B-label is in ignore_b (the
marginal sizes and n_points are recomputed over what remains). This is equivalent to
excluding those voxels before counting. Passing None (or an empty sequence) for a side is a
no-op for that side; dropping every present label yields the empty table.
Args: ignore_a: Labels to ignore in the first segmentation. ignore_b: Labels to ignore in the second segmentation.
Returns:
A new ContingencyTable without the ignored voxels.
185def contingency_table( 186 seg_a: SourceLike, 187 seg_b: SourceLike, 188 num_workers: int = 1, 189 block_shape: Optional[Tuple[int, ...]] = None, 190 job_type: str = "local", 191 job_config: Optional[RunnerConfig] = None, 192 mask: Optional[SourceLike] = None, 193 block_ids: Optional[Sequence[int]] = None, 194 resume_from: Optional[str] = None, 195 pre_cleanup: Optional[Callable[[str], None]] = None, 196) -> ContingencyTable: 197 """Compute the contingency table (sparse overlap counts) between two segmentations. 198 199 The two segmentations are compared pixel-by-pixel and the overlap counts are accumulated 200 block-wise, so the result is exact regardless of how labels straddle block boundaries. The pairing 201 is symmetric: which input is the candidate and which is the ground truth is up to the caller. 202 Background (label ``0``) is included; ignoring labels is left to the metrics built on top of this. 203 204 Args: 205 seg_a: The first segmentation (a numpy/zarr/n5 array or a `Source`); must be integer-typed. 206 seg_b: The second segmentation; must be integer-typed and the same shape as `seg_a`. 207 num_workers: Number of parallel workers (threads for ``local``, tasks for distributed 208 backends). 209 block_shape: Shape of the processing blocks. Defaults to the input chunk shape; 210 required for unchunked data. 211 job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``. 212 job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`). 213 mask: Optional binary mask; pixels outside the mask are excluded from the counts. 214 block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks); 215 the table then reflects only those blocks. 216 resume_from: Distributed only; the preserved temp folder of a failed run to resume and 217 merge (see ``runner.run``). The returned table then covers the full volume (the 218 already-completed blocks merged with the re-run ones). Mutually exclusive with 219 ``block_ids``. 220 pre_cleanup: Optional ``pre_cleanup(tmp_folder)`` callback invoked on the orchestrating 221 process with the job temp folder right before it is deleted (distributed backends only). 222 Ignored for the ``local`` backend and for the direct (single-worker, unchunked) path. 223 224 Returns: 225 The merged `ContingencyTable`. 226 """ 227 check_rerun_args(job_type, resume_from, block_ids) 228 src_a = as_source(seg_a) 229 src_b = as_source(seg_b) 230 for name, src in (("seg_a", src_a), ("seg_b", src_b)): 231 if not np.issubdtype(np.dtype(src.dtype), np.integer): 232 raise ValueError(f"contingency_table expects integer label images, got dtype {src.dtype} " 233 f"for {name}.") 234 235 if check_direct(job_type, num_workers, block_shape, mask, block_ids): 236 table = _overlap_rows(src_a[full_roi(src_a.ndim)], src_b[full_roi(src_b.ndim)]) 237 tables = [table] if table is not None else [] 238 else: 239 runner = get_runner(job_type, job_config) 240 results = runner.run(_contingency_block, [seg_a, seg_b], num_workers=num_workers, 241 block_shape=block_shape, mask=mask, block_ids=block_ids, 242 resume_from=resume_from, has_return_val=True, name="contingency_table", 243 pre_cleanup=pre_cleanup) 244 tables = [r for r in results if r is not None] 245 246 return _merge_tables(tables)
Compute the contingency table (sparse overlap counts) between two segmentations.
The two segmentations are compared pixel-by-pixel and the overlap counts are accumulated
block-wise, so the result is exact regardless of how labels straddle block boundaries. The pairing
is symmetric: which input is the candidate and which is the ground truth is up to the caller.
Background (label 0) is included; ignoring labels is left to the metrics built on top of this.
Args:
seg_a: The first segmentation (a numpy/zarr/n5 array or a Source); must be integer-typed.
seg_b: The second segmentation; must be integer-typed and the same shape as seg_a.
num_workers: Number of parallel workers (threads for local, tasks for distributed
backends).
block_shape: Shape of the processing blocks. Defaults to the input chunk shape;
required for unchunked data.
job_type: Execution backend: one of "local", "subprocess" or "slurm".
job_config: Backend configuration (a RunnerConfig / SlurmConfig).
mask: Optional binary mask; pixels outside the mask are excluded from the counts.
block_ids: Restrict processing to these block ids (e.g. to re-run previously failed blocks);
the table then reflects only those blocks.
resume_from: Distributed only; the preserved temp folder of a failed run to resume and
merge (see runner.run). The returned table then covers the full volume (the
already-completed blocks merged with the re-run ones). Mutually exclusive with
block_ids.
pre_cleanup: Optional pre_cleanup(tmp_folder) callback invoked on the orchestrating
process with the job temp folder right before it is deleted (distributed backends only).
Ignored for the local backend and for the direct (single-worker, unchunked) path.
Returns:
The merged ContingencyTable.