bioimage_py.evaluation.centroid_matching

Object matching scores (precision / recall / segmentation accuracy / f1) by centroid distance.

This complements the overlap-based bioimage_py.evaluation.matching: instead of densifying the contingency table and scoring object pairs by IoU, it matches objects whose centroids lie within a distance threshold of one another. This is the natural criterion when objects are point-like (spots, nuclei, detections) or when voxel overlap is too strict. Like dice_score it is not built on the contingency table — it needs object centroids, not voxel overlaps.

There are two layers. coordinate_matching is the low-level form: it matches two point sets directly. centroid_matching is the high-level form: it derives each segmentation's per-object centroids (the center of mass, via bioimage_py.morphology.morphology, which runs block-wise / distributed) and then defers to coordinate_matching. Both find the optimal one-to-one assignment with the Hungarian algorithm (scipy.optimize.linear_sum_assignment) and reuse the precision / recall / f1 / segmentation-accuracy formulas of the overlap-based matching.

  1"""Object matching scores (precision / recall / segmentation accuracy / f1) by centroid distance.
  2
  3This complements the overlap-based :func:`bioimage_py.evaluation.matching`: instead of densifying the
  4contingency table and scoring object pairs by IoU, it matches objects whose centroids lie within a
  5distance threshold of one another. This is the natural criterion when objects are point-like (spots,
  6nuclei, detections) or when voxel overlap is too strict. Like ``dice_score`` it is *not* built on the
  7contingency table — it needs object centroids, not voxel overlaps.
  8
  9There are two layers. ``coordinate_matching`` is the low-level form: it matches two point sets directly.
 10``centroid_matching`` is the high-level form: it derives each segmentation's per-object centroids (the
 11center of mass, via :func:`bioimage_py.morphology.morphology`, which runs block-wise / distributed) and
 12then defers to ``coordinate_matching``. Both find the optimal one-to-one assignment with the Hungarian
 13algorithm (``scipy.optimize.linear_sum_assignment``) and reuse the precision / recall / f1 /
 14segmentation-accuracy formulas of the overlap-based matching.
 15"""
 16from __future__ import annotations
 17
 18from typing import Dict, Optional, Sequence, Tuple
 19
 20import numpy as np
 21from scipy.optimize import linear_sum_assignment
 22from scipy.spatial.distance import cdist
 23
 24from ..morphology import morphology
 25from ..runner.config import RunnerConfig
 26from ..sources import SourceLike
 27from .matching import _f1, _precision, _recall, _segmentation_accuracy
 28
 29__all__ = ["coordinate_matching", "centroid_matching"]
 30
 31_EPS = 1e-12
 32
 33
 34def _count_matches(distances: np.ndarray, distance_threshold: float) -> int:
 35    """The number of true positives: optimal one-to-one assignment of points within the threshold."""
 36    n_matched = min(distances.shape)
 37    valid = distances <= distance_threshold
 38    if n_matched == 0 or not valid.any():
 39        return 0
 40    # Prioritize within-threshold pairs; the distance is a tie-breaker normalized so each term is
 41    # <= 1 / (2 * n_matched) and can never outweigh the integer validity term (cf. matching._compute_tps).
 42    costs = -valid.astype("float64") + distances / (2 * n_matched * (float(distances.max()) + _EPS))
 43    row_ind, col_ind = linear_sum_assignment(costs)
 44    return int(np.count_nonzero(valid[row_ind, col_ind]))
 45
 46
 47def _as_coordinates(coordinates: object, name: str) -> np.ndarray:
 48    """Coerce ``coordinates`` to a ``(N, ndim)`` float64 array (an empty set becomes ``(0, 0)``)."""
 49    arr = np.asarray(coordinates, dtype="float64")
 50    if arr.size == 0:
 51        return arr.reshape(0, 0)
 52    if arr.ndim != 2:
 53        raise ValueError(f"{name} must have shape (N, ndim), got shape {arr.shape}.")
 54    return arr
 55
 56
 57def _centroids(
 58    segmentation: SourceLike,
 59    ignore_label: Optional[int],
 60    num_workers: int,
 61    block_shape: Optional[Tuple[int, ...]],
 62    job_type: str,
 63    job_config: Optional[RunnerConfig],
 64    mask: Optional[SourceLike],
 65) -> np.ndarray:
 66    """Per-object centers of mass of ``segmentation`` as a ``(N, ndim)`` voxel-coordinate array."""
 67    table = morphology(segmentation, num_workers=num_workers, block_shape=block_shape,
 68                       job_type=job_type, job_config=job_config, mask=mask)
 69    if ignore_label is not None:
 70        table = table[table["label"] != ignore_label]
 71    com_columns = [column for column in table.columns if column.startswith("com_")]
 72    return table[com_columns].to_numpy(dtype="float64")
 73
 74
 75def coordinate_matching(
 76    coordinates_a: object,
 77    coordinates_b: object,
 78    *,
 79    distance_threshold: float,
 80    resolution: Optional[Sequence[float]] = None,
 81) -> Dict[str, float]:
 82    """Match two point sets by centroid distance and compute the matching scores.
 83
 84    Each point in ``coordinates_a`` is matched to at most one point in ``coordinates_b`` (and vice
 85    versa) via the optimal one-to-one assignment, counting a pair as a match only if their Euclidean
 86    distance does not exceed ``distance_threshold``. ``coordinates_a`` plays the role of the prediction
 87    and ``coordinates_b`` the reference, so precision is computed over ``coordinates_a`` and recall over
 88    ``coordinates_b`` (matching the orientation of :func:`matching`).
 89
 90    Args:
 91        coordinates_a: The candidate points to evaluate, an array-like of shape ``(N, ndim)``.
 92        coordinates_b: The reference points, an array-like of shape ``(M, ndim)`` with the same
 93            ``ndim`` as ``coordinates_a``.
 94        distance_threshold: Maximum centroid distance (in the units of the coordinates, or of
 95            ``resolution`` if given) for two points to count as a match.
 96        resolution: Optional per-axis spacing; when given, coordinates are scaled by it before
 97            distances are computed, so ``distance_threshold`` is interpreted in physical units.
 98
 99    Returns:
100        A mapping with keys ``precision``, ``recall``, ``segmentation_accuracy`` and ``f1``.
101    """
102    coords_a = _as_coordinates(coordinates_a, "coordinates_a")
103    coords_b = _as_coordinates(coordinates_b, "coordinates_b")
104    n_a, n_b = coords_a.shape[0], coords_b.shape[0]
105
106    if n_a and n_b:
107        if coords_a.shape[1] != coords_b.shape[1]:
108            raise ValueError(f"coordinates_a and coordinates_b must have the same ndim, got "
109                             f"{coords_a.shape[1]} and {coords_b.shape[1]}.")
110        if resolution is not None:
111            spacing = np.asarray(resolution, dtype="float64")
112            if spacing.shape != (coords_a.shape[1],):
113                raise ValueError(f"resolution must have one entry per axis ({coords_a.shape[1]}), got "
114                                 f"shape {spacing.shape}.")
115            coords_a, coords_b = coords_a * spacing, coords_b * spacing
116        tp = _count_matches(cdist(coords_a, coords_b), distance_threshold)
117    else:
118        tp = 0
119
120    fp, fn = n_a - tp, n_b - tp
121    return {"precision": _precision(tp, fp, fn), "recall": _recall(tp, fp, fn),
122            "segmentation_accuracy": _segmentation_accuracy(tp, fp, fn), "f1": _f1(tp, fp, fn)}
123
124
125def centroid_matching(
126    segmentation: SourceLike,
127    groundtruth: SourceLike,
128    *,
129    distance_threshold: float,
130    resolution: Optional[Sequence[float]] = None,
131    ignore_label: Optional[int] = 0,
132    num_workers: int = 1,
133    block_shape: Optional[Tuple[int, ...]] = None,
134    job_type: str = "local",
135    job_config: Optional[RunnerConfig] = None,
136    mask: Optional[SourceLike] = None,
137) -> Dict[str, float]:
138    """Compute object-matching scores between two segmentations by centroid distance.
139
140    Derives each object's center of mass with :func:`bioimage_py.morphology.morphology` (the centroid
141    extraction runs block-wise / distributed via the runner arguments) and then matches the two centroid
142    sets with :func:`coordinate_matching`.
143
144    Args:
145        segmentation: Candidate segmentation to evaluate (a numpy/zarr/n5 array or a `Source`); must be
146            integer-typed.
147        groundtruth: The groundtruth segmentation; same shape as ``segmentation``.
148        distance_threshold: Maximum centroid distance (in voxels, or physical units if ``resolution``
149            is given) for two objects to count as a match.
150        resolution: Optional per-axis voxel spacing; when given, centroids are scaled by it before
151            distances are computed, so ``distance_threshold`` is interpreted in physical units.
152        ignore_label: Object label removed from both segmentations before matching (e.g. background).
153            ``None`` keeps all objects. Note ``morphology`` already excludes label ``0``.
154        num_workers: Number of parallel workers used to extract the centroids.
155        block_shape: Shape of the processing blocks. Defaults to the input chunk shape.
156        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
157        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
158        mask: Optional binary mask; voxels outside the mask are excluded.
159
160    Returns:
161        A mapping with keys ``precision``, ``recall``, ``segmentation_accuracy`` and ``f1``.
162    """
163    coords_a = _centroids(segmentation, ignore_label, num_workers, block_shape, job_type, job_config,
164                          mask)
165    coords_b = _centroids(groundtruth, ignore_label, num_workers, block_shape, job_type, job_config,
166                          mask)
167    return coordinate_matching(coords_a, coords_b, distance_threshold=distance_threshold,
168                               resolution=resolution)
def coordinate_matching( coordinates_a: object, coordinates_b: object, *, distance_threshold: float, resolution: Optional[Sequence[float]] = None) -> Dict[str, float]:
 76def coordinate_matching(
 77    coordinates_a: object,
 78    coordinates_b: object,
 79    *,
 80    distance_threshold: float,
 81    resolution: Optional[Sequence[float]] = None,
 82) -> Dict[str, float]:
 83    """Match two point sets by centroid distance and compute the matching scores.
 84
 85    Each point in ``coordinates_a`` is matched to at most one point in ``coordinates_b`` (and vice
 86    versa) via the optimal one-to-one assignment, counting a pair as a match only if their Euclidean
 87    distance does not exceed ``distance_threshold``. ``coordinates_a`` plays the role of the prediction
 88    and ``coordinates_b`` the reference, so precision is computed over ``coordinates_a`` and recall over
 89    ``coordinates_b`` (matching the orientation of :func:`matching`).
 90
 91    Args:
 92        coordinates_a: The candidate points to evaluate, an array-like of shape ``(N, ndim)``.
 93        coordinates_b: The reference points, an array-like of shape ``(M, ndim)`` with the same
 94            ``ndim`` as ``coordinates_a``.
 95        distance_threshold: Maximum centroid distance (in the units of the coordinates, or of
 96            ``resolution`` if given) for two points to count as a match.
 97        resolution: Optional per-axis spacing; when given, coordinates are scaled by it before
 98            distances are computed, so ``distance_threshold`` is interpreted in physical units.
 99
100    Returns:
101        A mapping with keys ``precision``, ``recall``, ``segmentation_accuracy`` and ``f1``.
102    """
103    coords_a = _as_coordinates(coordinates_a, "coordinates_a")
104    coords_b = _as_coordinates(coordinates_b, "coordinates_b")
105    n_a, n_b = coords_a.shape[0], coords_b.shape[0]
106
107    if n_a and n_b:
108        if coords_a.shape[1] != coords_b.shape[1]:
109            raise ValueError(f"coordinates_a and coordinates_b must have the same ndim, got "
110                             f"{coords_a.shape[1]} and {coords_b.shape[1]}.")
111        if resolution is not None:
112            spacing = np.asarray(resolution, dtype="float64")
113            if spacing.shape != (coords_a.shape[1],):
114                raise ValueError(f"resolution must have one entry per axis ({coords_a.shape[1]}), got "
115                                 f"shape {spacing.shape}.")
116            coords_a, coords_b = coords_a * spacing, coords_b * spacing
117        tp = _count_matches(cdist(coords_a, coords_b), distance_threshold)
118    else:
119        tp = 0
120
121    fp, fn = n_a - tp, n_b - tp
122    return {"precision": _precision(tp, fp, fn), "recall": _recall(tp, fp, fn),
123            "segmentation_accuracy": _segmentation_accuracy(tp, fp, fn), "f1": _f1(tp, fp, fn)}

Match two point sets by centroid distance and compute the matching scores.

Each point in coordinates_a is matched to at most one point in coordinates_b (and vice versa) via the optimal one-to-one assignment, counting a pair as a match only if their Euclidean distance does not exceed distance_threshold. coordinates_a plays the role of the prediction and coordinates_b the reference, so precision is computed over coordinates_a and recall over coordinates_b (matching the orientation of matching()).

Args: coordinates_a: The candidate points to evaluate, an array-like of shape (N, ndim). coordinates_b: The reference points, an array-like of shape (M, ndim) with the same ndim as coordinates_a. distance_threshold: Maximum centroid distance (in the units of the coordinates, or of resolution if given) for two points to count as a match. resolution: Optional per-axis spacing; when given, coordinates are scaled by it before distances are computed, so distance_threshold is interpreted in physical units.

Returns: A mapping with keys precision, recall, segmentation_accuracy and f1.

def centroid_matching( segmentation: 'SourceLike', groundtruth: 'SourceLike', *, distance_threshold: float, resolution: Optional[Sequence[float]] = None, ignore_label: Optional[int] = 0, num_workers: int = 1, block_shape: Optional[Tuple[int, ...]] = None, job_type: str = 'local', job_config: Optional[bioimage_py.runner.RunnerConfig] = None, mask: 'Optional[SourceLike]' = None) -> Dict[str, float]:
126def centroid_matching(
127    segmentation: SourceLike,
128    groundtruth: SourceLike,
129    *,
130    distance_threshold: float,
131    resolution: Optional[Sequence[float]] = None,
132    ignore_label: Optional[int] = 0,
133    num_workers: int = 1,
134    block_shape: Optional[Tuple[int, ...]] = None,
135    job_type: str = "local",
136    job_config: Optional[RunnerConfig] = None,
137    mask: Optional[SourceLike] = None,
138) -> Dict[str, float]:
139    """Compute object-matching scores between two segmentations by centroid distance.
140
141    Derives each object's center of mass with :func:`bioimage_py.morphology.morphology` (the centroid
142    extraction runs block-wise / distributed via the runner arguments) and then matches the two centroid
143    sets with :func:`coordinate_matching`.
144
145    Args:
146        segmentation: Candidate segmentation to evaluate (a numpy/zarr/n5 array or a `Source`); must be
147            integer-typed.
148        groundtruth: The groundtruth segmentation; same shape as ``segmentation``.
149        distance_threshold: Maximum centroid distance (in voxels, or physical units if ``resolution``
150            is given) for two objects to count as a match.
151        resolution: Optional per-axis voxel spacing; when given, centroids are scaled by it before
152            distances are computed, so ``distance_threshold`` is interpreted in physical units.
153        ignore_label: Object label removed from both segmentations before matching (e.g. background).
154            ``None`` keeps all objects. Note ``morphology`` already excludes label ``0``.
155        num_workers: Number of parallel workers used to extract the centroids.
156        block_shape: Shape of the processing blocks. Defaults to the input chunk shape.
157        job_type: Execution backend: one of ``"local"``, ``"subprocess"`` or ``"slurm"``.
158        job_config: Backend configuration (a `RunnerConfig` / `SlurmConfig`).
159        mask: Optional binary mask; voxels outside the mask are excluded.
160
161    Returns:
162        A mapping with keys ``precision``, ``recall``, ``segmentation_accuracy`` and ``f1``.
163    """
164    coords_a = _centroids(segmentation, ignore_label, num_workers, block_shape, job_type, job_config,
165                          mask)
166    coords_b = _centroids(groundtruth, ignore_label, num_workers, block_shape, job_type, job_config,
167                          mask)
168    return coordinate_matching(coords_a, coords_b, distance_threshold=distance_threshold,
169                               resolution=resolution)

Compute object-matching scores between two segmentations by centroid distance.

Derives each object's center of mass with bioimage_py.morphology.morphology (the centroid extraction runs block-wise / distributed via the runner arguments) and then matches the two centroid sets with coordinate_matching().

Args: segmentation: Candidate segmentation to evaluate (a numpy/zarr/n5 array or a Source); must be integer-typed. groundtruth: The groundtruth segmentation; same shape as segmentation. distance_threshold: Maximum centroid distance (in voxels, or physical units if resolution is given) for two objects to count as a match. resolution: Optional per-axis voxel spacing; when given, centroids are scaled by it before distances are computed, so distance_threshold is interpreted in physical units. ignore_label: Object label removed from both segmentations before matching (e.g. background). None keeps all objects. Note morphology already excludes label 0. num_workers: Number of parallel workers used to extract the centroids. block_shape: Shape of the processing blocks. Defaults to the input chunk shape. job_type: Execution backend: one of "local", "subprocess" or "slurm". job_config: Backend configuration (a RunnerConfig / SlurmConfig). mask: Optional binary mask; voxels outside the mask are excluded.

Returns: A mapping with keys precision, recall, segmentation_accuracy and f1.