segmentation_measurement

segmentation-measurement

segmentation-measurement is a Python library for post-processing, measuring, and analyzing instance segmentations from microscopy images. It provides:

  • Post-processing: filter small segments, remove small holes, compute ring-masks around segments.
  • Intensity measurements: per-object mean, median, max, standard deviation, and percentile statistics.
  • Morphology measurements: per-object area/volume, perimeter/surface area, sphericity, solidity, axis lengths, and equivalent diameter; supports anisotropic pixel/voxel sizes.
  • Cell-nucleus measurements: per-cell nucleus count, cell-to-nucleus area/volume ratio, and optional cytoplasmic vs. nuclear intensity ratios from paired cell and nucleus segmentations.
  • Threshold analysis: categorize objects into named groups based on any measurement column using automatic or manual thresholds.
  • Clustering analysis: cluster objects using k-means, DBSCAN, HDBSCAN, or Mean Shift on any combination of measurement features, with an interactive 2-D feature-reduction scatter plot (UMAP, t-SNE, or PCA).
  • Classification analysis: train a random forest or logistic regression classifier from interactive napari brush annotations and apply it to all segments, with optional export of the trained classifier.
  • Batch processing across multiple segmentations: define named groups of layers in the napari plugin and run any measurement or analysis widget over every member of a group with a single click, with results written back per-layer.
  • Napari plugin: interactive widgets for all of the above, with table visualization and export to CSV, TSV, and Excel.
  • CLI: command-line interface for all functionality.

All functions support 2D and 3D inputs.

Installation

Install the core library with pip:

pip install segmentation-measurement

To also install the napari plugin and its dependencies:

pip install "segmentation-measurement[napari]"

To install from source:

git clone https://github.com/computational-cell-analytics/segmentation-measurement-tool
cd segmentation-measurement-tool
pip install -e .
# or with napari support:
pip install -e ".[napari]"

Requires Python 3.9 or later.

Napari Plugin

The segmentation-measurement napari plugin provides interactive widgets for post-processing segmentation label layers and for computing and exploring per-segment measurements – all without writing any Python code.

Installation

pip install "segmentation-measurement[napari]"

After installation the plugin is automatically discovered by napari.

Opening the Widgets

Open any widget from the napari menu:

Plugins → Segmentation Measurement → Postprocessing

Plugins → Segmentation Measurement → Intensity Measurement → Intensity Measurement

Plugins → Segmentation Measurement → Morphology Measurement → Morphology Measurement

Plugins → Segmentation Measurement → Threshold Analysis → Threshold Analysis

Plugins → Segmentation Measurement → Cell-Nucleus Measurement → Cell-Nucleus Measurement

Plugins → Segmentation Measurement → Clustering Analysis → Clustering Analysis

Plugins → Segmentation Measurement → Classification Analysis → Classification Analysis

Plugins → Segmentation Measurement → Table Manipulation → Table Manipulation

Plugins → Segmentation Measurement → Group Manager → Group Manager

All widgets appear as dockable panels that can be placed anywhere in the napari window.

Working with measurement tables

Measurement and analysis results are stored as the features table of the source Labels layer. napari ships a built-in Features Table dock (Layers → Visualize → Features table widget) that displays this table for the currently selected layer. Whenever a widget in this plugin writes to a layer's features (after a measurement, after applying a threshold/cluster/classifier, after loading or editing a table), the dock is opened automatically and the source layer is selected so the result is visible immediately. The dock supports sorting, in-place editing, copy/paste, CSV save, and bidirectional row ↔ viewer selection sync.

Working with groups (batch processing)

A group is a named bundle of layers that you batch over. Each group lists ordered layers under three roles:

  • segmentation (required, ≥1 layer) — the primary label layers to process
  • nucleus_segmentation (optional) — paired nucleus segmentations for the Cell-Nucleus widget
  • intensity_image (optional) — paired intensity images for the Intensity widget and (optionally) the Cell-Nucleus widget

Within a group, layers across roles are paired by position: segmentation[i] is matched with nucleus_segmentation[i] and intensity_image[i]. An optional role's list must either be empty or have exactly the same length as the segmentation list.

Groups are defined in the Group Manager Widget. Once defined, every measurement and analysis widget exposes a Target combo at the top with two options:

  • <single layer> (default) — original single-layer behaviour. Pick one segmentation (and optional pair partners) via the existing combos.
  • a group name — operate on the group's members. Measurement widgets iterate over members and write results into each member's features. Analysis widgets concatenate features across members for joint computation, then split results back to each member's features and emit one output label layer per member named {output}_{layer} (or just {output} for a single-member group).

Renaming a layer that is referenced by a group automatically updates the group definition.


Postprocessing Widget

The Postprocessing widget applies one of three post-processing operations to an existing label layer and writes the result to a new layer or back to the same layer.

Layout

┌─────────────────────────────────┐
│ Input segmentation: [combo]     │
│ Output name:        [combo]     │
│ Method:             [combo]     │
│ ┌ Parameters ──────────────┐   │
│ │  <method-specific param> │   │
│ └──────────────────────────┘   │
│ [Run]                           │
└─────────────────────────────────┘

Controls

Input segmentation : Dropdown list of all Labels layers currently loaded in napari. Select the layer you want to process.

Output name : Dropdown showing existing Labels layers plus the default entry postprocessed. You can also type a new name directly. If the chosen name matches an existing layer the data of that layer is updated in place; otherwise a new Labels layer is added. Setting the output name equal to the input name processes the input layer in place.

Method : Choose one of the three post-processing operations (see below). The Parameters panel updates immediately to show the relevant controls.

Run : Apply the selected method with the current parameters.

Methods

Filter Small Segments

Removes segments whose pixel (2-D) or voxel (3-D) count is below the threshold. Removed segments become background (0).

  • Min size – Minimum number of pixels/voxels a segment must have to be retained (default: 100).

Remove Small Holes

Fills enclosed background holes within segments when the hole size does not exceed the threshold. Other segments are never overwritten.

  • Max hole size – Maximum hole size in pixels/voxels to fill (default: 50).

Ring Mask

Creates an annular ring of a fixed width around each segment. Rings are placed only on background pixels; overlapping rings resolve in favour of the smaller label ID. This is commonly used to create pseudo-cytoplasm masks around segmented nuclei.

  • Ring width – Width of the ring in pixels/voxels (default: 5).
  • Keep original – When checked (default), the original segment pixels are retained in the output alongside the ring pixels. Uncheck to produce a ring-only mask where the original segment interiors are set to 0.

Watershed

Refines a segmentation using the watershed algorithm. The selected input segmentation is used as seed markers; a separate heatmap image layer provides the topographic landscape. The algorithm floods from low heatmap values upward, so the heatmap should have low values at segment boundaries. If your heatmap instead has high values at object centres (e.g. a distance transform or foreground-probability map), negate it before passing it to the widget.

  • Heatmap – Image layer used as the watershed landscape (low values flooded first).
  • Mask (optional) – Label layer whose footprint restricts processing. Pixels outside the mask are set to 0 in the output. Select None to process all pixels (default).

Intensity Measurement Widget

The Intensity Measurement widget computes per-segment intensity statistics from a label layer and a co-registered intensity image.

Layout

┌─────────────────────────────────┐
│ Target:          [combo]        │
│ Segmentation:    [combo]        │
│ Intensity image: [combo]        │
│ [Measure intensities]           │
└─────────────────────────────────┘

Workflow

  1. Pick a Target: <single layer> (default) for the original single-pair behaviour, or a previously defined group name to batch over its members. In group mode the Segmentation and Intensity image combos become read-only and show the first member; the group must define a non-empty intensity_image role.
  2. Select a Segmentation layer (Labels) from the first dropdown.
  3. Select an Intensity image layer (Image) from the second dropdown.
  4. Click Measure intensities.

In group mode the measurement is run on every (segmentation[i], intensity_image[i]) pair in turn, writing results into each segmentation layer's features. See the Working with groups section for the broader pattern.

The result is merged into the segmentation layer's features table and the napari Features Table dock is opened automatically with the segmentation layer selected. Re-running the measurement on the same layer silently overwrites the existing intensity columns; running a different measurement (e.g. Morphology) on the same layer adds its columns alongside.

The columns added to layer.features are:

Column Description
index Integer segment label ID
mean_intensity Mean pixel intensity
median_intensity Median pixel intensity
max_intensity Maximum pixel intensity
min_intensity Minimum pixel intensity
std_intensity Standard deviation
percentile_10 10th percentile
percentile_25 25th percentile (Q1)
percentile_75 75th percentile (Q3)
percentile_90 90th percentile

Saving the table

Use the Save as CSV button in the napari Features Table dock to export the features. For TSV / XLSX export, use the Save table button in the Table Manipulation Widget below.


Morphology Measurement Widget

The Morphology Measurement widget computes per-segment shape descriptors from a label layer. Physical pixel/voxel sizes can be specified per axis to obtain measurements in real-world units.

Layout (scrollable)

┌─────────────────────────────────┐
│ Target:       [combo]           │
│ Segmentation: [combo]           │
│ ┌ Physical pixel/voxel size ─┐ │
│ │ Y: [spinbox]               │ │
│ │ X: [spinbox]               │ │
│ └────────────────────────────┘ │
│ [Measure morphology]            │
└─────────────────────────────────┘

For 3-D data a third spinbox Z is added automatically.

Workflow

  1. Pick a Target: <single layer> (default) for the original single-layer behaviour, or a previously defined group name to batch over its segmentation members. In group mode the Segmentation combo becomes read-only and shows the first member; the same scale settings apply to every member.
  2. Select a Segmentation layer (Labels) from the dropdown. The scale spinboxes are pre-populated from the layer's scale attribute if it has been set; otherwise they default to 1.0.
  3. Adjust the per-axis scale values if needed. For isotropic data a single value applies to all axes; for anisotropic data set each axis independently.
  4. Click Measure morphology.

The result is merged into the segmentation layer's features table and the napari Features Table dock is opened automatically with the segmentation layer selected. Re-running the measurement on the same layer silently overwrites the morphology columns; running a different measurement on the same layer adds its columns alongside.

The columns added to layer.features (one row per segment) are:

2-D columns

Column Description
index Integer segment label ID
area Area in physical units (px² or µm² etc.)
perimeter Perimeter length in physical units
sphericity Circularity: 1.0 for a perfect circle, <1 for elongated or irregular shapes
solidity Ratio of segment area to convex hull area
axis_major_length Length of the major axis of the fitted ellipse
axis_minor_length Length of the minor axis of the fitted ellipse
equivalent_diameter Diameter of a circle with the same area

3-D columns

Column Description
index Integer segment label ID
volume Volume in physical units (vx³ or µm³ etc.)
surface_area Surface area computed via marching cubes
sphericity 1.0 for a perfect sphere, <1 for elongated or irregular shapes
solidity Ratio of segment volume to convex hull volume
axis_major_length Length of the major axis of the fitted ellipsoid
axis_minor_length Length of the minor axis of the fitted ellipsoid
equivalent_diameter Diameter of a sphere with the same volume

Saving the table

Use the Save as CSV button in the napari Features Table dock, or the Save table button in the Table Manipulation Widget for TSV / XLSX output.


Threshold Analysis Widget

The Threshold Analysis widget categorizes segments into named groups based on one or more thresholds applied to any numeric column of the selected layer's features table.

Layout (scrollable)

┌──────────────────────────────────────┐
│ Target:       [combo]                │
│ Segmentation: [combo]                │
│ ┌ Column histogram ────────────────┐ │
│ │ Column: [combo]                  │ │
│ │  <histogram plot>                │ │
│ └──────────────────────────────────┘ │
│ ┌ Categorization ──────────────────┐ │
│ │ Number of categories: [spin]     │ │
│ │ Threshold 1: [spin]  ...         │ │
│ │ Name 1: [edit]  ...              │ │
│ │ [Suggest thresholds]             │ │
│ │ Output layer: [edit]             │ │
│ │ [Categorize]                     │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────┘

Workflow

Step 1 – Select the segmentation layer or group

The widget operates on the features table of the selected Labels layer (populated by Intensity Measurement, Morphology Measurement, Cell-Nucleus Measurement, or by loading a CSV via the Table Manipulation Widget).

  1. Run one of the measurement widgets first (or load a CSV).
  2. Pick a Target: <single layer> (default) for the original single-layer behaviour, or a previously defined group name. In group mode the Segmentation combo becomes read-only. The histogram and threshold suggestion are computed on the concatenation of all members' features so a single set of thresholds can be applied consistently across the batch.
  3. In single-layer mode, pick the layer from the Segmentation dropdown. The Column dropdown is filled with its numeric columns (excluding index and category_id).

Step 2 – Explore the histogram

The Column histogram section shows the distribution of the currently selected column. Threshold lines are drawn in red so you can evaluate the split visually before applying it.

Note: The histogram requires matplotlib. Install it with pip install matplotlib if it is not already available.

Step 3 – Set thresholds and categorize

  1. Set Number of categories (2–10). The threshold and name fields update automatically.
  2. Enter threshold values in the Threshold spin-boxes, or click Suggest thresholds to auto-populate them using equally-spaced quantiles of the selected column. The threshold lines on the histogram update in real time.
  3. Optionally rename each category in the Name fields (defaults: category_1, category_2, …).
  4. Enter an Output layer name (default: categories).
  5. Click Categorize.

Two things happen:

  • The source layer's features gains two new columns: category_id (integer, 1-based) and category_name (string). The Features Table dock is opened automatically with the source layer selected so you can inspect them.
  • A new Labels layer is created (or updated) in napari where each segment is assigned its category ID as the label value. Use napari's built-in colormap controls to distinguish the categories visually.

In group mode every member's features receives the new columns and one output Labels layer is created per member named {output}_{layer_name} (or just {output} when the group has a single member).

How thresholds are applied

Segments with a value below the first threshold are assigned category 1, segments between the first and second threshold are assigned category 2, and so on. Thresholds need not be sorted; the widget sorts them internally.


Cell-Nucleus Measurement Widget

The Cell-Nucleus Measurement widget computes per-cell features that combine a cell segmentation with a nucleus segmentation. It reports the number of nuclei per cell, cell-to-nucleus area/volume ratios, and optionally cytoplasmic vs. nuclear intensity statistics.

Layout (scrollable)

┌─────────────────────────────────────┐
│ Target:                 [combo]     │
│ Cell segmentation:      [combo]     │
│ Nucleus segmentation:   [combo]     │
│ Intensity image (optional): [combo] │
│ ┌ Physical pixel/voxel size ──────┐ │
│ │ Y: [spinbox]                    │ │
│ │ X: [spinbox]                    │ │
│ └─────────────────────────────────┘ │
│ [Measure cell-nucleus]              │
└─────────────────────────────────────┘

For 3-D data a third spinbox Z is added automatically.

Workflow

  1. Pick a Target: <single layer> (default) for the original per-pair behaviour, or a previously defined group name to batch over its members. In group mode the cell, nucleus, and intensity-image combos become read-only and show the first member; the group must define a non-empty nucleus_segmentation role, while intensity_image is optional and is used when present.
  2. Select a Cell segmentation layer (Labels) from the first dropdown. The scale spinboxes are pre-populated from the layer's scale attribute if it has been set; otherwise they default to 1.0.
  3. Select a Nucleus segmentation layer (Labels) from the second dropdown. This layer must have the same spatial dimensions as the cell segmentation.
  4. Optionally select an Intensity image layer (Image) from the third dropdown. Choose (none) to skip intensity measurements.
  5. Adjust the per-axis scale values if needed (same convention as the Morphology widget).
  6. Click Measure cell-nucleus.

In group mode the measurement is run on every (segmentation[i], nucleus_segmentation[i], intensity_image[i] or None) triple in turn, writing results into each cell-segmentation layer's features.

The result is merged into the cell segmentation layer's features table and the napari Features Table dock is opened automatically with that layer selected. One row per cell.

Columns – without intensity image (2-D)

Column Description
index Integer cell label ID
n_nuclei Number of nucleus labels overlapping with this cell
cell_area Area of the whole cell in physical units (nucleus included)
nucleus_area Total area of nuclei within this cell in physical units
area_ratio cell_area / nucleus_area; NaN for cells with no nucleus

For 3-D data the columns are cell_volume, nucleus_volume, and volume_ratio instead.

Additional columns – with intensity image

When an intensity image is selected, columns are added for each statistic {stat} in mean, median, max, min, percentile_10, percentile_25, percentile_75, percentile_90:

Column Description
cell_{stat}_intensity Statistic over the cytoplasmic region (cell pixels where no nucleus is present)
nucleus_{stat}_intensity Statistic over all nuclear pixels within this cell
{stat}_intensity_ratio cell_{stat}_intensity / nucleus_{stat}_intensity; NaN when either region is empty or the nucleus value is zero

Saving the table

Use the Save as CSV button in the napari Features Table dock, or the Save table button in the Table Manipulation Widget for TSV / XLSX output.


Clustering Analysis Widget

The Clustering Analysis widget groups segments into clusters based on all numeric columns in a previously computed measurement table. After clustering, a 2-D feature-reduction scatter plot visualises the result, and a new label layer is created where each segment is painted with its cluster ID. The scatter-plot colours and the label-layer colours are kept in sync.

Layout (scrollable)

┌──────────────────────────────────────┐
│ Target:       [combo]                │
│ Segmentation: [combo]                │
│ ┌ Feature reduction ───────────────┐ │
│ │ Method: [UMAP▾]  [Reduce]        │ │
│ │  <2-D scatter plot>              │ │
│ └──────────────────────────────────┘ │
│ ┌ Clustering ──────────────────────┐ │
│ │ Method: [K-Means▾]               │ │
│ │  <method-specific parameters>    │ │
│ │ Output layer: [edit]             │ │
│ │ [Cluster]                        │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────┘

Workflow

Step 1 – Select the segmentation layer or group

The widget operates on the features table of the selected Labels layer. Run a measurement widget first (or load a CSV via the Table Manipulation widget), then either pick the layer from the Segmentation dropdown or pick a group from the Target dropdown.

In group mode the Segmentation combo becomes read-only. Features across the group's members are concatenated for joint clustering and the 2-D feature-reduction scatter plot, and one output Labels layer is created per member named {output}_{layer_name} (or just {output} for a single-member group). The same cluster colours are applied across all output layers so cluster IDs match visually.

Step 2 – Explore the feature space (optional)

  1. Choose a Feature reduction method: UMAP (default), TSNE, or PCA.
  2. Click Reduce to compute a 2-D embedding of the features and display an uncoloured scatter plot.

Note: UMAP requires the optional umap-learn package (pip install umap-learn). If it is not installed the widget falls back to PCA automatically. Changing the reduction method clears the cached embedding so the next Reduce or Cluster call recomputes it.

Step 3 – Cluster

  1. Select a Clustering method from the dropdown (see table below).
  2. Adjust the method-specific parameters shown below the dropdown.
  3. Enter an Output layer name (default: clusters).
  4. Click Cluster.

Three things happen simultaneously:

  • The source layer's features gains a new cluster_id column (1-based; -1 for noise) and the napari Features Table dock is opened automatically with that layer selected.
  • The scatter plot is redrawn with each point coloured by its cluster.
  • A new Labels layer is created (or updated) where each segment is painted with its cluster_id. The layer colours are set to exactly match the scatter-plot colours.

If you re-run clustering, the existing cluster_id column is excluded from the feature set so it does not affect the new result.

Clustering methods and parameters

Method Widget label Key parameters (defaults)
scikit-learn KMeans K-Means N clusters (3)
scikit-learn DBSCAN DBSCAN Eps (0.5), Min samples (5)
scikit-learn HDBSCAN HDBSCAN Min cluster size (5)
scikit-learn MeanShift Mean Shift Bandwidth (0 = auto)

Cluster IDs and label values

Cluster IDs are 1-based: the first cluster found is 1, the second is 2, and so on. Segments that are classified as noise by DBSCAN or HDBSCAN receive cluster_id = -1 and remain as background (0) in the output label layer.

Color matching

The widget uses matplotlib's tab10 (or tab20 when there are more than 10 clusters) colormap to assign one colour per cluster. The same colour array is applied to the Labels layer via DirectLabelColormap, so the scatter-plot legend and the segmentation overlay always show identical colours.

Saving the table

Use the Save as CSV button in the napari Features Table dock, or the Save table button in the Table Manipulation Widget for TSV / XLSX output.


Classification Analysis Widget

The Classification Analysis widget lets you interactively annotate a small number of segments with class labels using napari's paint tools, train a random forest or logistic regression classifier on those annotations, and then apply it to every segment in the table. The result is written to a new label layer and two new columns in the measurement table. Trained classifiers can be exported to disk for batch use from the CLI.

Layout (scrollable)

┌──────────────────────────────────────┐
│ Target: [combo]                      │
│ ┌ Layers ──────────────────────────┐ │
│ │ Segmentation: [combo]            │ │
│ │ Annotation layer: [combo] [Create new] │
│ └──────────────────────────────────┘ │
│ ┌ Class names ─────────────────────┐ │
│ │  Label ID │ Class Name           │ │
│ │  <editable rows>                 │ │
│ └──────────────────────────────────┘ │
│ ┌ Classifier ──────────────────────┐ │
│ │ Method: [Random Forest▾]         │ │
│ │  <method-specific parameters>    │ │
│ │ Output layer: [edit]             │ │
│ │ [x] Live Update                  │ │
│ │ [Train & Apply]                  │ │
│ │ [Export classifier]              │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────┘

Workflow

Step 1 – Select the segmentation layer or group

The widget operates on the features table of the selected Labels layer. Run a measurement widget first (or load a CSV via the Table Manipulation widget), then pick a Target: <single layer> (default) for the original single-layer behaviour, or a previously defined group name to classify across the group's segmentation members.

In group mode the Segmentation combo is restricted to the group's members so you can step through them one at a time to annotate each. The class-names table pools detected annotation IDs across every member of the group, so detected classes are not lost as you switch members. See Per-member annotation persistence below for how brushstrokes survive member switches.

Step 2 – Create an annotation layer

The Annotation layer combo defaults to (none) so per-member persistence cannot accidentally overwrite an unrelated label layer.

  1. Click Create new next to the Annotation layer dropdown. A new, empty Labels layer called annotations is added to napari and automatically selected.
  2. Alternatively, select an existing Labels layer from the Annotation layer dropdown if you already have annotations you want to use.

The active annotation layer is outlined by a thin frame in the image space. In group mode the frame follows the current member position in the grid.

Step 3 – Paint annotations

Use napari's built-in label painting tools to draw brushstrokes on the annotation layer.

  • Each label value you paint (1, 2, 3, …) represents a different class.
  • You can use napari's color picker and label selector to switch between classes.
  • Paint at least a few representative segments from each class. You do not need to annotate every segment — the classifier will be applied to the rest automatically.

Step 4 – Review projected annotations

Whenever you finish painting, erasing, or changing labels in the annotation layer, the widget waits briefly and then automatically:

  1. Reads the pixel-level brushstrokes from the annotation layer.
  2. For each segment in the segmentation, takes the majority-vote annotation label across all annotated pixels that overlap that segment. Segments with no annotation overlap receive annotation 0 (unannotated).
  3. Merges an annotation column into the source layer's features (overwriting any previous annotation values).
  4. Populates the Class names table with all annotation label IDs detected.

Because the projection is based on the full current annotation layer, removed brushstrokes set the corresponding segment annotation back to 0, and repainting with another label updates the value in the table.

Step 5 – Name the classes (optional)

The Class names table lists each detected annotation label ID with an editable name field. Click a name cell and type to rename a class (e.g. change class_1 to mitotic, class_2 to interphase). These names are written to the classification_name output column.

Step 6 – Train and apply the classifier

  1. Choose a Method (default: Random Forest) and adjust the parameters if needed (see table below).
  2. Enter an Output layer name (default: classification).
  3. Leave Live Update enabled to retrain and apply the classifier automatically after annotation edits. Disable it to enable the Train & Apply button and run the classifier manually.

Three things happen:

  • A scikit-learn classifier is trained on all rows where annotation > 0, using all numeric measurement columns as features (excluding index, annotation, classification_id, classification_name, cluster_id, category_id, and category_name). Features are z-score standardised internally.
  • The classifier is applied to every row in the layer's features (including unannotated ones). Results are merged back into the source layer's features as two new columns: classification_id (1-based integer) and classification_name (string). Manual Train & Apply also opens the Features Table dock with that layer selected; live updates keep the current layer selection unchanged.
  • A new Labels layer is created (or updated) in napari where each segment is painted with its classification_id. Colours are keyed off the class ID itself (using tab10 for ≤10 classes and tab20 beyond that), so the same class always renders in the same colour even when other classes are absent from a particular layer.

If the classifier is re-run, existing classification_id and classification_name columns are excluded from the feature set so they do not affect the new result.

In group mode the classifier is trained on the concatenation of all members' features (using each member's projected annotation column) and then applied per-member. One output Labels layer is created per member named {output}_{layer_name} (or just {output} for a single-member group). Class colours are keyed off the class ID, so the same class always renders in the same colour across all member outputs even if a particular member only contains a subset of classes.

Per-member annotation persistence

In group mode the widget keeps a per-member cache of your brushstrokes so they survive switching between members:

  • When you switch the Segmentation combo to a different member, the current annotation layer's contents are saved (sparse-compressed) for the previously-active member, and the cached annotations for the new member are loaded into the same annotation layer. If the new member has no cache yet, the layer is reset to zeros.
  • Switching the Target combo back to <single layer> (or to another group) saves the current member's annotations first so they are available again when you re-enter the group.
  • The cache is kept only as long as the widget is open and is keyed off the segmentation-layer name; renaming a member-layer mid-session decouples its cache.

The single-layer mode is unaffected: switching segmentation layers there does not alter the annotation layer.

Classification methods and parameters

Method Widget label Key parameters (defaults)
scikit-learn RandomForestClassifier Random Forest N estimators (100), Max depth (0 = unlimited)
scikit-learn LogisticRegression Logistic Regression C (1.0), Max iterations (1000)

Class IDs

Classification IDs are 1-based and match the annotation label values painted in the annotation layer. A segment that could not be classified (e.g. because all its feature values were NaN) receives classification_id = 0 and an empty classification_name.

Exporting the classifier

Click Export classifier to save the trained pipeline (StandardScaler + classifier) to a .joblib file. The exported file can be used with the analyze classify CLI command to apply it to new tables in batch.

Saving the table

Use the Save as CSV button in the napari Features Table dock, or the Save table button in the Table Manipulation Widget for TSV / XLSX output.


Group Manager Widget

The Group Manager is the single place where you define and edit the layer groups consumed by every measurement and analysis widget's Target combo. See Working with groups for the underlying data model and pairing semantics.

Layout (scrollable)

┌────────────────────────────────────────┐
│ ┌ Defined groups ────────────────────┐ │
│ │  exp_1  (3 seg, 3 nuc)             │ │
│ │  exp_2  (5 seg)                    │ │
│ │  ...                               │ │
│ │ [Delete selected]                  │ │
│ │ [Arrange selected as grid]         │ │
│ └────────────────────────────────────┘ │
│ ┌ Group editor ──────────────────────┐ │
│ │ Name: [edit]                       │ │
│ │ ┌ Segmentation layers (required)─┐ │ │
│ │ │  cells_01                      │ │ │
│ │ │  cells_02                      │ │ │
│ │ │ [Add selected][Remove][Up][Down] │ │
│ │ └────────────────────────────────┘ │ │
│ │ ┌ Nucleus layers (optional)──────┐ │ │
│ │ │  nuclei_01                     │ │ │
│ │ │  nuclei_02                     │ │ │
│ │ │ [Add selected][Remove][Up][Down] │ │
│ │ └────────────────────────────────┘ │ │
│ │ ┌ Intensity images (optional)────┐ │ │
│ │ │  raw_01                        │ │ │
│ │ │  raw_02                        │ │ │
│ │ │ [Add selected][Remove][Up][Down] │ │
│ │ └────────────────────────────────┘ │ │
│ │ ┌ Pairing preview ───────────────┐ │ │
│ │ │ Seg       Nucleus   Intensity  │ │ │
│ │ │ cells_01  nuclei_01 raw_01     │ │ │
│ │ │ cells_02  nuclei_02 raw_02     │ │ │
│ │ └────────────────────────────────┘ │ │
│ │ [Save]                             │ │
│ └────────────────────────────────────┘ │
└────────────────────────────────────────┘

Workflow

  1. Select layers in napari's main layer list (you can multi-select with Ctrl/Shift).
  2. In the Segmentation layers section click Add selected to add the selected Labels layers to the role. Layers are appended in napari layer-panel order; non-Labels layers in the selection are skipped, as are duplicates already in the list.
  3. Optionally repeat for Nucleus layers (also Labels-only) and Intensity images (Image-only).
  4. Reorder entries with Up / Down so each row of the Pairing preview at the bottom matches the segmentation–nucleus– image triple you actually want. Within a group, layers across roles are paired by position.
  5. Optional roles must be either empty or have the same length as the segmentation list — saving with mismatched lengths is rejected.
  6. Type a Name and click Save. If a group with that name already exists it is replaced.

Selecting a group in the top list loads its current members back into the editor for editing. Click Delete selected to remove the highlighted group.

Click Arrange selected as grid to lay out the highlighted group in the viewer. The grid has one cell per segmentation layer. Nucleus and intensity layers, when present, are translated to the same grid cell as their position-matched segmentation layer. For each grid cell, the intensity image is stacked below the nucleus segmentation, and the segmentation layer is stacked on top. Layers are linked by role (segmentations with segmentations, nucleus segmentations with nucleus segmentations, intensity images with intensity images) while spatial transforms remain independent so the grid positions are preserved.

Renaming a layer that a group references automatically updates the group's stored layer name. Removing a layer leaves the dangling reference in place; running a measurement or analysis widget that targets the group will raise an informative error if a member layer is missing.


Table Manipulation Widget

The Table Manipulation widget edits the features table of a Labels layer. It can load an external CSV / TSV / XLSX file (which must contain an index column) and merge it into the layer's features, drop a column from the layer's features, and save the layer's features to a CSV / TSV / XLSX file.

Layout

┌────────────────────────────────────────┐
│ Segmentation: [combo]                  │
│ ┌ Load table from file ──────────────┐ │
│ │ [Load file…]                       │ │
│ └────────────────────────────────────┘ │
│ ┌ Drop column ───────────────────────┐ │
│ │ Column: [combo]   [Drop]           │ │
│ └────────────────────────────────────┘ │
│ [Save table]                            │
└────────────────────────────────────────┘

Loading a table from file

Click Load file… to read a CSV, TSV, or XLSX file. The file must contain an index column whose values are the segment label IDs; loading is rejected otherwise. The columns of the loaded file are merged into the selected layer's features (outer join on index). Columns present in both the file and the existing features are overwritten with the values from the file. The Features Table dock is opened automatically after the merge.

Tip: Multiple measurement widgets writing to the same layer follow the same merge rules — running Intensity then Morphology on the same Labels layer leaves all columns from both measurements in the layer's features.

Dropping a column

Select a column from the Drop column dropdown and click Drop. The column is removed from the selected layer's features. The index column is the segment identifier and is never offered for dropping.

Saving the table

Click Save table to export the selected layer's features to a CSV, TSV, or XLSX file. This complements the napari Features Table dock's CSV-only save by also supporting TSV and Excel output.

Command Line Interface

The segmentation-measurement CLI provides utilities for post-processing segmentations, computing measurements, and analyzing results directly from the terminal without writing any Python code.

Installation

pip install segmentation-measurement

After installation the segmentation-measurement command is available in your shell.

Overview

The CLI exposes five top-level commands:

Command Description
open Open segmentations and optional matched images in napari
postprocess Apply post-processing operations to segmentation TIFF files
measure Compute per-segment measurements from segmentation TIFF files
table Manipulate measurement tables (merge tables, drop columns)
analyze Analyze measurement tables (threshold-based categorization, clustering, classification)

Run any command with --help to see its full usage:

segmentation-measurement --help
segmentation-measurement open --help
segmentation-measurement postprocess --help
segmentation-measurement measure morphology --help
segmentation-measurement analyze threshold --help

Open In Napari (open)

Open one or more segmentation files in napari, optionally with matched intensity images and nucleus segmentations. Paths may be single file paths or glob expressions. Recursive glob expressions using ** are supported.

segmentation-measurement open \
    --segmentations "data/**/cells*.tif" \
    --intensities "data/**/raw*.tif" \
    --nuclei "data/**/nuclei*.tif"

Segmentations and nuclei are opened as Labels layers. Intensities are opened as Image layers. When a glob expression is used, layer names are derived relative to the directory above the first wildcard component. For example, data/**/seg.tif creates names such as well_a/seg and well_b/seg, so files with the same basename in different subfolders remain distinguishable.

If multiple segmentations are opened, the default behavior is to create a group named opened_files and arrange the matched layers in a grid view. Use --no-grid to keep the group but skip grid arrangement, or --no-group to skip both grouping and grid arrangement.

Arguments

Argument Type Required Description
--segmentations path/glob list yes Segmentation file path(s) or glob expression(s)
--intensities path/glob list no Optional intensity image path(s); expanded count must match segmentations
--nuclei, --nucleus-segmentations path/glob list no Optional nucleus segmentation path(s); expanded count must match segmentations
--no-group flag no Do not create a group when multiple segmentations are opened
--no-grid flag no Create the group but do not arrange it as a grid
--group-name string no Name for the created group; default is opened_files

Examples

Open a single segmentation:

segmentation-measurement open --segmentations cells_01.tif

Open matched files as a grouped grid:

segmentation-measurement open \
    --segmentations "experiment/**/cells.tif" \
    --intensities "experiment/**/raw.tif" \
    --nuclei "experiment/**/nuclei.tif" \
    --group-name experiment

Open multiple segmentations without grouping:

segmentation-measurement open --segmentations "cells_*.tif" --no-group

Post-processing (postprocess)

The postprocess command provides three sub-commands that each read a segmentation TIFF, apply a transformation, and write the result to a new TIFF.

filter-small-segments

Remove segments whose size (in pixels for 2-D data, or voxels for 3-D data) is below a minimum threshold. Removed segments are replaced by the background label 0.

segmentation-measurement postprocess filter-small-segments \
    --input  segmentation.tif \
    --output filtered.tif \
    --min-size 200

Arguments

Argument Type Required Description
--input path yes Input segmentation TIFF file
--output path yes Output segmentation TIFF file
--min-size int yes Minimum segment size in pixels/voxels; segments strictly smaller than this value are removed

Example – keep only segments with at least 500 pixels:

segmentation-measurement postprocess filter-small-segments \
    --input nuclei.tif --output nuclei_filtered.tif --min-size 500

remove-small-holes

Fill enclosed background holes inside segments when the hole is smaller than or equal to the specified maximum size. Pixels belonging to other segments are never overwritten.

segmentation-measurement postprocess remove-small-holes \
    --input  segmentation.tif \
    --output filled.tif \
    --max-hole-size 50

Arguments

Argument Type Required Description
--input path yes Input segmentation TIFF file
--output path yes Output segmentation TIFF file
--max-hole-size int yes Maximum hole size in pixels/voxels to fill

Example – fill holes up to 100 pixels:

segmentation-measurement postprocess remove-small-holes \
    --input cells.tif --output cells_filled.tif --max-hole-size 100

ring-mask

Compute a ring (annular hull) of a specified width around each segment. By default the original segment pixels are retained in the output alongside the ring pixels. Pass --remove-original to produce a ring-only mask (original segment interiors set to 0). This is useful for creating pseudo-cytoplasm masks from segmented nuclei.

Rings are placed only on background pixels; if rings from different segments overlap, the segment with the smaller label ID takes precedence.

segmentation-measurement postprocess ring-mask \
    --input  segmentation.tif \
    --output rings.tif \
    --ring-width 5

Arguments

Argument Type Required Description
--input path yes Input segmentation TIFF file
--output path yes Output TIFF file for the ring mask
--ring-width int yes Width of the ring in pixels/voxels
--remove-original flag no Remove original segment pixels; output contains only ring pixels

Example – create 8-pixel-wide rings around nuclei (original segments kept):

segmentation-measurement postprocess ring-mask \
    --input nuclei.tif --output cytoplasm_rings.tif --ring-width 8

Example – ring-only mask (original segments removed):

segmentation-measurement postprocess ring-mask \
    --input nuclei.tif --output rings_only.tif --ring-width 8 --remove-original

watershed

Refine a segmentation using the watershed algorithm. The input segmentation is used as seed markers; the heatmap is the topographic landscape that the algorithm floods. Because skimage.segmentation.watershed floods from low values upward, the heatmap should have low values at desired segment boundaries and high values in the interior. If your heatmap has the opposite convention (e.g. a distance transform or foreground-probability map where high values indicate cell centres), pass the negated image.

An optional binary mask restricts processing to a subset of pixels; unmasked pixels are set to 0 in the output.

segmentation-measurement postprocess watershed \
    --input  seeds.tif \
    --heatmap landscape.tif \
    --output refined.tif

Arguments

Argument Type Required Description
--input path yes Input segmentation TIFF file used as seed markers
--heatmap path yes Heatmap image TIFF (low values flooded first)
--output path yes Output segmentation TIFF file
--mask path no Binary mask TIFF; only masked pixels are processed

Example – watershed refinement with a foreground-probability heatmap (negated so high-probability regions are flooded first):

# Negate the probability map beforehand, then run watershed
segmentation-measurement postprocess watershed \
    --input seeds.tif --heatmap neg_prob.tif --output refined.tif

Measurements (measure)

intensities

Compute per-segment intensity statistics from a segmentation label image and a co-registered intensity image. Both files must be TIFF and must have identical shapes.

segmentation-measurement measure intensities \
    --segmentation segmentation.tif \
    --intensity    fluorescence.tif \
    --output       measurements.csv

Arguments

Argument Type Required Description
--segmentation path yes Segmentation TIFF file (integer labels)
--intensity path yes Intensity image TIFF file
--output path yes Output table file; format inferred from extension

Supported output formats

Extension Format
.csv (default) Comma-separated values
.tsv Tab-separated values
.xlsx Excel workbook

Output columns

Column Description
index Integer segment label ID
mean_intensity Mean pixel intensity within the segment
median_intensity Median pixel intensity
max_intensity Maximum pixel intensity
min_intensity Minimum pixel intensity
std_intensity Standard deviation of pixel intensities
percentile_10 10th percentile of pixel intensities
percentile_25 25th percentile (first quartile)
percentile_75 75th percentile (third quartile)
percentile_90 90th percentile of pixel intensities

Example – save results as an Excel file:

segmentation-measurement measure intensities \
    --segmentation cells.tif \
    --intensity    gfp_channel.tif \
    --output       intensity_stats.xlsx

morphology

Compute per-segment shape descriptors from a segmentation label image. Supports isotropic and anisotropic pixel/voxel sizes so that results are returned in physical units.

segmentation-measurement measure morphology \
    --segmentation segmentation.tif \
    --output       morphology.csv

Arguments

Argument Type Required Description
--segmentation path yes Segmentation TIFF file (integer labels)
--output path yes Output table file; format inferred from extension
--scale float(s) no Physical pixel/voxel size (default: 1.0)

Scale argument

Pass a single value for isotropic spacing, or one value per spatial dimension for anisotropic spacing. Dimension order is (Y, X) for 2-D and (Z, Y, X) for 3-D.

# isotropic: 0.5 µm per pixel
--scale 0.5

# anisotropic 2-D: 0.5 µm in Y, 0.25 µm in X
--scale 0.5 0.25

# anisotropic 3-D: 2.0 µm in Z, 0.5 µm in Y and X
--scale 2.0 0.5 0.5

Output columns – 2-D

Column Description
index Integer segment label ID
area Area in physical units
perimeter Perimeter length in physical units
sphericity Circularity (1.0 = perfect circle)
solidity Area / convex hull area
axis_major_length Major axis of the fitted ellipse
axis_minor_length Minor axis of the fitted ellipse
equivalent_diameter Diameter of a circle with the same area

Output columns – 3-D

Column Description
index Integer segment label ID
volume Volume in physical units
surface_area Surface area via marching cubes
sphericity Sphericity (1.0 = perfect sphere)
solidity Volume / convex hull volume
axis_major_length Major axis of the fitted ellipsoid
axis_minor_length Minor axis of the fitted ellipsoid
equivalent_diameter Diameter of a sphere with the same volume

Examples

# 2-D, pixel units
segmentation-measurement measure morphology \
    --segmentation cells.tif --output morphology.csv

# 2-D, anisotropic scale
segmentation-measurement measure morphology \
    --segmentation cells.tif --output morphology.csv --scale 0.5 0.25

# 3-D, anisotropic scale (Z=2 µm, Y=X=0.5 µm)
segmentation-measurement measure morphology \
    --segmentation nuclei_3d.tif --output morphology_3d.csv --scale 2.0 0.5 0.5

cell-nucleus

Compute per-cell measurements that combine a cell segmentation with a nucleus segmentation. For each cell, the command reports how many nuclei it contains, the cell and nucleus area/volume in physical units, and their ratio. When an optional intensity image is supplied, it also reports intensity statistics for the cytoplasmic region (cell minus nucleus) and the nuclear region, together with their ratios.

segmentation-measurement measure cell-nucleus \
    --cell-segmentation   cells.tif \
    --nucleus-segmentation nuclei.tif \
    --output              cell_nucleus.csv

Arguments

Argument Type Required Description
--cell-segmentation path yes Cell segmentation TIFF file (integer labels)
--nucleus-segmentation path yes Nucleus segmentation TIFF file (integer labels); must have the same shape as the cell segmentation
--output path yes Output table file; format inferred from extension
--scale float(s) no Physical pixel/voxel size (default: 1.0); same syntax as morphology
--intensity path no Intensity image TIFF file; when provided, per-region intensity statistics and their ratios are included

Scale argument

Identical to the morphology sub-command: pass a single value for isotropic spacing, or one value per spatial dimension for anisotropic spacing in (Y, X) / (Z, Y, X) order.

Output columns – without intensity image (2-D)

Column Description
index Integer cell label ID
n_nuclei Number of nucleus labels overlapping with this cell
cell_area Area of the cell in physical units (nucleus included)
nucleus_area Total area of nuclei within this cell in physical units
area_ratio cell_area / nucleus_area; NaN if the cell contains no nucleus

For 3-D data the columns are cell_volume, nucleus_volume, and volume_ratio instead.

Additional columns – with intensity image

When --intensity is given, the following columns are added for each statistic {stat} in mean, median, max, min, percentile_10, percentile_25, percentile_75, percentile_90:

Column Description
cell_{stat}_intensity Statistic over the cytoplasmic region (cell pixels where no nucleus is present)
nucleus_{stat}_intensity Statistic over all nuclear pixels within this cell
{stat}_intensity_ratio cell_{stat}_intensity / nucleus_{stat}_intensity; NaN when either region is empty or the nucleus value is zero

Supported output formats – same as intensities (.csv, .tsv, .xlsx).

Examples

# Basic measurements, pixel units
segmentation-measurement measure cell-nucleus \
    --cell-segmentation cells.tif \
    --nucleus-segmentation nuclei.tif \
    --output cell_nucleus.csv

# With physical scale (0.5 µm/px, isotropic)
segmentation-measurement measure cell-nucleus \
    --cell-segmentation cells.tif \
    --nucleus-segmentation nuclei.tif \
    --output cell_nucleus.csv \
    --scale 0.5

# With intensity ratios
segmentation-measurement measure cell-nucleus \
    --cell-segmentation cells.tif \
    --nucleus-segmentation nuclei.tif \
    --intensity gfp_channel.tif \
    --output cell_nucleus_intensity.csv

# Anisotropic 3-D (Z=2 µm, Y=X=0.5 µm) with intensity
segmentation-measurement measure cell-nucleus \
    --cell-segmentation cells_3d.tif \
    --nucleus-segmentation nuclei_3d.tif \
    --intensity gfp_3d.tif \
    --output cell_nucleus_3d.csv \
    --scale 2.0 0.5 0.5

Table manipulation (table)

The table command operates on saved measurement tables (CSV / TSV / XLSX).

merge

Merge one or more saved measurement tables on a shared key column (index by default) and optionally drop columns from the merged result. The merge is an outer join: label IDs that appear in only some inputs are kept, with NaNs in the missing columns. Non-key columns must be disjoint between input tables — drop conflicts beforehand or via --drop-columns.

When only one input is given the command becomes a column-drop utility, useful for cleaning up an existing table. The index column is the segment identifier and is always preserved — passing it to --drop-columns raises an error.

segmentation-measurement table merge \
    --inputs intensity.csv morphology.csv \
    --output combined.csv

Arguments

Argument Type Required Description
--inputs path… yes One or more input table files (CSV, TSV, XLSX)
--output path yes Output table file (extension picks the format)
--on str no Key column shared between input tables (default: index)
--drop-columns str… no Columns to drop from the merged table

Example — merge intensity and morphology tables and drop a column:

segmentation-measurement table merge \
    --inputs intensity.csv morphology.csv \
    --output combined.csv \
    --drop-columns std_intensity

Example — drop a column from a single existing table:

segmentation-measurement table merge \
    --inputs combined.csv \
    --output trimmed.csv \
    --drop-columns std_intensity max_intensity

Analysis (analyze)

cluster

Cluster segments using their measurement features. All numeric columns are used as features (excluding index, cluster_id, category_id, and category_name). Features are z-score standardised before clustering.

The output table is the input table with an added cluster_id column. Cluster IDs are 1-based (1, 2, 3, …). Noise points — segments that no cluster claims, as produced by DBSCAN and HDBSCAN — are assigned cluster_id = -1.

segmentation-measurement analyze cluster \
    --table      measurements.csv \
    --method     kmeans \
    --n-clusters 4 \
    --output     clustered.csv

Arguments

Argument Type Required Description
--table path yes Input measurement table (CSV, TSV, or XLSX)
--method str no Clustering method: kmeans (default), dbscan, hdbscan, or mean_shift
--n-clusters int no K-Means: number of clusters (default: 3)
--eps float no DBSCAN: neighbourhood radius (default: 0.5)
--min-samples int no DBSCAN / HDBSCAN: minimum samples in a neighbourhood (default: 5)
--min-cluster-size int no HDBSCAN: minimum cluster size (default: 5)
--bandwidth float no Mean Shift: bandwidth; omit or set to 0 for automatic estimation
--output path yes Output table file (CSV, TSV, or XLSX)
--segmentation path no Segmentation TIFF; required when --output-segmentation is used
--output-segmentation path no Output TIFF where each segment is painted with its cluster_id; noise segments are left as background (0)

Output

The output table is the input table with one additional column:

Column Description
cluster_id Integer cluster label (1-based); -1 for noise (DBSCAN / HDBSCAN only)

When --output-segmentation is specified, each segment pixel is set to the cluster_id of that segment (background and noise segments remain 0).

Method defaults

Method Key parameters and defaults
kmeans --n-clusters 3
dbscan --eps 0.5, --min-samples 5
hdbscan --min-cluster-size 5
mean_shift bandwidth estimated automatically

Examples

# K-Means with 5 clusters
segmentation-measurement analyze cluster \
    --table morphology.csv --method kmeans --n-clusters 5 --output clustered.csv

# DBSCAN – also write a cluster segmentation TIFF
segmentation-measurement analyze cluster \
    --table intensity.csv \
    --method dbscan --eps 1.0 --min-samples 3 \
    --output clustered.csv \
    --segmentation cells.tif \
    --output-segmentation clusters.tif

# HDBSCAN
segmentation-measurement analyze cluster \
    --table morphology.csv --method hdbscan --min-cluster-size 10 --output clustered.csv

# Mean Shift with automatic bandwidth
segmentation-measurement analyze cluster \
    --table intensity.csv --method mean_shift --output clustered.csv

threshold

Categorize segments into N named groups based on N-1 thresholds applied to one column of a measurement table. Thresholds can be provided explicitly or suggested automatically from the data distribution.

segmentation-measurement analyze threshold \
    --table      measurements.csv \
    --column     mean_intensity \
    --n-categories 3 \
    --output     categorized.csv

Arguments

Argument Type Required Description
--table path yes Input measurement table (CSV, TSV, or XLSX)
--column str yes Column name to threshold
--n-categories int yes Number of output categories
--thresholds float(s) no Explicit threshold values (n_categories - 1 values); auto-suggested if omitted
--category-names str(s) no Names for each category (n_categories values); defaults to category_1, category_2, …
--output path yes Output table file (CSV, TSV, or XLSX)
--segmentation path no Segmentation TIFF; required when --output-segmentation is used
--output-segmentation path no Output TIFF where each segment is assigned its category ID

Output

The output table is the input table with two additional columns:

Column Description
category_id Integer category (1-based)
category_name Human-readable category name

Segments with a value below the first threshold are category 1; segments between the first and second threshold are category 2; and so on.

Examples

# Auto-suggest thresholds for 3 categories
segmentation-measurement analyze threshold \
    --table intensity.csv \
    --column mean_intensity \
    --n-categories 3 \
    --output categorized.csv

# Explicit thresholds with custom names
segmentation-measurement analyze threshold \
    --table morphology.csv \
    --column area \
    --n-categories 3 \
    --thresholds 500 1500 \
    --category-names small medium large \
    --output categorized.csv

# Also write a category segmentation TIFF
segmentation-measurement analyze threshold \
    --table intensity.csv \
    --column mean_intensity \
    --n-categories 2 \
    --thresholds 100 \
    --output categorized.csv \
    --segmentation cells.tif \
    --output-segmentation categories.tif

train-classifier

Train a random forest or logistic regression classifier from one or more annotated measurement tables and save the fitted pipeline to a .joblib file.

An annotated table is a measurement table that contains an integer annotation column (or whichever column name you specify with --annotation-column). Rows with a value of 0 in that column are treated as unannotated and excluded from training. Annotated rows are typically exported from the Classification Analysis napari widget, but you can also create the column manually.

All numeric columns are used as features (excluding index, annotation, classification_id, classification_name, cluster_id, category_id, and category_name). Features are z-score standardised inside the saved pipeline so no separate pre-processing step is needed when applying the classifier.

segmentation-measurement analyze train-classifier \
    --tables  annotated.csv \
    --method  random_forest \
    --output  classifier.joblib

Arguments

Argument Type Required Description
--tables path(s) yes One or more annotated measurement tables (CSV, TSV, or XLSX); when multiple files are given they are concatenated before training
--output path yes Output classifier file (.joblib)
--method str no Classifier type: random_forest (default) or logistic_regression
--annotation-column str no Column containing integer annotation labels (default: annotation)
--n-estimators int no RF: number of trees (default: 100)
--max-depth int no RF: maximum tree depth; omit or set to 0 for unlimited
--c float no LR: regularisation strength C (default: 1.0)
--max-iter int no LR: maximum number of solver iterations (default: 1000)

Examples

# Train a random forest from a single annotated CSV
segmentation-measurement analyze train-classifier \
    --tables annotated.csv \
    --output classifier.joblib

# Train from two experiments combined, with 200 trees
segmentation-measurement analyze train-classifier \
    --tables experiment1.csv experiment2.csv \
    --n-estimators 200 \
    --output classifier.joblib

# Train a logistic regression classifier
segmentation-measurement analyze train-classifier \
    --tables annotated.csv \
    --method logistic_regression \
    --c 0.1 \
    --output classifier.joblib

classify

Apply a previously trained classifier (saved with train-classifier or exported from the napari widget) to a measurement table. The output table gains two new columns: classification_id (1-based integer) and classification_name (string).

segmentation-measurement analyze classify \
    --table      measurements.csv \
    --classifier classifier.joblib \
    --output     classified.csv

Arguments

Argument Type Required Description
--table path yes Input measurement table (CSV, TSV, or XLSX)
--classifier path yes Trained classifier file (.joblib)
--output path yes Output table file (CSV, TSV, or XLSX)
--class-names str(s) no Names for each class in ascending class-label order (e.g. --class-names mitotic interphase); defaults to class_1, class_2, …
--segmentation path no Segmentation TIFF; required when --output-segmentation is used
--output-segmentation path no Output TIFF where each segment is painted with its classification_id; unclassified segments remain background (0)

Output columns

Column Description
classification_id Integer class label (1-based); 0 for rows whose features were all NaN
classification_name Human-readable class name

Examples

# Apply classifier and save results as CSV
segmentation-measurement analyze classify \
    --table new_measurements.csv \
    --classifier classifier.joblib \
    --output classified.csv

# Apply and assign human-readable names to classes
segmentation-measurement analyze classify \
    --table new_measurements.csv \
    --classifier classifier.joblib \
    --class-names mitotic interphase apoptotic \
    --output classified.csv

# Also write a classification segmentation TIFF
segmentation-measurement analyze classify \
    --table new_measurements.csv \
    --classifier classifier.joblib \
    --output classified.csv \
    --segmentation cells.tif \
    --output-segmentation classified_seg.tif
 1"""
 2.. include:: ../doc/start.md
 3.. include:: ../doc/napari.md
 4.. include:: ../doc/cli.md
 5"""
 6
 7from segmentation_measurement.postprocessing import (
 8    apply_watershed,
 9    compute_ring_mask,
10    filter_small_segments,
11    remove_small_holes,
12)
13from segmentation_measurement.intensity import measure_intensities
14from segmentation_measurement.morphology import measure_morphology
15from segmentation_measurement.cell_nucleus import measure_cell_nucleus
16from segmentation_measurement.table_manipulation import (
17    drop_columns,
18    merge_tables,
19)
20from segmentation_measurement.analysis import (
21    apply_classifier,
22    categorize_by_threshold,
23    cluster_measurements,
24    suggest_thresholds,
25    train_classifier,
26)
27
28__all__ = [
29    "filter_small_segments",
30    "remove_small_holes",
31    "compute_ring_mask",
32    "apply_watershed",
33    "measure_intensities",
34    "measure_morphology",
35    "measure_cell_nucleus",
36    "merge_tables",
37    "drop_columns",
38    "suggest_thresholds",
39    "categorize_by_threshold",
40    "cluster_measurements",
41    "train_classifier",
42    "apply_classifier",
43]
44
45__version__ = "0.1.0"
def filter_small_segments(segmentation: numpy.ndarray, min_size: int) -> numpy.ndarray:
13def filter_small_segments(segmentation: np.ndarray, min_size: int) -> np.ndarray:
14    """Filter out segments below a minimum size threshold.
15
16    Segments with fewer pixels/voxels than ``min_size`` are set to zero
17    (background label).
18
19    Args:
20        segmentation (np.ndarray): Integer-valued label array where 0 is
21            background and each positive integer represents a distinct segment.
22            Supports arbitrary dimensionality.
23        min_size (int): Minimum segment size in pixels/voxels. Segments
24            strictly smaller than this threshold are removed.
25
26    Returns:
27        np.ndarray: Label array with small segments set to zero, same shape
28            and dtype as input.
29    """
30    result = segmentation.copy()
31    label_ids, counts = np.unique(segmentation, return_counts=True)
32    for label_id, count in zip(label_ids, counts):
33        if label_id == 0:
34            continue
35        if count < min_size:
36            result[segmentation == label_id] = 0
37    return result

Filter out segments below a minimum size threshold.

Segments with fewer pixels/voxels than min_size are set to zero (background label).

Arguments:
  • segmentation (np.ndarray): Integer-valued label array where 0 is background and each positive integer represents a distinct segment. Supports arbitrary dimensionality.
  • min_size (int): Minimum segment size in pixels/voxels. Segments strictly smaller than this threshold are removed.
Returns:

np.ndarray: Label array with small segments set to zero, same shape and dtype as input.

def remove_small_holes(segmentation: numpy.ndarray, max_hole_size: int) -> numpy.ndarray:
40def remove_small_holes(segmentation: np.ndarray, max_hole_size: int) -> np.ndarray:
41    """Remove small holes from segments.
42
43    For each segment, enclosed background regions (holes) smaller than or
44    equal to ``max_hole_size`` pixels/voxels are filled with the segment's
45    label. Pixels belonging to other segments are never overwritten.
46
47    Args:
48        segmentation (np.ndarray): Integer-valued label array where 0 is
49            background and each positive integer represents a distinct segment.
50            Supports arbitrary dimensionality.
51        max_hole_size (int): Maximum hole size in pixels/voxels. Holes smaller
52            than or equal to this threshold are filled.
53
54    Returns:
55        np.ndarray: Label array with small holes filled, same shape and dtype
56            as input.
57    """
58    result = segmentation.copy()
59    for label_id in np.unique(segmentation):
60        if label_id == 0:
61            continue
62        binary_mask = segmentation == label_id
63        filled_mask = _remove_small_holes(binary_mask, area_threshold=max_hole_size)
64        new_pixels = filled_mask & ~binary_mask
65        # Only fill background pixels; do not overwrite other segments.
66        new_pixels &= (segmentation == 0)
67        result[new_pixels] = label_id
68    return result

Remove small holes from segments.

For each segment, enclosed background regions (holes) smaller than or equal to max_hole_size pixels/voxels are filled with the segment's label. Pixels belonging to other segments are never overwritten.

Arguments:
  • segmentation (np.ndarray): Integer-valued label array where 0 is background and each positive integer represents a distinct segment. Supports arbitrary dimensionality.
  • max_hole_size (int): Maximum hole size in pixels/voxels. Holes smaller than or equal to this threshold are filled.
Returns:

np.ndarray: Label array with small holes filled, same shape and dtype as input.

def compute_ring_mask( segmentation: numpy.ndarray, ring_width: int, keep_original: bool = True) -> numpy.ndarray:
 71def compute_ring_mask(
 72    segmentation: np.ndarray, ring_width: int, keep_original: bool = True
 73) -> np.ndarray:
 74    """Compute the ring mask around each segment.
 75
 76    For each segment, a ring of specified width is computed by dilating the
 77    segment mask by ``ring_width`` iterations and subtracting the original
 78    mask. Ring pixels are only placed on background pixels of the original
 79    segmentation. If rings from different segments overlap, the segment with
 80    the smaller label ID takes precedence.
 81
 82    This is useful for creating pseudo-cytosol masks around segmented nuclei.
 83
 84    Args:
 85        segmentation (np.ndarray): Integer-valued label array where 0 is
 86            background and each positive integer represents a distinct segment.
 87            Supports arbitrary dimensionality.
 88        ring_width (int): Width of the ring in pixels/voxels.
 89        keep_original (bool): If ``True`` (default), original segment pixels
 90            are retained in the output alongside the ring pixels. If ``False``,
 91            only the ring pixels are labeled and original segment pixels are
 92            set to zero (background).
 93
 94    Returns:
 95        np.ndarray: Label array containing the ring regions and, when
 96            ``keep_original`` is ``True``, also the original segment pixels.
 97            Same shape and dtype as input.
 98    """
 99    result = segmentation.copy() if keep_original else np.zeros_like(segmentation)
100    for label_id in np.unique(segmentation):
101        if label_id == 0:
102            continue
103        binary_mask = segmentation == label_id
104        dilated = binary_dilation(binary_mask, iterations=ring_width)
105        ring = dilated & ~binary_mask
106        # Only place ring on background pixels; smaller label IDs take precedence
107        # over later rings via the result == 0 guard.
108        ring &= (segmentation == 0) & (result == 0)
109        result[ring] = label_id
110    return result

Compute the ring mask around each segment.

For each segment, a ring of specified width is computed by dilating the segment mask by ring_width iterations and subtracting the original mask. Ring pixels are only placed on background pixels of the original segmentation. If rings from different segments overlap, the segment with the smaller label ID takes precedence.

This is useful for creating pseudo-cytosol masks around segmented nuclei.

Arguments:
  • segmentation (np.ndarray): Integer-valued label array where 0 is background and each positive integer represents a distinct segment. Supports arbitrary dimensionality.
  • ring_width (int): Width of the ring in pixels/voxels.
  • keep_original (bool): If True (default), original segment pixels are retained in the output alongside the ring pixels. If False, only the ring pixels are labeled and original segment pixels are set to zero (background).
Returns:

np.ndarray: Label array containing the ring regions and, when keep_original is True, also the original segment pixels. Same shape and dtype as input.

def apply_watershed( segmentation: numpy.ndarray, heatmap: numpy.ndarray, mask: Optional[numpy.ndarray] = None) -> numpy.ndarray:
113def apply_watershed(
114    segmentation: np.ndarray,
115    heatmap: np.ndarray,
116    mask: Optional[np.ndarray] = None,
117) -> np.ndarray:
118    """Refine a segmentation using the watershed algorithm.
119
120    Uses the input segmentation as seed markers and ``heatmap`` as the
121    topographic landscape for ``skimage.segmentation.watershed``.  The
122    watershed algorithm floods uphill from each marker, so pixels with
123    *low* heatmap values are claimed first.  For heatmaps where high values
124    indicate cell interiors or distance-to-boundary (e.g. a distance
125    transform), pass the negated heatmap so that high-confidence regions are
126    flooded first.
127
128    Args:
129        segmentation (np.ndarray): Integer-valued label array used as seed
130            markers.  0 is background; each positive integer is a distinct
131            seed.  Supports arbitrary dimensionality.
132        heatmap (np.ndarray): Landscape image of the same spatial shape as
133            ``segmentation``.  Low values are flooded before high values.
134        mask (Optional[np.ndarray]): Boolean or binary array of the same
135            shape as ``segmentation``.  Only pixels where ``mask`` is
136            ``True`` are processed; all other pixels are set to 0 in the
137            output.  If ``None`` (default), all pixels are processed.
138
139    Returns:
140        np.ndarray: Refined label array, same shape and dtype as
141            ``segmentation``.
142    """
143    from skimage.segmentation import watershed
144    result = watershed(heatmap, markers=segmentation, mask=mask)
145    return result.astype(segmentation.dtype)

Refine a segmentation using the watershed algorithm.

Uses the input segmentation as seed markers and heatmap as the topographic landscape for skimage.segmentation.watershed. The watershed algorithm floods uphill from each marker, so pixels with low heatmap values are claimed first. For heatmaps where high values indicate cell interiors or distance-to-boundary (e.g. a distance transform), pass the negated heatmap so that high-confidence regions are flooded first.

Arguments:
  • segmentation (np.ndarray): Integer-valued label array used as seed markers. 0 is background; each positive integer is a distinct seed. Supports arbitrary dimensionality.
  • heatmap (np.ndarray): Landscape image of the same spatial shape as segmentation. Low values are flooded before high values.
  • mask (Optional[np.ndarray]): Boolean or binary array of the same shape as segmentation. Only pixels where mask is True are processed; all other pixels are set to 0 in the output. If None (default), all pixels are processed.
Returns:

np.ndarray: Refined label array, same shape and dtype as segmentation.

def measure_intensities( segmentation: numpy.ndarray, intensity_image: numpy.ndarray) -> pandas.DataFrame:
17def measure_intensities(segmentation: np.ndarray, intensity_image: np.ndarray) -> pd.DataFrame:
18    """Compute per-segment intensity statistics.
19
20    For each labeled segment, computes mean, median, maximum, minimum,
21    standard deviation and common percentiles of pixel intensities.
22
23    Args:
24        segmentation (np.ndarray): Integer-valued label array where 0 is
25            background. Supports arbitrary dimensionality.
26        intensity_image (np.ndarray): Intensity image with the same shape as
27            ``segmentation``.
28
29    Returns:
30        pd.DataFrame: One row per segment with columns ``index``,
31            ``mean_intensity``, ``median_intensity``, ``max_intensity``,
32            ``min_intensity``, ``std_intensity``, ``percentile_10``,
33            ``percentile_25``, ``percentile_75``, ``percentile_90``.
34            ``index`` holds the integer label ID of each segment.
35    """
36    props = regionprops(segmentation, intensity_image)
37    if not props:
38        return pd.DataFrame(columns=_COLUMNS)
39
40    rows = []
41    for region in props:
42        # ``image_intensity`` was renamed in scikit-image 0.26; keep the old
43        # ``intensity_image`` name as a fallback for older versions.
44        intensity_arr = getattr(region, "image_intensity", None)
45        if intensity_arr is None:
46            intensity_arr = region.intensity_image
47        intensities = intensity_arr[region.image].astype(float)
48        rows.append({
49            "index": region.label,
50            "mean_intensity": float(np.mean(intensities)),
51            "median_intensity": float(np.median(intensities)),
52            "max_intensity": float(np.max(intensities)),
53            "min_intensity": float(np.min(intensities)),
54            "std_intensity": float(np.std(intensities)),
55            "percentile_10": float(np.percentile(intensities, 10)),
56            "percentile_25": float(np.percentile(intensities, 25)),
57            "percentile_75": float(np.percentile(intensities, 75)),
58            "percentile_90": float(np.percentile(intensities, 90)),
59        })
60    return pd.DataFrame(rows)

Compute per-segment intensity statistics.

For each labeled segment, computes mean, median, maximum, minimum, standard deviation and common percentiles of pixel intensities.

Arguments:
  • segmentation (np.ndarray): Integer-valued label array where 0 is background. Supports arbitrary dimensionality.
  • intensity_image (np.ndarray): Intensity image with the same shape as segmentation.
Returns:

pd.DataFrame: One row per segment with columns index, mean_intensity, median_intensity, max_intensity, min_intensity, std_intensity, percentile_10, percentile_25, percentile_75, percentile_90. index holds the integer label ID of each segment.

def measure_morphology( segmentation: numpy.ndarray, scale: float | tuple = 1.0) -> pandas.DataFrame:
 20def measure_morphology(
 21    segmentation: np.ndarray,
 22    scale: float | tuple = 1.0,
 23) -> pd.DataFrame:
 24    """Compute per-segment morphological measurements.
 25
 26    Supported dimensionality is 2D and 3D.  For 2D the measurements are area,
 27    perimeter, sphericity (circularity), solidity, major and minor axis
 28    lengths, and equivalent diameter.  For 3D the measurements are volume,
 29    surface area (via marching cubes), sphericity, solidity, major and minor
 30    axis lengths, and equivalent diameter.
 31
 32    Physical units are applied via the ``scale`` parameter.  Anisotropic
 33    voxel sizes are supported by passing a per-dimension tuple.
 34
 35    Args:
 36        segmentation (np.ndarray): Integer-valued label array where 0 is
 37            background.  Must be 2D or 3D.
 38        scale (float | tuple): Physical size of a pixel/voxel. A single float
 39            is interpreted as isotropic spacing. A tuple must have one value
 40            per spatial dimension in ``(Y, X)`` order for 2D or ``(Z, Y, X)``
 41            order for 3D. Defaults to 1.0 (pixel/voxel units).
 42
 43    Returns:
 44        pd.DataFrame: One row per segment.  ``index`` holds the integer label
 45            ID of each segment.  2D columns: ``index``, ``area``,
 46            ``perimeter``, ``sphericity``, ``solidity``, ``axis_major_length``,
 47            ``axis_minor_length``, ``equivalent_diameter``.  3D columns:
 48            ``index``, ``volume``, ``surface_area``, ``sphericity``,
 49            ``solidity``, ``axis_major_length``, ``axis_minor_length``,
 50            ``equivalent_diameter``.
 51
 52    Raises:
 53        ValueError: If ``segmentation`` is not 2D or 3D, or if ``scale``
 54            tuple length does not match ``ndim``.
 55    """
 56    ndim = segmentation.ndim
 57    if ndim not in (2, 3):
 58        raise ValueError(
 59            f"measure_morphology requires 2D or 3D input, got {ndim}D."
 60        )
 61
 62    if isinstance(scale, (int, float)):
 63        scale_tuple = tuple([float(scale)] * ndim)
 64    else:
 65        scale_tuple = tuple(float(s) for s in scale)
 66        if len(scale_tuple) != ndim:
 67            raise ValueError(
 68                f"scale must have {ndim} elements for a {ndim}D segmentation, "
 69                f"got {len(scale_tuple)}."
 70            )
 71
 72    # Pass spacing so regionprops returns all length/area/volume measurements
 73    # in physical units, handling anisotropic voxel sizes exactly.
 74    props = regionprops(segmentation, spacing=scale_tuple)
 75    if not props:
 76        columns = _COLUMNS_2D if ndim == 2 else _COLUMNS_3D
 77        return pd.DataFrame(columns=columns)
 78
 79    rows = []
 80    for region in props:
 81        row: dict = {"index": region.label, "solidity": float(region.solidity)}
 82
 83        if ndim == 2:
 84            area = float(region.area)  # physical area via spacing
 85            # region.perimeter raises NotImplementedError for anisotropic spacing.
 86            # Compute perimeter in pixel space then scale by the geometric mean of
 87            # the spacings.  For isotropic spacing this is exact; for anisotropic
 88            # it is the best single-factor approximation of the Crofton formula.
 89            from skimage.measure import perimeter_crofton
 90            pixel_perimeter = float(perimeter_crofton(region.image))
 91            perimeter = pixel_perimeter * float(np.sqrt(np.prod(scale_tuple)))
 92            sphericity = (
 93                4.0 * np.pi * area / (perimeter ** 2) if perimeter > 0 else 0.0
 94            )
 95            row["area"] = area
 96            row["perimeter"] = perimeter
 97            row["sphericity"] = sphericity
 98            row["axis_major_length"] = float(region.axis_major_length)
 99            row["axis_minor_length"] = float(region.axis_minor_length)
100            row["equivalent_diameter"] = float(region.equivalent_diameter_area)
101
102        else:  # 3D
103            volume = float(region.area)  # physical volume via spacing
104
105            binary = segmentation == region.label
106            padded = np.pad(binary[region.slice], 1)
107            try:
108                from skimage.measure import marching_cubes, mesh_surface_area
109                verts, faces, _, _ = marching_cubes(
110                    padded, level=0.5, spacing=scale_tuple
111                )
112                surface_area = float(mesh_surface_area(verts, faces))
113            except (ValueError, RuntimeError):
114                surface_area = 0.0
115
116            sphericity = (
117                np.pi ** (1.0 / 3.0) * (6.0 * volume) ** (2.0 / 3.0) / surface_area
118                if surface_area > 0 else 0.0
119            )
120            row["volume"] = volume
121            row["surface_area"] = surface_area
122            row["sphericity"] = sphericity
123            row["axis_major_length"] = float(region.axis_major_length)
124            row["axis_minor_length"] = float(region.axis_minor_length)
125            row["equivalent_diameter"] = float(region.equivalent_diameter_area)
126
127        rows.append(row)
128
129    columns = _COLUMNS_2D if ndim == 2 else _COLUMNS_3D
130    return pd.DataFrame(rows, columns=columns)

Compute per-segment morphological measurements.

Supported dimensionality is 2D and 3D. For 2D the measurements are area, perimeter, sphericity (circularity), solidity, major and minor axis lengths, and equivalent diameter. For 3D the measurements are volume, surface area (via marching cubes), sphericity, solidity, major and minor axis lengths, and equivalent diameter.

Physical units are applied via the scale parameter. Anisotropic voxel sizes are supported by passing a per-dimension tuple.

Arguments:
  • segmentation (np.ndarray): Integer-valued label array where 0 is background. Must be 2D or 3D.
  • scale (float | tuple): Physical size of a pixel/voxel. A single float is interpreted as isotropic spacing. A tuple must have one value per spatial dimension in (Y, X) order for 2D or (Z, Y, X) order for 3D. Defaults to 1.0 (pixel/voxel units).
Returns:

pd.DataFrame: One row per segment. index holds the integer label ID of each segment. 2D columns: index, area, perimeter, sphericity, solidity, axis_major_length, axis_minor_length, equivalent_diameter. 3D columns: index, volume, surface_area, sphericity, solidity, axis_major_length, axis_minor_length, equivalent_diameter.

Raises:
  • ValueError: If segmentation is not 2D or 3D, or if scale tuple length does not match ndim.
def measure_cell_nucleus( cell_segmentation: numpy.ndarray, nucleus_segmentation: numpy.ndarray, scale: float | tuple = 1.0, intensity_image: numpy.ndarray | None = None) -> pandas.DataFrame:
 48def measure_cell_nucleus(
 49    cell_segmentation: np.ndarray,
 50    nucleus_segmentation: np.ndarray,
 51    scale: float | tuple = 1.0,
 52    intensity_image: np.ndarray | None = None,
 53) -> pd.DataFrame:
 54    """Compute per-cell measurements combining cell and nucleus segmentations.
 55
 56    For each cell, computes the number of nuclei it contains, the ratio of
 57    cell to nuclear area/volume (in physical units), and optionally the ratio
 58    of intensity statistics between the cytoplasmic and nuclear regions.
 59
 60    The cell area/volume encompasses the nucleus (nuclear mask is **not**
 61    excluded).  For intensity measurements, the nuclear mask **is** excluded
 62    from the cellular region so that cytoplasmic and nuclear intensities are
 63    measured independently.
 64
 65    Supported dimensionality is 2D and 3D.
 66
 67    Args:
 68        cell_segmentation (np.ndarray): Integer-valued label array of cells
 69            where 0 is background.  Must be 2D or 3D.
 70        nucleus_segmentation (np.ndarray): Integer-valued label array of
 71            nuclei where 0 is background.  Must have the same shape as
 72            ``cell_segmentation``.
 73        scale (float | tuple): Physical size of a pixel/voxel.  A single
 74            float is interpreted as isotropic spacing.  A tuple must have one
 75            value per spatial dimension in ``(Y, X)`` order for 2D or
 76            ``(Z, Y, X)`` order for 3D.  Defaults to 1.0 (pixel/voxel units).
 77        intensity_image (np.ndarray | None): Optional intensity image with
 78            the same shape as ``cell_segmentation``.  When provided, intensity
 79            statistics are computed for the cytoplasmic (cell minus nucleus)
 80            and nuclear regions and their ratios are reported.
 81
 82    Returns:
 83        pd.DataFrame: One row per cell.  ``index`` holds the integer cell label
 84            ID.  Columns: ``index``, ``n_nuclei``, ``cell_area``/``cell_volume``,
 85            ``nucleus_area``/``nucleus_volume``, ``area_ratio``/``volume_ratio``.  When *intensity_image* is given,
 86            additional columns are added for ``cell_{stat}_intensity``,
 87            ``nucleus_{stat}_intensity``, and ``{stat}_intensity_ratio`` for
 88            each stat in mean, median, max, min, percentile_10, percentile_25,
 89            percentile_75, percentile_90.  The ``area_ratio``/``volume_ratio``
 90            is ``NaN`` for cells with no detected nucleus.  Intensity ratios
 91            are ``NaN`` when the cytoplasm or nucleus region is empty, or when
 92            the nucleus value is zero.
 93
 94    Raises:
 95        ValueError: If ``cell_segmentation`` is not 2D or 3D, if the shapes
 96            of ``cell_segmentation`` and ``nucleus_segmentation`` do not match,
 97            if ``intensity_image`` shape does not match, or if ``scale`` tuple
 98            length does not match ``ndim``.
 99    """
100    ndim = cell_segmentation.ndim
101    if ndim not in (2, 3):
102        raise ValueError(
103            f"measure_cell_nucleus requires 2D or 3D input, got {ndim}D."
104        )
105
106    if cell_segmentation.shape != nucleus_segmentation.shape:
107        raise ValueError(
108            "cell_segmentation and nucleus_segmentation must have the same shape, "
109            f"got {cell_segmentation.shape} and {nucleus_segmentation.shape}."
110        )
111
112    if intensity_image is not None and intensity_image.shape != cell_segmentation.shape:
113        raise ValueError(
114            "intensity_image must have the same shape as cell_segmentation, "
115            f"got {intensity_image.shape} and {cell_segmentation.shape}."
116        )
117
118    if isinstance(scale, (int, float)):
119        scale_tuple = tuple([float(scale)] * ndim)
120    else:
121        scale_tuple = tuple(float(s) for s in scale)
122        if len(scale_tuple) != ndim:
123            raise ValueError(
124                f"scale must have {ndim} elements for a {ndim}D segmentation, "
125                f"got {len(scale_tuple)}."
126            )
127
128    voxel_size = float(np.prod(scale_tuple))
129    size_col = "area" if ndim == 2 else "volume"
130    has_intensity = intensity_image is not None
131
132    cell_ids = np.unique(cell_segmentation)
133    cell_ids = cell_ids[cell_ids != 0]
134
135    if len(cell_ids) == 0:
136        return pd.DataFrame(columns=_base_columns(ndim, has_intensity))
137
138    rows = []
139    for cell_id in cell_ids:
140        cell_mask = cell_segmentation == cell_id
141        cell_size = float(np.sum(cell_mask)) * voxel_size
142
143        nuc_ids = np.unique(nucleus_segmentation[cell_mask])
144        nuc_ids = nuc_ids[nuc_ids != 0]
145        n_nuclei = int(len(nuc_ids))
146
147        nucleus_mask = (nucleus_segmentation != 0) & cell_mask
148        nucleus_size = float(np.sum(nucleus_mask)) * voxel_size
149
150        size_ratio = cell_size / nucleus_size if nucleus_size > 0 else float("nan")
151
152        row: dict = {
153            "index": int(cell_id),
154            "n_nuclei": n_nuclei,
155            f"cell_{size_col}": cell_size,
156            f"nucleus_{size_col}": nucleus_size,
157            f"{size_col}_ratio": size_ratio,
158        }
159
160        if has_intensity:
161            cyto_mask = cell_mask & ~nucleus_mask
162            cell_stats = (
163                _compute_intensity_stats(intensity_image[cyto_mask])
164                if np.any(cyto_mask)
165                else _nan_intensity_stats()
166            )
167            nuc_stats = (
168                _compute_intensity_stats(intensity_image[nucleus_mask])
169                if np.any(nucleus_mask)
170                else _nan_intensity_stats()
171            )
172
173            for stat in _INTENSITY_STATS:
174                row[f"cell_{stat}_intensity"] = cell_stats[stat]
175            for stat in _INTENSITY_STATS:
176                row[f"nucleus_{stat}_intensity"] = nuc_stats[stat]
177            for stat in _INTENSITY_STATS:
178                c = cell_stats[stat]
179                n = nuc_stats[stat]
180                if not (np.isnan(c) or np.isnan(n)) and n != 0:
181                    row[f"{stat}_intensity_ratio"] = c / n
182                else:
183                    row[f"{stat}_intensity_ratio"] = float("nan")
184
185        rows.append(row)
186
187    return pd.DataFrame(rows, columns=_base_columns(ndim, has_intensity))

Compute per-cell measurements combining cell and nucleus segmentations.

For each cell, computes the number of nuclei it contains, the ratio of cell to nuclear area/volume (in physical units), and optionally the ratio of intensity statistics between the cytoplasmic and nuclear regions.

The cell area/volume encompasses the nucleus (nuclear mask is not excluded). For intensity measurements, the nuclear mask is excluded from the cellular region so that cytoplasmic and nuclear intensities are measured independently.

Supported dimensionality is 2D and 3D.

Arguments:
  • cell_segmentation (np.ndarray): Integer-valued label array of cells where 0 is background. Must be 2D or 3D.
  • nucleus_segmentation (np.ndarray): Integer-valued label array of nuclei where 0 is background. Must have the same shape as cell_segmentation.
  • scale (float | tuple): Physical size of a pixel/voxel. A single float is interpreted as isotropic spacing. A tuple must have one value per spatial dimension in (Y, X) order for 2D or (Z, Y, X) order for 3D. Defaults to 1.0 (pixel/voxel units).
  • intensity_image (np.ndarray | None): Optional intensity image with the same shape as cell_segmentation. When provided, intensity statistics are computed for the cytoplasmic (cell minus nucleus) and nuclear regions and their ratios are reported.
Returns:

pd.DataFrame: One row per cell. index holds the integer cell label ID. Columns: index, n_nuclei, cell_area/cell_volume, nucleus_area/nucleus_volume, area_ratio/volume_ratio. When intensity_image is given, additional columns are added for cell_{stat}_intensity, nucleus_{stat}_intensity, and {stat}_intensity_ratio for each stat in mean, median, max, min, percentile_10, percentile_25, percentile_75, percentile_90. The area_ratio/volume_ratio is NaN for cells with no detected nucleus. Intensity ratios are NaN when the cytoplasm or nucleus region is empty, or when the nucleus value is zero.

Raises:
  • ValueError: If cell_segmentation is not 2D or 3D, if the shapes of cell_segmentation and nucleus_segmentation do not match, if intensity_image shape does not match, or if scale tuple length does not match ndim.
def merge_tables( tables: Sequence[pandas.DataFrame], on: str = 'index') -> pandas.DataFrame:
12def merge_tables(
13    tables: Sequence[pd.DataFrame], on: str = "index"
14) -> pd.DataFrame:
15    """Merge multiple measurement tables on a shared key column.
16
17    Performs an outer join on ``on`` so that label IDs present in some but not
18    all tables are preserved (missing values become NaN).  All tables must
19    contain the ``on`` column, and the *other* columns must be disjoint
20    between tables — otherwise the merge would silently rename or duplicate
21    measurement columns.  Use :func:`drop_columns` first to remove conflicts
22    if needed.
23
24    Args:
25        tables (Sequence[pd.DataFrame]): Two or more measurement tables to
26            merge.
27        on (str): Name of the key column shared between tables. Defaults to
28            ``"index"``.
29
30    Returns:
31        pd.DataFrame: A single table containing the union of all rows and
32            columns.
33
34    Raises:
35        ValueError: If fewer than two tables are provided, if any table is
36            missing the ``on`` column, or if non-``on`` columns overlap
37            between tables.
38    """
39    tables = list(tables)
40    if len(tables) < 2:
41        raise ValueError("merge_tables requires at least two tables.")
42
43    seen_columns: set[str] = set()
44    for i, table in enumerate(tables):
45        if on not in table.columns:
46            raise ValueError(
47                f"Table at index {i} is missing the key column '{on}'."
48            )
49        other_columns = set(table.columns) - {on}
50        conflicts = seen_columns & other_columns
51        if conflicts:
52            raise ValueError(
53                f"Column(s) {sorted(conflicts)} appear in more than one "
54                f"table; drop them before merging."
55            )
56        seen_columns.update(other_columns)
57
58    merged = reduce(lambda left, right: pd.merge(left, right, on=on, how="outer"), tables)
59    return merged.sort_values(on).reset_index(drop=True)

Merge multiple measurement tables on a shared key column.

Performs an outer join on on so that label IDs present in some but not all tables are preserved (missing values become NaN). All tables must contain the on column, and the other columns must be disjoint between tables — otherwise the merge would silently rename or duplicate measurement columns. Use drop_columns() first to remove conflicts if needed.

Arguments:
  • tables (Sequence[pd.DataFrame]): Two or more measurement tables to merge.
  • on (str): Name of the key column shared between tables. Defaults to "index".
Returns:

pd.DataFrame: A single table containing the union of all rows and columns.

Raises:
  • ValueError: If fewer than two tables are provided, if any table is missing the on column, or if non-on columns overlap between tables.
def drop_columns( table: pandas.DataFrame, columns: Union[str, Iterable[str]]) -> pandas.DataFrame:
65def drop_columns(
66    table: pd.DataFrame, columns: Union[str, Iterable[str]]
67) -> pd.DataFrame:
68    """Return a copy of ``table`` with the specified columns removed.
69
70    The ``index`` column is the standard segment-identifier key throughout
71    this package and may never be dropped.
72
73    Args:
74        table (pd.DataFrame): Input measurement table.
75        columns (str | Iterable[str]): Single column name or an iterable of
76            column names to drop.
77
78    Returns:
79        pd.DataFrame: New DataFrame without the dropped columns.
80
81    Raises:
82        ValueError: If any requested column is not present in ``table``, or
83            if a protected column (``index``) is requested.
84    """
85    if isinstance(columns, str):
86        columns_list = [columns]
87    else:
88        columns_list = list(columns)
89    protected = [c for c in columns_list if c in PROTECTED_COLUMNS]
90    if protected:
91        raise ValueError(
92            f"Column(s) {protected} are protected and cannot be dropped."
93        )
94    missing = [c for c in columns_list if c not in table.columns]
95    if missing:
96        raise ValueError(f"Column(s) {missing} not found in table.")
97    return table.drop(columns=columns_list)

Return a copy of table with the specified columns removed.

The index column is the standard segment-identifier key throughout this package and may never be dropped.

Arguments:
  • table (pd.DataFrame): Input measurement table.
  • columns (str | Iterable[str]): Single column name or an iterable of column names to drop.
Returns:

pd.DataFrame: New DataFrame without the dropped columns.

Raises:
  • ValueError: If any requested column is not present in table, or if a protected column (index) is requested.
def suggest_thresholds( measurements: pandas.DataFrame, column: str, n_categories: int) -> list[float]:
17def suggest_thresholds(measurements: pd.DataFrame, column: str, n_categories: int) -> list[float]:
18    """Suggest thresholds for categorizing segments.
19
20    Computes ``n_categories - 1`` threshold values at equally-spaced quantiles
21    of the specified column.
22
23    Args:
24        measurements (pd.DataFrame): Measurement DataFrame as returned by
25            :func:`~segmentation_measurement.measure_intensities` or
26            :func:`~segmentation_measurement.measure_morphology`.
27        column (str): Column name to compute thresholds for.
28        n_categories (int): Number of desired categories. Must be >= 2.
29
30    Returns:
31        list[float]: ``n_categories - 1`` threshold values in ascending order.
32
33    Raises:
34        ValueError: If ``n_categories`` < 2 or ``column`` is not in
35            ``measurements``.
36    """
37    if n_categories < 2:
38        raise ValueError("n_categories must be >= 2.")
39    if column not in measurements.columns:
40        raise ValueError(f"Column '{column}' not found in measurements.")
41    values = measurements[column].dropna().values
42    quantile_positions = np.linspace(0, 100, n_categories + 1)[1:-1]
43    return [float(np.percentile(values, q)) for q in quantile_positions]

Suggest thresholds for categorizing segments.

Computes n_categories - 1 threshold values at equally-spaced quantiles of the specified column.

Arguments:
Returns:

list[float]: n_categories - 1 threshold values in ascending order.

Raises:
  • ValueError: If n_categories < 2 or column is not in measurements.
def categorize_by_threshold( measurements: pandas.DataFrame, column: str, thresholds: list[float], category_names: list[str] | None = None) -> pandas.DataFrame:
 46def categorize_by_threshold(
 47    measurements: pd.DataFrame,
 48    column: str,
 49    thresholds: list[float],
 50    category_names: list[str] | None = None,
 51) -> pd.DataFrame:
 52    """Assign categories to segments based on thresholds.
 53
 54    Segments with values below the first threshold are assigned category 1,
 55    between consecutive thresholds category 2, ..., N.  Works with any
 56    measurement DataFrame (intensity or morphology).
 57
 58    Args:
 59        measurements (pd.DataFrame): Measurement DataFrame as returned by
 60            :func:`~segmentation_measurement.measure_intensities` or
 61            :func:`~segmentation_measurement.measure_morphology`.
 62        column (str): Column name to apply thresholds to.
 63        thresholds (list[float]): ``n_categories - 1`` threshold values.
 64            Need not be sorted; they are sorted internally.
 65        category_names (list[str] | None): ``n_categories`` names, one per
 66            category. Defaults to ``"category_1"``, ``"category_2"``, etc.
 67
 68    Returns:
 69        pd.DataFrame: Copy of ``measurements`` with added columns
 70            ``category_id`` (int, 1-based) and ``category_name`` (str).
 71
 72    Raises:
 73        ValueError: If ``column`` is not in ``measurements`` or
 74            ``category_names`` has the wrong length.
 75    """
 76    if column not in measurements.columns:
 77        raise ValueError(f"Column '{column}' not found in measurements.")
 78    n_categories = len(thresholds) + 1
 79    if category_names is None:
 80        category_names = [f"category_{i + 1}" for i in range(n_categories)]
 81    if len(category_names) != n_categories:
 82        raise ValueError(
 83            f"Expected {n_categories} category names, got {len(category_names)}."
 84        )
 85    result = measurements.copy()
 86    values = result[column].values
 87    # NaN values (e.g. the background-padding row added so napari's Features
 88    # Table can map row position → label) are not categorized.  They get
 89    # ``category_id=0`` and an empty ``category_name``.
 90    if np.issubdtype(np.asarray(values).dtype, np.number):
 91        valid_mask = ~np.isnan(np.asarray(values, dtype=float))
 92    else:
 93        valid_mask = np.ones(len(values), dtype=bool)
 94    category_ids = np.zeros(len(values), dtype=int)
 95    if valid_mask.any():
 96        category_ids[valid_mask] = np.digitize(
 97            np.asarray(values, dtype=float)[valid_mask], sorted(thresholds)
 98        ) + 1
 99    result["category_id"] = category_ids
100    result["category_name"] = [
101        category_names[cid - 1] if cid > 0 else "" for cid in category_ids
102    ]
103    return result

Assign categories to segments based on thresholds.

Segments with values below the first threshold are assigned category 1, between consecutive thresholds category 2, ..., N. Works with any measurement DataFrame (intensity or morphology).

Arguments:
Returns:

pd.DataFrame: Copy of measurements with added columns category_id (int, 1-based) and category_name (str).

Raises:
  • ValueError: If column is not in measurements or category_names has the wrong length.
def cluster_measurements( measurements: pandas.DataFrame, method: str = 'kmeans', **kwargs) -> pandas.DataFrame:
106def cluster_measurements(
107    measurements: pd.DataFrame,
108    method: str = "kmeans",
109    **kwargs,
110) -> pd.DataFrame:
111    """Apply clustering to measurement features.
112
113    Clusters segments using all numeric measurement columns, excluding
114    ``index``, ``cluster_id``, ``category_id``, and ``category_name``.
115    Features are z-score standardised before clustering.
116
117    Args:
118        measurements (pd.DataFrame): Measurement DataFrame as returned by
119            :func:`~segmentation_measurement.measure_intensities` or similar.
120        method (str): Clustering method.  One of ``'kmeans'``, ``'dbscan'``,
121            ``'hdbscan'``, or ``'mean_shift'``.  Defaults to ``'kmeans'``.
122        **kwargs: Keyword arguments forwarded to the underlying scikit-learn
123            estimator.  Sensible defaults are used when not provided:
124            k-means – ``n_clusters=3``; DBSCAN – ``eps=0.5``,
125            ``min_samples=5``; HDBSCAN – ``min_cluster_size=5``; Mean Shift –
126            bandwidth is estimated automatically.
127
128    Returns:
129        pd.DataFrame: Copy of ``measurements`` with an added ``cluster_id``
130            column containing integer cluster labels.  ``-1`` marks noise
131            points for methods that support it (DBSCAN, HDBSCAN).
132
133    Raises:
134        ValueError: If ``method`` is unrecognised or no numeric feature
135            columns are found in ``measurements``.
136    """
137    from sklearn.preprocessing import StandardScaler
138
139    feature_cols = [
140        c for c in measurements.select_dtypes(include="number").columns
141        if c not in _CLUSTER_EXCLUDE
142    ]
143    if not feature_cols:
144        raise ValueError("No numeric feature columns found in measurements.")
145
146    X = measurements[feature_cols].values.astype(float)
147    valid_mask = ~np.isnan(X).any(axis=1)
148    X_valid = StandardScaler().fit_transform(X[valid_mask])
149
150    model = _build_clustering_model(method, kwargs)
151    labels_valid = model.fit_predict(X_valid).copy()
152    # Shift to 1-based; noise (-1) stays -1
153    labels_valid[labels_valid >= 0] += 1
154
155    labels = np.full(len(measurements), -1, dtype=int)
156    labels[valid_mask] = labels_valid
157
158    result = measurements.copy()
159    result["cluster_id"] = labels
160    return result

Apply clustering to measurement features.

Clusters segments using all numeric measurement columns, excluding index, cluster_id, category_id, and category_name. Features are z-score standardised before clustering.

Arguments:
  • measurements (pd.DataFrame): Measurement DataFrame as returned by ~segmentation_measurement.measure_intensities() or similar.
  • method (str): Clustering method. One of 'kmeans', 'dbscan', 'hdbscan', or 'mean_shift'. Defaults to 'kmeans'.
  • **kwargs: Keyword arguments forwarded to the underlying scikit-learn estimator. Sensible defaults are used when not provided: k-means – n_clusters=3; DBSCAN – eps=0.5, min_samples=5; HDBSCAN – min_cluster_size=5; Mean Shift – bandwidth is estimated automatically.
Returns:

pd.DataFrame: Copy of measurements with an added cluster_id column containing integer cluster labels. -1 marks noise points for methods that support it (DBSCAN, HDBSCAN).

Raises:
  • ValueError: If method is unrecognised or no numeric feature columns are found in measurements.
def train_classifier( measurements: pandas.DataFrame, annotation_column: str = 'annotation', method: str = 'random_forest', **kwargs) -> object:
163def train_classifier(
164    measurements: pd.DataFrame,
165    annotation_column: str = "annotation",
166    method: str = "random_forest",
167    **kwargs,
168) -> object:
169    """Train a classifier on annotated measurement features.
170
171    Args:
172        measurements (pd.DataFrame): Measurement DataFrame with annotation labels.
173            Rows where the annotation value equals zero are treated as unannotated
174            and excluded from training.
175        annotation_column (str): Column containing integer annotation labels
176            (1-based). Defaults to ``'annotation'``.
177        method (str): Classifier type. One of ``'logistic_regression'`` or
178            ``'random_forest'``. Defaults to ``'logistic_regression'``.
179        **kwargs: Keyword arguments forwarded to the underlying scikit-learn
180            estimator. Sensible defaults are applied when not provided:
181            logistic regression – ``max_iter=1000``; random forest –
182            ``n_estimators=100``.
183
184    Returns:
185        object: Trained sklearn ``Pipeline`` (``StandardScaler`` + classifier)
186            that can be passed to :func:`apply_classifier` or serialised with
187            ``joblib``.
188
189    Raises:
190        ValueError: If ``annotation_column`` is missing in ``measurements``,
191            no annotated rows exist (all annotation values are zero), no
192            numeric feature columns are found, or ``method`` is unknown.
193    """
194    from sklearn.pipeline import Pipeline
195    from sklearn.preprocessing import StandardScaler
196
197    if annotation_column not in measurements.columns:
198        raise ValueError(f"Annotation column '{annotation_column}' not found.")
199
200    feature_cols = [
201        c for c in measurements.select_dtypes(include="number").columns
202        if c not in _CLASSIFY_EXCLUDE and c != annotation_column
203    ]
204    if not feature_cols:
205        raise ValueError("No numeric feature columns found in measurements.")
206
207    annotated = measurements[measurements[annotation_column] > 0]
208    if len(annotated) == 0:
209        raise ValueError("No annotated rows found (all annotation values are 0).")
210
211    X = annotated[feature_cols].values.astype(float)
212    y = annotated[annotation_column].values.astype(int)
213
214    valid_mask = ~np.isnan(X).any(axis=1)
215    X, y = X[valid_mask], y[valid_mask]
216    if len(X) == 0:
217        raise ValueError("All annotated rows contain NaN feature values.")
218
219    clf = _build_classifier_model(method, kwargs)
220    pipeline = Pipeline([("scaler", StandardScaler()), ("clf", clf)])
221    pipeline.fit(X, y)
222    return pipeline

Train a classifier on annotated measurement features.

Arguments:
  • measurements (pd.DataFrame): Measurement DataFrame with annotation labels. Rows where the annotation value equals zero are treated as unannotated and excluded from training.
  • annotation_column (str): Column containing integer annotation labels (1-based). Defaults to 'annotation'.
  • method (str): Classifier type. One of 'logistic_regression' or 'random_forest'. Defaults to 'logistic_regression'.
  • **kwargs: Keyword arguments forwarded to the underlying scikit-learn estimator. Sensible defaults are applied when not provided: logistic regression – max_iter=1000; random forest – n_estimators=100.
Returns:

object: Trained sklearn Pipeline (StandardScaler + classifier) that can be passed to apply_classifier() or serialised with joblib.

Raises:
  • ValueError: If annotation_column is missing in measurements, no annotated rows exist (all annotation values are zero), no numeric feature columns are found, or method is unknown.
def apply_classifier( measurements: pandas.DataFrame, classifier: object, class_names: list[str] | None = None, annotation_column: str = 'annotation') -> pandas.DataFrame:
225def apply_classifier(
226    measurements: pd.DataFrame,
227    classifier: object,
228    class_names: list[str] | None = None,
229    annotation_column: str = "annotation",
230) -> pd.DataFrame:
231    """Apply a trained classifier to measurement features.
232
233    Args:
234        measurements (pd.DataFrame): Measurement DataFrame to classify.
235        classifier (object): Trained sklearn estimator or ``Pipeline`` as
236            returned by :func:`train_classifier`.
237        class_names (list[str] | None): Optional names for each class, ordered
238            by ascending class label (matching ``classifier.classes_``). Defaults
239            to ``'class_<id>'`` strings.
240        annotation_column (str): Name of the annotation column to exclude from
241            features. Defaults to ``'annotation'``.
242
243    Returns:
244        pd.DataFrame: Copy of ``measurements`` with added columns
245            ``classification_id`` (int, 1-based; 0 for rows with NaN features)
246            and ``classification_name`` (str; empty string for unclassified rows).
247
248    Raises:
249        ValueError: If no numeric feature columns are found.
250    """
251    feature_cols = [
252        c for c in measurements.select_dtypes(include="number").columns
253        if c not in _CLASSIFY_EXCLUDE and c != annotation_column
254    ]
255    if not feature_cols:
256        raise ValueError("No numeric feature columns found in measurements.")
257
258    X = measurements[feature_cols].values.astype(float)
259    valid_mask = ~np.isnan(X).any(axis=1)
260
261    predictions = np.zeros(len(measurements), dtype=int)
262    if valid_mask.any():
263        predictions[valid_mask] = classifier.predict(X[valid_mask]).astype(int)
264
265    classes = sorted(int(c) for c in classifier.classes_)
266    if class_names is not None:
267        name_map = {
268            cid: (class_names[i] if i < len(class_names) else f"class_{cid}")
269            for i, cid in enumerate(classes)
270        }
271    else:
272        name_map = {cid: f"class_{cid}" for cid in classes}
273
274    result = measurements.copy()
275    result["classification_id"] = predictions
276    result["classification_name"] = [
277        name_map.get(int(p), f"class_{p}") if int(p) > 0 else ""
278        for p in predictions
279    ]
280    return result

Apply a trained classifier to measurement features.

Arguments:
  • measurements (pd.DataFrame): Measurement DataFrame to classify.
  • classifier (object): Trained sklearn estimator or Pipeline as returned by train_classifier().
  • class_names (list[str] | None): Optional names for each class, ordered by ascending class label (matching classifier.classes_). Defaults to 'class_<id>' strings.
  • annotation_column (str): Name of the annotation column to exclude from features. Defaults to 'annotation'.
Returns:

pd.DataFrame: Copy of measurements with added columns classification_id (int, 1-based; 0 for rows with NaN features) and classification_name (str; empty string for unclassified rows).

Raises:
  • ValueError: If no numeric feature columns are found.