feat(body_axis): add prior-free AP body-axis inference by khan-u · Pull Request #945 · neuroinformatics-unit/movement

khan-u · 2026-04-02T09:23:43Z

Description

Prior-Free Body-Axis Inference Pipeline

This PR depends on #875.

For review, the intended new work is only the prior-free body-axis inference changes. The overlapping compute_polarization changes are already under review in #875 and are included here only because this PR depends on that unmerged branch state.

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?

This PR solves a practical problem: when computing orientation polarization of animals using body-axis keypoints, the user must specify which keypoint pair defines "posterior → anterior" i.e. the from_node and the to_node. The AP validation pipeline automatically verifies--and, when needed, suggests alternatives for--this choice by leveraging the principle that animals generally move head-first. It does this without any anatomical priors, purely from geometry (PCA) and kinematics (velocity voting).

What does this PR do?

The core question it answers: given a set of keypoints for an animal, which direction is "front" (anterior) and which is "back" (posterior)?

The pipeline is implemented in a new module movement/kinematics/body_axis.py, which provides:

ValidateAPConfig: Configuration dataclass for all tunable parameters
FrameSelection: Dataclass bundling frame indices and segment assignments
APNodePairReport: Dataclass with detailed AP pair evaluation results
validate_ap(): Main validation function for a single individual
run_ap_validation(): Multi-individual validation entry point

The validation is called by compute_polarization() as a side-channel diagnostic when validate_ap=True and body_axis_keypoints is provided. The validation results do not affect the polarization computation itself but are stored in polarization.attrs["ap_validation_result"] for the user to inspect. Configuration parameters for various thresholds can be supplied by the user via ap_validation_config.

body_axis.py contains all AP validation infrastructure organized into sections:

Configuration and Data Classes
Tiered Validity and Centroid Computation
Velocity and Motion Detection
Run and Segment Detection
Skeleton Analysis
K-Medoids Clustering
PCA and Anterior Inference
AP Node-Pair Evaluation (3-Step Filter Cascade)
Scenario Assignment
Input Preparation and Validation
Pipeline Orchestration Functions
Main Validation Function
Multi-Individual Validation

Cross-dataset summary: all 5 datasets achieve correct AP pair identification with unanimous velocity voting (M=1.0) and high directional concentration (R>0.7).

Example Script | Detailed Log

The pipeline in validate_ap() works through these stages:

1. Tiered validity

Frames are classified as tier-1 (≥min_valid_frac of keypoints present AND ≥2 total) or tier-2 (all keypoints present).
This creates a quality hierarchy:
- tier-1 is used for motion segmentation (tolerates minor keypoint dropouts)
- tier-2 is required for skeleton construction and PCA (demands complete observations)

2. Bounding-box centroid computation

Rather than using the arithmetic mean of keypoints (which is density-biased if keypoints cluster on one body region), it uses the midpoint of the axis-aligned bounding box - making it invariant to annotation density asymmetry.
A centroid discrepancy diagnostic computes the normalized distance (distance / bbox diagonal) between the bbox and arithmetic centroids across tier-1 frames, reporting median/mean/max.
If the median discrepancy exceeds 5%, a warning is issued indicating likely asymmetric annotation density - validating the bbox centroid choice for that dataset.

3. High-motion segment detection

Frame-to-frame centroid velocities are computed (valid only when both adjacent frames are tier-1 valid), then sliding windows of window_len speed samples (advanced by stride samples) compute median speeds.
A window is accepted only if every speed sample within it is valid (non-NaN).
Windows whose median speed meets or exceeds the pct_thresh percentile of all valid-window medians are classified as high-motion.
Consecutive qualifying windows form "runs" that must meet a minimum length (min_run_len); runs are converted to frame ranges and merged if overlapping or abutting.
This focuses analysis on frames where the animal is actually moving (and thus has informative velocity).

4. Tier-2 filtering on segments

Selected segment frames are further filtered to retain only tier-2 valid frames (all keypoints present).
A warning is issued if retention falls below 30%.

5. Centroid-centered skeleton construction

Within the selected high-motion, tier-2 frames, each skeleton is centered on its per-frame bbox centroid (the same centroid type used for velocity computation) - removing translational variation and yielding a "shape-only" representation.

6. Postural clustering

Pairwise RMSDs between all centered skeletons are computed and partitioned into within-segment and between-segment groups.
If the between/within variance ratio exceeds postural_var_ratio_thresh AND at least 6 frames are available, k-medoids clustering (with silhouette-based model selection across k ∈ [2, min(max_clusters, n//2)]) partitions frames into postural clusters.
Clustering is accepted only if the best silhouette score exceeds 0.2; otherwise, the pipeline falls back to a global average.
The primary cluster is the largest by frame count.
This handles cases where an animal adopts distinct postures (e.g., rearing vs. walking), ensuring the body model comes from a single coherent posture.

7. PCA on the average skeleton

SVD is performed on the valid (non-NaN) rows of the primary cluster's average centered skeleton, yielding PC1 (the longitudinal body axis) and PC2 (the lateral axis).
A geometric sign convention is applied post-SVD:
- PC1 is flipped so that PC1[1] >= 0 (y-component non-negative)
- PC2 is flipped so that PC2[0] >= 0 (x-component non-negative)
This ensures axis orientation is reproducible across runs and decoupled from the anatomical anterior/posterior assignment, which is determined separately in the next step.

8. Anterior direction inference via velocity voting

Centroid velocities are recomputed using only adjacent consecutive frames within the same segment AND the same cluster (preventing spanning gaps or mixing postures).
These velocity vectors are projected onto PC1.
If more projections are positive than negative (strict majority; ties default to −PC1), anterior = +PC1.
The vote margin M = |n₊ − n₋| / (n₊ + n₋) quantifies confidence (0 = split, 1 = unanimous).
Separately, circular statistics on velocity angles yield the resultant length R = √(C² + S²) where C = mean(cos θ) and S = mean(sin θ), measuring directional concentration (0 = omnidirectional, 1 = unidirectional).
The product R×M is used as a composite quality score:
- in compute_polarization(), each individual's R×M is determined solely by its own motion and body shape (independent of the input keypoint pair)
- the best individual is selected by max R×M
If the vote margin falls below confidence_floor, the pipeline logs a warning that the anterior assignment is unreliable.
If multiple clusters exist, inter-cluster agreement on anterior polarity is reported.

9. Input AP Node-Pair Filter Cascade

Given a candidate keypoint pair (e.g., tail_base → nose), it evaluates quality through:
- Step I - Lateral alignment filter
  - Computes a combined score for each keypoint: effective_lateral = lateral_offset_norm + lateral_var_weight × lateral_std_norm + longitudinal_var_weight × longitudinal_std_norm.
  - This penalizes keypoints that are (a) far from the body axis, (b) swing side-to-side over time, or (c) move along the AP axis.
  - Keypoints with effective score above lateral_thresh_pct (default: 50th percentile) are eliminated—this adaptive threshold retains roughly half the keypoints while preferring those closest to the body axis and most stable over time.
  - Degenerate cases:
    - (a) If all nodes are equally offset (max == min), all normalized offsets are set to 0 and all nodes pass.
    - (b) If all nodes are far from the axis but with spread, the nearest still scores 0 and passes.
- Step II - Opposite-sides constraint
  - Surviving keypoints are checked against the AP midpoint (centroid = mean of PC1 projections among valid keypoints).
  - Pairs are valid only if their two nodes lie on opposite sides of this midpoint (product of their signed distances from midpoint is negative).
  - Pairs on the same side cannot span the body axis.
- Step III - Distal/proximal classification
  - Each surviving pair's nodes are classified by their normalized distance from the midpoint (|pc1_coord − midpoint| / max distance among valid keypoints).
  - A pair is "distal" if both nodes have normalized midpoint distance above edge_thresh_pct (default: 70th percentile); otherwise it is "proximal".
  - The high percentile threshold preferentially selects body-core extremities (head/tail) over limbs.
  - Degenerate case:
    - If all valid nodes are near the midpoint, the most extreme still scores 1.0 and passes.
- Loss diagnostics
  - High Step 1 loss = few axial nodes
  - High Step 2 loss = midpoint poorly separates candidates
  - Low distal fraction = annotation lacks longitudinal spread

10. Suggested Pair

The filter cascade identifies a single suggested AP pair using variance-weighted scoring.
Each candidate pair's AP separation is weighted by the average stability of its two nodes: weighted_sep = separation × (1 − avg_lateral_std), where lateral_std is the normalized standard deviation of each node's lateral offset over time.
This penalizes high-variance extremity keypoints (e.g., leg tips) in favor of stable body-core keypoints (e.g., thorax, abdomen).
If any distal pairs exist, the one with maximum weighted separation is selected (type = "distal"); otherwise, the overall maximum-weighted-separation pair is selected (type = "proximal").
The suggested pair is ordered by order_pair_by_ap() so that element 0 is posterior (lower AP coordinate) and element 1 is anterior (higher AP coordinate), matching the body_axis_keypoints=(from_node, to_node) convention.
The ordered indices are stored in max_separation_distal_nodes or max_separation_nodes on the APNodePairReport.
The order check (input_pair_order_matches_inference) compares the input pair's AP coordinates:
- True if from_node's AP coordinate < to_node's AP coordinate (from_node is more posterior)

11. Mutually Exclusive Scenarios

classify the outcome (accept/warn) based on whether the input pair survived all filters, is distal, has maximum separation, etc.
See flowchart below.

Configuration: (`ValidateAPConfig`)

All configurable thresholds are collected in a single dataclass in movement.kinematics.body_axis. Users pass overrides as a dict via ap_validation_config; any omitted key uses its default.

Parameter	Default	Stage/Step	Description
`min_valid_frac`	0.6	1	Minimum fraction of keypoints present for a frame to qualify as tier-1 valid. Must be in [0, 1].
`window_len`	50	3	Number of speed samples per sliding window for motion detection.
`stride`	5	3	Step size (in speed samples) between consecutive sliding window start positions.
`pct_thresh`	85.0	3	Percentile of valid-window median speeds above which a window is classified as high-motion. Must be in [0, 100].
`min_run_len`	1	3	Minimum number of consecutive qualifying windows to form a valid run.
`postural_var_ratio_thresh`	2.0	6	Between-segment / within-segment RMSD variance ratio above which postural clustering is triggered. Must be positive.
`max_clusters`	4	6	Upper bound on the number of clusters evaluated during k-medoids (actual upper bound is min(`max_clusters`, n//2)).
`confidence_floor`	0.1	8	Vote margin below which the anterior inference is flagged as unreliable. Must be in [0, 1].
`lateral_thresh_pct`	50.0	9-I	Percentile threshold for Step 1 lateral alignment filter. Keypoints with effective lateral score above this percentile are eliminated. Must be in [0, 100].
`edge_thresh_pct`	70.0	9-III	Percentile threshold for Step 3 distal/proximal classification. Pairs where both nodes have normalized midpoint distance above this percentile are classified as "distal". Must be in [0, 100].
`lateral_var_weight`	1.0	9-I	Weight for lateral (PC2) position variance penalty in combined filtering score. Higher values penalize keypoints with more side-to-side motion. Must be non-negative.
`longitudinal_var_weight`	0.5	9-I	Weight for longitudinal (PC1) position variance penalty in combined filtering score. Higher values penalize keypoints with more AP motion. Must be non-negative.

Return: xarray Attribute `ap_validation_result`

When validate_ap=True and body_axis_keypoints is provided, compute_polarization() stores results in polarization.attrs["ap_validation_result"]:

{
    "all_results": [<per-individual result dicts>],
    "best_idx": int  # index into all_results (highest R×M score)
}

Per-Individual Result Dict Fields

Field	Type	Description
`success`	bool	Whether pipeline completed successfully
`anterior_sign`	int	Inferred anterior direction (+1 or -1 relative to PC1)
`vote_margin`	float	Confidence in anterior assignment (0-1)
`resultant_length`	float	Directional concentration of velocities (0-1)
`circ_mean_dir`	float	Circular mean direction angle (radians; present only on success)
`num_selected_frames`	int	Tier-2 frames used for inference
`num_clusters`	int	Number of postural clusters (1 if no clustering)
`primary_cluster`	int	Index of primary (largest) cluster
`PC1`	ndarray	First principal component vector (2,)
`PC2`	ndarray	Second principal component vector (2,)
`avg_skeleton`	ndarray	Average centered skeleton of primary cluster (n_keypoints, 2)
`vel_projs_pc1`	ndarray	Velocity projections onto PC1 (present only on success)
`lateral_std`	ndarray	Per-keypoint std of lateral (PC2) position (present only on success)
`longitudinal_std`	ndarray	Per-keypoint std of longitudinal (PC1) position (present only on success)
`pair_report`	dataclass	`APNodePairReport` with detailed AP pair evaluation
`log_lines`	list[str]	Captured diagnostic output (always populated; not printed to stdout when called via `compute_polarization()`, which hardcodes `verbose=False`)
`error_msg`	str	Error message if pipeline failed (empty string on success)
`individual`	Hashable	Individual name (added by `run_ap_validation()`)

The pair_report field contains scenario (1-13) and outcome ("accept"/"warn") from the flowchart above.

Usage

from movement.io import load_dataset
from movement.kinematics.collective import compute_polarization

# Load tracking data (must have a 'keypoints' dimension)
ds = load_dataset("tracking.slp", source_software="SLEAP", fps=30)

# Basic: compute body-axis polarization with AP validation
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=True,
)

# Validation results are stored in the output's attrs
ap = polarization.attrs["ap_validation_result"]
best = ap["all_results"][ap["best_idx"]]

# Check the inferred anterior direction and confidence
print(f"Anterior sign: {best['anterior_sign']}")   # +1 or -1 relative to PC1
print(f"Vote margin M: {best['vote_margin']:.3f}")  # 0 = split, 1 = unanimous
print(f"Resultant length R: {best['resultant_length']:.3f}")  # directional concentration

# Inspect the pair evaluation
pr = best["pair_report"]
print(f"Scenario: {pr.scenario} ({pr.outcome})")  # e.g. "5 (accept)"
print(f"Input pair order matches inference: {pr.input_pair_order_matches_inference}")

# Check the suggested pair (pipeline-verified posterior → anterior)
if len(pr.max_separation_distal_nodes) > 0:
    suggested = pr.max_separation_distal_nodes  # [posterior_idx, anterior_idx]
    print(f"Suggested distal pair: {suggested}")
elif len(pr.max_separation_nodes) > 0:
    suggested = pr.max_separation_nodes
    print(f"Suggested proximal pair: {suggested}")

# Override config thresholds (any omitted key uses its default)
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=True,
    ap_validation_config={
        "lateral_var_weight": 0.5,  # reduce penalty for side-to-side motion
        "confidence_floor": 0.2,    # stricter confidence warning
    },
)

# Disable validation (default behavior)
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=False,  # this is the default
)

# Read the diagnostic log (always captured; not printed when called via compute_polarization())
for line in best["log_lines"]:
    print(line)

# Direct access to body_axis module for standalone validation
from movement.kinematics.body_axis import validate_ap, ValidateAPConfig

# Run validation directly on a single individual with custom config
config = ValidateAPConfig(lateral_var_weight=0.5, confidence_floor=0.2)
result = validate_ap(
    ds.position.sel(individuals="mouse1"),
    from_node="tail_base",
    to_node="nose",
    config=config,
    verbose=True,  # prints diagnostic output
)

How has this PR been tested?

Yes, with a new file test_body_axis.py.

TestValidateAPConfig (2 tests)

Parameter boundary validation for the ValidateAPConfig dataclass. Tests all 12 configurable fields:

test_invalid_config_values_raise (23 parametrized cases): Each field is tested with out-of-range values - negative fractions, values above 1.0 for [0, 1] fields, zero or negative integers for count fields, floats where integers are required. All must raise ValueError with a message matching "must be".

test_valid_config_does_not_raise: Constructs a ValidateAPConfig with all fields set to non-default valid values and asserts no exception is raised.
The 12 fields tested are: min_valid_frac, window_len, stride, pct_thresh, min_run_len, postural_var_ratio_thresh, max_clusters, confidence_floor, lateral_thresh_pct, edge_thresh_pct, lateral_var_weight, longitudinal_var_weight - matching the configuration table above.

Empirical Validation

The 3-step filter cascade thresholds and pair scoring method were empirically optimized via two validation studies on 5 diverse multi-animal datasets (2Flies, 2Mice, 4Gerbils, 5Mice, 2Bees) with hand-curated ground-truth AP node rankings.

Analysis 1: Grid Search over Design and Parameter Space

Find the configuration that maximizes "both nodes in GT" (suggested pair contains two ground-truth AP nodes) with correct ordering across all datasets.

Example Script | Detailed Log | Results JSON

Details

Method: Exhaustive grid search over 705,024 configurations testing 6 method categories:

Midpoint: geometric center vs. centroid (mean)
Lateral threshold: fixed (0.3, 0.4, 0.5) vs. percentile (30, 40, 50, 60, 70)
Edge threshold: fixed (0.2, 0.3, 0.4, 0.5) vs. percentile (30, 40, 50, 60, 70, 80)
Normalization: body_width vs. min_max vs. percentile_rank
Formula: additive vs. multiplicative vs. RMS
Pair scoring: max_separation vs. weighted_variance vs. weighted_both
Weights: lateral (0.5, 1.0), longitudinal (0.0, 0.5, 1.0)

For each configuration, the best individual per dataset was selected via max R×M, then the 3-step filter cascade was applied to identify the suggested AP pair. Results were scored by: (1) how many datasets achieved "both in GT", (2) how many achieved correct ordering.

Results: Multiple configurations achieved 5/5 datasets with both nodes in GT and correct ordering. The top-ranked configuration:

Parameter	Selected Value
Midpoint	centroid (mean of PC1 projections)
Lateral threshold	50th percentile
Edge threshold	70th percentile
Normalization	body_width
Formula	additive
Pair scoring	weighted_variance
Weights	lateral=1.0, longitudinal=0.5

Implementation: These empirically-validated values are the defaults in ValidateAPConfig:

lateral_thresh_pct=50.0 (Step I lateral filter)
edge_thresh_pct=70.0 (Step III distal classification)
longitudinal_var_weight=0.5 (variance-weighted pair scoring)

Per-dataset filter cascade results with optimal configuration:

Dataset	Step 1 (lateral)	Step 2 (opposite)	Step 3 (distal)	Suggested Pair	Status
2Flies	7/13 nodes	12/21 pairs	1/12 pairs	[2 → 0]	Both in GT, correct
2Mice	3/5 nodes	2/3 pairs	0/2 pairs	[3 → 0]	Both in GT, correct
4Gerbils	7/14 nodes	10/21 pairs	0/10 pairs	[9 → 5]	Both in GT, correct
5Mice	6/11 nodes	8/15 pairs	0/8 pairs	[6 → 1]	Both in GT, correct
2Bees	11/21 nodes	30/55 pairs	3/30 pairs	[2 → 1]	Both in GT, correct

Analysis 2: Metric Evaluation for 'Best' Individual Selection

Validate that R×M is the best metric for selecting the "reference individual" whose AP ordering others should align with.

Example Script | Detailed Log

Details

Method: For each of 5 metrics, select the individual with the highest score per dataset and check their ground-truth accuracy (% of AP node pairs correctly ordered vs. hand-curated GT).

Metrics tested:

R×M (resultant_length × vote_margin): Composite locomotion quality score
PC1 variance ratio: Fraction of total variance explained by the first principal component
Mean inverse lateral variance: Average of 1/σ² for each keypoint's lateral offset over time (rewards stable body-core keypoints)
Agreement score: Fraction of other individuals whose raw PC1 ordering matches this individual
Skeleton completeness: Fraction of frames with all keypoints present

Metric	100% Accuracy	Mean Accuracy	Per-Dataset
R×M	5/5	100.0%	2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:✓
mean_inv_lateral_var	5/5	100.0%	2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:✓
agreement_score	4/5	80.0%	2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:0%
skeleton_completeness	4/5	80.0%	2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:0%
pc1_variance_ratio	3/5	74.7%	2Flies:✓ 2Mice:✓ 4Gerbils:73% 5Mice:✓ 2Bees:0%

Detailed Per-Dataset Breakdown (Metric Selection)

4Gerbils (4 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  female          | 0.004  | 3.59    | 0.02    | 0.33   | 1.00   | 100.0%
  pup unshaved    | 0.245  | 4.07    | 0.05    | 0.33   | 1.00   | 100.0%  ← R×M selects
  male            | 0.016  | 6.08    | 0.02    | 0.00   | 1.00   |  73.3%  ← PC1 var would select (wrong)
  pup shaved      | 0.018  | 2.79    | 0.04    | 0.00   | 1.00   |  73.3%

5Mice (5 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  track_0         | 0.843  | 5.48    | 0.06    | 1.00   | 1.00   | 100.0%  ← R×M selects
  track_1         | 0.722  | 4.67    | 0.04    | 1.00   | 1.00   | 100.0%
  track_2         | 0.079  | 3.47    | 0.06    | 1.00   | 1.00   | 100.0%
  track_3         | 0.366  | 5.33    | 0.03    | 1.00   | 1.00   | 100.0%
  track_4         | 0.526  | 4.18    | 0.03    | 1.00   | 1.00   | 100.0%

2Bees (2 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  track_1         | 0.206  | 1.60    | 0.03    | 0.00   | 1.00   | 100.0%  ← R×M selects
  track_0         | 0.004  | 2.12    | 0.02    | 0.00   | 1.00   |   0.0%  ← All others would select (wrong)

The 2Bees case is particularly instructive: track_0 has higher PC1 variance ratio, higher skeleton completeness, and equal agreement score - yet 0% GT accuracy. Only R×M (and mean_inv_lateral_var) correctly identify track_1 as the trustworthy reference.

Conclusion: R×M and mean_inv_lateral_var both achieve perfect reference selection (5/5). R×M is preferred because it directly measures locomotion quality (the physical basis for AP inference) rather than an indirect proxy. Additionally, R×M requires no additional computation beyond what's already performed for anterior direction inference.

Other Datasets

2Flies (track_0):

2Mice (track_0):

4Gerbils (pup_unshaved):

5Mice (track_0):

Flowchart: Input AP Node-Pair Filter Cascade

Terminology:

Survivors: Pairs that passed both Step I (lateral alignment) and Step II (opposite-sides
constraint).
Distal pair: A surviving pair where both nodes have normalized midpoint distance above
the edge_thresh_pct percentile (default: 70th).
Proximal pair: A surviving pair where at least one node has normalized midpoint
distance below the edge_thresh_pct percentile.
Max-sep overall: The surviving pair with the largest variance-weighted AP separation among
all survivors (distal or proximal).
Max-sep distal: The surviving pair with the largest variance-weighted AP separation among
distal survivors only.
Input pair rank: The input pair's rank by variance-weighted separation among all survivors
(rank 1 = largest weighted separation).

AP Node-Pair Filter Cascade Flowchart

STEP I: Lateral Alignment Filter
────────────────────────────────
                [All valid keypoints]
                         |
                         v
          effective_lateral_score <= lateral_thresh_pct?
                       /   \
                     Yes    No --> [Eliminated]
                      |
                      v
               [Candidate nodes]
                      |
                      v
        >= 2 candidates? --No--> [FAIL: Step I]
                      |
                     Yes
                      |
                      v
STEP II: Opposite-Sides Constraint
─────────────────────────────────
          pair on opposite sides of centroid (mean PC1)?
                       /   \
                     Yes    No --> [FAIL: Step II]
                      |
                      v
             [Surviving pairs]       <-- pairs that passed Steps I + II
                      |
                      v
STEP III: Distal/Proximal Classification
───────────────────────────────────────
     both nodes' midline_dist_norm >= edge_thresh_pct?
                       /   \
                     Yes    No
                      |      |
                      v      v
                [Distal] [Proximal]
                      \    /
                       \  /
                        \/
                        |
                        v
SUGGESTED PAIR SELECTION (variance-weighted)
────────────────────────────────────────────
     Any distal pairs among survivors?
            /    \
          Yes     No
           |       |
           v       v
     Max weighted-sep    Max weighted-sep
     distal pair         overall pair
                        |
                        v
SCENARIO ASSIGNMENT (13 mutually exclusive outcomes)
────────────────────────────────────────────────────

Single pair survived Steps I–II?
|
+--Yes--> Input pair == the survivor?
|         |
|         +--Yes--> Survivor is distal?
|         |         |
|         |         +--Yes--> #1 ACCEPT: input pair confirmed (distal)
|         |         +--No---> #2 WARN: input pair is proximal
|         |
|         +--No---> Survivor is distal?
|                   |
|                   +--Yes--> #3 WARN: input pair eliminated, suggest survivor
|                   +--No---> #4 WARN: input pair eliminated, only option is proximal
|
+--No (multiple pairs survived)
          |
          +--> Input pair among survivors?
               |
               +--Yes--> Input pair is distal?
               |         |
               |         +--Yes--> Input pair is max-sep overall?
               |         |         |
               |         |         +--Yes-----------> #5 ACCEPT: input pair is best
               |         |         |
               |         |         +--No--> Input pair is max-sep among distal?
               |         |                  |
               |         |                  +--Yes--> #7 ACCEPT: input pair is best distal
               |         |                  +--No---> #6 WARN: better distal pair exists
               |         |
               |         +--No (input pair is proximal)
               |                   |
               |                   +--> Input pair is max-sep overall?
               |                        |
               |                        +--Yes--> Any distal survivor?
               |                        |         |
               |                        |         +--Yes--> #8 WARN: input proximal, distal alternative exists
               |                        |         +--No---> #9 WARN: input proximal, all survivors proximal
               |                        |
               |                        +--No---> Any distal survivor?
               |                                  |
               |                                  +--Yes--> #10 WARN: input proximal, distal alternative exists
               |                                  +--No---> #11 WARN: input proximal, all survivors proximal
               |
               +--No (input pair not among survivors)
                         |
                         +--> Any distal survivor?
                              |
                              +--Yes--> #12 WARN: input eliminated, suggest max-sep distal
                              +--No---> #13 WARN: input eliminated, suggest max-sep overall

Future Refactoring Opportunity

The body_axis.py module (~2,900 lines) is intentionally monolithic in this PR to simplify review and iteration. Once the API stabilizes, general-purpose functionality could be extracted to existing or new utility modules:

movement/
├── kinematics/
│   ├── body_axis.py          # Reduced: AP-specific logic only
│   ├── collective.py
│   └── ...
├── utils/
│   ├── vector.py             # + circular_mean, resultant_length (from body_axis)
│   ├── clustering.py         # NEW: kmedoids, silhouette_score (from body_axis)
│   ├── temporal.py           # NEW: detect_runs, merge_segments (from body_axis)
│   └── ...

References

Is this a breaking change?

No.

Does this PR require an update to the documentation?

No - API docs auto-generate from docstrings.

Checklist

Code tested locally
Tests added for new functionality
Formatted with pre-commit

… behavior

for more information, see https://pre-commit.ci

…dundancy

…compute_polarization

…tion, edge case handling, and simplified tests

…ts, clarify orientation vs heading terminology

…n for polarization

…omputation

for more information, see https://pre-commit.ci

sonarqubecloud · 2026-04-05T09:58:18Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

khan-u and others added 14 commits March 31, 2026 10:04

feat: Add compute_polarization for collective behavior analysis

872cb0d

test(kinematics): add polarization edge cases and clarify first-frame…

fc2bab5

… behavior

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1896bc

for more information, see https://pre-commit.ci

linting fix

0087208

test(collective): consolidate related polarization tests to reduce re…

0af9dc7

…dundancy

feat(kinematics): add displacement_frames and return_angle params to …

758ea04

…compute_polarization

refactor(collective): Rewrite compute_polarization with robust valida…

e271a8c

…tion, edge case handling, and simplified tests

test(collective): add .sel() keypoint selection test, clarify docs

2bd4386

refactor(polarization): rename heading_keypoints to body_axis_keypoin…

aef4ee9

…ts, clarify orientation vs heading terminology

linting fix

6efe00e

docs(polarization) use neutral u_hat notation for unit direction vector

8182d05

test(collective): add mathematical invariance, edge case, & validatio…

96ef674

…n for polarization

feat(polarization): add in_degrees parameter + unit test

7d61af5

SonarCloud warning fixes

9313b9d

khan-u force-pushed the body-axis-ap-inference branch 2 times, most recently from 5cd79d6 to 01d16a8 Compare April 2, 2026 10:05

khan-u marked this pull request as draft April 2, 2026 11:05

khan-u changed the title ~~feat(collective): add prior-free body-axis inference~~ WIP: feat(collective): add prior-free body-axis inference for compute_polarization Apr 2, 2026

This was referenced Apr 2, 2026

docs(examples): add AP inference validation demo khan-u/movement#2

Open

demo(examples): add AP inference validation demo #946

Closed

khan-u changed the title ~~WIP: feat(collective): add prior-free body-axis inference for compute_polarization~~ feat(collective): add prior-free body-axis inference for compute_polarization Apr 2, 2026

khan-u marked this pull request as ready for review April 2, 2026 13:49

khan-u mentioned this pull request Apr 3, 2026

demo(examples): multi-timescale polarization analysis + visualization with inferred AP node selection #947

Closed

5 tasks

khan-u added 2 commits April 3, 2026 18:21

refactor(collective): use more vector.py utilities for polarization c…

6dd7fbb

…omputation

feat(collective): add prior-free body-axis inference

51866c9

khan-u force-pushed the body-axis-ap-inference branch from 01d16a8 to 51866c9 Compare April 4, 2026 05:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

1bd1618

for more information, see https://pre-commit.ci

khan-u force-pushed the body-axis-ap-inference branch from cbc1c25 to 1bd1618 Compare April 4, 2026 08:05

refactor(collective) extract out AP validation into body_axis.py

14e2674

khan-u changed the title ~~feat(collective): add prior-free body-axis inference for compute_polarization~~ feat(body_axis): add prior-free A-P body-axis inference Apr 4, 2026

khan-u changed the title ~~feat(body_axis): add prior-free A-P body-axis inference~~ feat(body_axis): add prior-free AP body-axis inference Apr 4, 2026

khan-u closed this Apr 5, 2026

khan-u reopened this Apr 5, 2026

khan-u force-pushed the body-axis-ap-inference branch 5 times, most recently from c78dce3 to 615666c Compare April 5, 2026 09:48

update(body_axis) new config params optimized via grid search

b1df3b9

khan-u force-pushed the body-axis-ap-inference branch from 358e817 to b1df3b9 Compare April 5, 2026 09:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

15fd84d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(body_axis): add prior-free AP body-axis inference #945

feat(body_axis): add prior-free AP body-axis inference #945
khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
khan-u:body-axis-ap-inference

khan-u commented Apr 2, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khan-u commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

References

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist

Uh oh!

sonarqubecloud bot commented Apr 5, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khan-u commented Apr 2, 2026 •

edited

Loading