Skip to content

feat(body_axis): add prior-free AP body-axis inference #945

Open
khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
khan-u:body-axis-ap-inference
Open

feat(body_axis): add prior-free AP body-axis inference #945
khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
khan-u:body-axis-ap-inference

Conversation

@khan-u
Copy link
Copy Markdown

@khan-u khan-u commented Apr 2, 2026

Description

Prior-Free Body-Axis Inference Pipeline

This PR depends on #875.

For review, the intended new work is only the prior-free body-axis inference changes. The overlapping compute_polarization changes are already under review in #875 and are included here only because this PR depends on that unmerged branch state.

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

This PR solves a practical problem: when computing orientation polarization of animals using body-axis keypoints, the user must specify which keypoint pair defines "posterior → anterior" i.e. the from_node and the to_node. The AP validation pipeline automatically verifies--and, when needed, suggests alternatives for--this choice by leveraging the principle that animals generally move head-first. It does this without any anatomical priors, purely from geometry (PCA) and kinematics (velocity voting).

What does this PR do?

The core question it answers: given a set of keypoints for an animal, which direction is "front" (anterior) and which is "back" (posterior)?

The pipeline is implemented in a new module movement/kinematics/body_axis.py, which provides:

  • ValidateAPConfig: Configuration dataclass for all tunable parameters
  • FrameSelection: Dataclass bundling frame indices and segment assignments
  • APNodePairReport: Dataclass with detailed AP pair evaluation results
  • validate_ap(): Main validation function for a single individual
  • run_ap_validation(): Multi-individual validation entry point

The validation is called by compute_polarization() as a side-channel diagnostic when validate_ap=True and body_axis_keypoints is provided. The validation results do not affect the polarization computation itself but are stored in polarization.attrs["ap_validation_result"] for the user to inspect. Configuration parameters for various thresholds can be supplied by the user via ap_validation_config.


body_axis.py contains all AP validation infrastructure organized into sections:

  • Configuration and Data Classes
  • Tiered Validity and Centroid Computation
  • Velocity and Motion Detection
  • Run and Segment Detection
  • Skeleton Analysis
  • K-Medoids Clustering
  • PCA and Anterior Inference
  • AP Node-Pair Evaluation (3-Step Filter Cascade)
  • Scenario Assignment
  • Input Preparation and Validation
  • Pipeline Orchestration Functions
  • Main Validation Function
  • Multi-Individual Validation

AP Validation Pipeline Overview

Cross-dataset summary: all 5 datasets achieve correct AP pair identification with unanimous velocity voting (M=1.0) and high directional concentration (R>0.7).

Example Script | Detailed Log

The pipeline in validate_ap() works through these stages:

1. Tiered validity
  • Frames are classified as tier-1 (≥min_valid_frac of keypoints present AND ≥2 total) or tier-2 (all keypoints present).
  • This creates a quality hierarchy:
    • tier-1 is used for motion segmentation (tolerates minor keypoint dropouts)
    • tier-2 is required for skeleton construction and PCA (demands complete observations)
2. Bounding-box centroid computation
  • Rather than using the arithmetic mean of keypoints (which is density-biased if keypoints cluster on one body region), it uses the midpoint of the axis-aligned bounding box - making it invariant to annotation density asymmetry.
  • A centroid discrepancy diagnostic computes the normalized distance (distance / bbox diagonal) between the bbox and arithmetic centroids across tier-1 frames, reporting median/mean/max.
  • If the median discrepancy exceeds 5%, a warning is issued indicating likely asymmetric annotation density - validating the bbox centroid choice for that dataset.
3. High-motion segment detection
  • Frame-to-frame centroid velocities are computed (valid only when both adjacent frames are tier-1 valid), then sliding windows of window_len speed samples (advanced by stride samples) compute median speeds.
  • A window is accepted only if every speed sample within it is valid (non-NaN).
  • Windows whose median speed meets or exceeds the pct_thresh percentile of all valid-window medians are classified as high-motion.
  • Consecutive qualifying windows form "runs" that must meet a minimum length (min_run_len); runs are converted to frame ranges and merged if overlapping or abutting.
  • This focuses analysis on frames where the animal is actually moving (and thus has informative velocity).
4. Tier-2 filtering on segments
  • Selected segment frames are further filtered to retain only tier-2 valid frames (all keypoints present).
  • A warning is issued if retention falls below 30%.
5. Centroid-centered skeleton construction
  • Within the selected high-motion, tier-2 frames, each skeleton is centered on its per-frame bbox centroid (the same centroid type used for velocity computation) - removing translational variation and yielding a "shape-only" representation.
6. Postural clustering
  • Pairwise RMSDs between all centered skeletons are computed and partitioned into within-segment and between-segment groups.
  • If the between/within variance ratio exceeds postural_var_ratio_thresh AND at least 6 frames are available, k-medoids clustering (with silhouette-based model selection across k ∈ [2, min(max_clusters, n//2)]) partitions frames into postural clusters.
  • Clustering is accepted only if the best silhouette score exceeds 0.2; otherwise, the pipeline falls back to a global average.
  • The primary cluster is the largest by frame count.
  • This handles cases where an animal adopts distinct postures (e.g., rearing vs. walking), ensuring the body model comes from a single coherent posture.
7. PCA on the average skeleton
  • SVD is performed on the valid (non-NaN) rows of the primary cluster's average centered skeleton, yielding PC1 (the longitudinal body axis) and PC2 (the lateral axis).
  • A geometric sign convention is applied post-SVD:
    • PC1 is flipped so that PC1[1] >= 0 (y-component non-negative)
    • PC2 is flipped so that PC2[0] >= 0 (x-component non-negative)
  • This ensures axis orientation is reproducible across runs and decoupled from the anatomical anterior/posterior assignment, which is determined separately in the next step.
8. Anterior direction inference via velocity voting
  • Centroid velocities are recomputed using only adjacent consecutive frames within the same segment AND the same cluster (preventing spanning gaps or mixing postures).
  • These velocity vectors are projected onto PC1.
  • If more projections are positive than negative (strict majority; ties default to −PC1), anterior = +PC1.
  • The vote margin M = |n₊ − n₋| / (n₊ + n₋) quantifies confidence (0 = split, 1 = unanimous).
  • Separately, circular statistics on velocity angles yield the resultant length R = √(C² + S²) where C = mean(cos θ) and S = mean(sin θ), measuring directional concentration (0 = omnidirectional, 1 = unidirectional).
  • The product R×M is used as a composite quality score:
    • in compute_polarization(), each individual's R×M is determined solely by its own motion and body shape (independent of the input keypoint pair)
    • the best individual is selected by max R×M
  • If the vote margin falls below confidence_floor, the pipeline logs a warning that the anterior assignment is unreliable.
  • If multiple clusters exist, inter-cluster agreement on anterior polarity is reported.
9. Input AP Node-Pair Filter Cascade
  • Given a candidate keypoint pair (e.g., tail_base → nose), it evaluates quality through:
    • Step I - Lateral alignment filter
      • Computes a combined score for each keypoint: effective_lateral = lateral_offset_norm + lateral_var_weight × lateral_std_norm + longitudinal_var_weight × longitudinal_std_norm.
      • This penalizes keypoints that are (a) far from the body axis, (b) swing side-to-side over time, or (c) move along the AP axis.
      • Keypoints with effective score above lateral_thresh_pct (default: 50th percentile) are eliminated—this adaptive threshold retains roughly half the keypoints while preferring those closest to the body axis and most stable over time.
      • Degenerate cases:
        • (a) If all nodes are equally offset (max == min), all normalized offsets are set to 0 and all nodes pass.
        • (b) If all nodes are far from the axis but with spread, the nearest still scores 0 and passes.
    • Step II - Opposite-sides constraint
      • Surviving keypoints are checked against the AP midpoint (centroid = mean of PC1 projections among valid keypoints).
      • Pairs are valid only if their two nodes lie on opposite sides of this midpoint (product of their signed distances from midpoint is negative).
      • Pairs on the same side cannot span the body axis.
    • Step III - Distal/proximal classification
      • Each surviving pair's nodes are classified by their normalized distance from the midpoint (|pc1_coord − midpoint| / max distance among valid keypoints).
      • A pair is "distal" if both nodes have normalized midpoint distance above edge_thresh_pct (default: 70th percentile); otherwise it is "proximal".
      • The high percentile threshold preferentially selects body-core extremities (head/tail) over limbs.
      • Degenerate case:
        • If all valid nodes are near the midpoint, the most extreme still scores 1.0 and passes.
    • Loss diagnostics
      • High Step 1 loss = few axial nodes
      • High Step 2 loss = midpoint poorly separates candidates
      • Low distal fraction = annotation lacks longitudinal spread
10. Suggested Pair
  • The filter cascade identifies a single suggested AP pair using variance-weighted scoring.
  • Each candidate pair's AP separation is weighted by the average stability of its two nodes: weighted_sep = separation × (1 − avg_lateral_std), where lateral_std is the normalized standard deviation of each node's lateral offset over time.
  • This penalizes high-variance extremity keypoints (e.g., leg tips) in favor of stable body-core keypoints (e.g., thorax, abdomen).
  • If any distal pairs exist, the one with maximum weighted separation is selected (type = "distal"); otherwise, the overall maximum-weighted-separation pair is selected (type = "proximal").
  • The suggested pair is ordered by order_pair_by_ap() so that element 0 is posterior (lower AP coordinate) and element 1 is anterior (higher AP coordinate), matching the body_axis_keypoints=(from_node, to_node) convention.
  • The ordered indices are stored in max_separation_distal_nodes or max_separation_nodes on the APNodePairReport.
  • The order check (input_pair_order_matches_inference) compares the input pair's AP coordinates:
    • True if from_node's AP coordinate < to_node's AP coordinate (from_node is more posterior)
11. Mutually Exclusive Scenarios
  • classify the outcome (accept/warn) based on whether the input pair survived all filters, is distal, has maximum separation, etc.
  • See flowchart below.

Configuration: (`ValidateAPConfig`)

All configurable thresholds are collected in a single dataclass in movement.kinematics.body_axis. Users pass overrides as a dict via ap_validation_config; any omitted key uses its default.

Parameter Default Stage/Step Description
min_valid_frac 0.6 1 Minimum fraction of keypoints present for a frame to qualify as tier-1 valid. Must be in [0, 1].
window_len 50 3 Number of speed samples per sliding window for motion detection.
stride 5 3 Step size (in speed samples) between consecutive sliding window start positions.
pct_thresh 85.0 3 Percentile of valid-window median speeds above which a window is classified as high-motion. Must be in [0, 100].
min_run_len 1 3 Minimum number of consecutive qualifying windows to form a valid run.
postural_var_ratio_thresh 2.0 6 Between-segment / within-segment RMSD variance ratio above which postural clustering is triggered. Must be positive.
max_clusters 4 6 Upper bound on the number of clusters evaluated during k-medoids (actual upper bound is min(max_clusters, n//2)).
confidence_floor 0.1 8 Vote margin below which the anterior inference is flagged as unreliable. Must be in [0, 1].
lateral_thresh_pct 50.0 9-I Percentile threshold for Step 1 lateral alignment filter. Keypoints with effective lateral score above this percentile are eliminated. Must be in [0, 100].
edge_thresh_pct 70.0 9-III Percentile threshold for Step 3 distal/proximal classification. Pairs where both nodes have normalized midpoint distance above this percentile are classified as "distal". Must be in [0, 100].
lateral_var_weight 1.0 9-I Weight for lateral (PC2) position variance penalty in combined filtering score. Higher values penalize keypoints with more side-to-side motion. Must be non-negative.
longitudinal_var_weight 0.5 9-I Weight for longitudinal (PC1) position variance penalty in combined filtering score. Higher values penalize keypoints with more AP motion. Must be non-negative.

Return: xarray Attribute `ap_validation_result`

When validate_ap=True and body_axis_keypoints is provided, compute_polarization() stores results in polarization.attrs["ap_validation_result"]:

{
    "all_results": [<per-individual result dicts>],
    "best_idx": int  # index into all_results (highest R×M score)
}

Per-Individual Result Dict Fields

Field Type Description
success bool Whether pipeline completed successfully
anterior_sign int Inferred anterior direction (+1 or -1 relative to PC1)
vote_margin float Confidence in anterior assignment (0-1)
resultant_length float Directional concentration of velocities (0-1)
circ_mean_dir float Circular mean direction angle (radians; present only on success)
num_selected_frames int Tier-2 frames used for inference
num_clusters int Number of postural clusters (1 if no clustering)
primary_cluster int Index of primary (largest) cluster
PC1 ndarray First principal component vector (2,)
PC2 ndarray Second principal component vector (2,)
avg_skeleton ndarray Average centered skeleton of primary cluster (n_keypoints, 2)
vel_projs_pc1 ndarray Velocity projections onto PC1 (present only on success)
lateral_std ndarray Per-keypoint std of lateral (PC2) position (present only on success)
longitudinal_std ndarray Per-keypoint std of longitudinal (PC1) position (present only on success)
pair_report dataclass APNodePairReport with detailed AP pair evaluation
log_lines list[str] Captured diagnostic output (always populated; not printed to stdout when called via compute_polarization(), which hardcodes verbose=False)
error_msg str Error message if pipeline failed (empty string on success)
individual Hashable Individual name (added by run_ap_validation())

The pair_report field contains scenario (1-13) and outcome ("accept"/"warn") from the flowchart above.


Usage

from movement.io import load_dataset
from movement.kinematics.collective import compute_polarization

# Load tracking data (must have a 'keypoints' dimension)
ds = load_dataset("tracking.slp", source_software="SLEAP", fps=30)

# Basic: compute body-axis polarization with AP validation
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=True,
)

# Validation results are stored in the output's attrs
ap = polarization.attrs["ap_validation_result"]
best = ap["all_results"][ap["best_idx"]]

# Check the inferred anterior direction and confidence
print(f"Anterior sign: {best['anterior_sign']}")   # +1 or -1 relative to PC1
print(f"Vote margin M: {best['vote_margin']:.3f}")  # 0 = split, 1 = unanimous
print(f"Resultant length R: {best['resultant_length']:.3f}")  # directional concentration

# Inspect the pair evaluation
pr = best["pair_report"]
print(f"Scenario: {pr.scenario} ({pr.outcome})")  # e.g. "5 (accept)"
print(f"Input pair order matches inference: {pr.input_pair_order_matches_inference}")

# Check the suggested pair (pipeline-verified posterior → anterior)
if len(pr.max_separation_distal_nodes) > 0:
    suggested = pr.max_separation_distal_nodes  # [posterior_idx, anterior_idx]
    print(f"Suggested distal pair: {suggested}")
elif len(pr.max_separation_nodes) > 0:
    suggested = pr.max_separation_nodes
    print(f"Suggested proximal pair: {suggested}")

# Override config thresholds (any omitted key uses its default)
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=True,
    ap_validation_config={
        "lateral_var_weight": 0.5,  # reduce penalty for side-to-side motion
        "confidence_floor": 0.2,    # stricter confidence warning
    },
)

# Disable validation (default behavior)
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "nose"),
    validate_ap=False,  # this is the default
)

# Read the diagnostic log (always captured; not printed when called via compute_polarization())
for line in best["log_lines"]:
    print(line)

# Direct access to body_axis module for standalone validation
from movement.kinematics.body_axis import validate_ap, ValidateAPConfig

# Run validation directly on a single individual with custom config
config = ValidateAPConfig(lateral_var_weight=0.5, confidence_floor=0.2)
result = validate_ap(
    ds.position.sel(individuals="mouse1"),
    from_node="tail_base",
    to_node="nose",
    config=config,
    verbose=True,  # prints diagnostic output
)

How has this PR been tested?

Yes, with a new file test_body_axis.py.

TestValidateAPConfig (2 tests)

  • Parameter boundary validation for the ValidateAPConfig dataclass. Tests all 12 configurable fields:

    test_invalid_config_values_raise (23 parametrized cases): Each field is tested with out-of-range values - negative fractions, values above 1.0 for [0, 1] fields, zero or negative integers for count fields, floats where integers are required. All must raise ValueError with a message matching "must be".

    test_valid_config_does_not_raise: Constructs a ValidateAPConfig with all fields set to non-default valid values and asserts no exception is raised.

  • The 12 fields tested are: min_valid_frac, window_len, stride, pct_thresh, min_run_len, postural_var_ratio_thresh, max_clusters, confidence_floor, lateral_thresh_pct, edge_thresh_pct, lateral_var_weight, longitudinal_var_weight - matching the configuration table above.


Empirical Validation

The 3-step filter cascade thresholds and pair scoring method were empirically optimized via two validation studies on 5 diverse multi-animal datasets (2Flies, 2Mice, 4Gerbils, 5Mice, 2Bees) with hand-curated ground-truth AP node rankings.

Analysis 1: Grid Search over Design and Parameter Space

Find the configuration that maximizes "both nodes in GT" (suggested pair contains two ground-truth AP nodes) with correct ordering across all datasets.

Example Script | Detailed Log | Results JSON

Details

Method: Exhaustive grid search over 705,024 configurations testing 6 method categories:

  • Midpoint: geometric center vs. centroid (mean)
  • Lateral threshold: fixed (0.3, 0.4, 0.5) vs. percentile (30, 40, 50, 60, 70)
  • Edge threshold: fixed (0.2, 0.3, 0.4, 0.5) vs. percentile (30, 40, 50, 60, 70, 80)
  • Normalization: body_width vs. min_max vs. percentile_rank
  • Formula: additive vs. multiplicative vs. RMS
  • Pair scoring: max_separation vs. weighted_variance vs. weighted_both
  • Weights: lateral (0.5, 1.0), longitudinal (0.0, 0.5, 1.0)

For each configuration, the best individual per dataset was selected via max R×M, then the 3-step filter cascade was applied to identify the suggested AP pair. Results were scored by: (1) how many datasets achieved "both in GT", (2) how many achieved correct ordering.

Results: Multiple configurations achieved 5/5 datasets with both nodes in GT and correct ordering. The top-ranked configuration:

Parameter Selected Value
Midpoint centroid (mean of PC1 projections)
Lateral threshold 50th percentile
Edge threshold 70th percentile
Normalization body_width
Formula additive
Pair scoring weighted_variance
Weights lateral=1.0, longitudinal=0.5

Implementation: These empirically-validated values are the defaults in ValidateAPConfig:

  • lateral_thresh_pct=50.0 (Step I lateral filter)
  • edge_thresh_pct=70.0 (Step III distal classification)
  • longitudinal_var_weight=0.5 (variance-weighted pair scoring)
Per-dataset filter cascade results with optimal configuration:
Dataset Step 1 (lateral) Step 2 (opposite) Step 3 (distal) Suggested Pair Status
2Flies 7/13 nodes 12/21 pairs 1/12 pairs [2 → 0] Both in GT, correct
2Mice 3/5 nodes 2/3 pairs 0/2 pairs [3 → 0] Both in GT, correct
4Gerbils 7/14 nodes 10/21 pairs 0/10 pairs [9 → 5] Both in GT, correct
5Mice 6/11 nodes 8/15 pairs 0/8 pairs [6 → 1] Both in GT, correct
2Bees 11/21 nodes 30/55 pairs 3/30 pairs [2 → 1] Both in GT, correct

Analysis 2: Metric Evaluation for 'Best' Individual Selection

Validate that R×M is the best metric for selecting the "reference individual" whose AP ordering others should align with.

Example Script | Detailed Log

Details

Method: For each of 5 metrics, select the individual with the highest score per dataset and check their ground-truth accuracy (% of AP node pairs correctly ordered vs. hand-curated GT).

Metrics tested:

  1. R×M (resultant_length × vote_margin): Composite locomotion quality score
  2. PC1 variance ratio: Fraction of total variance explained by the first principal component
  3. Mean inverse lateral variance: Average of 1/σ² for each keypoint's lateral offset over time (rewards stable body-core keypoints)
  4. Agreement score: Fraction of other individuals whose raw PC1 ordering matches this individual
  5. Skeleton completeness: Fraction of frames with all keypoints present
Metric 100% Accuracy Mean Accuracy Per-Dataset
R×M 5/5 100.0% 2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:✓
mean_inv_lateral_var 5/5 100.0% 2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:✓
agreement_score 4/5 80.0% 2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:0%
skeleton_completeness 4/5 80.0% 2Flies:✓ 2Mice:✓ 4Gerbils:✓ 5Mice:✓ 2Bees:0%
pc1_variance_ratio 3/5 74.7% 2Flies:✓ 2Mice:✓ 4Gerbils:73% 5Mice:✓ 2Bees:0%

Detailed Per-Dataset Breakdown (Metric Selection)

4Gerbils (4 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  female          | 0.004  | 3.59    | 0.02    | 0.33   | 1.00   | 100.0%
  pup unshaved    | 0.245  | 4.07    | 0.05    | 0.33   | 1.00   | 100.0%  ← R×M selects
  male            | 0.016  | 6.08    | 0.02    | 0.00   | 1.00   |  73.3%  ← PC1 var would select (wrong)
  pup shaved      | 0.018  | 2.79    | 0.04    | 0.00   | 1.00   |  73.3%

5Mice (5 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  track_0         | 0.843  | 5.48    | 0.06    | 1.00   | 1.00   | 100.0%  ← R×M selects
  track_1         | 0.722  | 4.67    | 0.04    | 1.00   | 1.00   | 100.0%
  track_2         | 0.079  | 3.47    | 0.06    | 1.00   | 1.00   | 100.0%
  track_3         | 0.366  | 5.33    | 0.03    | 1.00   | 1.00   | 100.0%
  track_4         | 0.526  | 4.18    | 0.03    | 1.00   | 1.00   | 100.0%

2Bees (2 individuals):
  Individual      | R×M    | PC1 Var | InvLat  | Agree  | Compl  | GT Acc
  ---------------------------------------------------------------------------
  track_1         | 0.206  | 1.60    | 0.03    | 0.00   | 1.00   | 100.0%  ← R×M selects
  track_0         | 0.004  | 2.12    | 0.02    | 0.00   | 1.00   |   0.0%  ← All others would select (wrong)

The 2Bees case is particularly instructive: track_0 has higher PC1 variance ratio, higher skeleton completeness, and equal agreement score - yet 0% GT accuracy. Only R×M (and mean_inv_lateral_var) correctly identify track_1 as the trustworthy reference.
2Bees AP Validation

Conclusion: R×M and mean_inv_lateral_var both achieve perfect reference selection (5/5). R×M is preferred because it directly measures locomotion quality (the physical basis for AP inference) rather than an indirect proxy. Additionally, R×M requires no additional computation beyond what's already performed for anterior direction inference.

Other Datasets

2Flies (track_0):

2Flies AP Validation

2Mice (track_0):

2Mice AP Validation

4Gerbils (pup_unshaved):

4Gerbils AP Validation

5Mice (track_0):

5Mice AP Validation


Flowchart: Input AP Node-Pair Filter Cascade

Terminology:

  • Survivors: Pairs that passed both Step I (lateral alignment) and Step II (opposite-sides
    constraint).
  • Distal pair: A surviving pair where both nodes have normalized midpoint distance above
    the edge_thresh_pct percentile (default: 70th).
  • Proximal pair: A surviving pair where at least one node has normalized midpoint
    distance below the edge_thresh_pct percentile.
  • Max-sep overall: The surviving pair with the largest variance-weighted AP separation among
    all survivors (distal or proximal).
  • Max-sep distal: The surviving pair with the largest variance-weighted AP separation among
    distal survivors only.
  • Input pair rank: The input pair's rank by variance-weighted separation among all survivors
    (rank 1 = largest weighted separation).

AP Node-Pair Filter Cascade Flowchart

STEP I: Lateral Alignment Filter
────────────────────────────────
                [All valid keypoints]
                         |
                         v
          effective_lateral_score <= lateral_thresh_pct?
                       /   \
                     Yes    No --> [Eliminated]
                      |
                      v
               [Candidate nodes]
                      |
                      v
        >= 2 candidates? --No--> [FAIL: Step I]
                      |
                     Yes
                      |
                      v
STEP II: Opposite-Sides Constraint
─────────────────────────────────
          pair on opposite sides of centroid (mean PC1)?
                       /   \
                     Yes    No --> [FAIL: Step II]
                      |
                      v
             [Surviving pairs]       <-- pairs that passed Steps I + II
                      |
                      v
STEP III: Distal/Proximal Classification
───────────────────────────────────────
     both nodes' midline_dist_norm >= edge_thresh_pct?
                       /   \
                     Yes    No
                      |      |
                      v      v
                [Distal] [Proximal]
                      \    /
                       \  /
                        \/
                        |
                        v
SUGGESTED PAIR SELECTION (variance-weighted)
────────────────────────────────────────────
     Any distal pairs among survivors?
            /    \
          Yes     No
           |       |
           v       v
     Max weighted-sep    Max weighted-sep
     distal pair         overall pair
                        |
                        v
SCENARIO ASSIGNMENT (13 mutually exclusive outcomes)
────────────────────────────────────────────────────

Single pair survived Steps I–II?
|
+--Yes--> Input pair == the survivor?
|         |
|         +--Yes--> Survivor is distal?
|         |         |
|         |         +--Yes--> #1 ACCEPT: input pair confirmed (distal)
|         |         +--No---> #2 WARN: input pair is proximal
|         |
|         +--No---> Survivor is distal?
|                   |
|                   +--Yes--> #3 WARN: input pair eliminated, suggest survivor
|                   +--No---> #4 WARN: input pair eliminated, only option is proximal
|
+--No (multiple pairs survived)
          |
          +--> Input pair among survivors?
               |
               +--Yes--> Input pair is distal?
               |         |
               |         +--Yes--> Input pair is max-sep overall?
               |         |         |
               |         |         +--Yes-----------> #5 ACCEPT: input pair is best
               |         |         |
               |         |         +--No--> Input pair is max-sep among distal?
               |         |                  |
               |         |                  +--Yes--> #7 ACCEPT: input pair is best distal
               |         |                  +--No---> #6 WARN: better distal pair exists
               |         |
               |         +--No (input pair is proximal)
               |                   |
               |                   +--> Input pair is max-sep overall?
               |                        |
               |                        +--Yes--> Any distal survivor?
               |                        |         |
               |                        |         +--Yes--> #8 WARN: input proximal, distal alternative exists
               |                        |         +--No---> #9 WARN: input proximal, all survivors proximal
               |                        |
               |                        +--No---> Any distal survivor?
               |                                  |
               |                                  +--Yes--> #10 WARN: input proximal, distal alternative exists
               |                                  +--No---> #11 WARN: input proximal, all survivors proximal
               |
               +--No (input pair not among survivors)
                         |
                         +--> Any distal survivor?
                              |
                              +--Yes--> #12 WARN: input eliminated, suggest max-sep distal
                              +--No---> #13 WARN: input eliminated, suggest max-sep overall

Future Refactoring Opportunity

The body_axis.py module (~2,900 lines) is intentionally monolithic in this PR to simplify review and iteration. Once the API stabilizes, general-purpose functionality could be extracted to existing or new utility modules:

movement/
├── kinematics/
│   ├── body_axis.py          # Reduced: AP-specific logic only
│   ├── collective.py
│   └── ...
├── utils/
│   ├── vector.py             # + circular_mean, resultant_length (from body_axis)
│   ├── clustering.py         # NEW: kmedoids, silhouette_score (from body_axis)
│   ├── temporal.py           # NEW: detect_runs, merge_segments (from body_axis)
│   └── ...

References

Is this a breaking change?

No.

Does this PR require an update to the documentation?

No - API docs auto-generate from docstrings.

Checklist

  • Code tested locally
  • Tests added for new functionality
  • Formatted with pre-commit

@khan-u khan-u force-pushed the body-axis-ap-inference branch 2 times, most recently from 5cd79d6 to 01d16a8 Compare April 2, 2026 10:05
@khan-u khan-u marked this pull request as draft April 2, 2026 11:05
@khan-u khan-u changed the title feat(collective): add prior-free body-axis inference WIP: feat(collective): add prior-free body-axis inference for compute_polarization Apr 2, 2026
@khan-u khan-u changed the title WIP: feat(collective): add prior-free body-axis inference for compute_polarization feat(collective): add prior-free body-axis inference for compute_polarization Apr 2, 2026
@khan-u khan-u marked this pull request as ready for review April 2, 2026 13:49
@khan-u khan-u force-pushed the body-axis-ap-inference branch from 01d16a8 to 51866c9 Compare April 4, 2026 05:21
@khan-u khan-u force-pushed the body-axis-ap-inference branch from cbc1c25 to 1bd1618 Compare April 4, 2026 08:05
@khan-u khan-u changed the title feat(collective): add prior-free body-axis inference for compute_polarization feat(body_axis): add prior-free A-P body-axis inference Apr 4, 2026
@khan-u khan-u changed the title feat(body_axis): add prior-free A-P body-axis inference feat(body_axis): add prior-free AP body-axis inference Apr 4, 2026
@khan-u khan-u closed this Apr 5, 2026
@khan-u khan-u reopened this Apr 5, 2026
@khan-u khan-u force-pushed the body-axis-ap-inference branch 5 times, most recently from c78dce3 to 615666c Compare April 5, 2026 09:48
@khan-u khan-u force-pushed the body-axis-ap-inference branch from 358e817 to b1df3b9 Compare April 5, 2026 09:55
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant