feat(body_axis): add prior-free AP body-axis inference #945
Open
khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
Open
feat(body_axis): add prior-free AP body-axis inference #945khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
khan-u wants to merge 20 commits intoneuroinformatics-unit:mainfrom
Conversation
for more information, see https://pre-commit.ci
…compute_polarization
…tion, edge case handling, and simplified tests
…ts, clarify orientation vs heading terminology
…n for polarization
5cd79d6 to
01d16a8
Compare
This was referenced Apr 2, 2026
5 tasks
01d16a8 to
51866c9
Compare
for more information, see https://pre-commit.ci
cbc1c25 to
1bd1618
Compare
c78dce3 to
615666c
Compare
358e817 to
b1df3b9
Compare
for more information, see https://pre-commit.ci
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
Prior-Free Body-Axis Inference Pipeline
This PR depends on #875.
For review, the intended new work is only the prior-free body-axis inference changes. The overlapping compute_polarization changes are already under review in #875 and are included here only because this PR depends on that unmerged branch state.
What is this PR
Why is this PR needed?
This PR solves a practical problem: when computing orientation polarization of animals using body-axis keypoints, the user must specify which keypoint pair defines "posterior → anterior" i.e. the
from_nodeand theto_node. The AP validation pipeline automatically verifies--and, when needed, suggests alternatives for--this choice by leveraging the principle that animals generally move head-first. It does this without any anatomical priors, purely from geometry (PCA) and kinematics (velocity voting).What does this PR do?
The core question it answers: given a set of keypoints for an animal, which direction is "front" (anterior) and which is "back" (posterior)?
The pipeline is implemented in a new module
movement/kinematics/body_axis.py, which provides:ValidateAPConfig: Configuration dataclass for all tunable parametersFrameSelection: Dataclass bundling frame indices and segment assignmentsAPNodePairReport: Dataclass with detailed AP pair evaluation resultsvalidate_ap(): Main validation function for a single individualrun_ap_validation(): Multi-individual validation entry pointThe validation is called by
compute_polarization()as a side-channel diagnostic whenvalidate_ap=Trueandbody_axis_keypointsis provided. The validation results do not affect the polarization computation itself but are stored inpolarization.attrs["ap_validation_result"]for the user to inspect. Configuration parameters for various thresholds can be supplied by the user viaap_validation_config.body_axis.pycontains all AP validation infrastructure organized into sections:Cross-dataset summary: all 5 datasets achieve correct AP pair identification with unanimous velocity voting (M=1.0) and high directional concentration (R>0.7).
Example Script | Detailed Log
The pipeline in
validate_ap()works through these stages:1. Tiered validity
min_valid_fracof keypoints present AND ≥2 total) or tier-2 (all keypoints present).2. Bounding-box centroid computation
3. High-motion segment detection
window_lenspeed samples (advanced bystridesamples) compute median speeds.pct_threshpercentile of all valid-window medians are classified as high-motion.min_run_len); runs are converted to frame ranges and merged if overlapping or abutting.4. Tier-2 filtering on segments
5. Centroid-centered skeleton construction
6. Postural clustering
postural_var_ratio_threshAND at least 6 frames are available, k-medoids clustering (with silhouette-based model selection across k ∈ [2, min(max_clusters, n//2)]) partitions frames into postural clusters.7. PCA on the average skeleton
PC1[1] >= 0(y-component non-negative)PC2[0] >= 0(x-component non-negative)8. Anterior direction inference via velocity voting
compute_polarization(), each individual's R×M is determined solely by its own motion and body shape (independent of the input keypoint pair)confidence_floor, the pipeline logs a warning that the anterior assignment is unreliable.9. Input AP Node-Pair Filter Cascade
effective_lateral = lateral_offset_norm + lateral_var_weight × lateral_std_norm + longitudinal_var_weight × longitudinal_std_norm.lateral_thresh_pct(default: 50th percentile) are eliminated—this adaptive threshold retains roughly half the keypoints while preferring those closest to the body axis and most stable over time.edge_thresh_pct(default: 70th percentile); otherwise it is "proximal".10. Suggested Pair
weighted_sep = separation × (1 − avg_lateral_std), wherelateral_stdis the normalized standard deviation of each node's lateral offset over time.order_pair_by_ap()so that element 0 is posterior (lower AP coordinate) and element 1 is anterior (higher AP coordinate), matching thebody_axis_keypoints=(from_node, to_node)convention.max_separation_distal_nodesormax_separation_nodeson theAPNodePairReport.input_pair_order_matches_inference) compares the input pair's AP coordinates:from_node's AP coordinate <to_node's AP coordinate (from_node is more posterior)11. Mutually Exclusive Scenarios
Configuration: (`ValidateAPConfig`)
All configurable thresholds are collected in a single dataclass in
movement.kinematics.body_axis. Users pass overrides as a dict viaap_validation_config; any omitted key uses its default.min_valid_fracwindow_lenstridepct_threshmin_run_lenpostural_var_ratio_threshmax_clustersmax_clusters, n//2)).confidence_floorlateral_thresh_pctedge_thresh_pctlateral_var_weightlongitudinal_var_weightReturn: xarray Attribute `ap_validation_result`
When
validate_ap=Trueandbody_axis_keypointsis provided,compute_polarization()stores results inpolarization.attrs["ap_validation_result"]:{ "all_results": [<per-individual result dicts>], "best_idx": int # index into all_results (highest R×M score) }Per-Individual Result Dict Fields
successanterior_signvote_marginresultant_lengthcirc_mean_dirnum_selected_framesnum_clustersprimary_clusterPC1PC2avg_skeletonvel_projs_pc1lateral_stdlongitudinal_stdpair_reportAPNodePairReportwith detailed AP pair evaluationlog_linescompute_polarization(), which hardcodesverbose=False)error_msgindividualrun_ap_validation())The
pair_reportfield containsscenario(1-13) andoutcome("accept"/"warn") from the flowchart above.Usage
How has this PR been tested?
Yes, with a new file
test_body_axis.py.TestValidateAPConfig(2 tests)Parameter boundary validation for the
ValidateAPConfigdataclass. Tests all 12 configurable fields:test_invalid_config_values_raise(23 parametrized cases): Each field is tested with out-of-range values - negative fractions, values above 1.0 for [0, 1] fields, zero or negative integers for count fields, floats where integers are required. All must raiseValueErrorwith a message matching"must be".test_valid_config_does_not_raise: Constructs aValidateAPConfigwith all fields set to non-default valid values and asserts no exception is raised.The 12 fields tested are:
min_valid_frac,window_len,stride,pct_thresh,min_run_len,postural_var_ratio_thresh,max_clusters,confidence_floor,lateral_thresh_pct,edge_thresh_pct,lateral_var_weight,longitudinal_var_weight- matching the configuration table above.Empirical Validation
The 3-step filter cascade thresholds and pair scoring method were empirically optimized via two validation studies on 5 diverse multi-animal datasets (2Flies, 2Mice, 4Gerbils, 5Mice, 2Bees) with hand-curated ground-truth AP node rankings.
Analysis 1: Grid Search over Design and Parameter Space
Find the configuration that maximizes "both nodes in GT" (suggested pair contains two ground-truth AP nodes) with correct ordering across all datasets.
Example Script | Detailed Log | Results JSON
Details
Method: Exhaustive grid search over 705,024 configurations testing 6 method categories:
For each configuration, the best individual per dataset was selected via max R×M, then the 3-step filter cascade was applied to identify the suggested AP pair. Results were scored by: (1) how many datasets achieved "both in GT", (2) how many achieved correct ordering.
Results: Multiple configurations achieved 5/5 datasets with both nodes in GT and correct ordering. The top-ranked configuration:
Implementation: These empirically-validated values are the defaults in
ValidateAPConfig:lateral_thresh_pct=50.0(Step I lateral filter)edge_thresh_pct=70.0(Step III distal classification)longitudinal_var_weight=0.5(variance-weighted pair scoring)Analysis 2: Metric Evaluation for 'Best' Individual Selection
Validate that R×M is the best metric for selecting the "reference individual" whose AP ordering others should align with.
Example Script | Detailed Log
Details
Method: For each of 5 metrics, select the individual with the highest score per dataset and check their ground-truth accuracy (% of AP node pairs correctly ordered vs. hand-curated GT).
Metrics tested:
Detailed Per-Dataset Breakdown (Metric Selection)
The 2Bees case is particularly instructive: track_0 has higher PC1 variance ratio, higher skeleton completeness, and equal agreement score - yet 0% GT accuracy. Only R×M (and mean_inv_lateral_var) correctly identify track_1 as the trustworthy reference.
Conclusion: R×M and mean_inv_lateral_var both achieve perfect reference selection (5/5). R×M is preferred because it directly measures locomotion quality (the physical basis for AP inference) rather than an indirect proxy. Additionally, R×M requires no additional computation beyond what's already performed for anterior direction inference.
Other Datasets
2Flies (track_0):
2Mice (track_0):
4Gerbils (pup_unshaved):
5Mice (track_0):
Flowchart: Input AP Node-Pair Filter Cascade
Terminology:
constraint).
the
edge_thresh_pctpercentile (default: 70th).distance below the
edge_thresh_pctpercentile.all survivors (distal or proximal).
distal survivors only.
(rank 1 = largest weighted separation).
AP Node-Pair Filter Cascade Flowchart
Future Refactoring Opportunity
The
body_axis.pymodule (~2,900 lines) is intentionally monolithic in this PR to simplify review and iteration. Once the API stabilizes, general-purpose functionality could be extracted to existing or new utility modules:References
Is this a breaking change?
No.
Does this PR require an update to the documentation?
No - API docs auto-generate from docstrings.
Checklist