Skip to content

Inference v1 PR 1/n: model inference (standalone)#424

Open
akashmjn wants to merge 15 commits intomainfrom
akash/inference-v1-pr1
Open

Inference v1 PR 1/n: model inference (standalone)#424
akashmjn wants to merge 15 commits intomainfrom
akash/inference-v1-pr1

Conversation

@akashmjn
Copy link
Collaborator

@akashmjn akashmjn commented Feb 21, 2026

Summary

Part 1 of #405 — porting the OrcaHello SRKW detector away from fastai.
(PR branch created from akash/inference-v1-nofastai.)

  • This PR adds rewritten inference code under src/model_v1, standalone scripts to process files/folders under scripts, and pytests that verify parity vs fastai model outputs. Inference parameters are separated into model/config.yaml.
  • After merging, enables directly using the detection model from HuggingFace https://huggingface.co/orcasound/orcahello-srkw-detector-v1

The rewrite also made it easy to tune inference configs/logic - leading to much better results even with the same model weights i.e. ~50-60% of current false positives with ~90% of current recall (see below)

Evaluation

Tested detections from recent months fetched from OrcaHello DB (2025-12, 2026-01) with high detection counts.

month category detected total pct
2025-12 positive 169 193 88%
2025-12 false_positive 74 134 55%
2025-12 unmoderated 488 859 57%
2025-12 TOTAL 731 1186 62%
2026-01 positive 196 228 86%
2026-01 false_positive 113 196 58%
2026-01 unmoderated 280 621 45%
2026-01 TOTAL 589 1046 56%

Usage

from model_v1.inference import OrcaHelloSRKWDetectorV1
model = OrcaHelloSRKWDetectorV1.from_pretrained("orcasound/orcahello-srkw-detector-v1")
result = model.predict(audio_file_path)  
# -> DetectionResult
#      .global_prediction: int (0/1)
#      .global_confidence: float (0-1)
#      .local_predictions: List[int] (0/1) for each time segment
#      .local_confidences: List[float] (0-1) for each time segment
#      .segment_predictions: List[SegmentPrediction]
#          .start_time_s: float
#          .duration_s: float
#          .confidence: float
#      .metadata: DetectionMetadata
#          .wav_file_path: str
#          .file_duration_s: float
#          .processing_time_s: float

Changes made to inference config:

  • tuned windowed inference to use 3.0s windows with 2.0s hop
  • global_confidence value is better calibrated with a mean_top_k aggregation strategy

Testing

  • Added test_model_inference.py comparing both per-segment and per-file inference
  • Compares against pre-generated outputs for test data file

Notes

Scope of this PR is limited to standalone inference, doesn't touch the inference container. That is next #405

akashmjn and others added 5 commits February 20, 2026 17:11
- Updated test_generate_segment_predictions_reference to save as JSON
- Updated segment_prediction_references fixture to load JSON
- Converted fastai classes to list for JSON serialization
- Added both segment and file prediction JSON references to repo
- Updated .gitignore to track JSON references instead of .pt
- Removed old .pt segment predictions reference

Tests verified:
- test_generate_segment_predictions_reference: PASSED (inference-venv)
- test_segment_predictions_match_fastai: PASSED (model-v1-venv)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@akashmjn akashmjn requested review from dthaler and removed request for TruaShamu and micya February 21, 2026 03:57
@akashmjn akashmjn requested a review from scottveirs as a code owner February 23, 2026 23:06
@akashmjn akashmjn requested a review from micya February 24, 2026 01:13
@dthaler

This comment was marked as resolved.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is part 1 of a multi-part refactor (#405) that ports the OrcaHello SRKW (Southern Resident Killer Whale) detector from FastAI to pure PyTorch. The primary goals are to remove the fastai dependency, simplify deployment, and enable direct model usage from HuggingFace. The refactor also includes improved inference logic and configuration that reportedly reduces false positives by 40-45% while maintaining ~90% recall.

Changes:

  • Adds new src/model_v1 module with pure PyTorch implementation of audio preprocessing and model inference
  • Introduces configurable aggregation strategies (mean_thresholded and mean_top_k) for converting segment predictions to global predictions
  • Adds standalone scripts for inference, weight extraction, and HuggingFace upload
  • Includes comprehensive pytests verifying numerical parity with FastAI outputs
  • Adds production model configuration, model card, and RAIL license
  • Updates documentation (README, DEVELOPMENT.md) to reflect the new architecture

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/model_v1/inference.py Main model class with ResNet50 architecture, inference methods, and aggregation logic
src/model_v1/audio_frontend.py Audio preprocessing pipeline (not in diff but referenced)
src/model_v1/types.py Type definitions including new GlobalPredictionConfig dataclass
src/model_v1/__init__.py Module exports and version
tests/test_model_inference.py Unit tests and parity checks against FastAI
tests/utils.py Test utilities including DetectionResultDiff for comparing outputs
tests/conftest.py Test fixtures including model loading from HuggingFace
tests/test_config.yaml Test configuration preserving FastAI parity settings
tests/reference_outputs/*.json Pre-generated reference outputs from FastAI for parity testing
scripts/run_inference.py Standalone inference script with batch processing and reaggregation
scripts/extract_fastai_weights.py Script to extract weights from FastAI model.pkl
scripts/upload_to_hf_hub.py Script to upload model to HuggingFace Hub
model/config.yaml Production inference configuration with tuned parameters
model/MODEL_CARD.md Comprehensive model documentation for HuggingFace
model/LICENSE OrcaHello RAIL license with marine conservation restrictions
model/img-orca_fin_waveform.jpg Hero image for model card
src/model/fastai_inference.py Added smooth_predictions parameter for parity testing
.github/workflows/InferenceSystem.yaml Fixed Windows CI mkdir command
.gitignore Added reference output files to exceptions
README.md Updated with quick start guide and references to new docs
DEVELOPMENT.md New developer documentation for testing and scripts

Comment on lines +104 to +105
local_conf_threshold: 0.5
global_pred_threshold: 3
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MODEL_CARD.md contains outdated configuration parameter names that don't match the actual implementation. The example shows local_conf_threshold and global_pred_threshold (lines 104-105), but the actual code uses pred_local_threshold and pred_global_threshold in GlobalPredictionConfig. Additionally, the example is missing the new aggregation_strategy and mean_top_k parameters that are central to the new inference logic.

Suggested change
local_conf_threshold: 0.5
global_pred_threshold: 3
pred_local_threshold: 0.5 # per-segment prediction threshold
pred_global_threshold: 3 # number of segments above local threshold required for a positive clip
aggregation_strategy: mean_top_k # aggregation method for clip-level prediction
mean_top_k: 5 # k used when aggregation_strategy == "mean_top_k"

Copilot uses AI. Check for mistakes.

global_prediction:
aggregation_strategy: "mean_top_k" # "mean_thresholded" or "mean_top_k"
mean_top_k: 2 # top segments to average for global_confidence (mean_top_k)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent mean_top_k values across configuration files. The production config (model/config.yaml) uses mean_top_k: 2, while the test config (tests/test_config.yaml) and the code default in GlobalPredictionConfig use mean_top_k: 3. This inconsistency could lead to confusion about which value is actually recommended. Consider aligning these values or documenting why they differ.

Suggested change
mean_top_k: 2 # top segments to average for global_confidence (mean_top_k)
mean_top_k: 3 # top segments to average for global_confidence (mean_top_k)

Copilot uses AI. Check for mistakes.

1. Illegal or unethical whale watching behavior

(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in LICENSE file: "Candaa" should be "Canada".

Suggested change
(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa.
(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Canada.

Copilot uses AI. Check for mistakes.
@akashmjn
Copy link
Collaborator Author

@akashmjn The PR description still has "TODO: Cleanup docs and readme". Is that done yet or should this still be considered as a draft not fully ready for review?

Done - ready for review

@dthaler
Copy link
Collaborator

dthaler commented Feb 24, 2026

@akashmjn The PR description still has "TODO: Cleanup docs and readme". Is that done yet or should this still be considered as a draft not fully ready for review?

Done - ready for review

Looks like it's still a work in progress:
https://github.com/orcasound/orcahello/pull/424/changes#diff-da5579f9e505ac62a05fefb0af6c2fe8a9c39467525799ecd9bc61266ba632d4R152

TODO: WIP


### Testing Data & Metrics

TODO: WIP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill this in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll probably delete this section for now until there is a proper public facing eval set on HuggingFace.

Evals results are on PR description

@misc{akash_mahajan_2026,
author = { Akash Mahajan and Prakruti Gogia and Aayush Agrawal },
title = { orcahello-srkw-detector-v1 (Revision 6ccff28) },
year = { 2020 },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
year = { 2020 },
year = { 2020 },

def __init__(self, config: Dict):
super().__init__()

# `config` needs to Dict not DetectorInferenceConfig for serialization with PyTorchModelHubMixin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't parse "needs to Dict", did you mean "needs Dict"?

for batch_start in range(0, len(spectrograms), max_batch_size):
batch_end = min(batch_start + max_batch_size, len(spectrograms))
batch = torch.stack(spectrograms[batch_start:batch_end])
# Move batch to model's device and cast to model dtype (e.g. fp16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Move batch to model's device and cast to model dtype (e.g. fp16)
# Move batch to model's device and cast to model dtype (e.g., fp16)

punctuation nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants