Inference v1 PR 1/n: model inference (standalone) by akashmjn · Pull Request #424 · orcasound/orcahello

akashmjn · 2026-02-21T03:57:29Z

Summary

Part 1 of #405 — porting the OrcaHello SRKW detector away from fastai.
(PR branch created from akash/inference-v1-nofastai.)

This PR adds rewritten inference code under src/model_v1, standalone scripts to process files/folders under scripts, and pytests that verify parity vs fastai model outputs. Inference parameters are separated into model/config.yaml.
After merging, enables directly using the detection model from HuggingFace https://huggingface.co/orcasound/orcahello-srkw-detector-v1

The rewrite also made it easy to tune inference configs/logic - leading to much better results even with the same model weights i.e. ~50-60% of current false positives with ~90% of current recall (see below)

Evaluation

Tested detections from recent months fetched from OrcaHello DB (2025-12, 2026-01) with high detection counts.

month	category	detected	total	pct
2025-12	positive	169	193	88%
2025-12	false_positive	74	134	55%
2025-12	unmoderated	488	859	57%
2025-12	TOTAL	731	1186	62%
2026-01	positive	196	228	86%
2026-01	false_positive	113	196	58%
2026-01	unmoderated	280	621	45%
2026-01	TOTAL	589	1046	56%

Usage

from model_v1.inference import OrcaHelloSRKWDetectorV1
model = OrcaHelloSRKWDetectorV1.from_pretrained("orcasound/orcahello-srkw-detector-v1")
result = model.predict(audio_file_path)  
# -> DetectionResult
#      .global_prediction: int (0/1)
#      .global_confidence: float (0-1)
#      .local_predictions: List[int] (0/1) for each time segment
#      .local_confidences: List[float] (0-1) for each time segment
#      .segment_predictions: List[SegmentPrediction]
#          .start_time_s: float
#          .duration_s: float
#          .confidence: float
#      .metadata: DetectionMetadata
#          .wav_file_path: str
#          .file_duration_s: float
#          .processing_time_s: float

Changes made to inference config:

tuned windowed inference to use 3.0s windows with 2.0s hop
global_confidence value is better calibrated with a mean_top_k aggregation strategy

Testing

Added test_model_inference.py comparing both per-segment and per-file inference
Compares against pre-generated outputs for test data file

Notes

Scope of this PR is limited to standalone inference, doesn't touch the inference container. That is next #405

- Updated test_generate_segment_predictions_reference to save as JSON - Updated segment_prediction_references fixture to load JSON - Converted fastai classes to list for JSON serialization - Added both segment and file prediction JSON references to repo - Updated .gitignore to track JSON references instead of .pt - Removed old .pt segment predictions reference Tests verified: - test_generate_segment_predictions_reference: PASSED (inference-venv) - test_segment_predictions_match_fastai: PASSED (model-v1-venv) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Copilot

Pull request overview

This PR is part 1 of a multi-part refactor (#405) that ports the OrcaHello SRKW (Southern Resident Killer Whale) detector from FastAI to pure PyTorch. The primary goals are to remove the fastai dependency, simplify deployment, and enable direct model usage from HuggingFace. The refactor also includes improved inference logic and configuration that reportedly reduces false positives by 40-45% while maintaining ~90% recall.

Changes:

Adds new src/model_v1 module with pure PyTorch implementation of audio preprocessing and model inference
Introduces configurable aggregation strategies (mean_thresholded and mean_top_k) for converting segment predictions to global predictions
Adds standalone scripts for inference, weight extraction, and HuggingFace upload
Includes comprehensive pytests verifying numerical parity with FastAI outputs
Adds production model configuration, model card, and RAIL license
Updates documentation (README, DEVELOPMENT.md) to reflect the new architecture

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/model_v1/inference.py`	Main model class with ResNet50 architecture, inference methods, and aggregation logic
`src/model_v1/audio_frontend.py`	Audio preprocessing pipeline (not in diff but referenced)
`src/model_v1/types.py`	Type definitions including new GlobalPredictionConfig dataclass
`src/model_v1/__init__.py`	Module exports and version
`tests/test_model_inference.py`	Unit tests and parity checks against FastAI
`tests/utils.py`	Test utilities including DetectionResultDiff for comparing outputs
`tests/conftest.py`	Test fixtures including model loading from HuggingFace
`tests/test_config.yaml`	Test configuration preserving FastAI parity settings
`tests/reference_outputs/*.json`	Pre-generated reference outputs from FastAI for parity testing
`scripts/run_inference.py`	Standalone inference script with batch processing and reaggregation
`scripts/extract_fastai_weights.py`	Script to extract weights from FastAI model.pkl
`scripts/upload_to_hf_hub.py`	Script to upload model to HuggingFace Hub
`model/config.yaml`	Production inference configuration with tuned parameters
`model/MODEL_CARD.md`	Comprehensive model documentation for HuggingFace
`model/LICENSE`	OrcaHello RAIL license with marine conservation restrictions
`model/img-orca_fin_waveform.jpg`	Hero image for model card
`src/model/fastai_inference.py`	Added `smooth_predictions` parameter for parity testing
`.github/workflows/InferenceSystem.yaml`	Fixed Windows CI mkdir command
`.gitignore`	Added reference output files to exceptions
`README.md`	Updated with quick start guide and references to new docs
`DEVELOPMENT.md`	New developer documentation for testing and scripts

Copilot · 2026-02-24T16:26:18Z

InferenceSystem/model/MODEL_CARD.md

+  local_conf_threshold: 0.5
+  global_pred_threshold: 3


The MODEL_CARD.md contains outdated configuration parameter names that don't match the actual implementation. The example shows local_conf_threshold and global_pred_threshold (lines 104-105), but the actual code uses pred_local_threshold and pred_global_threshold in GlobalPredictionConfig. Additionally, the example is missing the new aggregation_strategy and mean_top_k parameters that are central to the new inference logic.

Suggested change

local_conf_threshold: 0.5

global_pred_threshold: 3

pred_local_threshold: 0.5 # per-segment prediction threshold

pred_global_threshold: 3 # number of segments above local threshold required for a positive clip

aggregation_strategy: mean_top_k # aggregation method for clip-level prediction

mean_top_k: 5 # k used when aggregation_strategy == "mean_top_k"

Copilot · 2026-02-24T16:26:18Z

InferenceSystem/model/config.yaml

+
+global_prediction:
+  aggregation_strategy: "mean_top_k"  # "mean_thresholded" or "mean_top_k"
+  mean_top_k: 2               # top segments to average for global_confidence (mean_top_k)


Inconsistent mean_top_k values across configuration files. The production config (model/config.yaml) uses mean_top_k: 2, while the test config (tests/test_config.yaml) and the code default in GlobalPredictionConfig use mean_top_k: 3. This inconsistency could lead to confusion about which value is actually recommended. Consider aligning these values or documenting why they differ.

Suggested change

mean_top_k: 2 # top segments to average for global_confidence (mean_top_k)

mean_top_k: 3 # top segments to average for global_confidence (mean_top_k)

Copilot · 2026-02-24T16:26:19Z

InferenceSystem/model/LICENSE

+
+1. Illegal or unethical whale watching behavior
+
+   (a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa.


Typo in LICENSE file: "Candaa" should be "Canada".

Suggested change

(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa.

(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Canada.

akashmjn · 2026-02-24T16:28:37Z

@akashmjn The PR description still has "TODO: Cleanup docs and readme". Is that done yet or should this still be considered as a draft not fully ready for review?

Done - ready for review

dthaler · 2026-02-24T16:29:25Z

@akashmjn The PR description still has "TODO: Cleanup docs and readme". Is that done yet or should this still be considered as a draft not fully ready for review?

Done - ready for review

Looks like it's still a work in progress:
https://github.com/orcasound/orcahello/pull/424/changes#diff-da5579f9e505ac62a05fefb0af6c2fe8a9c39467525799ecd9bc61266ba632d4R152

TODO: WIP

dthaler · 2026-02-24T16:29:40Z

InferenceSystem/model/MODEL_CARD.md

+
+### Testing Data & Metrics
+
+TODO: WIP


Fill this in

I’ll probably delete this section for now until there is a proper public facing eval set on HuggingFace.

Evals results are on PR description

dthaler · 2026-02-24T16:29:59Z

InferenceSystem/model/MODEL_CARD.md

+@misc{akash_mahajan_2026,
+	author       = { Akash Mahajan and Prakruti Gogia and Aayush Agrawal },
+	title        = { orcahello-srkw-detector-v1 (Revision 6ccff28) },
+  year         = { 2020 },


Suggested change

year = { 2020 },

year = { 2020 },

dthaler · 2026-02-24T16:41:23Z

InferenceSystem/src/model_v1/inference.py

+    def __init__(self, config: Dict):
+        super().__init__()
+
+        # `config` needs to Dict not DetectorInferenceConfig for serialization with PyTorchModelHubMixin


I can't parse "needs to Dict", did you mean "needs Dict"?

dthaler · 2026-02-24T16:42:25Z

InferenceSystem/src/model_v1/inference.py

+        for batch_start in range(0, len(spectrograms), max_batch_size):
+            batch_end = min(batch_start + max_batch_size, len(spectrograms))
+            batch = torch.stack(spectrograms[batch_start:batch_end])
+            # Move batch to model's device and cast to model dtype (e.g. fp16)


Suggested change

# Move batch to model's device and cast to model dtype (e.g. fp16)

# Move batch to model's device and cast to model dtype (e.g., fp16)

punctuation nit

akashmjn and others added 5 commits February 20, 2026 17:11

initial add model inference files

eb5539a

pytest cleanup

2c4ef10

updated segment inference pytest

5545902

add file-level inference pytest

e4d9a13

akashmjn requested review from TruaShamu and micya as code owners February 21, 2026 03:57

akashmjn requested review from dthaler and removed request for TruaShamu and micya February 21, 2026 03:57

akashmjn added 2 commits February 22, 2026 13:14

tweak config

eff18f8

fix mkdir in ci test

8898e9f

akashmjn requested a review from scottveirs as a code owner February 23, 2026 23:06

akashmjn added 8 commits February 23, 2026 15:09

fix ci

e5c1484

windows cmd tweak again

acd3a60

model card link fix

f4dcd57

windows ugh

206b549

clean device dtype handling

267c9ff

hf cleanup

2815be3

from_checkpoint cleanup

6d25af5

docs update

aa6c888

akashmjn requested a review from micya February 24, 2026 01:13

This comment was marked as resolved.

Sign in to view

dthaler requested a review from Copilot February 24, 2026 16:20

Copilot started reviewing on behalf of dthaler February 24, 2026 16:21 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

dthaler reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference v1 PR 1/n: model inference (standalone)#424

Inference v1 PR 1/n: model inference (standalone)#424
akashmjn wants to merge 15 commits intomainfrom
akash/inference-v1-pr1

akashmjn commented Feb 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

akashmjn commented Feb 24, 2026

Uh oh!

dthaler commented Feb 24, 2026

Uh oh!

dthaler Feb 24, 2026

Uh oh!

akashmjn Feb 24, 2026

Uh oh!

dthaler Feb 24, 2026

Uh oh!

dthaler Feb 24, 2026

Uh oh!

dthaler Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  local_conf_threshold: 0.5
-  global_pred_threshold: 3
+  pred_local_threshold: 0.5   # per-segment prediction threshold
+  pred_global_threshold: 3    # number of segments above local threshold required for a positive clip
+  aggregation_strategy: mean_top_k  # aggregation method for clip-level prediction
+  mean_top_k: 5               # k used when aggregation_strategy == "mean_top_k"

	mean_top_k: 2 # top segments to average for global_confidence (mean_top_k)
	mean_top_k: 3 # top segments to average for global_confidence (mean_top_k)


		1. Illegal or unethical whale watching behavior

		(a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa.

	# Move batch to model's device and cast to model dtype (e.g. fp16)
	# Move batch to model's device and cast to model dtype (e.g., fp16)

Conversation

akashmjn commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evaluation

Usage

Testing

Notes

Uh oh!

This comment was marked as resolved.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

akashmjn commented Feb 24, 2026

Uh oh!

dthaler commented Feb 24, 2026

Uh oh!

dthaler Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

akashmjn Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

dthaler Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

dthaler Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

dthaler Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akashmjn commented Feb 21, 2026 •

edited

Loading