Inference v1 PR 1/n: model inference (standalone)#424
Inference v1 PR 1/n: model inference (standalone)#424
Conversation
- Updated test_generate_segment_predictions_reference to save as JSON - Updated segment_prediction_references fixture to load JSON - Converted fastai classes to list for JSON serialization - Added both segment and file prediction JSON references to repo - Updated .gitignore to track JSON references instead of .pt - Removed old .pt segment predictions reference Tests verified: - test_generate_segment_predictions_reference: PASSED (inference-venv) - test_segment_predictions_match_fastai: PASSED (model-v1-venv) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Pull request overview
This PR is part 1 of a multi-part refactor (#405) that ports the OrcaHello SRKW (Southern Resident Killer Whale) detector from FastAI to pure PyTorch. The primary goals are to remove the fastai dependency, simplify deployment, and enable direct model usage from HuggingFace. The refactor also includes improved inference logic and configuration that reportedly reduces false positives by 40-45% while maintaining ~90% recall.
Changes:
- Adds new
src/model_v1module with pure PyTorch implementation of audio preprocessing and model inference - Introduces configurable aggregation strategies (
mean_thresholdedandmean_top_k) for converting segment predictions to global predictions - Adds standalone scripts for inference, weight extraction, and HuggingFace upload
- Includes comprehensive pytests verifying numerical parity with FastAI outputs
- Adds production model configuration, model card, and RAIL license
- Updates documentation (README, DEVELOPMENT.md) to reflect the new architecture
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/model_v1/inference.py |
Main model class with ResNet50 architecture, inference methods, and aggregation logic |
src/model_v1/audio_frontend.py |
Audio preprocessing pipeline (not in diff but referenced) |
src/model_v1/types.py |
Type definitions including new GlobalPredictionConfig dataclass |
src/model_v1/__init__.py |
Module exports and version |
tests/test_model_inference.py |
Unit tests and parity checks against FastAI |
tests/utils.py |
Test utilities including DetectionResultDiff for comparing outputs |
tests/conftest.py |
Test fixtures including model loading from HuggingFace |
tests/test_config.yaml |
Test configuration preserving FastAI parity settings |
tests/reference_outputs/*.json |
Pre-generated reference outputs from FastAI for parity testing |
scripts/run_inference.py |
Standalone inference script with batch processing and reaggregation |
scripts/extract_fastai_weights.py |
Script to extract weights from FastAI model.pkl |
scripts/upload_to_hf_hub.py |
Script to upload model to HuggingFace Hub |
model/config.yaml |
Production inference configuration with tuned parameters |
model/MODEL_CARD.md |
Comprehensive model documentation for HuggingFace |
model/LICENSE |
OrcaHello RAIL license with marine conservation restrictions |
model/img-orca_fin_waveform.jpg |
Hero image for model card |
src/model/fastai_inference.py |
Added smooth_predictions parameter for parity testing |
.github/workflows/InferenceSystem.yaml |
Fixed Windows CI mkdir command |
.gitignore |
Added reference output files to exceptions |
README.md |
Updated with quick start guide and references to new docs |
DEVELOPMENT.md |
New developer documentation for testing and scripts |
| local_conf_threshold: 0.5 | ||
| global_pred_threshold: 3 |
There was a problem hiding this comment.
The MODEL_CARD.md contains outdated configuration parameter names that don't match the actual implementation. The example shows local_conf_threshold and global_pred_threshold (lines 104-105), but the actual code uses pred_local_threshold and pred_global_threshold in GlobalPredictionConfig. Additionally, the example is missing the new aggregation_strategy and mean_top_k parameters that are central to the new inference logic.
| local_conf_threshold: 0.5 | |
| global_pred_threshold: 3 | |
| pred_local_threshold: 0.5 # per-segment prediction threshold | |
| pred_global_threshold: 3 # number of segments above local threshold required for a positive clip | |
| aggregation_strategy: mean_top_k # aggregation method for clip-level prediction | |
| mean_top_k: 5 # k used when aggregation_strategy == "mean_top_k" |
|
|
||
| global_prediction: | ||
| aggregation_strategy: "mean_top_k" # "mean_thresholded" or "mean_top_k" | ||
| mean_top_k: 2 # top segments to average for global_confidence (mean_top_k) |
There was a problem hiding this comment.
Inconsistent mean_top_k values across configuration files. The production config (model/config.yaml) uses mean_top_k: 2, while the test config (tests/test_config.yaml) and the code default in GlobalPredictionConfig use mean_top_k: 3. This inconsistency could lead to confusion about which value is actually recommended. Consider aligning these values or documenting why they differ.
| mean_top_k: 2 # top segments to average for global_confidence (mean_top_k) | |
| mean_top_k: 3 # top segments to average for global_confidence (mean_top_k) |
|
|
||
| 1. Illegal or unethical whale watching behavior | ||
|
|
||
| (a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa. |
There was a problem hiding this comment.
Typo in LICENSE file: "Candaa" should be "Canada".
| (a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Candaa. | |
| (a) In any way that violates any applicable national, federal, state, local or international law or regulation, including the U.S. Marine Mammal Protection Act and the rules issued by the Department of Fisheries and Oceans in Canada. |
Done - ready for review |
Looks like it's still a work in progress:
|
|
|
||
| ### Testing Data & Metrics | ||
|
|
||
| TODO: WIP |
There was a problem hiding this comment.
I’ll probably delete this section for now until there is a proper public facing eval set on HuggingFace.
Evals results are on PR description
| @misc{akash_mahajan_2026, | ||
| author = { Akash Mahajan and Prakruti Gogia and Aayush Agrawal }, | ||
| title = { orcahello-srkw-detector-v1 (Revision 6ccff28) }, | ||
| year = { 2020 }, |
There was a problem hiding this comment.
| year = { 2020 }, | |
| year = { 2020 }, |
| def __init__(self, config: Dict): | ||
| super().__init__() | ||
|
|
||
| # `config` needs to Dict not DetectorInferenceConfig for serialization with PyTorchModelHubMixin |
There was a problem hiding this comment.
I can't parse "needs to Dict", did you mean "needs Dict"?
| for batch_start in range(0, len(spectrograms), max_batch_size): | ||
| batch_end = min(batch_start + max_batch_size, len(spectrograms)) | ||
| batch = torch.stack(spectrograms[batch_start:batch_end]) | ||
| # Move batch to model's device and cast to model dtype (e.g. fp16) |
There was a problem hiding this comment.
| # Move batch to model's device and cast to model dtype (e.g. fp16) | |
| # Move batch to model's device and cast to model dtype (e.g., fp16) |
punctuation nit
Summary
Part 1 of #405 — porting the OrcaHello SRKW detector away from fastai.
(PR branch created from akash/inference-v1-nofastai.)
src/model_v1, standalone scripts to process files/folders underscripts, and pytests that verify parity vs fastai model outputs. Inference parameters are separated intomodel/config.yaml.The rewrite also made it easy to tune inference configs/logic - leading to much better results even with the same model weights i.e. ~50-60% of current false positives with ~90% of current recall (see below)
Evaluation
Tested detections from recent months fetched from OrcaHello DB (
2025-12,2026-01) with high detection counts.Usage
Changes made to inference config:
global_confidencevalue is better calibrated with amean_top_kaggregation strategyTesting
test_model_inference.pycomparing both per-segment and per-file inferenceNotes
Scope of this PR is limited to standalone inference, doesn't touch the inference container. That is next #405