feat: Add HDF5 baseline inference cache to eliminate redundant forward passes in perturbation evaluation

## Problem

The current eval loop in `perceptionmetrics/models/torch_detection.py` 
re-runs full model inference for every perturbation condition. For N images 
× P perturbation types × I intensities this produces N·P·I forward passes.

For COCO val2017 (5,000 images), 5 types, 5 intensities = 125,000 forward 
passes. The clean baseline preprocessing and inference is repeated 25× 
even though the model and data are identical each time.

## Proposed solution

A standalone `perceptionmetrics/utils/cache.py` with:
- `CacheWriter` — context manager, writes preprocessed tensors + detection 
  predictions (bboxes, labels, scores) to HDF5 after one baseline eval run
- `CacheReader` — validates `model_hash` + `schema_version` on open, lazy access
- `is_cache_valid(path, model_hash) → bool` — O(1) guard for eval loop

**Layer 1 only** (disk cache write/read). Integration into `torch_detection.py` 
is a follow-up PR.

## HDF5 schema
├── metadata/   (model_name, coco_split, model_hash, timestamp, schema_version)
├── tensors/{img_id}          float32 (C, H, W)
└── preds/{img_id}/bboxes     float32 (N_det, 4)
/labels     int64   (N_det,)
/scores     float32 (N_det,)

Zero-detection images write empty `(0,4)`/`(0,)`/`(0,)` datasets — never skip.

## Why HDF5
Variable-length prediction arrays + fixed-shape image tensors in one file 
with O(1) random access by image ID. Parquet, LMDB, and zarr each fail one 
of these requirements.

Tests included: round-trip, stale hash, is_cache_valid, zero-detection, metadata.
I plan to submit a solution  for this


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add HDF5 baseline inference cache to eliminate redundant forward passes in perturbation evaluation #567

Problem

Proposed solution

HDF5 schema

Why HDF5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Add HDF5 baseline inference cache to eliminate redundant forward passes in perturbation evaluation #567

Description

Problem

Proposed solution

HDF5 schema

Why HDF5

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions