Skip to content

Code for our paper analyzing the looseness of the upper bound on selective classification performance.

Notifications You must be signed in to change notification settings

cleverhans-lab/sc-gap

Repository files navigation

What Does It Take to Build a Performant Selective Classifier?

📄 Paper: https://arxiv.org/abs/2510.20242

🧠 Abstract

Selective classifiers improve reliability by abstaining on uncertain inputs, yet their performance often lags behind the perfect-ordering oracle that accepts examples in exact order of correctness. We formulate this shortfall as a coverage-uniform selective-classification gap and prove the first finite-sample decomposition that pinpoints five distinct sources of looseness: Bayes noise, approximation error, ranking error, statistical noise, and implementation or shift-induced slack. Our bound shows that monotone post-hoc calibration cannot reduce the gap, as it preserves the original score ordering; closing the gap therefore requires scoring mechanisms that can modify the ranking induced by the base model. We validate our gap decomposition on synthetic two-moons data and real-world vision benchmarks, isolating each error component via controlled experiments. Results confirm that (i) Bayes noise and limited model capacity alone explain large gaps, (ii) only non-monotone or feature-aware calibrators shrink the ranking term, and (iii) distribution shift adds a distinct slack that must be addressed by robust training. Our decomposition supplies a quantitative error budget and concrete design guidelines for building selective classifiers that approach ideal oracle behavior.

⚙️ Installation with uv

We are using uv as our package manager. It is a fast Python dependency management tool and drop-in replacement for pip.

Step 1: Install uv (if not already installed)

pip install uv

Step 2: Install dependencies

uv pip install -e .

Step 3: Activate environment

source .venv/bin/activate

🗂️ Codebase overview

Training:

  • train_main.py: Trains a standard model from scratch across datasets and selective prediction methods.
  • train_cifar_n.py Trains a model on the CIFAR-10N/100N datasets.
  • train_lp.py: Trains a loss predictor on top of penultimate layer representations.

Evaluation:

  • eval_arch.py: Evaluation across model architectures.
  • eval_cifar_c.py: Evaluation on the CIFAR-10C/100C datasets.
  • eval_cifar_n.py: Evaluation on the CIFAR-10N/100N datasets.
  • eval_shift.py: Evaluation on real-life shifts.

General:

  • synth_exp_plot.ipynb: Notebook used for synthetic experiments and general plotting.

🎓 BibTeX citation

@inproceedings{rabanser2025what,
  title = {What Does It Take to Build a Performant Selective Classifier?},
  author = {Stephan Rabanser and Nicolas Papernot},
  year = {2025},
  booktitle = {Advances in Neural Information Processing Systems}
}

About

Code for our paper analyzing the looseness of the upper bound on selective classification performance.

Topics

Resources

Stars

Watchers

Forks