What Does It Take to Build a Performant Selective Classifier?

📄 Paper: https://arxiv.org/abs/2510.20242

🧠 Abstract

Selective classifiers improve reliability by abstaining on uncertain inputs, yet their performance often lags behind the perfect-ordering oracle that accepts examples in exact order of correctness. We formulate this shortfall as a coverage-uniform selective-classification gap and prove the first finite-sample decomposition that pinpoints five distinct sources of looseness: Bayes noise, approximation error, ranking error, statistical noise, and implementation or shift-induced slack. Our bound shows that monotone post-hoc calibration cannot reduce the gap, as it preserves the original score ordering; closing the gap therefore requires scoring mechanisms that can modify the ranking induced by the base model. We validate our gap decomposition on synthetic two-moons data and real-world vision benchmarks, isolating each error component via controlled experiments. Results confirm that (i) Bayes noise and limited model capacity alone explain large gaps, (ii) only non-monotone or feature-aware calibrators shrink the ranking term, and (iii) distribution shift adds a distinct slack that must be addressed by robust training. Our decomposition supplies a quantitative error budget and concrete design guidelines for building selective classifiers that approach ideal oracle behavior.

⚙️ Installation with `uv`

We are using uv as our package manager. It is a fast Python dependency management tool and drop-in replacement for pip.

Step 1: Install `uv` (if not already installed)

pip install uv

Step 2: Install dependencies

uv pip install -e .

Step 3: Activate environment

source .venv/bin/activate

🗂️ Codebase overview

Training:

train_main.py: Trains a standard model from scratch across datasets and selective prediction methods.
train_cifar_n.py Trains a model on the CIFAR-10N/100N datasets.
train_lp.py: Trains a loss predictor on top of penultimate layer representations.

Evaluation:

eval_arch.py: Evaluation across model architectures.
eval_cifar_c.py: Evaluation on the CIFAR-10C/100C datasets.
eval_cifar_n.py: Evaluation on the CIFAR-10N/100N datasets.
eval_shift.py: Evaluation on real-life shifts.

General:

synth_exp_plot.ipynb: Notebook used for synthetic experiments and general plotting.

🎓 BibTeX citation

@inproceedings{rabanser2025what,
  title = {What Does It Take to Build a Performant Selective Classifier?},
  author = {Stephan Rabanser and Nicolas Papernot},
  year = {2025},
  booktitle = {Advances in Neural Information Processing Systems}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What Does It Take to Build a Performant Selective Classifier?

🧠 Abstract

⚙️ Installation with `uv`

Step 1: Install `uv` (if not already installed)

Step 2: Install dependencies

Step 3: Activate environment

🗂️ Codebase overview

🎓 BibTeX citation

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.python-version		.python-version
README.md		README.md
eval_arch.py		eval_arch.py
eval_cifar_c.py		eval_cifar_c.py
eval_cifar_n.py		eval_cifar_n.py
eval_shift.py		eval_shift.py
pyproject.toml		pyproject.toml
synth_exp_plot.ipynb		synth_exp_plot.ipynb
train_cifar_n.py		train_cifar_n.py
train_lp.py		train_lp.py
train_main.py		train_main.py
uv.lock		uv.lock

cleverhans-lab/sc-gap

Folders and files

Latest commit

History

Repository files navigation

What Does It Take to Build a Performant Selective Classifier?

🧠 Abstract

⚙️ Installation with uv

Step 1: Install uv (if not already installed)

Step 2: Install dependencies

Step 3: Activate environment

🗂️ Codebase overview

🎓 BibTeX citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages

⚙️ Installation with `uv`

Step 1: Install `uv` (if not already installed)