📄 Paper: https://arxiv.org/abs/2510.20242
Selective classifiers improve reliability by abstaining on uncertain inputs, yet their performance often lags behind the perfect-ordering oracle that accepts examples in exact order of correctness. We formulate this shortfall as a coverage-uniform selective-classification gap and prove the first finite-sample decomposition that pinpoints five distinct sources of looseness: Bayes noise, approximation error, ranking error, statistical noise, and implementation or shift-induced slack. Our bound shows that monotone post-hoc calibration cannot reduce the gap, as it preserves the original score ordering; closing the gap therefore requires scoring mechanisms that can modify the ranking induced by the base model. We validate our gap decomposition on synthetic two-moons data and real-world vision benchmarks, isolating each error component via controlled experiments. Results confirm that (i) Bayes noise and limited model capacity alone explain large gaps, (ii) only non-monotone or feature-aware calibrators shrink the ranking term, and (iii) distribution shift adds a distinct slack that must be addressed by robust training. Our decomposition supplies a quantitative error budget and concrete design guidelines for building selective classifiers that approach ideal oracle behavior.
We are using uv as our package manager. It is a fast Python dependency management tool and drop-in replacement for pip.
pip install uvuv pip install -e .source .venv/bin/activateTraining:
train_main.py: Trains a standard model from scratch across datasets and selective prediction methods.train_cifar_n.pyTrains a model on the CIFAR-10N/100N datasets.train_lp.py: Trains a loss predictor on top of penultimate layer representations.
Evaluation:
eval_arch.py: Evaluation across model architectures.eval_cifar_c.py: Evaluation on the CIFAR-10C/100C datasets.eval_cifar_n.py: Evaluation on the CIFAR-10N/100N datasets.eval_shift.py: Evaluation on real-life shifts.
General:
synth_exp_plot.ipynb: Notebook used for synthetic experiments and general plotting.
@inproceedings{rabanser2025what,
title = {What Does It Take to Build a Performant Selective Classifier?},
author = {Stephan Rabanser and Nicolas Papernot},
year = {2025},
booktitle = {Advances in Neural Information Processing Systems}
}