GitHub - iLearn-Lab/CVPR25-CCFS: [CVPR 2025] Official Implementation for [Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation]

Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

Yanda Chen¹, Gongwei Chen¹, Miao Zhang¹, Weili Guan¹, Liqiang Nie^1✉

¹Harbin Institute of Technology, Shenzhen ^✉Corresponding author

Official PyTorch implementation of the paper Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation (CVPR 2025). This repo contains code for conducting CCFS on CIFAR-10/100 and Tiny-ImageNet based on the already distilled data.

Abstract

Dataset distillation (DD) excels in synthesizing a small number of images per class (IPC) but struggles to maintain its effectiveness in high-IPC settings. Recent works on dataset distillation demonstrate that combining distilled and real data can mitigate the effectiveness decay. However, our analysis of the combination paradigm reveals that the current one-shot and independent selection mechanism induces an incompatibility issue between distilled and real images. To address this issue, we introduce a novel curriculum coarse-to-fine selection (CCFS) method for efficient high-IPC dataset distillation. CCFS employs a curriculum selection framework for real data selection, where we leverage a coarse-to-fine strategy to select appropriate real data based on the current synthetic dataset in each curriculum. Extensive experiments validate CCFS, surpassing the state-of-the-art by +6.6% on CIFAR-10, +5.8% on CIFAR-100, and +3.4% on Tiny-ImageNet under high-IPC settings. Notably, CCFS achieves 60.2% test accuracy on ResNet-18 with a 20% compression ratio of Tiny-ImageNet, closely matching full-dataset training with only 0.3% degradation.

Usage

Requirements

pandas==2.2.3
torch==2.2.1
torchvision==0.17.1
tqdm==4.66.2

Preparation

To conduct a single experiment, you need to prepare a teacher checkpoint for relabeling, difficulty scores for corresponding dataset and already distilled data structured in the following format:

/path/to/distilled_dataset/
├── 00000/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── image3.jpg
│   ├── image4.jpg
│   └── image5.jpg
├── 00001/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── image3.jpg
│   ├── image4.jpg
│   └── image5.jpg
├── 00002/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── image3.jpg
│   ├── image4.jpg
│   └── image5.jpg

The c-scores for CIFAR10/100 and the forgetting scores for CIFAR10/100 and Tiny-ImageNet are provided in scores/.

Follow the squeeze instructions in SRe2L (CIFAR / Tiny-ImageNet) to train the teacher model ResNet-18:

Dataset	Backbone	epochs	acc@1(last)	Input Size
CIFAR10	ResNet18 (modified)	200	95.53	32 $\times$ 32
CIFAR100	ResNet18 (modified)	200	78.72	32 $\times$ 32
Tiny-ImageNet	ResNet18 (modified)	200	60.50	64 $\times$ 64

You can download the teacher model used in our experiments here.

In the main table of our paper, we used distilled data synthesized by CDA. You can download the distilled data used in our experiments here.

Note that CCFS can be extended to most dataset distillation methods, as long as you have the distilled data and organize it into the image folder structure. We encourage adopting different distilled data by other DD methods and configuring corresponding data augmentation and training settings to verify the scalability of CCFS.

How to Run

For the 3 small datasets (CIFAR-10/100, Tiny-ImageNet), we provide single GPU implementation of CCFS. Run the following command to conduct CCFS on Tiny-ImageNet with IPC = 50:

CUDA_VISIBLE_DEVICES=0, python ccfs_tiny.py \
    --data-path /path/to/Tiny-ImageNet/ --filter-model resnet18 --teacher-model resnet18 \
    --teacher-path ./checkpoints/resnet18_tiny_200epochs.pth  --eval-model resnet18 \
    --device cuda --batch-size 64 --epochs 100 --opt sgd --lr 0.2 --momentum 0.9 --weight-decay 1e-4 \
    --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --lr-warmup-decay 0.01 \
    --distill-data-path ./syn-data/cda_tiny_rn18_4k_ipc100 \
    -T 20 --image-per-class 50 --alpha 0.2 --curriculum-num 3 \
    --select-misclassified --select-method simple --balance \
    --score forgetting --score-path ./scores/forgetting_Tiny.npy \
    --output-dir ./selection_logs --num-eval 5

To facilitate experiments running, we provide scripts for running the bulk experiments in the paper:

sh ./scripts/ccfs_tiny.sh

After running, the selected real image indices will be stored as selected_indices.json. The experiment configurations will also be saved as exp_log.txt.

To quickly validate the performance of the synthetic dataset without the relabel process, we provide validation code with naive KD following SRe2L. Set --selected_indices_path to the correct selected_indices.json file and run the following command to conduct a quick validation on Tiny-ImageNet with IPC = 50:

CUDA_VISIBLE_DEVICES=0, python eval_tiny.py \
    --data-path /path/to/Tiny-ImageNet/ --eval-model resnet18 \
    --teacher-model resnet18 --teacher-path ./checkpoints/resnet18_tiny_200epochs.pth \
    --device cuda --batch-size 64 --epochs 100 --opt sgd --lr 0.2 --momentum 0.9 --weight-decay 1e-4 -T 20 \
    --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --lr-warmup-decay 0.01 \
    --distill-data-path ./syn-data/cda_tiny_rn18_4k_ipc100 \
    --selected_indices_path ./selection_logs/Tiny/selected_indices.json \
    --image-per-class 50 --num-eval 5

Results

Performance of CCFS compared to the SOTA dataset distillation and coreset selection baselines.

Bibliography

If you find this repository helpful for your project, please consider citing our work:

@article{chen2025ccfs,
  title={Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation}, 
  author={Chen, Yanda and Chen, Gongwei and Zhang, Miao and Guan, Weili and Nie, Liqiang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figs		figs
scores		scores
scripts		scripts
LICENSE		LICENSE
README.md		README.md
ccfs_cifar10.py		ccfs_cifar10.py
ccfs_cifar100.py		ccfs_cifar100.py
ccfs_tiny.py		ccfs_tiny.py
eval_cifar10.py		eval_cifar10.py
eval_cifar100.py		eval_cifar100.py
eval_tiny.py		eval_tiny.py
figures		figures
imagenet_ipc.py		imagenet_ipc.py
tiny_imagenet_dataset.py		tiny_imagenet_dataset.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

Abstract

Usage

Requirements

Preparation

How to Run

Results

Bibliography

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

Abstract

Usage

Requirements

Preparation

How to Run

Results

Bibliography

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages