Deep Tabular Learning via Distillation and Language Guidance

Overview | Requirements | Datasets | Running DisTab

Overview

This is the official implementation for Deep Tabular Learning via Distillation and Language Guidance (DisTab). DisTab is based on transformer architectures, and leverages distillation pre-training and language-guided embeddings for robust performance. The repository provides sample code and usage guide.

Requirements

Key dependencies include PyTorch, PyTorch Lightning, OpenML, AutoGluon, gin-config, scikit-learn, and pandas.

Datasets

We preprocess several datasets from OpenML for running DisTab. The pre-processed datasets include the language-guided embeddings as described in the paper, using Llama-3-8B as the embedding model. To use the datasets, download and unzip them under the repo. dataset folder should be created and origanized as follows:

dataset
├── adult
│   ├── head.json
│   ├── tab_data
├── higgs
│   ├── head.json
│   ├── tab_data
├── ...

Each dataset includes head.json (metadata, e.g. OpenML Task ID) and tab_data (the preprocessed tabular data).

Running DisTab

Configuration

Please configure experiment settings and model hyperparameters in gin-config files located in the gin_config folder.

The folder structure of gin-config:

gin_config
├── single_task.gin
├── ...

Running

DisTab consists of three components, including training a teacher model (tree-based models), pre-training by distilling the teacher model, and model fine-tuning. Each stage may be run independently for convenience.

For full training including the training of a teacher model, distillation pre-training and fine-tuning:

python run_single_task.py --gin_file gin_config/single_task.gin --task_name adult  --active_teacher_model --active_pre_training --active_fine_tuning

The teacher models are saved in teacher_model_dir from single_task.gin, and the performance result in baseline_tree_<task_type>_<metric>.res. The fine-tuned models are saved in fine_tuned_model_dir from single_task.gin, with the performance results in fine_tuned_result_path from single_task.gin.

To only train the teacher model:

python run_single_task.py --gin_file gin_config/single_task.gin --task_name {task_name}  --active_teacher_model

To only train DisTab if a teacher model is available:

python run_single_task.py --gin_file gin_config/single_task.gin --task_name {task_name} --active_pre_training --active_fine_tuning

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
gin_config		gin_config
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data.py		data.py
dyna_tab_model.py		dyna_tab_model.py
llama_transformer.py		llama_transformer.py
run_single_task.py		run_single_task.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Tabular Learning via Distillation and Language Guidance

Overview

Requirements

Datasets

Running DisTab

Configuration

Running

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Tabular Learning via Distillation and Language Guidance

Overview

Requirements

Datasets

Running DisTab

Configuration

Running

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages