DeVLBert: Learning Deconfounded Visio-Linguistic Representations

Original implementation for paper DeVLBert: Learning Deconfounded Visio-Linguistic Representations.

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n devlbert python=3.6
conda activate devlbert
git clone https://github.com/shengyuzhang/DeVLBert.git
cd devlbert
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
compile tools

cd tools/refer
make

Data Setup

Check README.md under data for more details. Check devlbert_tasks.yml for more details.

We totally follow the setup of vilbert.

Get DeVLBert pre-trained model

This repo is only for design D in our paper. You can realize other designs easily based on the repo.

Pre-trained model for Evaluation

You can download our pre-trained DeVLBert model in here and put it under save/devlbert/.

Train DeVLBert model by yourself

1: Follow Data Setup and get training dataset. Download pretrain bert-base-uncased model in here and bert-base-uncased vocabulary in here.

2: Run ./dic/get_noun_set.py ./dic/count.py ./dic/get_id2class.py in order to get "./dic/id2class.npy". Run get_dic.sh and ./dic/merge_dic.ipynb to get confounder dictionaries.

Absolute paths often occur in our code, the meaning is as follows:

"/mnt3/xuesheng/features_lmdb/CC/training_feat_part_" + str(rank) + ".lmdb" : We process the Concept Caption dataset and divide it into 8 segments
"/mnt3/xuesheng/features_lmdb/CC/caption_train.json" : During the processing the Concept Caption dataset, we save all captions. It will be used in the process of training, because we need to get a wrong caption of a image for visio-linguistic alignment proxy task.
"/mnt/xuesheng_1/bert-base-uncased" : We put bert-base-uncased model and vocabulary in here.
"./dic/id2class.npy" "./dic/id2class1155.npy" : We keep every sentence 2 confound words and 4 confound words to get the former and the latter, respectively.

3: Follow train.sh. Firstly, run train.sh. Secondly, change region mask probability from 0.15 to 0.3. Thirdly, run train.sh again. We totally train 24(12 + 12) epochs. You can train for longer time for higher performance, especially in Zero-Shot Image Retrieval task.

Evaluation

Follow Data Setup and get all datasets. Please note that keep the dataset path consistent with the path in devlbert_tasks.yml.

VQA

1: Finetune: Run vqa_train.sh. Or you can directly download our trained model in here.

2: Inference: Modify devlbert_tasks.yml: comment line 7 and uncoment line 8. Then run vqa_test.sh. The result will be generated at results/VQA_bert_base_6layer_6conect-{save_name of vqa_train.sh}-{save_name of vqa_test.sh}/test_result.json.

3: Evaluation: Access VQA Challenge 2020 and sign up for an account. Submit your result in Test-Dev Phase or Test-Standard Phase.

VCR

We only evaluate on the validation set. Run vcr_train.sh, and you can get result at the first several lines of save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-{save_name of vcr_train.sh}/output.txt.

Image Retrieval

1: Finetune: Run ir_train.sh. Or you can directly download our trained model in here.

2: Evaluation: Modify devlbert_tasks.yml: comment line 60,64 and uncoment line 61,65. Then run ir_test.sh. The result will be printed on the screen after evaluation finishing.

Zero-Shot Image Retrieval

Run zsir_test.sh directly, and the result will be printed on the screen after evaluation finishing.

RefCOCO+

We only evaluate on the validation set. Run refcoco_train.sh, and you can get result at the first several lines of save/refcoco+_bert_base_6layer_6conect-{save_name of refcoco_train.sh}/output.txt.

References

If you use DeVLBert in your research or wish to refer to the results, please cite our paper

@article{zhang2020devlbert,
  title={DeVLBert: Learning Deconfounded Visio-Linguistic Representations},
  author={Zhang, Shengyu and Jiang, Tan and Wang, Tan and Kuang, Kun and Zhao, Zhou and Zhu, Jianke and Yu, Jin and Yang, Hongxia and Wu, Fei},
  journal={arXiv preprint arXiv:2008.06884},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeVLBert: Learning Deconfounded Visio-Linguistic Representations

Repository Setup

Data Setup

Get DeVLBert pre-trained model

Pre-trained model for Evaluation

Train DeVLBert model by yourself

Evaluation

VQA

VCR

Image Retrieval

Zero-Shot Image Retrieval

RefCOCO+

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data		data
devlbert		devlbert
dic		dic
script		script
tools		tools
README.md		README.md
devlbert_tasks.yml		devlbert_tasks.yml
eval_retrieval.py		eval_retrieval.py
eval_tasks.py		eval_tasks.py
get_dic.py		get_dic.py
get_dic.sh		get_dic.sh
ir_test.sh		ir_test.sh
ir_train.sh		ir_train.sh
refcoco_train.sh		refcoco_train.sh
requirements.txt		requirements.txt
train.sh		train.sh
train_concap.py		train_concap.py
train_tasks.py		train_tasks.py
train_vcr.py		train_vcr.py
vcr_train.sh		vcr_train.sh
vqa_test.sh		vqa_test.sh
vqa_train.sh		vqa_train.sh
zsir_test.sh		zsir_test.sh

shengyuzhang/DeVLBert

Folders and files

Latest commit

History

Repository files navigation

DeVLBert: Learning Deconfounded Visio-Linguistic Representations

Repository Setup

Data Setup

Get DeVLBert pre-trained model

Pre-trained model for Evaluation

Train DeVLBert model by yourself

Evaluation

VQA

VCR

Image Retrieval

Zero-Shot Image Retrieval

RefCOCO+

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages