The MED dataset and benchmark are available. You will need to download the original images of Visual Genome and Docci from the web. The rest of the files are in the current code repository.
project/
├── script_gen_edit_prompt.py # Generate editing prompts
├── script_gen_vg_description.py # Generate full VG descriptions from multiple sentences
├── script_filter_images.py # Filter images suitable for editing
├── script_get_complete_caption.py # Generate complete original captions to capture all objects in the image
├── script_get_edit_caption.py # Generate edited captions based on editing prompts
├── script_gen_diff_and_judge.py # Generate difference descriptions and judge alignment with prompts
├── script_gen_sft.py # Construct difference descriptions as SFT training data
├── test_*.py # Code for testing MED benchmark
pip install openai
pip install -q -U google-genai #python >= 3.9
pip install -U transformers
pip install qwen-vl-utilsIf you find our work helpful, please cite us:
@article{bai2025hallucination,
title={Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning},
author={Bai, Tianyi and Fan, Yuxuan and Qiu, Jiantao and Sun, Fupeng and Song, Jiayi and Han, Junlin and Liu, Zichen and He, Conghui and Zhang, Wentao and Yuan, Binhang},
journal={arXiv preprint arXiv:2506.07227},
year={2025}
}