Skip to content

Relaxed-System-Lab/hallu_med

Repository files navigation

Codes for 'Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning'

The MED dataset and benchmark are available. You will need to download the original images of Visual Genome and Docci from the web. The rest of the files are in the current code repository.

Code Organization

project/
├── script_gen_edit_prompt.py         # Generate editing prompts
├── script_gen_vg_description.py      # Generate full VG descriptions from multiple sentences
├── script_filter_images.py           # Filter images suitable for editing
├── script_get_complete_caption.py    # Generate complete original captions to capture all objects in the image
├── script_get_edit_caption.py        # Generate edited captions based on editing prompts
├── script_gen_diff_and_judge.py      # Generate difference descriptions and judge alignment with prompts
├── script_gen_sft.py                 # Construct difference descriptions as SFT training data
├── test_*.py                         # Code for testing MED benchmark

Environment Installation

pip install openai
pip install -q -U google-genai #python >= 3.9
pip install -U transformers
pip install qwen-vl-utils

Citation

If you find our work helpful, please cite us:

@article{bai2025hallucination,
  title={Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning},
  author={Bai, Tianyi and Fan, Yuxuan and Qiu, Jiantao and Sun, Fupeng and Song, Jiayi and Han, Junlin and Liu, Zichen and He, Conghui and Zhang, Wentao and Yuan, Binhang},
  journal={arXiv preprint arXiv:2506.07227},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages