1The Hong Kong Polytechnic University 2Sichuan University
*Corresponding author
This repository is the official implementation of DRAG (Debate-Augmented RAG), a novel training-free framework, designed to reduce hallucinations in Retrieval-Augmented Generation (RAG) systems.
Retrieval-Augmented Generation (RAG) is designed to mitigate hallucinations in large language models (LLMs) by retrieving relevant external knowledge to support factual generation. However, biased or erroneous retrieval results can mislead the generation, compounding the hallucination problem rather than solving it. In this work, we refer to this cascading issue as Hallucination on Hallucination, a phenomenon where the model's factual mistakes are not just due to internal reasoning flaws, but also triggered or worsened by unreliable retrieved content.
To address this, we implement DRAG, a training-free framework that integrates multi-agent debate (MAD) mechanisms into both the retrieval and generation stages. These debates help dynamically refine queries, reduce bias, and promote factually grounded, robust answers.
🔥 [May 24, 2025]: The paper and Code were released!
🔥 [May 16, 2025]: Our paper was accepted by ACL 2025!
Clone this repository, then create a drag conda environment and install the packages.
# clone repository
git clone https://github.com/Huenao/Debate-Augmented-RAG.git
# create conda env
conda create -n drag
conda activate drag
# install packages
pip install -r requirements.txt💡 Note: If you encounter any issues when installing the Python packages using the commands above, we recommend following the official installation instructions provided by FlashRAG#Installation instead.
The datasets used in this project follow the same format as those pre-processed by FlashRAG#Datasets. All datasets are available at Huggingface datasets.
After downloading the dataset, please create a /dataset folder in the project directory and place the downloaded data inside. The directory structure should be as follows:
Debate-Augmented-RAG
├── assets
├── config
├── dataset
│ ├── 2wiki
│ ├── HotpotQA
│ ├── NQ
│ ├── PopQA
│ ├── StrategyQA
│ └── TriviaQA
├── misc
├── model
├── output
├── wiki_corpus
├── main.py
├── README.md
└── requirements.txtCurrently, DRAG supports only the following six datasets: NQ, TriviaQA, PopQA, 2WikiMultihopQA, HotpotQA, and StrategyQA.
💡 Note: If you wish to use a custom dataset path, simply modify the
data_dirfield inconfig/base_config.yamlaccordingly.
We use the wiki18_100w dataset provided by FlashRAG#index as the document corpus, along with the preprocessed index generated by its e5-base-v2 retriever.
Both the document corpus and the index can be downloaded from the retrieval_corpus folder at ModelScope.
After downloading, please create a wiki_corpus folder in the project root and place both files inside it. The directory structure should look like:
Debate-Augmented-RAG
├── assets
├── config
├── dataset
├── misc
├── model
├── output
├── wiki_corpus
│ ├── e5_flat_inner.index
│ └── wiki18_100w.jsonl
├── main.py
├── README.md
└── requirements.txt💡 Note: If you wish to use a custom corpus path, simply modify the
index_pathandcorpus_pathfield inconfig/base_config.yamlaccordingly.
This project supports all LLMs compatible with HuggingFace and vLLM. Please specify the path to your downloaded model using the model2path field in config/base_config.yaml.
python main.py --method_name "DRAG" \
--gpu_id "0" \
--dataset_name "StrategyQA" \
--generator_model "llama3-8B-instruct"--method_name: Specifies the RAG method to use, supports:DRAG(default),Naive Gen,Naive RAG,FLARE,Iter-RetGen,IRCoT,SuRe,Self-RAG,MAD.--gpu_id: Specifies the GPU device ID to use.--dataset_name: Specifies the dataset to use, supports the following options:NQ,TriviaQA,PopQA,2wiki,HotpotQA,StrategyQA--generator_model: Specifies the generation model to use.
Additionally, when using DRAG, you can customize the number of debate rounds for each phase by setting the --max_query_debate_rounds and --max_answer_debate_rounds parameters, which control the Retrieval Debate and Response Debate stages, respectively.
To better visualize and analyze the results, we use HTML4Vision to generate HTML files that visualize the entire debate process.
python misc/vis_naive_gen.py --file_path output/path-to-results-folderFlashRAG: A Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. We thank the authors for their excellent work.
Thank you for your interest in our work. If this work is useful to you, please cite it as follows:
@inproceedings{hu-etal-2025-removal,
title = "Removal of Hallucination on Hallucination: Debate-Augmented {RAG}",
author = "Hu, Wentao and
Zhang, Wengyu and
Jiang, Yiyang and
Zhang, Chen Jason and
Wei, Xiaoyong and
Qing, Li",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.770/",
pages = "15839--15853",
ISBN = "979-8-89176-251-0",
}