Multiple LLM Agents Debate for Equitable
Cultural Alignment

Dayeon Ki, Rachel Rudinger, Tianyi Zhou, Marine Carpuat
University of Maryland

This repository contains the code and dataset for our ACL 2025 Main paper
Multiple LLM Agents Debate for Equitable Cultural Alignment.

👾 TL;DR

While previous efforts in cultural alignment have focused on single-model, single-turn approaches, we propose to exploit the complementary strengths of multiple LLMs to promote cultural adaptability. We introduce a Multi-Agent Debate framework, where two LLM-based agents debate over a cultural scenario and collaboratively reach a final decision, which improves both (i) overall accuracy and (ii) cultural group parity over single-model baselines.

📰 News

2025-07-10 Our paper has been selected for an oral presentation — top 8% of accepted papers!
2025-05-15 Our paper is accepted to ACL 2025! See you in Vienna!

🗺️ Overview

How can multiple LLMs collaborate toward equitable alignment across cultures? We investigate a common form of multi-LLM collaboration: debate. We propose a Multi-Agent Debate framework, where two LLM agents debate over the given scenario and collaboratively arrive at a final decision with a judge LLM. We introduce two key variants as illustrated in the above figure:

Debate-Only: multiple LLM agents exclusively engage in debate with a discussant
Self-Reflect+Debate: each LLM agent dynamically choose between self-reflection and debating during its turn

For more comprehensive comparison study, we investigate two additional strategies based on single-LLM:

Single Model: a single LLM generates outputs
Self-Reflection: an LLM generates verbal self-reflections on its own outputs and incorporate them in subsequent iterations

Results

🚀 Quick Start

Data Preparation

We use NORMAD-ETI dataset for evaluation, a benchmark designed to assess the cultural adaptability of LLMs. The dataset contains 2.6K stories reflecting social and cultural norms from 75 countries, derived from the social-etiquette norms outlined in the Cultural Atlas. Each story is associated with a country, a rule-of-thumb, and a ternary ground truth label in {Yes, No, Neither} as shown in the figure above. We categorize a total of 75 countries according to the Inglehart-Welzel cultural map and show the label and country distribution for each bin.

Raw data: data/normad_raw.csv
Country distribution: data/normad_country_dist.csv
Refined data: data/normad.jsonl

Single LLM

(1) Single Model

We first investigate the effect of adding relevant cultural context in enhancing cultural alignment of LLMs. We test two variants: without and with the rule-of-thumb (RoT) information in the prompts. (single_llm/single_model/)

For running without RoT prompting,

python -u sinlge_llm/single_model/{$LLM}.py \
  --input_path $PATH_TO_INPUT_FILE \
  --output_path $PATH_TO_OUTPUT_FILE \
  --type without_rot

For running with RoT prompting,

python -u sinlge_llm/single_model/{$LLM}.py \
  --input_path $PATH_TO_INPUT_FILE \
  --output_path $PATH_TO_OUTPUT_FILE \
  --type with_rot

Arguments for the prompting code are as follows:

$LLM: Name of the LLM (specific names can be found in the directory).
--input_path: Path to input data file (data/normad.jsonl).
--output_path: Save path of output file.
--type: Without or with RoT information.

(2) Self-Reflection

Building on previous works that showed that LLMs can evaluate their outputs and learn from their own feedback, we explore self-reflection for each LLM. (single_llm/self_reflection/)

python -u sinlge_llm/self_reflection/{$LLM}.py \
  --input_path $PATH_TO_INPUT_FILE \
  --output_path $PATH_TO_OUTPUT_FILE \

Arguments for the prompting code are as follows:

$LLM: Name of the LLM (specific names can be found in the directory).
--input_path: Path to input data file (data/normad.jsonl).
--output_path: Save path of output file.

Multiple LLM

LLMs often exhibit varying knowledge coverage, with the potential to complement each other due to differences in training data distributions and alignment processes. We tap into this knowledge complementarity through multi-LLM collaboration, debate, where two LLM-based agents debate and collaboratively evaluate the given scenario.

python -u multi_llm/{$FIRST_LLM}_$SECOND_LLM.py \
  --input_path $PATH_TO_INPUT_FILE \
  --output_path $PATH_TO_OUTPUT_FILE \

Arguments for the prompting code are as follows:

$FIRST_LLM: Name of the first participant LLM (specific names can be found in the directory).
$SECOND_LLM: Name of the second participant LLM (specific names can be found in the directory).
--input_path: Path to input data file (data/normad.jsonl).
--output_path: Save path of output file.

Evaluation

For evaluating single LLM baselines, use evaluate/accuracy_single.py. Add the model names to test in the MODEL_NAMES variable and run the code: python evaluate/accuracy_single.py.
For evaluating multi LLM baselines, use evaluate/accuracy_multi.py. Add the name of the first model as FIRST_MODEL and the name of the second model as SECOND_MODEL variables and run the code: python evaluate/accuracy_multi.py.

🤲 Citation

If you find our work useful in your research, please consider citing our work:

@inproceedings{ki-etal-2025-multiple,
    title = "Multiple {LLM} Agents Debate for Equitable Cultural Alignment",
    author = "Ki, Dayeon  and
      Rudinger, Rachel  and
      Zhou, Tianyi  and
      Carpuat, Marine",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.1210/",
    doi = "10.18653/v1/2025.acl-long.1210",
    pages = "24841--24877",
    ISBN = "979-8-89176-251-0",
}

📧 Contact

For questions, issues, or collaborations, please reach out to dayeonki@umd.edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiple LLM Agents Debate for Equitable
Cultural Alignment

👾 TL;DR

📰 News

✏️ Content

🗺️ Overview

Results

🚀 Quick Start

Data Preparation

Single LLM

(1) Single Model

(2) Self-Reflection

Multiple LLM

Evaluation

🤲 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
evaluate		evaluate
multi_llm		multi_llm
single_llm		single_llm
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Multiple LLM Agents Debate for Equitable Cultural Alignment

👾 TL;DR

📰 News

✏️ Content

🗺️ Overview

Results

🚀 Quick Start

Data Preparation

Single LLM

(1) Single Model

(2) Self-Reflection

Multiple LLM

Evaluation

🤲 Citation

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Multiple LLM Agents Debate for Equitable
Cultural Alignment

Packages