Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction

In proceedings of the 2025 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC)

This paper builds upon our previous work on Juro, an AI-powered chatbot designed to improve legal information access in the Democratic Republic of Congo (DRC), by addressing the specific challenge of automated citation detection in unstructured legal texts. We propose an end-to-end approach that combines Large Language Model (LLM)-based annotation and Named Entity Recognition (NER) for extracting key entities critical to constructing a legal knowledge graph. A total of 8,400 Congolese legal document titles were collected and annotated using the GPT-4o-mini model, followed by training in spaCy under two distinct configurations, one emphasizing accuracy and the other efficiency. We evaluated the system using both a split dataset and a human-annotated benchmark, demonstrating strong performance in identifying document types, reference numbers, and publication dates. An initial mapping algorithm connected documents based on annotated entities, revealing a preliminary citation graph of over 1,400 relationships. While the current methodology shows promise in automating entity extraction and preliminary graph construction, future developments will explore deeper relationship modeling, improved type coverage, and integration into the Juro framework to provide enhanced legal support.

How to cite this work

@INPROCEEDINGS{11299672,
  author={Ngandu, Bernard and Mateus, Jovita and Mbale, Landry and Bagula, Antoine},
  booktitle={2025 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC)}, 
  title={Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction}, 
  year={2025},
  volume={},
  number={},
  pages={994-999},
  keywords={Training;Accuracy;Law;Annotations;Large language models;Knowledge graphs;Named entity recognition;Manuals;Market research;Data mining;Automated Entity Extraction;Citation Detection;Congolese Legal Texts;GPT-4o-mini;Knowledge Graph;LLM Annotation;Legal AI;NER;spaCy;Unstructured Data},
  doi={10.1109/ETNCC66224.2025.11299672}
}

Usage

git clone https://github.com/bernard-ng/drc-legal-ner.git
cd drc-legal-ner

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
docker compose up

Annotation

Will generate a dataset of Congolese legal texts and annotate it using OpenAI's GPT-4o-mini you can do it synchronously or asynchronously (with batch API).

python -m processing.batch.requests --build
python -m processing.batch.requests --upload
python -m processing.batch.requests --create
python -m processing.batch.response  # 24h later

python -m process.annotate --method=async

python -m processing.format --label-studio  # for Human feedback and validation
python -m processing.format --spacy-binary  # Spacy compatible format for training

Tasks

make train_efficiency   # Train the model with efficiency
make train_accuracy     # Train the model with accuracy
make evaluate           # Evaluate the model
make benchmark          # Benchmark the model
make visualize          # Visualize NER
make clean              # Clean the model and results

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
misc		misc
processing		processing
results		results
visualization		visualization
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
app.py		app.py
compose.yaml		compose.yaml
config_accuracy.cfg		config_accuracy.cfg
config_efficiency.cfg		config_efficiency.cfg
graph.py		graph.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction

How to cite this work

Usage

About

Uh oh!

Uh oh!

Languages

bernard-ng/drc-legal-ner

Folders and files

Latest commit

History

Repository files navigation

Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction

How to cite this work

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages