Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction
In proceedings of the 2025 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC)
This paper builds upon our previous work on Juro, an AI-powered chatbot designed to improve legal information access in the Democratic Republic of Congo (DRC), by addressing the specific challenge of automated citation detection in unstructured legal texts. We propose an end-to-end approach that combines Large Language Model (LLM)-based annotation and Named Entity Recognition (NER) for extracting key entities critical to constructing a legal knowledge graph. A total of 8,400 Congolese legal document titles were collected and annotated using the GPT-4o-mini model, followed by training in spaCy under two distinct configurations, one emphasizing accuracy and the other efficiency. We evaluated the system using both a split dataset and a human-annotated benchmark, demonstrating strong performance in identifying document types, reference numbers, and publication dates. An initial mapping algorithm connected documents based on annotated entities, revealing a preliminary citation graph of over 1,400 relationships. While the current methodology shows promise in automating entity extraction and preliminary graph construction, future developments will explore deeper relationship modeling, improved type coverage, and integration into the Juro framework to provide enhanced legal support.
@INPROCEEDINGS{11299672,
author={Ngandu, Bernard and Mateus, Jovita and Mbale, Landry and Bagula, Antoine},
booktitle={2025 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC)},
title={Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction},
year={2025},
volume={},
number={},
pages={994-999},
keywords={Training;Accuracy;Law;Annotations;Large language models;Knowledge graphs;Named entity recognition;Manuals;Market research;Data mining;Automated Entity Extraction;Citation Detection;Congolese Legal Texts;GPT-4o-mini;Knowledge Graph;LLM Annotation;Legal AI;NER;spaCy;Unstructured Data},
doi={10.1109/ETNCC66224.2025.11299672}
}git clone https://github.com/bernard-ng/drc-legal-ner.git
cd drc-legal-ner
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
docker compose up- Annotation
Will generate a dataset of Congolese legal texts and annotate it using OpenAI's GPT-4o-mini you can do it synchronously or asynchronously (with batch API).
python -m processing.batch.requests --build
python -m processing.batch.requests --upload
python -m processing.batch.requests --create
python -m processing.batch.response # 24h later
python -m process.annotate --method=async
python -m processing.format --label-studio # for Human feedback and validation
python -m processing.format --spacy-binary # Spacy compatible format for training- Tasks
make train_efficiency # Train the model with efficiency
make train_accuracy # Train the model with accuracy
make evaluate # Evaluate the model
make benchmark # Benchmark the model
make visualize # Visualize NER
make clean # Clean the model and results