NLP Classifier for Patronizing and Condescending Language Detection

Contributors

Gabriel de Olim Gaul
Siya Gowda
Ryan Patel

Overview

This is code used for the NLP coursework submission at Imperial College London, implementing a Natural Language Processing (NLP) classifier to detect patronizing or condescending language in text, trained on the Don't Patronize Me! dataset. All of the experiments and hyperparameter tuning can be found in models/ directory.

Model

The best-performing model in this project is a finetuned DeBERTa model, incorporating:

Synonym replacement for data augmentation.
Class-weighted sampling to handle data imbalance.
Preprocessing (punctuation removal and lemmatization)

Repository Structure

The repository is organized as follows:

📂 analysis       # Code for analyzing the dataset and the final trained model
📂 dataset        # Scripts for reading and splitting the dataset into training and validation sets
📂 models         # Implementation of different models and experiments
    ├── Baseline models: BoW and TF-IDF with logistic regression
    ├── DeBERTa finetuning with hyperparameter tuning
    ├── Data augmentation, sampling, and preprocessing techniques
📄 dev.txt        # Final predictions for the development dataset
📄 test.txt       # Final predictions for the test dataset

Installation and Requirements

To run the code in this repository, install the required dependencies:

python3 -m venv venv
pip install -r requirements.txt
python3 -m spacy download en_core_web_sm

This was trained on the GPU lab machines found at Imperial College London

Acknowledgments

The dataset used in this project: Don't Patronize Me! (Dataset Link)
DeBERTa model from Microsoft for state-of-the-art NLP performance. (DeBERTa Paper)

This repository is part of a coursework submission of the NLP course at Imperial College London. Any unauthorized use, reproduction, or submission of this code as original work may result in academic misconduct or plagiarism consequences.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
analysis		analysis
dataset		dataset
models		models
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
dev.txt		dev.txt
predictions.ipynb		predictions.ipynb
requirements.txt		requirements.txt
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Classifier for Patronizing and Condescending Language Detection

Contributors

Overview

Model

Repository Structure

Installation and Requirements

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

UnheardChunk/PCL-Detection

Folders and files

Latest commit

History

Repository files navigation

NLP Classifier for Patronizing and Condescending Language Detection

Contributors

Overview

Model

Repository Structure

Installation and Requirements

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages