Skip to content

UnheardChunk/PCL-Detection

Repository files navigation

NLP Classifier for Patronizing and Condescending Language Detection

Contributors

  • Gabriel de Olim Gaul
  • Siya Gowda
  • Ryan Patel

Overview

This is code used for the NLP coursework submission at Imperial College London, implementing a Natural Language Processing (NLP) classifier to detect patronizing or condescending language in text, trained on the Don't Patronize Me! dataset. All of the experiments and hyperparameter tuning can be found in models/ directory.

Model

The best-performing model in this project is a finetuned DeBERTa model, incorporating:

  • Synonym replacement for data augmentation.
  • Class-weighted sampling to handle data imbalance.
  • Preprocessing (punctuation removal and lemmatization)

Repository Structure

The repository is organized as follows:

📂 analysis       # Code for analyzing the dataset and the final trained model
📂 dataset        # Scripts for reading and splitting the dataset into training and validation sets
📂 models         # Implementation of different models and experiments
    ├── Baseline models: BoW and TF-IDF with logistic regression
    ├── DeBERTa finetuning with hyperparameter tuning
    ├── Data augmentation, sampling, and preprocessing techniques
📄 dev.txt        # Final predictions for the development dataset
📄 test.txt       # Final predictions for the test dataset

Installation and Requirements

To run the code in this repository, install the required dependencies:

python3 -m venv venv
pip install -r requirements.txt
python3 -m spacy download en_core_web_sm

This was trained on the GPU lab machines found at Imperial College London

Acknowledgments

  • The dataset used in this project: Don't Patronize Me! (Dataset Link)
  • DeBERTa model from Microsoft for state-of-the-art NLP performance. (DeBERTa Paper)

This repository is part of a coursework submission of the NLP course at Imperial College London. Any unauthorized use, reproduction, or submission of this code as original work may result in academic misconduct or plagiarism consequences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published