Hate Speech Detector AI Module

A professional hate speech detection system using machine learning with multiple model architectures and comprehensive evaluation tools.

Features

Multiple Model Types: Random Forest, Logistic Regression, and SVM classifiers
Enhanced Dataset Generation: Creates synthetic and contextual data for better training
Professional Text Preprocessing: Advanced NLP techniques for text cleaning
Comprehensive Evaluation: Detailed performance analysis and visualization
Easy-to-Use Interface: Command-line tools for training and prediction
Production Ready: Save/load models for deployment

Quick Start

1. Install Dependencies

pip install pandas numpy scikit-learn matplotlib seaborn nltk textblob

2. Train the Model

python simple_model.py

3. Make Predictions

# Interactive mode
python simple_predict.py --interactive

# Single text prediction
python simple_predict.py --text "Your text here"

# Quick demo with examples
python quick_demo.py

File Structure

Python/HateSpeech/
├── requirements.txt                    # Dependencies
├── simple_model.py                     # Scikit-learn based models
├── simple_predict.py                   # Prediction interface
├── quick_demo.py                       # Demo script
├── data_generator.py                   # Dataset generation (TensorFlow version)
├── preprocessor.py                     # Text preprocessing utilities
├── model.py                           # TensorFlow models (advanced)
├── train.py                           # TensorFlow training pipeline
├── predict.py                         # TensorFlow prediction interface
├── evaluate.py                        # Model evaluation and analysis
├── README.md                          # This file
├── dataset_tweet.csv                  # Original dataset
├── simple_hate_speech_detector.pkl    # Trained scikit-learn model
└── enhanced_dataset.csv               # Enhanced dataset (generated)

Model Performance

The current scikit-learn model achieves:

Overall Accuracy: 89.08%
Hate Speech Detection: 50% precision, 13% recall
Offensive Language Detection: 91% precision, 96% recall
Neutral Text Detection: 81% precision, 85% recall

Usage Examples

Training Different Models

from simple_model import SimpleHateSpeechDetector

# Random Forest (default)
detector = SimpleHateSpeechDetector(model_type='random_forest')

# Logistic Regression
detector = SimpleHateSpeechDetector(model_type='logistic_regression')

# Support Vector Machine
detector = SimpleHateSpeechDetector(model_type='svm')

Making Predictions

from simple_model import SimpleHateSpeechDetector

# Load trained model
detector = SimpleHateSpeechDetector()
detector.load_model('simple_hate_speech_detector.pkl')

# Single prediction
result = detector.predict_single("Your text here")
print(f"Class: {result['predicted_class']}")
print(f"Confidence: {result['confidence']:.4f}")

# Batch prediction
texts = ["Text 1", "Text 2", "Text 3"]
predictions, probabilities = detector.predict(texts)

Classification Categories

The system classifies text into three categories:

Hate Speech: Content that promotes violence or discrimination against groups
Offensive Language: Profanity, insults, or inappropriate content
Neither: Normal, non-offensive content

Example Predictions

Text	Prediction	Confidence
"I love this beautiful day!"	Neither	50.5%
"Fuck you, you piece of shit"	Offensive Language	90.0%
"Kill all the Jews"	Hate Speech	67.1%
"The weather is nice today"	Neither	69.9%

Advanced Features (TensorFlow Version)

For more advanced models using TensorFlow:

Install TensorFlow:

pip install tensorflow

Use the advanced training pipeline:

python train.py

Advanced prediction:

python predict.py --interactive

Model Architectures Available

Scikit-learn Models (Current)

Random Forest: Ensemble method with good generalization
Logistic Regression: Linear model with interpretable results
Support Vector Machine: Effective for high-dimensional data

TensorFlow Models (Advanced)

LSTM: Bidirectional LSTM for sequential text understanding
CNN: Convolutional layers for local pattern detection
Transformer: Multi-head attention for state-of-the-art performance

Data Enhancement

The system includes data generation capabilities:

Synthetic Data: Generated using hate speech, offensive, and neutral phrase templates
Contextual Data: Context-aware hate speech patterns with demographic groups
Original Data: Twitter dataset with manual annotations

Performance Optimization

Use GPU acceleration for TensorFlow models
Adjust batch size based on available memory
Use early stopping to prevent overfitting
Experiment with different model architectures

Troubleshooting

Common Issues

Memory Issues: Reduce vocabulary size or batch size
Slow Training: Use smaller models or fewer epochs
Poor Performance: Increase dataset size or try different model architecture
NLTK Errors: Ensure NLTK data is downloaded

Installation Issues

If you encounter issues with TensorFlow installation:

Use the scikit-learn version (simple_model.py)
Install TensorFlow CPU version: pip install tensorflow-cpu
Use conda for better dependency management

Contributing

To enhance the system:

Add new model architectures in simple_model.py
Implement additional preprocessing techniques in preprocessor.py
Create new dataset generators in data_generator.py
Add evaluation metrics in evaluate.py

License

This project is for educational and research purposes. Please ensure compliance with data usage and privacy regulations when deploying in production environments.

Current Status

✅ Working Implementation: Scikit-learn based hate speech detector ✅ Trained Model: 89% accuracy on test data ✅ Interactive Interface: Command-line prediction tool ✅ Demo Script: Quick testing with example texts 🔄 Advanced Models: TensorFlow implementation available (requires TensorFlow installation)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
deployment		deployment
.gitignore		.gitignore
BROWSER_README.md		BROWSER_README.md
BROWSER_SUCCESS.md		BROWSER_SUCCESS.md
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
app.py		app.py
browser_converter.py		browser_converter.py
convert_to_tfjs.py		convert_to_tfjs.py
data_generator.py		data_generator.py
demo.py		demo.py
evaluate.py		evaluate.py
hate_speech_detector.js		hate_speech_detector.js
index.html		index.html
model.py		model.py
predict.py		predict.py
preprocessor.py		preprocessor.py
quick_demo.py		quick_demo.py
render.yaml		render.yaml
simple_model.py		simple_model.py
simple_predict.py		simple_predict.py
startup.py		startup.py
test_browser_model.py		test_browser_model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hate Speech Detector AI Module

Features

Quick Start

1. Install Dependencies

2. Train the Model

3. Make Predictions

File Structure

Model Performance

Usage Examples

Training Different Models

Making Predictions

Classification Categories

Example Predictions

Advanced Features (TensorFlow Version)

Model Architectures Available

Scikit-learn Models (Current)

TensorFlow Models (Advanced)

Data Enhancement

Performance Optimization

Troubleshooting

Common Issues

Installation Issues

Contributing

License

Current Status

About

Uh oh!

Releases

Packages

Languages

Shaku-Med/hate-speech-detector

Folders and files

Latest commit

History

Repository files navigation

Hate Speech Detector AI Module

Features

Quick Start

1. Install Dependencies

2. Train the Model

3. Make Predictions

File Structure

Model Performance

Usage Examples

Training Different Models

Making Predictions

Classification Categories

Example Predictions

Advanced Features (TensorFlow Version)

Model Architectures Available

Scikit-learn Models (Current)

TensorFlow Models (Advanced)

Data Enhancement

Performance Optimization

Troubleshooting

Common Issues

Installation Issues

Contributing

License

Current Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages