Skip to content

zineb198/DefiIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DefiIA

This repository is the code used during the Kaggle Challenge : IA and Ethics. The aim of this competition is to assign the correct job category to a job description. The data is therefore representative of what can be found on the English speaking part of the Internet, and thus contains a certain amount of bias. One of the goals of this competition is to design a solution that is both accurate as well as fair.
You can find an exploratory analysis of the data in the notebook folder of this git.

Tutorial

We have two ways to solve this problem :

  • Classical NLP methods.
  • Transformers : Bert. For the first one, you can either train locally or on the google cloud computing platform. For the second one, it is recommended to use google colaboratory.

1. Classical NLP methods

Installing environement dependencies

Run the following lines in your terminal :

pip install -r requirements.txt
python
import nltk
nltk.download('wordnet')
nltk.download('stopwords')

Load data

Download from the link the data.

There are multiple ways to run the code :

a. Training locally

First change the path in the three scripts (cleaning.py, embedding.py, classification.py).
From the command line, go into the script folder and run the following lines :

python cleaning.py
python embedding.py
python classification.py
From the code you can modify the parameters to change the embedding method for example.

b. Training on google cloud platform

In the instance.py file, modify the parameters for your virtual machine.
Change the path in the four scripts (project.py, cleaning.py, embedding.py, classification.py).
Move the requirements.txt file in the script folder. From the command line run the following line :

python main.py

In this architecture, you will find the models in the model folder and the submission files in the result folder.

2. Transformers : Bert

Upload the bert.ipynb in the notebook folder to colab.
Choose the GPU.
Install the packages and import the data.
Run the cells and download the submission file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors