DefiIA

This repository is the code used during the Kaggle Challenge : IA and Ethics. The aim of this competition is to assign the correct job category to a job description. The data is therefore representative of what can be found on the English speaking part of the Internet, and thus contains a certain amount of bias. One of the goals of this competition is to design a solution that is both accurate as well as fair.
You can find an exploratory analysis of the data in the notebook folder of this git.

Tutorial

We have two ways to solve this problem :

Classical NLP methods.
Transformers : Bert. For the first one, you can either train locally or on the google cloud computing platform. For the second one, it is recommended to use google colaboratory.

1. Classical NLP methods

Installing environement dependencies

Run the following lines in your terminal :

pip install -r requirements.txt
python
import nltk
nltk.download('wordnet')
nltk.download('stopwords')

Load data

Download from the link the data.

There are multiple ways to run the code :

a. Training locally

First change the path in the three scripts (cleaning.py, embedding.py, classification.py).
From the command line, go into the script folder and run the following lines :

python cleaning.py
python embedding.py
python classification.py
From the code you can modify the parameters to change the embedding method for example.

b. Training on google cloud platform

In the instance.py file, modify the parameters for your virtual machine.
Change the path in the four scripts (project.py, cleaning.py, embedding.py, classification.py).
Move the requirements.txt file in the script folder. From the command line run the following line :

python main.py

In this architecture, you will find the models in the model folder and the submission files in the result folder.

2. Transformers : Bert

Upload the bert.ipynb in the notebook folder to colab.
Choose the GPU.
Install the packages and import the data.
Run the cells and download the submission file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DefiIA

Tutorial

1. Classical NLP methods

Installing environement dependencies

Load data

a. Training locally

b. Training on google cloud platform

2. Transformers : Bert

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
notebook		notebook
script		script
README.md		README.md
instances.py		instances.py
main.py		main.py
project.py		project.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DefiIA

Tutorial

1. Classical NLP methods

Installing environement dependencies

Load data

a. Training locally

b. Training on google cloud platform

2. Transformers : Bert

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages