Encoders

Testing different Encoding Schemes for Categorical variables

Categorical variables cannot be used in the Machine Learning Model, hence they need to be encoded. Categorical variables need to be encoded properly to get the most out of the data. There are many encoding techniques which can be used. Hence, this PR is an attempt to investigate various encoders for different types of data. So, when one needs to encode something, instead of exploring all the techniques, one can just go through the notebook and use top 3-5 encoders. This PR adds the following notebooks:

Encoders - gives a general overview and implementation of the encoders investigated Extracting data - used to create datasets with the metrics for each case Analysis - Analyze the results produced in the previous step

All the encoders are tested for Linear model (Linear Regression) and Tree based model(Random Forest) and are analysed using RMSE and MAE metrics.

Also, we took 3 datasets which have different dimensionality and have categorical variables of different cardinality.

#Note The purpose here is just to investigate encoders and hence no efforts has been put to increase the efficiency of the models. Instead, we are more interested in the percentage change of the metric that we want to optimize rather than optimizing the model.

Instructions - Install the reuirements using - pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
Encoders.ipynb		Encoders.ipynb
README.md		README.md
analysis.ipynb		analysis.ipynb
extract_data.ipynb		extract_data.ipynb
extracting_data.py		extracting_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Encoders

About

Uh oh!

Releases

Packages

Languages

ankurrajdev/Encoders

Folders and files

Latest commit

History

Repository files navigation

Encoders

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages