Skip to content

lavanderhoney/shell_ai_fuel_blend_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Shell.ai Fuel Blend Properties Prediction

Leaderboard Position: 53rd Place

Project Overview

This repository contains my solution for the Shell.ai's Fuel Blend Properties Prediction Challenge hosted on Hackerearth.
The objective was to develop a machine learning model to accurately predict 10 different properties of a fuel blend based on the fractions and chemical characteristics of its 5 constituent components.

My final solution, a stacked ensemble of CatBoostRegressor, TabPFN and RandomForrest, achieved 53rd place on the final leaderboard.

During the challenge, the code was executed on Kaggle notebooks, but here I have organized the codes into separate notebooks.

Methodology

The project followed a structured approach, from data exploration to advanced model ensembling. The process is documented in a series of Jupyter notebooks:

  1. EDA and Preprocessing: Initial analysis revealed inconsistencies in component fractions, which were normalized to sum to 1. Correlation studies were conducted to understand baseline relationships.
  2. Feature Engineering: The most critical feature engineering step was the creation of "Weighted Property" features. These were calculated as the weighted average of each chemical property across the five components, using their respective fractions as weights. This significantly improved model performance.
  3. Modeling Experiments: A wide range of models were evaluated:
    • AutoML: Used FLAML for an initial broad search.
    • Neural Networks: Developed custom PyTorch models, including an automated Neural Architecture Search (NAS) using Optuna.
    • Tree-Based Models: Performed systematic hyperparameter optimization for CatBoost and cuML RandomForest using Optuna.
  4. Final Stacked Ensemble: The final model is a 2-level stack:
    • Level 0 Models: TabPFN, CatBoost, and cuML RandomForest Regressors.
    • Level 1 Meta-Model: A MultiTaskElasticNetCV regressor trained on the out-of-fold predictions from the base models. This ensemble achieved a final cross-validated Mean Absolute Percentage Error (MAPE) of 0.25 on the validation set.

Repository Structure

  • /notebooks: Contains the narrative Jupyter notebooks detailing the project workflow.
  • /data: Contains the .csv files of the dataset provided.
  • requirements.txt: Lists all necessary Python packages to reproduce the environment.

How to Run

  1. Clone the repository: git clone https://github.com/your-username/Shell-AI-Fuel-Blend-Prediction.git
  2. Install the required packages: pip install -r requirements.txt
  3. Run the notebooks in numerical order in the /notebooks directory.

Technologies Used

  • Python
  • Pandas & NumPy for data manipulation
  • Scikit-learn for preprocessing and modeling
  • CatBoost, XGBoost, cuML for gradient boosting and random forests
  • TabPFN for transformer-based tabular modeling
  • PyTorch for neural networks
  • Optuna & FLAML for hyperparameter tuning and AutoML
  • Matplotlib & Seaborn for visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published