Shell.ai Fuel Blend Properties Prediction

Leaderboard Position: 53rd Place

Project Overview

This repository contains my solution for the Shell.ai's Fuel Blend Properties Prediction Challenge hosted on Hackerearth.
The objective was to develop a machine learning model to accurately predict 10 different properties of a fuel blend based on the fractions and chemical characteristics of its 5 constituent components.

My final solution, a stacked ensemble of CatBoostRegressor, TabPFN and RandomForrest, achieved 53rd place on the final leaderboard.

During the challenge, the code was executed on Kaggle notebooks, but here I have organized the codes into separate notebooks.

Methodology

The project followed a structured approach, from data exploration to advanced model ensembling. The process is documented in a series of Jupyter notebooks:

EDA and Preprocessing: Initial analysis revealed inconsistencies in component fractions, which were normalized to sum to 1. Correlation studies were conducted to understand baseline relationships.
Feature Engineering: The most critical feature engineering step was the creation of "Weighted Property" features. These were calculated as the weighted average of each chemical property across the five components, using their respective fractions as weights. This significantly improved model performance.
Modeling Experiments: A wide range of models were evaluated:
- AutoML: Used FLAML for an initial broad search.
- Neural Networks: Developed custom PyTorch models, including an automated Neural Architecture Search (NAS) using Optuna.
- Tree-Based Models: Performed systematic hyperparameter optimization for CatBoost and cuML RandomForest using Optuna.
Final Stacked Ensemble: The final model is a 2-level stack:
- Level 0 Models: TabPFN, CatBoost, and cuML RandomForest Regressors.
- Level 1 Meta-Model: A MultiTaskElasticNetCV regressor trained on the out-of-fold predictions from the base models. This ensemble achieved a final cross-validated Mean Absolute Percentage Error (MAPE) of 0.25 on the validation set.

Repository Structure

/notebooks: Contains the narrative Jupyter notebooks detailing the project workflow.
/data: Contains the .csv files of the dataset provided.
requirements.txt: Lists all necessary Python packages to reproduce the environment.

How to Run

Clone the repository: git clone https://github.com/your-username/Shell-AI-Fuel-Blend-Prediction.git
Install the required packages: pip install -r requirements.txt
Run the notebooks in numerical order in the /notebooks directory.

Technologies Used

Python
Pandas & NumPy for data manipulation
Scikit-learn for preprocessing and modeling
CatBoost, XGBoost, cuML for gradient boosting and random forests
TabPFN for transformer-based tabular modeling
PyTorch for neural networks
Optuna & FLAML for hyperparameter tuning and AutoML
Matplotlib & Seaborn for visualization

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Shell.ai Fuel Blend Properties Prediction

Project Overview

Methodology

Repository Structure

How to Run

Technologies Used

About

Uh oh!

Releases

Packages

Languages

lavanderhoney/shell_ai_fuel_blend_prediction

Folders and files

Latest commit

History

Repository files navigation

Shell.ai Fuel Blend Properties Prediction

Project Overview

Methodology

Repository Structure

How to Run

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages