QuantChallenge 2025 Overall Rank: #11 out of 1417

https://quantchallenge.org/

Research Round

Task:
Given 17 columns of training data (time, A-N, Y1, Y2) and 15 columns of test data (time, A-N):

Goal: Train a model on the training data and predict values (Y1, Y2) for the test data.
Metric: Achieve the highest possible ( R^2 ), defined as:

R^2 = 1 - (SS_res / SS_tot) SS_res = Σ(y_i - ŷ_i)^2 SS_tot = Σ(y_i - ȳ)^2

Models Considered

Tree-based methods: XGBoost, LightGBM, CatBoost.

Data Exploration

Correlation
- Y1 strongly correlated with predictors (( r \approx 0.8 )) → possible linear dependence.
- Y2 weaker correlation → less linear predictability.
ADF (Augmented Dickey–Fuller Test)
- Tests for stationarity (constant mean, variance, autocorrelation).
- Both Y1 and Y2 found to be stationary.
ACF (Autocorrelation Function)
- Y1: no significant autocorrelation (noise-like).
- Y2: spikes above confidence bounds → some MA structure.
PACF (Partial Autocorrelation Function)
- Y1: no significant partial autocorrelation.
- Y2: significant spikes → some AR structure.

Interpretation:

Y1: mostly noise, no strong AR/MA structure.
Y2: some autocorrelation, but not enough for a low-order AR/MA model.
Linear models likely to underfit → nonlinear models more suitable.

Why XGBoost (Nonlinear Choice)

Large dataset with many columns → XGBoost can exploit interactions.
Predictive performance prioritized over interpretability.
Handles nonlinear structure: can capture feature interactions and lag effects.
Comparison of boosting frameworks:
- XGBoost: extreme gradient boosting, level-wise growth.
- LightGBM: leaf-wise growth, faster on large datasets.
- CatBoost: categorical feature handling, symmetric level-wise trees.

Testing & Evaluation Pipeline

Rolling Window Testing
- Train/test splits that respect time order.
Hyperparameter Optimization
- Cross-validation with time-series splits.
Evaluation Metric
- Compute ( R^2 ) on validation/test data.
Final Training
- Retrain best model on the entire dataset.
Prediction
- Generate predictions for Y1 and Y2 on the test set.

Final Approach: Blending

Final model was a blend of XGBoost and CatBoost,
Ridge regression was used as a meta-learner to determine the optimal weights for combining their predictions.
This approach leveraged:
- XGBoost: strong baseline nonlinear learner.
- CatBoost: robust handling of categorical features.
- Ridge regression: provided a stable, regularized way to combine both, preventing overfitting and balancing contributions.

Trading round

Implemented Avellaneda-Stoikov model, parameters need adjusting in testing environment
Developed a late-game inefficiency capture strategy, targeting predictable pricing discrepancies near market close

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
research		research
trading		trading
.competition		.competition
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QuantChallenge 2025 Overall Rank: #11 out of 1417

Research Round

Models Considered

Data Exploration

Why XGBoost (Nonlinear Choice)

Testing & Evaluation Pipeline

Final Approach: Blending

Trading round

Credit to Team

About

Uh oh!

Releases

Packages

Languages

Scott-Yap/quantchallenge

Folders and files

Latest commit

History

Repository files navigation

QuantChallenge 2025 Overall Rank: #11 out of 1417

Research Round

Models Considered

Data Exploration

Why XGBoost (Nonlinear Choice)

Testing & Evaluation Pipeline

Final Approach: Blending

Trading round

Credit to Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages