This repository contains a production-ready, industry-standard machine learning pipeline for the NeurIPS Open Polymer Prediction Challenge 2025. The challenge involves predicting five key polymer properties (Tg, FFV, Tc, Density, Rg) from SMILES strings using Graph Neural Networks.
The NeurIPS Open Polymer Prediction 2025 challenge requires predicting polymer properties to accelerate sustainable materials research. The task involves:
- Input: Polymer SMILES strings
- Output: 5 properties - Glass transition temperature (Tg), Fractional free volume (FFV), Thermal conductivity (Tc), Density, and Radius of gyration (Rg)
- Evaluation: Weighted Mean Absolute Error (wMAE) with property-specific reweighting
- Data: 7,973 training samples with significant missing values (11.8% - 93.6% per property)
pip install torch torch-geometric rdkit pandas numpy scikit-learn tqdm- Download competition data from Kaggle and place in
info/folder:info/train.csv- Training data with SMILES and target propertiesinfo/test.csv- Test data with SMILES onlyinfo/sample_submission.csv- Sample submission format
-
Quick training and submission:
python neurips_competition.py --epochs 50 --batch_size 32 --output submission.csv
-
Optimized training with better hyperparameters:
python train_final_model.py
-
Custom hyperparameters:
python neurips_competition.py --epochs 100 --batch_size 64 --hidden_channels 256 --num_layers 4 --lr 0.001
βββ .github/ # CI/CD workflows
βββ .vscode/ # VSCode settings
βββ configs/ # Hydra configuration files
βββ data/
β βββ processed/ # Processed data
β βββ raw/ # Raw data
βββ docs/ # Documentation
βββ models/ # Trained models
βββ notebooks/ # Jupyter notebooks
βββ reports/ # Reports and figures
βββ scripts/ # Utility scripts
βββ src/polymer_prediction/ # Main source code
βββ tests/ # Test suite
βββ .dockerignore # Docker ignore file
βββ .gitignore # Git ignore file
βββ .pre-commit-config.yaml # Pre-commit hooks configuration
βββ CHANGELOG.md # Changelog
βββ CONTRIBUTING.md # Contribution guidelines
βββ Dockerfile # Dockerfile
βββ LICENSE # License
βββ Makefile # Makefile
βββ README.md # README
βββ pyproject.toml # Project metadata and dependencies
βββ setup.cfg # Setup configuration
This diagram provides a high-level overview of the repository structure. For more details, refer to the respective directories.
- Multi-target Prediction: Simultaneous prediction of 5 polymer properties
- Missing Value Handling: Robust handling of sparse training data
- Competition Metrics: Implementation of weighted MAE evaluation metric
- SMILES Processing: Advanced molecular featurization with RDKit
- Graph Neural Networks: GCN-based architecture optimized for molecular data
- Modular Design: Clean separation of concerns with proper abstractions
- Error Handling: Comprehensive error handling for invalid SMILES
- Reproducibility: Deterministic training with seed management
- Scalable Architecture: Efficient batching and GPU support
- Industry Standards: Following Python best practices and conventions
- Flexible Configuration: Easy hyperparameter tuning
- Extensible Models: Simple to add new GNN architectures
- Comprehensive Metrics: Per-property and competition-specific evaluation
- Visualization Support: Training curves and prediction analysis
The model predicts all 5 properties simultaneously using a shared GCN encoder:
# Model outputs 5 values: [Tg, FFV, Tc, Density, Rg]
predictions = model(molecular_graph) # Shape: (batch_size, 5)Training data has significant missing values (11.8% - 93.6% per property). Our implementation:
- Uses binary masks to track missing values
- Computes loss only on available targets
- Handles sparse gradients efficiently
# Masked loss computation
loss = masked_mse_loss(predictions, targets, masks)Implements the official weighted MAE metric:
from polymer_prediction.utils.competition_metrics import weighted_mae
# Calculate competition score
wmae = weighted_mae(predictions, targets, masks)# Quick training (10 epochs)
python neurips_competition.py --epochs 10 --batch_size 32
# Production training with optimized hyperparameters
python train_final_model.py
# Custom configuration
python neurips_competition.py \
--epochs 100 \
--batch_size 64 \
--hidden_channels 256 \
--num_layers 4 \
--lr 0.001 \
--output my_submission.csvOur pipeline converts SMILES strings to rich graph representations:
- Atom Features: Atomic number, degree, hybridization, aromaticity, chirality
- Bond Features: Bond type, ring membership, conjugation
- Graph Structure: Molecular connectivity with PyTorch Geometric
- Validation: Automatic SMILES validation and error handling
- Graph Neural Networks: GCN, GAT, GraphSAGE support
- Molecular Pooling: Global mean/max/attention pooling
- Regularization: Dropout, batch normalization, weight decay
- Optimization: Adam, AdamW with learning rate scheduling
Comprehensive evaluation with:
- Regression: RMSE, MAE, RΒ², MAPE, SMAPE
- Classification: Accuracy, Precision, Recall, F1, AUC
- Visualization: Prediction plots, training curves, molecular structures
# Build and run with Docker Compose
docker-compose up jupyter # Jupyter Lab environment
docker-compose up tensorboard # TensorBoard visualization
# Or build manually
docker build -t polymer-prediction .
docker run -it --rm -v $(pwd):/workspace polymer-prediction# Run all tests
make test
# Run specific test categories
pytest tests/test_models.py -v
pytest tests/ -k "not slow" # Skip slow tests
# Generate coverage report
make test && open htmlcov/index.htmlmake help # Show all available commands
make install # Install package
make install-dev # Install with dev dependencies
make test # Run tests with coverage
make lint # Run code quality checks
make format # Format code with black/isort
make type-check # Run mypy type checking
make security # Run security scans
make docs # Build documentation
make clean # Clean build artifacts
make docker-build # Build Docker image
make train # Train model with default config
make hyperparameter-sweep # Run parameter optimizationWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Make changes with tests
- Run quality checks:
make lint type-check security test - Submit a pull request
- API Reference: https://[YOUR_USERNAME].github.io/[YOUR_REPOSITORY]/
- Examples: See
docs/examples/for detailed tutorials - Configuration: See
configs/for configuration options
This project implements enterprise-grade standards:
- β PEP 8 code style with Black formatting
- β Type hints throughout codebase
- β Comprehensive testing with pytest
- β Security scanning with Bandit
- β Dependency management with modern pyproject.toml
- β CI/CD pipeline with GitHub Actions
- β Documentation with Sphinx
- β Containerization with Docker
- β Configuration management with Hydra
- β Logging with structured logging
- β Error handling and validation
- β Reproducibility with seed management
This project is licensed under the MIT License - see the LICENSE file for details.
- NeurIPS Open Polymer Prediction Challenge organizers
- PyTorch Geometric team for excellent graph ML tools
- RDKit developers for cheminformatics utilities
- Open source community for the amazing Python ecosystem