All notable changes to this project are documented here.
Format based on Keep a Changelog.
- Added
training.algorithmvaluesdpoandgrpowith the same actor-critic wiring as PPO (ppo_wiring), shared rollout contract, andweights1.torchcheckpoints. - New modules:
trackmania_rl/agents/policy_optimization/dpo.py,grpo.py; learnerslearner_dpo.py,learner_grpo.py; sharedpolicy_rollout_batch.pyfor PPO/DPO/GRPO tensor builds. - Config:
DPOConfig/GRPOConfigonRulkaConfig, flatConfigViewaccess, example YAMLconfig_files/rl/config_dpo.yamlandconfig_grpo.yaml. - Utilities:
POLICY_OPTIMIZATION_ALGORITHMS/is_policy_optimization_algorithmfor collector,train.py, HF save skip, BC inject intosave/for the whole policy family. - Helper script
scripts/dpo_append_offline_pair.pyfor offline DPO JSONL lines. - Docs: README,
config_files/README.md, Sphinxconfiguration_guide/project_structureupdated for algorithms and new sections.
- Added algorithm wiring abstractions (
trackmania_rl/agents/algorithms/) to decouple training loop orchestration from IQN-specific implementation details. - Introduced PPO stack: PPO wiring, learner path, policy optimization modules, and reusable policy model builders under
trackmania_rl/agents/policy_models/andtrackmania_rl/agents/policy_optimization/. - Added new RL configs for PPO and transformer/CNN variants (
config_files/rl/config_ppo*.yaml,config_files/rl/config_btr_post_concat_cnn_transformer.yaml) and extended schema/loader support for NN topology selection.
- Expanded BC pretrain and policy-loading paths (
trackmania_rl/pretrain/train_bc.py,trackmania_rl/pretrain/rl_policy_factory.py) to support RL policy initialization and PPO-compatible checkpoints. - Added/updated preprocessing, dataset, model, and export logic for multi-modal inputs and topology-aware construction.
- Added
trackmania_rl/pretrain/lightning_compat.pyto stabilize Lightning integration boundaries.
- Updated collector/learner process integration and training entrypoints (
scripts/train.py,trackmania_rl/multiprocess/*) to support algorithm pluggability and new PPO flow. - Added parameter-freeze helpers and soft-copy controls (
trackmania_rl/param_freeze.py) for safer transfer/fine-tuning workflows. - Added vectorized reward utilities (
trackmania_rl/reward_vectorized.py) and utility refactors supporting new training modes.
- Added focused tests for PPO math/wiring, config migrations, NN schema, BC-to-RL policy loading, and parameter freeze behavior.
- Expanded architecture and experiment docs with PPO baseline write-up (
docs/source/experiments/ppo_vs_iqn_baseline.rst), PPO architecture page, and NN topology catalog. - Refreshed configuration/troubleshooting/installation and experiment pages to document new training options and expected workflows.
analyze_experiment_by_relative_time.py:--time-axis auto(default) usescumul_training_hourswhen all runs log it, else wall minutes; explicitcumul_training_hours/wall_minutesstill available.audit_tensorboard_training_timeline.py: compares merged TB wall span vscumul_training_hoursper run (detect misleading wall-axis docs).generate_experiment_plots.py: default--time-axis autotries cumulative training hours and falls back to wall minutes ifcumul_training_hoursis missing from any run; regenerateddocs/source/_static/exp_*.jpgaccordingly.- Experiment docs: time-axis methodology; new
docs/source/experiments/time_axis_conventions.rstaudit table (wall vscumul_training_hours);multi_action_prediction.rstrecomputed training-hour checkpoints for v2 vs v3.1_bc;pretrain_bc.rstrun lengths corrected for full_iqn_bc / _2 / _3 and collapse narrative clarified (wall vs active training).
- Added a dedicated BTR A01 comparison write-up:
docs/source/experiments/models/btr.rst. - Generated and embedded BTR comparison plots in
docs/source/_static/(prefixedexp_btr_A01_v2_v4*). - Added BTR config used for the experiment runs:
config_files/rl/config_btr.yaml. - Reorganized model documentation paths so architectures live under
docs/source/models/(moved/addedmodels/iqn_architecture.rstandmodels/btr_architecture.rst).
- Added optional CPU pinning for
TmForever.exevia config (new fields inconfig_files/config_schema.pyandconfig_files/rl/config_default.yaml). - Hardened prioritized replay: when TorchRL prioritized trees are unavailable on Windows builds, the code falls back to uniform
RandomSampler.
- Multi-action prediction pipeline finalized for RL + BC: multi-offset heads, decision-block training path, and related config/schema updates.
- Experiment validation tooling hardened to avoid partial conclusions:
- TensorBoard run continuation chunks are now treated as one run (
run,run_2,run_3, ...). - Analysis checks were expanded and documented so relative-time and by-step comparisons are consistently reproduced.
- TensorBoard run continuation chunks are now treated as one run (
- Documentation command/rules for experiment writeups were updated with stricter verification steps (suffix-merge checks, save-state cross-checks, direct pairwise checks for top runs).
- Added/updated experiment pages for:
- Global schedule speed (
A01_as20_long_v2series), - Multi-action prediction (
A01_as20_long_v3/v3.1/v3.1_pretrained_bc), - BC multi-offset pretrain comparisons.
- Global schedule speed (
- Regenerated and committed comparison plots (relative-time and by-step) for updated experiment analyses.
- What worked
global_schedule_speed = 4was the strongest setting in the checked long A01 runs, with best final saved time in the v2 series.- Multi-action RL + BC pretrain improved stability and peak time over non-pretrained multi-action runs.
- Temporal/action alignment between BC pretrain and multi-action RL (5-action offset structure) improved transfer quality.
- What did not work / underperformed
- Some v2 variants (
v2.2,v2.3) did not converge to competitive A01 best times. - Multi-action without BC pretrain underperformed the strongest single-action baseline (
v2) on final best A01.
- Some v2 variants (
- Pretrain BC — behavioral cloning pretraining (
trackmania_rl.pretrain,scripts/pretrain_bc.py, configs inconfig_files/pretrain/bc/): single-frame and multi-offset action prediction, optional encoder + action head injection into IQN;pretrain_visualrenamed topretrain - Full IQN from BC — option to load full IQN (encoder + float head + A_head + V_head) from BC checkpoint (
pretrain_bc_heads_path) for RL warm start - Config and schema updates for pretrain (BC, vis), RL defaults, and experiment variants (v3, v4 multi-offset, only_vis, etc.)
- Throughput — training pipeline accelerated from ~23K to ~58K samples/second (collector + learner optimizations, buffer and batch handling)
- BC pretrain: BC + A_head (bc_ah) matches vis-only on final A01 best time (24.47s); encoder-only BC gives fastest early first finish; freezing encoder+A_head hurts, unfreezing with lower lr/epsilon recovers
- Reward shaping: Engineered rewards (speedslide, neoslide) did not improve best time; baseline without them reached better A01 time (24.53s vs 24.94s)
- IQN without image head: Float-only IQN is viable but consistently ~0.2–0.3s slower than full IQN on A01; image head improves sample efficiency and final times
- Full IQN from BC: Documented runs (full_iqn_bc, best_ref, 4explo) and analysis in experiment docs
- IQN architecture documentation —
docs/source/experiments/models/iqn_architecture.rstwith high-level and per-block Graphviz diagrams (inputs/outputs, image head, float head, IQN quantile mixing, dueling heads); link from main_objects;sphinx.ext.graphvizenabled - Experiment docs and plots — pretrain BC (behavioral_cloning, per-action accuracy, multi-offset), reward shaping, IQN no image head, full IQN from BC; scripts for analysis and plot generation
- Misc —
data/added to.gitignore; float inputs verification, dataset and HF tooling, cleanup scripts
- Doc build — optional dependency group
doccomment: Graphviz (system) required for architecture diagrams; all architecture page text in English - Config/docs — configuration guide, troubleshooting, tmnf_replays, experiment index and cross-links updated
- Pretrain visual backbone — optional visual pretraining (encoder injection into IQN), config and scripts
- TMNF replays — replay capture and related tooling
- Config updates and
.gitignorefor cache/output
- Replay capture — scripts and support for capturing replays (e.g.
--exclude-respawn-maps),replay_has_respawnfixes - Experiments plots and game_env_backend-related updates
- Config, docs, and scripts updates
- Experiment comparison plots — scripts
generate_experiment_plots.py,plot_experiment_comparison.py,experiment_plot_utils.py; JPG graphs (one metric per graph, runs as lines) saved todocs/source/_static/, embedded in experiment RST with:alt:captions - Relative-time comparison —
analyze_experiment_by_relative_time.pywith--plot,compute_comparison_data()for tables and plots; optional--all-scalars,--metricsfor arbitrary TensorBoard tags - CI: docs build and deploy — GitHub Actions workflow builds Sphinx docs and deploys to GitHub Pages (venv python, doc extra only)
- doc_exp (Create Experiment) — rule "Embedding plots in RST (quality)": place image after metric subsection,
:alt:on every image, intro sentence in Detailed TensorBoard Metrics Analysis
- Config — migrated to YAML (
config_files/); IQN experiments (uni_19, uni_20) documented - Best-time plot Y-axis — scale from min(time) to mean(time) + 1s so initial 300s spike is off-scale and improvement is readable; robust (percentile) scaling kept for loss, Q, finish rate
- Experiment docs — exploration (uni_12 vs uni_15), temporal_mini_race_duration, training_speed, models/iqn: plots for all experiments (incl. IQN Exp 2–5), intro sentence and alt text for every figure
- Universal scalar metrics —
load_run_metrics(tags_to_load),get_available_scalar_tags(),use_all_scalars/extra_scalar_tagsincompute_comparison_data(); scalar tag → kind/unit inferred for any tag
- Experiment analysis scripts —
scripts/analyze_experiment.py,scripts/analyze_batch_experiment.py,scripts/extract_tensorboard_data.pyfor extracting and comparing TensorBoard metrics across runs - Experiments documentation —
docs/source/experiments/section (training_speed and index) with toctree in main docs
- performance_config.py — parameter names aligned with code (e.g.
plot_race_time_left_curves,update_inference_network_every_n_actions);force_window_focus_on_inputdeprecated (focus managed once per map load) - training_config.py — extended
tensorboard_suffix_schedule,oversample_long_term_steps/oversample_maximum_term_steps,min_horizon_to_update_priority_actions - docs/source/index.rst — added Experiments toctree
- Modern
pyproject.tomlwith hatchling build backend - Support for
uvpackage manager - single command installation - Comprehensive User FAQ with 30+ questions
- Comprehensive Dev FAQ for contributors
- Hot-reload configuration system documentation
- README.md - rewritten as concise English quick-start guide
- Documentation - significantly expanded and reorganized
- Installation guide with uv support
- Extended troubleshooting section
- Virtual checkpoint creation guide
- Modular configuration fully documented
- Dependencies - updated to PyTorch 2.7+, TorchRL 0.6+, CUDA 12.6
- Build system - migrated from setup.py to modern pyproject.toml
- Language - all documentation in English
setup.py- replaced by pyproject.tomlrequirements_pip.txt- merged into pyproject.tomlrequirements_conda.txt- merged into pyproject.toml- Original Linesight branding from README
- NumPy version pinned to 1.26.4 (2.0 breaks numba)
- Documentation reflects actual modular config structure
- Installation instructions updated for modern tools
| Aspect | Original Linesight | This Fork |
|---|---|---|
| Build System | setup.py + requirements.txt | pyproject.toml + uv |
| Installation | Multi-step conda/pip | uv sync |
| Dependencies | requirements files | pyproject.toml |
| Documentation | Basic | Comprehensive FAQ + guides |
| Language | Mixed | English |
| Focus | Research project | Personal experimentation |
This is a fork of the original Linesight project: