Skip to content

Latest commit

 

History

History
197 lines (152 loc) · 13.5 KB

File metadata and controls

197 lines (152 loc) · 13.5 KB

Changelog

All notable changes to this project are documented here.

Format based on Keep a Changelog.

[1.8.0] - 2026-04-08

DPO / GRPO training

  • Added training.algorithm values dpo and grpo with the same actor-critic wiring as PPO (ppo_wiring), shared rollout contract, and weights1.torch checkpoints.
  • New modules: trackmania_rl/agents/policy_optimization/dpo.py, grpo.py; learners learner_dpo.py, learner_grpo.py; shared policy_rollout_batch.py for PPO/DPO/GRPO tensor builds.
  • Config: DPOConfig / GRPOConfig on RulkaConfig, flat ConfigView access, example YAML config_files/rl/config_dpo.yaml and config_grpo.yaml.
  • Utilities: POLICY_OPTIMIZATION_ALGORITHMS / is_policy_optimization_algorithm for collector, train.py, HF save skip, BC inject into save/ for the whole policy family.
  • Helper script scripts/dpo_append_offline_pair.py for offline DPO JSONL lines.
  • Docs: README, config_files/README.md, Sphinx configuration_guide / project_structure updated for algorithms and new sections.

[1.7.0] - 2026-03-31

RL architecture and training pipeline

  • Added algorithm wiring abstractions (trackmania_rl/agents/algorithms/) to decouple training loop orchestration from IQN-specific implementation details.
  • Introduced PPO stack: PPO wiring, learner path, policy optimization modules, and reusable policy model builders under trackmania_rl/agents/policy_models/ and trackmania_rl/agents/policy_optimization/.
  • Added new RL configs for PPO and transformer/CNN variants (config_files/rl/config_ppo*.yaml, config_files/rl/config_btr_post_concat_cnn_transformer.yaml) and extended schema/loader support for NN topology selection.

Pretrain, model export, and compatibility

  • Expanded BC pretrain and policy-loading paths (trackmania_rl/pretrain/train_bc.py, trackmania_rl/pretrain/rl_policy_factory.py) to support RL policy initialization and PPO-compatible checkpoints.
  • Added/updated preprocessing, dataset, model, and export logic for multi-modal inputs and topology-aware construction.
  • Added trackmania_rl/pretrain/lightning_compat.py to stabilize Lightning integration boundaries.

Runtime and utility updates

  • Updated collector/learner process integration and training entrypoints (scripts/train.py, trackmania_rl/multiprocess/*) to support algorithm pluggability and new PPO flow.
  • Added parameter-freeze helpers and soft-copy controls (trackmania_rl/param_freeze.py) for safer transfer/fine-tuning workflows.
  • Added vectorized reward utilities (trackmania_rl/reward_vectorized.py) and utility refactors supporting new training modes.

Validation and documentation

  • Added focused tests for PPO math/wiring, config migrations, NN schema, BC-to-RL policy loading, and parameter freeze behavior.
  • Expanded architecture and experiment docs with PPO baseline write-up (docs/source/experiments/ppo_vs_iqn_baseline.rst), PPO architecture page, and NN topology catalog.
  • Refreshed configuration/troubleshooting/installation and experiment pages to document new training options and expected workflows.

[1.6.1] - 2026-03-25

Analysis

  • analyze_experiment_by_relative_time.py: --time-axis auto (default) uses cumul_training_hours when all runs log it, else wall minutes; explicit cumul_training_hours / wall_minutes still available.
  • audit_tensorboard_training_timeline.py: compares merged TB wall span vs cumul_training_hours per run (detect misleading wall-axis docs).
  • generate_experiment_plots.py: default --time-axis auto tries cumulative training hours and falls back to wall minutes if cumul_training_hours is missing from any run; regenerated docs/source/_static/exp_*.jpg accordingly.
  • Experiment docs: time-axis methodology; new docs/source/experiments/time_axis_conventions.rst audit table (wall vs cumul_training_hours); multi_action_prediction.rst recomputed training-hour checkpoints for v2 vs v3.1_bc; pretrain_bc.rst run lengths corrected for full_iqn_bc / _2 / _3 and collapse narrative clarified (wall vs active training).

BTR experiment + docs

  • Added a dedicated BTR A01 comparison write-up: docs/source/experiments/models/btr.rst.
  • Generated and embedded BTR comparison plots in docs/source/_static/ (prefixed exp_btr_A01_v2_v4*).
  • Added BTR config used for the experiment runs: config_files/rl/config_btr.yaml.
  • Reorganized model documentation paths so architectures live under docs/source/models/ (moved/added models/iqn_architecture.rst and models/btr_architecture.rst).

Runtime reliability

  • Added optional CPU pinning for TmForever.exe via config (new fields in config_files/config_schema.py and config_files/rl/config_default.yaml).
  • Hardened prioritized replay: when TorchRL prioritized trees are unavailable on Windows builds, the code falls back to uniform RandomSampler.

[1.6.0] - 2026-03-20

Training and Analysis

  • Multi-action prediction pipeline finalized for RL + BC: multi-offset heads, decision-block training path, and related config/schema updates.
  • Experiment validation tooling hardened to avoid partial conclusions:
    • TensorBoard run continuation chunks are now treated as one run (run, run_2, run_3, ...).
    • Analysis checks were expanded and documented so relative-time and by-step comparisons are consistently reproduced.
  • Documentation command/rules for experiment writeups were updated with stricter verification steps (suffix-merge checks, save-state cross-checks, direct pairwise checks for top runs).

Documentation and Plots

  • Added/updated experiment pages for:
    • Global schedule speed (A01_as20_long_v2 series),
    • Multi-action prediction (A01_as20_long_v3/v3.1/v3.1_pretrained_bc),
    • BC multi-offset pretrain comparisons.
  • Regenerated and committed comparison plots (relative-time and by-step) for updated experiment analyses.

Experimental Outcomes (technical takeaways)

  • What worked
    • global_schedule_speed = 4 was the strongest setting in the checked long A01 runs, with best final saved time in the v2 series.
    • Multi-action RL + BC pretrain improved stability and peak time over non-pretrained multi-action runs.
    • Temporal/action alignment between BC pretrain and multi-action RL (5-action offset structure) improved transfer quality.
  • What did not work / underperformed
    • Some v2 variants (v2.2, v2.3) did not converge to competitive A01 best times.
    • Multi-action without BC pretrain underperformed the strongest single-action baseline (v2) on final best A01.

[1.5.0] - 2026-03-10

Training

  • Pretrain BC — behavioral cloning pretraining (trackmania_rl.pretrain, scripts/pretrain_bc.py, configs in config_files/pretrain/bc/): single-frame and multi-offset action prediction, optional encoder + action head injection into IQN; pretrain_visual renamed to pretrain
  • Full IQN from BC — option to load full IQN (encoder + float head + A_head + V_head) from BC checkpoint (pretrain_bc_heads_path) for RL warm start
  • Config and schema updates for pretrain (BC, vis), RL defaults, and experiment variants (v3, v4 multi-offset, only_vis, etc.)

Performance

  • Throughput — training pipeline accelerated from ~23K to ~58K samples/second (collector + learner optimizations, buffer and batch handling)

Experiments (summary)

  • BC pretrain: BC + A_head (bc_ah) matches vis-only on final A01 best time (24.47s); encoder-only BC gives fastest early first finish; freezing encoder+A_head hurts, unfreezing with lower lr/epsilon recovers
  • Reward shaping: Engineered rewards (speedslide, neoslide) did not improve best time; baseline without them reached better A01 time (24.53s vs 24.94s)
  • IQN without image head: Float-only IQN is viable but consistently ~0.2–0.3s slower than full IQN on A01; image head improves sample efficiency and final times
  • Full IQN from BC: Documented runs (full_iqn_bc, best_ref, 4explo) and analysis in experiment docs

Added

  • IQN architecture documentationdocs/source/experiments/models/iqn_architecture.rst with high-level and per-block Graphviz diagrams (inputs/outputs, image head, float head, IQN quantile mixing, dueling heads); link from main_objects; sphinx.ext.graphviz enabled
  • Experiment docs and plots — pretrain BC (behavioral_cloning, per-action accuracy, multi-offset), reward shaping, IQN no image head, full IQN from BC; scripts for analysis and plot generation
  • Miscdata/ added to .gitignore; float inputs verification, dataset and HF tooling, cleanup scripts

Changed

  • Doc build — optional dependency group doc comment: Graphviz (system) required for architecture diagrams; all architecture page text in English
  • Config/docs — configuration guide, troubleshooting, tmnf_replays, experiment index and cross-links updated

[1.4.0] - 2026-02-18

Added

  • Pretrain visual backbone — optional visual pretraining (encoder injection into IQN), config and scripts
  • TMNF replays — replay capture and related tooling
  • Config updates and .gitignore for cache/output

[1.3.0] - 2026-02-16

Added

  • Replay capture — scripts and support for capturing replays (e.g. --exclude-respawn-maps), replay_has_respawn fixes
  • Experiments plots and game_env_backend-related updates

Changed

  • Config, docs, and scripts updates

[1.2.0] - 2026-02-01

Added

  • Experiment comparison plots — scripts generate_experiment_plots.py, plot_experiment_comparison.py, experiment_plot_utils.py; JPG graphs (one metric per graph, runs as lines) saved to docs/source/_static/, embedded in experiment RST with :alt: captions
  • Relative-time comparisonanalyze_experiment_by_relative_time.py with --plot, compute_comparison_data() for tables and plots; optional --all-scalars, --metrics for arbitrary TensorBoard tags
  • CI: docs build and deploy — GitHub Actions workflow builds Sphinx docs and deploys to GitHub Pages (venv python, doc extra only)
  • doc_exp (Create Experiment) — rule "Embedding plots in RST (quality)": place image after metric subsection, :alt: on every image, intro sentence in Detailed TensorBoard Metrics Analysis

Changed

  • Config — migrated to YAML (config_files/); IQN experiments (uni_19, uni_20) documented
  • Best-time plot Y-axis — scale from min(time) to mean(time) + 1s so initial 300s spike is off-scale and improvement is readable; robust (percentile) scaling kept for loss, Q, finish rate
  • Experiment docs — exploration (uni_12 vs uni_15), temporal_mini_race_duration, training_speed, models/iqn: plots for all experiments (incl. IQN Exp 2–5), intro sentence and alt text for every figure
  • Universal scalar metricsload_run_metrics(tags_to_load), get_available_scalar_tags(), use_all_scalars / extra_scalar_tags in compute_comparison_data(); scalar tag → kind/unit inferred for any tag

[1.1.0] - 2026-01-27

Added

  • Experiment analysis scriptsscripts/analyze_experiment.py, scripts/analyze_batch_experiment.py, scripts/extract_tensorboard_data.py for extracting and comparing TensorBoard metrics across runs
  • Experiments documentationdocs/source/experiments/ section (training_speed and index) with toctree in main docs

Changed

  • performance_config.py — parameter names aligned with code (e.g. plot_race_time_left_curves, update_inference_network_every_n_actions); force_window_focus_on_input deprecated (focus managed once per map load)
  • training_config.py — extended tensorboard_suffix_schedule, oversample_long_term_steps / oversample_maximum_term_steps, min_horizon_to_update_priority_actions
  • docs/source/index.rst — added Experiments toctree

[1.0.0] - 2026-01-25

Added

  • Modern pyproject.toml with hatchling build backend
  • Support for uv package manager - single command installation
  • Comprehensive User FAQ with 30+ questions
  • Comprehensive Dev FAQ for contributors
  • Hot-reload configuration system documentation

Changed

  • README.md - rewritten as concise English quick-start guide
  • Documentation - significantly expanded and reorganized
    • Installation guide with uv support
    • Extended troubleshooting section
    • Virtual checkpoint creation guide
    • Modular configuration fully documented
  • Dependencies - updated to PyTorch 2.7+, TorchRL 0.6+, CUDA 12.6
  • Build system - migrated from setup.py to modern pyproject.toml
  • Language - all documentation in English

Removed

  • setup.py - replaced by pyproject.toml
  • requirements_pip.txt - merged into pyproject.toml
  • requirements_conda.txt - merged into pyproject.toml
  • Original Linesight branding from README

Fixed

  • NumPy version pinned to 1.26.4 (2.0 breaks numba)
  • Documentation reflects actual modular config structure
  • Installation instructions updated for modern tools

Comparison with Original Linesight

Aspect Original Linesight This Fork
Build System setup.py + requirements.txt pyproject.toml + uv
Installation Multi-step conda/pip uv sync
Dependencies requirements files pyproject.toml
Documentation Basic Comprehensive FAQ + guides
Language Mixed English
Focus Research project Personal experimentation

Original Linesight

This is a fork of the original Linesight project: