Changelog

All notable changes to this project are documented here.

Format based on Keep a Changelog.

[1.8.0] - 2026-04-08

DPO / GRPO training

Added training.algorithm values dpo and grpo with the same actor-critic wiring as PPO (ppo_wiring), shared rollout contract, and weights1.torch checkpoints.
New modules: trackmania_rl/agents/policy_optimization/dpo.py, grpo.py; learners learner_dpo.py, learner_grpo.py; shared policy_rollout_batch.py for PPO/DPO/GRPO tensor builds.
Config: DPOConfig / GRPOConfig on RulkaConfig, flat ConfigView access, example YAML config_files/rl/config_dpo.yaml and config_grpo.yaml.
Utilities: POLICY_OPTIMIZATION_ALGORITHMS / is_policy_optimization_algorithm for collector, train.py, HF save skip, BC inject into save/ for the whole policy family.
Helper script scripts/dpo_append_offline_pair.py for offline DPO JSONL lines.
Docs: README, config_files/README.md, Sphinx configuration_guide / project_structure updated for algorithms and new sections.

[1.7.0] - 2026-03-31

RL architecture and training pipeline

Added algorithm wiring abstractions (trackmania_rl/agents/algorithms/) to decouple training loop orchestration from IQN-specific implementation details.
Introduced PPO stack: PPO wiring, learner path, policy optimization modules, and reusable policy model builders under trackmania_rl/agents/policy_models/ and trackmania_rl/agents/policy_optimization/.
Added new RL configs for PPO and transformer/CNN variants (config_files/rl/config_ppo*.yaml, config_files/rl/config_btr_post_concat_cnn_transformer.yaml) and extended schema/loader support for NN topology selection.

Pretrain, model export, and compatibility

Expanded BC pretrain and policy-loading paths (trackmania_rl/pretrain/train_bc.py, trackmania_rl/pretrain/rl_policy_factory.py) to support RL policy initialization and PPO-compatible checkpoints.
Added/updated preprocessing, dataset, model, and export logic for multi-modal inputs and topology-aware construction.
Added trackmania_rl/pretrain/lightning_compat.py to stabilize Lightning integration boundaries.

Runtime and utility updates

Updated collector/learner process integration and training entrypoints (scripts/train.py, trackmania_rl/multiprocess/*) to support algorithm pluggability and new PPO flow.
Added parameter-freeze helpers and soft-copy controls (trackmania_rl/param_freeze.py) for safer transfer/fine-tuning workflows.
Added vectorized reward utilities (trackmania_rl/reward_vectorized.py) and utility refactors supporting new training modes.

Validation and documentation

Added focused tests for PPO math/wiring, config migrations, NN schema, BC-to-RL policy loading, and parameter freeze behavior.
Expanded architecture and experiment docs with PPO baseline write-up (docs/source/experiments/ppo_vs_iqn_baseline.rst), PPO architecture page, and NN topology catalog.
Refreshed configuration/troubleshooting/installation and experiment pages to document new training options and expected workflows.

[1.6.1] - 2026-03-25

Analysis

analyze_experiment_by_relative_time.py: --time-axis auto (default) uses cumul_training_hours when all runs log it, else wall minutes; explicit cumul_training_hours / wall_minutes still available.
audit_tensorboard_training_timeline.py: compares merged TB wall span vs cumul_training_hours per run (detect misleading wall-axis docs).
generate_experiment_plots.py: default --time-axis auto tries cumulative training hours and falls back to wall minutes if cumul_training_hours is missing from any run; regenerated docs/source/_static/exp_*.jpg accordingly.
Experiment docs: time-axis methodology; new docs/source/experiments/time_axis_conventions.rst audit table (wall vs cumul_training_hours); multi_action_prediction.rst recomputed training-hour checkpoints for v2 vs v3.1_bc; pretrain_bc.rst run lengths corrected for full_iqn_bc / _2 / _3 and collapse narrative clarified (wall vs active training).

BTR experiment + docs

Added a dedicated BTR A01 comparison write-up: docs/source/experiments/models/btr.rst.
Generated and embedded BTR comparison plots in docs/source/_static/ (prefixed exp_btr_A01_v2_v4*).
Added BTR config used for the experiment runs: config_files/rl/config_btr.yaml.
Reorganized model documentation paths so architectures live under docs/source/models/ (moved/added models/iqn_architecture.rst and models/btr_architecture.rst).

Runtime reliability

Added optional CPU pinning for TmForever.exe via config (new fields in config_files/config_schema.py and config_files/rl/config_default.yaml).
Hardened prioritized replay: when TorchRL prioritized trees are unavailable on Windows builds, the code falls back to uniform RandomSampler.

[1.6.0] - 2026-03-20

Training and Analysis

Multi-action prediction pipeline finalized for RL + BC: multi-offset heads, decision-block training path, and related config/schema updates.
Experiment validation tooling hardened to avoid partial conclusions:
- TensorBoard run continuation chunks are now treated as one run (run, run_2, run_3, ...).
- Analysis checks were expanded and documented so relative-time and by-step comparisons are consistently reproduced.
Documentation command/rules for experiment writeups were updated with stricter verification steps (suffix-merge checks, save-state cross-checks, direct pairwise checks for top runs).

Documentation and Plots

Added/updated experiment pages for:
- Global schedule speed (A01_as20_long_v2 series),
- Multi-action prediction (A01_as20_long_v3/v3.1/v3.1_pretrained_bc),
- BC multi-offset pretrain comparisons.
Regenerated and committed comparison plots (relative-time and by-step) for updated experiment analyses.

Experimental Outcomes (technical takeaways)

What worked
- global_schedule_speed = 4 was the strongest setting in the checked long A01 runs, with best final saved time in the v2 series.
- Multi-action RL + BC pretrain improved stability and peak time over non-pretrained multi-action runs.
- Temporal/action alignment between BC pretrain and multi-action RL (5-action offset structure) improved transfer quality.
What did not work / underperformed
- Some v2 variants (v2.2, v2.3) did not converge to competitive A01 best times.
- Multi-action without BC pretrain underperformed the strongest single-action baseline (v2) on final best A01.

[1.5.0] - 2026-03-10

Training

Pretrain BC — behavioral cloning pretraining (trackmania_rl.pretrain, scripts/pretrain_bc.py, configs in config_files/pretrain/bc/): single-frame and multi-offset action prediction, optional encoder + action head injection into IQN; pretrain_visual renamed to pretrain
Full IQN from BC — option to load full IQN (encoder + float head + A_head + V_head) from BC checkpoint (pretrain_bc_heads_path) for RL warm start
Config and schema updates for pretrain (BC, vis), RL defaults, and experiment variants (v3, v4 multi-offset, only_vis, etc.)

Performance

Throughput — training pipeline accelerated from ~23K to ~58K samples/second (collector + learner optimizations, buffer and batch handling)

Experiments (summary)

BC pretrain: BC + A_head (bc_ah) matches vis-only on final A01 best time (24.47s); encoder-only BC gives fastest early first finish; freezing encoder+A_head hurts, unfreezing with lower lr/epsilon recovers
Reward shaping: Engineered rewards (speedslide, neoslide) did not improve best time; baseline without them reached better A01 time (24.53s vs 24.94s)
IQN without image head: Float-only IQN is viable but consistently ~0.2–0.3s slower than full IQN on A01; image head improves sample efficiency and final times
Full IQN from BC: Documented runs (full_iqn_bc, best_ref, 4explo) and analysis in experiment docs

Added

IQN architecture documentation — docs/source/experiments/models/iqn_architecture.rst with high-level and per-block Graphviz diagrams (inputs/outputs, image head, float head, IQN quantile mixing, dueling heads); link from main_objects; sphinx.ext.graphviz enabled
Experiment docs and plots — pretrain BC (behavioral_cloning, per-action accuracy, multi-offset), reward shaping, IQN no image head, full IQN from BC; scripts for analysis and plot generation
Misc — data/ added to .gitignore; float inputs verification, dataset and HF tooling, cleanup scripts

Changed

Doc build — optional dependency group doc comment: Graphviz (system) required for architecture diagrams; all architecture page text in English
Config/docs — configuration guide, troubleshooting, tmnf_replays, experiment index and cross-links updated

[1.4.0] - 2026-02-18

Added

Pretrain visual backbone — optional visual pretraining (encoder injection into IQN), config and scripts
TMNF replays — replay capture and related tooling
Config updates and .gitignore for cache/output

[1.3.0] - 2026-02-16

Added

Replay capture — scripts and support for capturing replays (e.g. --exclude-respawn-maps), replay_has_respawn fixes
Experiments plots and game_env_backend-related updates

Changed

Config, docs, and scripts updates

[1.2.0] - 2026-02-01

Added

Experiment comparison plots — scripts generate_experiment_plots.py, plot_experiment_comparison.py, experiment_plot_utils.py; JPG graphs (one metric per graph, runs as lines) saved to docs/source/_static/, embedded in experiment RST with :alt: captions
Relative-time comparison — analyze_experiment_by_relative_time.py with --plot, compute_comparison_data() for tables and plots; optional --all-scalars, --metrics for arbitrary TensorBoard tags
CI: docs build and deploy — GitHub Actions workflow builds Sphinx docs and deploys to GitHub Pages (venv python, doc extra only)
doc_exp (Create Experiment) — rule "Embedding plots in RST (quality)": place image after metric subsection, :alt: on every image, intro sentence in Detailed TensorBoard Metrics Analysis

Changed

Config — migrated to YAML (config_files/); IQN experiments (uni_19, uni_20) documented
Best-time plot Y-axis — scale from min(time) to mean(time) + 1s so initial 300s spike is off-scale and improvement is readable; robust (percentile) scaling kept for loss, Q, finish rate
Experiment docs — exploration (uni_12 vs uni_15), temporal_mini_race_duration, training_speed, models/iqn: plots for all experiments (incl. IQN Exp 2–5), intro sentence and alt text for every figure
Universal scalar metrics — load_run_metrics(tags_to_load), get_available_scalar_tags(), use_all_scalars / extra_scalar_tags in compute_comparison_data(); scalar tag → kind/unit inferred for any tag

[1.1.0] - 2026-01-27

Added

Experiment analysis scripts — scripts/analyze_experiment.py, scripts/analyze_batch_experiment.py, scripts/extract_tensorboard_data.py for extracting and comparing TensorBoard metrics across runs
Experiments documentation — docs/source/experiments/ section (training_speed and index) with toctree in main docs

Changed

performance_config.py — parameter names aligned with code (e.g. plot_race_time_left_curves, update_inference_network_every_n_actions); force_window_focus_on_input deprecated (focus managed once per map load)
training_config.py — extended tensorboard_suffix_schedule, oversample_long_term_steps / oversample_maximum_term_steps, min_horizon_to_update_priority_actions
docs/source/index.rst — added Experiments toctree

[1.0.0] - 2026-01-25

Added

Modern pyproject.toml with hatchling build backend
Support for uv package manager - single command installation
Comprehensive User FAQ with 30+ questions
Comprehensive Dev FAQ for contributors
Hot-reload configuration system documentation

Changed

README.md - rewritten as concise English quick-start guide
Documentation - significantly expanded and reorganized
- Installation guide with uv support
- Extended troubleshooting section
- Virtual checkpoint creation guide
- Modular configuration fully documented
Dependencies - updated to PyTorch 2.7+, TorchRL 0.6+, CUDA 12.6
Build system - migrated from setup.py to modern pyproject.toml
Language - all documentation in English

Removed

setup.py - replaced by pyproject.toml
requirements_pip.txt - merged into pyproject.toml
requirements_conda.txt - merged into pyproject.toml
Original Linesight branding from README

Fixed

NumPy version pinned to 1.26.4 (2.0 breaks numba)
Documentation reflects actual modular config structure
Installation instructions updated for modern tools

Comparison with Original Linesight

Aspect	Original Linesight	This Fork
Build System	setup.py + requirements.txt	pyproject.toml + uv
Installation	Multi-step conda/pip	`uv sync`
Dependencies	requirements files	pyproject.toml
Documentation	Basic	Comprehensive FAQ + guides
Language	Mixed	English
Focus	Research project	Personal experimentation

Original Linesight

This is a fork of the original Linesight project:

Repository: https://github.com/pb4git/linesight

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[1.8.0] - 2026-04-08

DPO / GRPO training

[1.7.0] - 2026-03-31

RL architecture and training pipeline

Pretrain, model export, and compatibility

Runtime and utility updates

Validation and documentation

[1.6.1] - 2026-03-25

Analysis

BTR experiment + docs

Runtime reliability

[1.6.0] - 2026-03-20

Training and Analysis

Documentation and Plots

Experimental Outcomes (technical takeaways)

[1.5.0] - 2026-03-10

Training

Performance

Experiments (summary)

Added

Changed

[1.4.0] - 2026-02-18

Added

[1.3.0] - 2026-02-16

Added

Changed

[1.2.0] - 2026-02-01

Added

Changed

[1.1.0] - 2026-01-27

Added

Changed

[1.0.0] - 2026-01-25

Added

Changed

Removed

Fixed

Comparison with Original Linesight

Original Linesight