Skip to content

feat: add LeRobot imitation learning pipelines for OSMO and Azure ML#165

Merged
akzaidi merged 11 commits intomainfrom
feat/lerobot-il
Feb 12, 2026
Merged

feat: add LeRobot imitation learning pipelines for OSMO and Azure ML#165
akzaidi merged 11 commits intomainfrom
feat/lerobot-il

Conversation

@akzaidi
Copy link
Contributor

@akzaidi akzaidi commented Feb 12, 2026

PR Soundtrack: Gorillaz - White Flag

Summary

Adds end-to-end LeRobot imitation learning support across OSMO and Azure ML, covering training (multiple data sources), checkpoint management, and edge inference.

What Changed

Training Pipelines

  • OSMO workflows for ACT policy training from three data sources: HuggingFace Hub, Azure Blob Storage, and OSMO-managed datasets (lerobot-train.yaml, lerobot-train-dataset.yaml)
  • Azure ML workflow (lerobot-train.yaml) with MLflow experiment tracking, system metrics, and model registration
  • Consolidated workflow using base64-encoded zip payload pattern (matching IsaacLab workflow convention) instead of fragile git clone
  • Submission scripts (submit-osmo-lerobot-training.sh, submit-azureml-lerobot-training.sh) with full CLI for dataset, policy, and compute configuration
  • Pipeline script (run-lerobot-pipeline.sh) for train → evaluate → register flow

Training Modules (src/training/scripts/lerobot/)

Module Purpose
train.py Core training loop with MLflow logging and configurable hyperparameters
checkpoints.py Checkpoint upload and model registration via Azure ML SDK
download_dataset.py Dataset acquisition from HuggingFace Hub or Azure Blob Storage
bootstrap.py Environment setup, dependency installation, payload extraction

Inference

  • PolicyRunner — framework-agnostic wrapper for ACT policy inference with normalization stats
  • robot_types.py — observation and command data classes with UR10E joint mapping
  • act_inference_node.py — ROS2 node with dry-run safety gate and JointTrajectory publishing
  • OSMO inference workflow (lerobot-infer.yaml) and submission script
  • Offline test script (test-lerobot-inference.py) for validating policy output shape and normalization

Checkpoint Management

  • Replaced mlflow.register_model with MLClient.models.create_or_update to avoid azureml_artifacts_builder tracking URI bug
  • Shared _get_aml_client and _register_model_via_aml helpers for consistent registration

Documentation

  • docs/lerobot-inference.md — inference setup with AML and HuggingFace model pull instructions
  • Updated scripts/README.md, workflows/README.md, workflows/osmo/README.md, and workflows/azureml/README.md with LeRobot usage

Files Changed

 23 files changed, 4520 insertions(+), 79 deletions(-)

New files (19): Training modules, inference modules, workflows, submission scripts, docs
Modified files (4): READMEs, VS Code settings

Key Design Decisions

  • MLflow-only logging — removed WANDB support in favor of MLflow with system metrics for consistency with the existing Azure ML integration
  • Payload packaging — training source (src/training/) is base64-zip encoded into the workflow YAML, avoiding container image rebuilds for training logic changes
  • Azure ML SDK for registration — direct MLClient usage instead of MLflow's register_model to work around tracking URI limitations in OSMO environments
  • Dry-run gate on inference — ROS2 node requires explicit opt-in before publishing joint commands to physical hardware

- add OSMO workflows for ACT training (HF Hub, Azure Blob, OSMO dataset sources) and inference
- add Azure ML workflow and submission script for LeRobot training with MLflow integration
- add end-to-end pipeline script for train → evaluate → register flow
- add offline inference test script with pre/post processor normalization
- update scripts and workflows README docs with LeRobot usage

🤖 - Generated by Copilot
- add robot observation and command data classes with UR10E joint mapping
- add framework-agnostic PolicyRunner wrapping ACT policy with normalization
- add ROS2 inference node with dry-run safety gate and JointTrajectory publishing
- add inference documentation with AML and HuggingFace model pull instructions

🤖 - Generated by Copilot
- add --from-blob, --storage-account, --blob-prefix to submit script
- add MLflow metric logging and checkpoint upload to azure-data workflow
- decouple blob dataset container from log storage container env var
- remove ad-hoc submit-lerobot-training.sh script

🚀 - Generated by Copilot
- remove azure.ai.ml and azure.identity dependencies from training workflows
- use mlflow.log_artifacts and mlflow.register_model for checkpoint registration
- pass active MLflow run to upload functions instead of standalone ML client
- update registration step to use MLflow tracking URI and experiment context
- remove unused threading and datetime imports

🔄 - Generated by Copilot
… packaging

- merge lerobot-train-azure-data.yaml into lerobot-train.yaml with conditional blob handling
- replace fragile git clone with base64-encoded zip payload pattern matching IsaacLab workflow
- remove WANDB support in favor of MLflow-only logging with system metrics
- delegate training logic to Python modules via packaged src/training payload

🔧 - Generated by Copilot
…er_model

- replace mlflow.register_model with MLClient.models.create_or_update to avoid azureml_artifacts_builder tracking_uri bug
- extract shared _get_aml_client and _register_model_via_aml helpers
- simplify upload_checkpoints_to_azure_ml to use shared helper

🐛 - Generated by Copilot
- accept main's markdownlint-cli2 config and chat location settings
- keep [json] editor settings from feature branch
- adopt main's table formatting style in workflows README
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@akzaidi akzaidi marked this pull request as ready for review February 12, 2026 01:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end LeRobot imitation learning support across OSMO and Azure ML by introducing new workflow templates, submission/pipeline scripts, and Python modules for training, dataset acquisition, checkpoint registration, and ACT policy inference.

Changes:

  • Added OSMO workflows for LeRobot training (inline payload + dataset mount) and evaluation/optional AML model registration.
  • Added Azure ML command-job template + submission script for LeRobot training with environment registration.
  • Introduced new Python modules for LeRobot training orchestration, blob dataset download/prep, checkpoint upload/registration, and ACT inference (including a ROS2 node), plus updated docs/READMEs and VS Code settings.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
workflows/osmo/lerobot-train.yaml OSMO LeRobot training workflow using inline base64 payload and MLflow logging.
workflows/osmo/lerobot-train-dataset.yaml OSMO LeRobot training workflow using an OSMO dataset mount (includes WANDB/MLflow toggles).
workflows/osmo/lerobot-infer.yaml OSMO evaluation workflow that downloads a policy and optionally registers it to AML.
workflows/osmo/README.md Documents new OSMO LeRobot workflows and submission commands.
workflows/azureml/lerobot-train.yaml AzureML command-job template for LeRobot training submission.
workflows/azureml/README.md Documents AzureML LeRobot training template and usage.
workflows/README.md Updates workflow directory overview + adds LeRobot examples/sections.
src/training/scripts/lerobot/train.py MLflow-wrapping training orchestrator that parses logs and uploads/registers checkpoints.
src/training/scripts/lerobot/download_dataset.py Azure Blob dataset download + dataset fixes (stats/timestamps).
src/training/scripts/lerobot/checkpoints.py Checkpoint artifact upload and AML model registration helpers.
src/training/scripts/lerobot/bootstrap.py AML MLflow bootstrap + HuggingFace authentication helpers.
src/training/scripts/lerobot/init.py Package init for LeRobot training scripts.
src/inference/scripts/act_inference_node.py ROS2 node for running ACT inference and optionally publishing joint commands.
src/inference/robot_types.py Observation/command dataclasses for ACT inference integration.
src/inference/policy_runner.py Framework-agnostic ACT policy runner that normalizes inputs and produces joint commands.
scripts/test-lerobot-inference.py Offline ACT inference validation script against dataset observations.
scripts/submit-osmo-lerobot-training.sh Submits OSMO LeRobot training workflow and packages training payload inline.
scripts/submit-osmo-lerobot-inference.sh Submits OSMO LeRobot evaluation workflow with optional AML registration.
scripts/submit-azureml-lerobot-training.sh Registers AzureML environment and submits LeRobot training job via az ml job create.
scripts/run-lerobot-pipeline.sh Orchestrates train → wait/poll → evaluate → optional registration in OSMO.
scripts/README.md Adds LeRobot scripts and pipeline usage documentation.
docs/lerobot-inference.md New documentation for offline inference + ROS2 deployment.
.vscode/settings.json Adds excludes for datasets/cache folders and tweaks JSON editor settings.

- fix f-string quoting bug and add subscription param in azureml submission script
- replace empty string defaults with 'none' sentinels in azureml job template
- add ENCODED_ARCHIVE guard and jq dependency for osmo workflows
- fix DefaultAzureCredential to use workload identity in download_dataset.py
- remove WANDB references from docs, add front matter to lerobot-inference.md

🔧 - Generated by Copilot
- switch table separators to compact |---| style matching main
- restore OSMO inference parameters table rows
- remove lerobot-train-dataset.yaml from directory tree

📝 - Generated by Copilot
- remove extra space after ## in LeRobot Inference heading (MD019)
- rename duplicate Inference Parameters heading to OSMO Inference Parameters (MD024)

📝 - Generated by Copilot
@akzaidi akzaidi merged commit baef32d into main Feb 12, 2026
8 checks passed
@akzaidi akzaidi deleted the feat/lerobot-il branch February 12, 2026 23:05
@WilliamBerryiii WilliamBerryiii added this to the v0.3.0 milestone Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants