feat(scripts): Add TOML linting workflow by akzaidi · Pull Request #26 · Azure-Samples/azure-nvidia-robotics-reference-architecture

akzaidi · 2025-12-11T00:10:40Z

RSL_RL Backend for OSMO

PR Soundtrack: Little Brother - Altitudes

Summary

This pull request introduces support for selecting the training backend (between skrl and rsl_rl) for the OSMO training workflow scripts, and updates the RSL-RL integration to be compatible with RSL-RL 3.x, including proper handling of the new TensorDict observation format. It also updates dependencies and improves environment variable and argument handling for consistency.

Training backend selection:

Added a new --backend/-b CLI argument and TRAINING_BACKEND environment variable to scripts/submit-osmo-training.sh, allowing users to select the training backend (skrl or rsl_rl). This value is passed through the workflow and made available in training jobs. [1] [2] [3] [4] [5] [6] [7]

RSL-RL 3.x compatibility:

Updated the RSL-RL dependency to version 3.1.2 and added tensordict as a required dependency.
Introduced a RslRl3xCompatWrapper in train.py to ensure all environment observations conform to the TensorDict format required by RSL-RL 3.x, supporting backward compatibility with older wrappers. [1] [2]
Updated the runner configuration and instantiation logic to ensure required fields (like obs_groups and class_name) are present and correct for RSL-RL 3.x. [1] [2] [3]

Argument and environment improvements:

Added support for --checkpoint-mode and --register-checkpoint arguments in launch_rsl_rl.py for improved checkpoint handling.
add RslRl3xCompatWrapper for TensorDict observation format in RSL-RL 3.x
add --backend/-b flag to select training backend (skrl, rsl_rl)
upgrade rsl-rl-lib to 3.1.2 with tensordict dependency
pass training_backend parameter through OSMO workflow template

🤖 - Generated by Copilot

…ackend selection * add RslRl3xCompatWrapper for TensorDict observation format in RSL-RL 3.x * add --backend/-b flag to select training backend (skrl, rsl_rl) * upgrade rsl-rl-lib to 3.1.2 with tensordict dependency * pass training_backend parameter through OSMO workflow template 🤖 - Generated by Copilot

Copilot

Pull request overview

This pull request introduces support for selecting between skrl and rsl_rl training backends in the OSMO workflow and upgrades RSL-RL to version 3.1.2 with TensorDict observation format compatibility. The changes enable flexible backend selection while maintaining backward compatibility through a new compatibility wrapper that handles the TensorDict requirements of RSL-RL 3.x.

Key changes:

Added --backend CLI flag and TRAINING_BACKEND environment variable to select between skrl (default) and rsl_rl backends
Upgraded rsl-rl-lib from 3.0.1 to 3.1.2 and added tensordict dependency for RSL-RL 3.x compatibility
Implemented RslRl3xCompatWrapper to ensure observations conform to TensorDict format required by RSL-RL 3.x

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
workflows/osmo/train.yaml	Added training_backend parameter to workflow template with default value "skrl"
src/training/scripts/rsl_rl/train.py	Upgraded to RSL-RL 3.x with TensorDict import, RslRlOnPolicyRunnerCfg, and new RslRl3xCompatWrapper class for observation format compatibility
src/training/scripts/launch_rsl_rl.py	Added --checkpoint-mode and --register-checkpoint CLI arguments for improved checkpoint handling
src/training/requirements.txt	Upgraded rsl-rl-lib to 3.1.2 and added tensordict>=0.7.0 dependency
scripts/submit-osmo-training.sh	Added --backend/-b flag for training backend selection with environment variable support

Copilot · 2025-12-11T00:19:52Z

src/training/scripts/rsl_rl/train.py

+
+    # Convert config to dict and ensure RSL-RL 3.x required fields are present
+    agent_cfg_dict = agent_cfg.to_dict()
+    # RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names


The comment states "RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names" but the default value sets it to {"policy": ["policy"]}, which maps "policy" to a list containing "policy". The comment could be clearer about what this structure represents - specifically that the outer key is the policy name and the inner list contains the observation group names to use for that policy.

Suggested change

# RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names

# RSL-RL 3.x requires 'obs_groups' to be a dict mapping each policy name (outer key)

# to a list of observation group names (inner list) to use for that policy.

Copilot · 2025-12-11T00:19:53Z

scripts/submit-osmo-training.sh

+    -b|--backend)
+      BACKEND_VALUE="$2"
+      shift 2
+      ;;


The --backend argument accepts any arbitrary string value without validation. Based on the documentation (line 38), only "skrl" and "rsl_rl" are valid backends. Consider adding validation after the argument parsing to ensure BACKEND_VALUE is one of the supported values, similar to how checkpoint_mode is validated via normalize_checkpoint_mode().

Copilot · 2025-12-11T00:19:53Z

src/training/requirements.txt

-rsl-rl-lib==3.0.1
+rsl-rl-lib==3.1.2
 skrl>=1.4.3
+tensordict>=0.7.0  # Required by RSL-RL 3.x for TensorDict observations


The tensordict dependency uses a minimum version constraint (>=0.7.0) while most other dependencies in this file use exact version pins (==). For consistency and reproducibility, consider using an exact version pin like tensordict==0.7.0 or tensordict==0.8.0, matching the pinning pattern used for other dependencies such as rsl-rl-lib, azure-core, and pyperclip.

Suggested change

tensordict>=0.7.0 # Required by RSL-RL 3.x for TensorDict observations

tensordict==0.7.0 # Required by RSL-RL 3.x for TensorDict observations

Copilot · 2025-12-11T00:19:53Z

src/training/scripts/rsl_rl/train.py

+            return TensorDict(obs, batch_size=[self._env.num_envs])
+        if isinstance(obs, torch.Tensor):
+            # Single tensor - wrap as 'policy' observation group
+            return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])
+        if isinstance(obs, tuple):
+            # Tuple (obs_tensor, extras) from older wrappers
+            obs_data = obs[0]
+            if isinstance(obs_data, dict):
+                return TensorDict(obs_data, batch_size=[self._env.num_envs])
+            return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])


The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,). This same issue appears in lines 212, 215, 220, and 221.

Suggested change

return TensorDict(obs, batch_size=[self._env.num_envs])

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=[self._env.num_envs])

return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])

return TensorDict(obs, batch_size=(self._env.num_envs,))

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,))

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=(self._env.num_envs,))

return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,))

Copilot · 2025-12-11T00:19:54Z

src/training/scripts/rsl_rl/train.py

+            return TensorDict(obs, batch_size=[self._env.num_envs])
+        if isinstance(obs, torch.Tensor):
+            # Single tensor - wrap as 'policy' observation group
+            return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])
+        if isinstance(obs, tuple):
+            # Tuple (obs_tensor, extras) from older wrappers
+            obs_data = obs[0]
+            if isinstance(obs_data, dict):
+                return TensorDict(obs_data, batch_size=[self._env.num_envs])
+            return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])


The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,).

Suggested change

return TensorDict(obs, batch_size=[self._env.num_envs])

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=[self._env.num_envs])

return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])

return TensorDict(obs, batch_size=(self._env.num_envs,))

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,))

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=(self._env.num_envs,))

return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,))

Copilot · 2025-12-11T00:19:54Z

src/training/scripts/rsl_rl/train.py

+            return TensorDict(obs, batch_size=[self._env.num_envs])
+        if isinstance(obs, torch.Tensor):
+            # Single tensor - wrap as 'policy' observation group
+            return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])
+        if isinstance(obs, tuple):
+            # Tuple (obs_tensor, extras) from older wrappers
+            obs_data = obs[0]
+            if isinstance(obs_data, dict):
+                return TensorDict(obs_data, batch_size=[self._env.num_envs])
+            return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])


The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,).

Suggested change

return TensorDict(obs, batch_size=[self._env.num_envs])

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=[self._env.num_envs])

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=[self._env.num_envs])

return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs])

return TensorDict(obs, batch_size=(self._env.num_envs,))

if isinstance(obs, torch.Tensor):

# Single tensor - wrap as 'policy' observation group

return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,))

if isinstance(obs, tuple):

# Tuple (obs_tensor, extras) from older wrappers

obs_data = obs[0]

if isinstance(obs_data, dict):

return TensorDict(obs_data, batch_size=(self._env.num_envs,))

return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,))

Copilot · 2025-12-11T00:19:54Z

src/training/scripts/launch_rsl_rl.py

+    parser.add_argument(
+        "--checkpoint-mode",
+        type=_optional_str,
+        default="from-scratch",
+        help="Checkpoint handling mode (fresh is treated as from-scratch)",
+    )
+    parser.add_argument(
+        "--register-checkpoint",
+        type=_optional_str,
+        default=None,
+        help="Register the final checkpoint as this Azure ML model name",
+    )


The newly added arguments --checkpoint-mode and --register-checkpoint are parsed but never used in the code. The parsed values (args.checkpoint_mode and args.register_checkpoint) are not referenced anywhere in the function implementations or passed to the training script. Either these arguments should be utilized in the workflow or they should be removed if they're not needed yet.

Suggested change

parser.add_argument(

"--checkpoint-mode",

type=_optional_str,

default="from-scratch",

help="Checkpoint handling mode (fresh is treated as from-scratch)",

)

parser.add_argument(

"--register-checkpoint",

type=_optional_str,

default=None,

help="Register the final checkpoint as this Azure ML model name",

)

Copilot · 2025-12-11T00:19:55Z

src/training/scripts/rsl_rl/train.py

+        # Proxy all attributes to the underlying env
+        for attr in dir(env):
+            if not attr.startswith("_") and attr not in ("get_observations", "step", "reset"):
+                try:
+                    setattr(self, attr, getattr(env, attr))
+                except AttributeError:
+                    pass


The attribute proxying logic in the init method is redundant and potentially problematic. The loop copies attributes to self, but getattr already provides dynamic attribute access to self._env. This dual approach can lead to stale attributes if the wrapped environment's attributes change after initialization. Consider removing the loop (lines 196-202) and relying solely on getattr for attribute delegation, which is the standard pattern for wrapper classes.

Suggested change

# Proxy all attributes to the underlying env

for attr in dir(env):

if not attr.startswith("_") and attr not in ("get_observations", "step", "reset"):

try:

setattr(self, attr, getattr(env, attr))

except AttributeError:

pass

Copilot AI review requested due to automatic review settings December 11, 2025 00:10

Copilot started reviewing on behalf of akzaidi December 11, 2025 00:11 View session

agreaves-ms approved these changes Dec 11, 2025

View reviewed changes

agreaves-ms merged commit 4986caa into main Dec 11, 2025
5 checks passed

agreaves-ms deleted the feat/rsl_train_osmo branch December 11, 2025 00:11

Copilot AI reviewed Dec 11, 2025

View reviewed changes

WilliamBerryiii changed the title ~~Adds RSL_RL training backed for OSMO workflows~~ feat(scripts): Add TOML linting workflow Feb 5, 2026

This was referenced Feb 7, 2026

chore(main): release 1.0.0 #140

Closed

chore(main): release 0.1.0 #143

Merged

chore(main): release 0.2.0 #144

Closed

chore(main): release 0.3.0 #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scripts): Add TOML linting workflow#26

feat(scripts): Add TOML linting workflow#26
agreaves-ms merged 1 commit intomainfrom
feat/rsl_train_osmo

akzaidi commented Dec 11, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names
	# RSL-RL 3.x requires 'obs_groups' to be a dict mapping each policy name (outer key)
	# to a list of observation group names (inner list) to use for that policy.

	tensordict>=0.7.0 # Required by RSL-RL 3.x for TensorDict observations
	tensordict==0.7.0 # Required by RSL-RL 3.x for TensorDict observations

Conversation

akzaidi commented Dec 11, 2025

RSL_RL Backend for OSMO

Summary

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants