feat(scripts): Add TOML linting workflow#26
Conversation
…ackend selection * add RslRl3xCompatWrapper for TensorDict observation format in RSL-RL 3.x * add --backend/-b flag to select training backend (skrl, rsl_rl) * upgrade rsl-rl-lib to 3.1.2 with tensordict dependency * pass training_backend parameter through OSMO workflow template 🤖 - Generated by Copilot
There was a problem hiding this comment.
Pull request overview
This pull request introduces support for selecting between skrl and rsl_rl training backends in the OSMO workflow and upgrades RSL-RL to version 3.1.2 with TensorDict observation format compatibility. The changes enable flexible backend selection while maintaining backward compatibility through a new compatibility wrapper that handles the TensorDict requirements of RSL-RL 3.x.
Key changes:
- Added
--backendCLI flag andTRAINING_BACKENDenvironment variable to select betweenskrl(default) andrsl_rlbackends - Upgraded rsl-rl-lib from 3.0.1 to 3.1.2 and added tensordict dependency for RSL-RL 3.x compatibility
- Implemented
RslRl3xCompatWrapperto ensure observations conform to TensorDict format required by RSL-RL 3.x
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/osmo/train.yaml | Added training_backend parameter to workflow template with default value "skrl" |
| src/training/scripts/rsl_rl/train.py | Upgraded to RSL-RL 3.x with TensorDict import, RslRlOnPolicyRunnerCfg, and new RslRl3xCompatWrapper class for observation format compatibility |
| src/training/scripts/launch_rsl_rl.py | Added --checkpoint-mode and --register-checkpoint CLI arguments for improved checkpoint handling |
| src/training/requirements.txt | Upgraded rsl-rl-lib to 3.1.2 and added tensordict>=0.7.0 dependency |
| scripts/submit-osmo-training.sh | Added --backend/-b flag for training backend selection with environment variable support |
|
|
||
| # Convert config to dict and ensure RSL-RL 3.x required fields are present | ||
| agent_cfg_dict = agent_cfg.to_dict() | ||
| # RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names |
There was a problem hiding this comment.
The comment states "RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names" but the default value sets it to {"policy": ["policy"]}, which maps "policy" to a list containing "policy". The comment could be clearer about what this structure represents - specifically that the outer key is the policy name and the inner list contains the observation group names to use for that policy.
| # RSL-RL 3.x requires 'obs_groups' with a 'policy' key mapping to observation group names | |
| # RSL-RL 3.x requires 'obs_groups' to be a dict mapping each policy name (outer key) | |
| # to a list of observation group names (inner list) to use for that policy. |
| -b|--backend) | ||
| BACKEND_VALUE="$2" | ||
| shift 2 | ||
| ;; |
There was a problem hiding this comment.
The --backend argument accepts any arbitrary string value without validation. Based on the documentation (line 38), only "skrl" and "rsl_rl" are valid backends. Consider adding validation after the argument parsing to ensure BACKEND_VALUE is one of the supported values, similar to how checkpoint_mode is validated via normalize_checkpoint_mode().
| rsl-rl-lib==3.0.1 | ||
| rsl-rl-lib==3.1.2 | ||
| skrl>=1.4.3 | ||
| tensordict>=0.7.0 # Required by RSL-RL 3.x for TensorDict observations |
There was a problem hiding this comment.
The tensordict dependency uses a minimum version constraint (>=0.7.0) while most other dependencies in this file use exact version pins (==). For consistency and reproducibility, consider using an exact version pin like tensordict==0.7.0 or tensordict==0.8.0, matching the pinning pattern used for other dependencies such as rsl-rl-lib, azure-core, and pyperclip.
| tensordict>=0.7.0 # Required by RSL-RL 3.x for TensorDict observations | |
| tensordict==0.7.0 # Required by RSL-RL 3.x for TensorDict observations |
| return TensorDict(obs, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, torch.Tensor): | ||
| # Single tensor - wrap as 'policy' observation group | ||
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, tuple): | ||
| # Tuple (obs_tensor, extras) from older wrappers | ||
| obs_data = obs[0] | ||
| if isinstance(obs_data, dict): | ||
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | ||
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) |
There was a problem hiding this comment.
The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,). This same issue appears in lines 212, 215, 220, and 221.
| return TensorDict(obs, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | |
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) | |
| return TensorDict(obs, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=(self._env.num_envs,)) | |
| return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,)) |
| return TensorDict(obs, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, torch.Tensor): | ||
| # Single tensor - wrap as 'policy' observation group | ||
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, tuple): | ||
| # Tuple (obs_tensor, extras) from older wrappers | ||
| obs_data = obs[0] | ||
| if isinstance(obs_data, dict): | ||
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | ||
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) |
There was a problem hiding this comment.
The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,).
| return TensorDict(obs, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | |
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) | |
| return TensorDict(obs, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=(self._env.num_envs,)) | |
| return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,)) |
| return TensorDict(obs, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, torch.Tensor): | ||
| # Single tensor - wrap as 'policy' observation group | ||
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | ||
| if isinstance(obs, tuple): | ||
| # Tuple (obs_tensor, extras) from older wrappers | ||
| obs_data = obs[0] | ||
| if isinstance(obs_data, dict): | ||
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | ||
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) |
There was a problem hiding this comment.
The TensorDict initialization uses batch_size=[self._env.num_envs] but TensorDict expects batch_size to be a torch.Size or tuple representing the batch dimensions, not a list. While lists are implicitly converted, it would be more explicit and correct to use batch_size=torch.Size([self._env.num_envs]) or batch_size=(self._env.num_envs,).
| return TensorDict(obs, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=[self._env.num_envs]) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=[self._env.num_envs]) | |
| return TensorDict({"policy": obs_data}, batch_size=[self._env.num_envs]) | |
| return TensorDict(obs, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, torch.Tensor): | |
| # Single tensor - wrap as 'policy' observation group | |
| return TensorDict({"policy": obs}, batch_size=(self._env.num_envs,)) | |
| if isinstance(obs, tuple): | |
| # Tuple (obs_tensor, extras) from older wrappers | |
| obs_data = obs[0] | |
| if isinstance(obs_data, dict): | |
| return TensorDict(obs_data, batch_size=(self._env.num_envs,)) | |
| return TensorDict({"policy": obs_data}, batch_size=(self._env.num_envs,)) |
| parser.add_argument( | ||
| "--checkpoint-mode", | ||
| type=_optional_str, | ||
| default="from-scratch", | ||
| help="Checkpoint handling mode (fresh is treated as from-scratch)", | ||
| ) | ||
| parser.add_argument( | ||
| "--register-checkpoint", | ||
| type=_optional_str, | ||
| default=None, | ||
| help="Register the final checkpoint as this Azure ML model name", | ||
| ) |
There was a problem hiding this comment.
The newly added arguments --checkpoint-mode and --register-checkpoint are parsed but never used in the code. The parsed values (args.checkpoint_mode and args.register_checkpoint) are not referenced anywhere in the function implementations or passed to the training script. Either these arguments should be utilized in the workflow or they should be removed if they're not needed yet.
| parser.add_argument( | |
| "--checkpoint-mode", | |
| type=_optional_str, | |
| default="from-scratch", | |
| help="Checkpoint handling mode (fresh is treated as from-scratch)", | |
| ) | |
| parser.add_argument( | |
| "--register-checkpoint", | |
| type=_optional_str, | |
| default=None, | |
| help="Register the final checkpoint as this Azure ML model name", | |
| ) |
| # Proxy all attributes to the underlying env | ||
| for attr in dir(env): | ||
| if not attr.startswith("_") and attr not in ("get_observations", "step", "reset"): | ||
| try: | ||
| setattr(self, attr, getattr(env, attr)) | ||
| except AttributeError: | ||
| pass |
There was a problem hiding this comment.
The attribute proxying logic in the init method is redundant and potentially problematic. The loop copies attributes to self, but getattr already provides dynamic attribute access to self._env. This dual approach can lead to stale attributes if the wrapped environment's attributes change after initialization. Consider removing the loop (lines 196-202) and relying solely on getattr for attribute delegation, which is the standard pattern for wrapper classes.
| # Proxy all attributes to the underlying env | |
| for attr in dir(env): | |
| if not attr.startswith("_") and attr not in ("get_observations", "step", "reset"): | |
| try: | |
| setattr(self, attr, getattr(env, attr)) | |
| except AttributeError: | |
| pass |
RSL_RL Backend for OSMO
PR Soundtrack: Little Brother - Altitudes
Summary
This pull request introduces support for selecting the training backend (between
skrlandrsl_rl) for the OSMO training workflow scripts, and updates the RSL-RL integration to be compatible with RSL-RL 3.x, including proper handling of the new TensorDict observation format. It also updates dependencies and improves environment variable and argument handling for consistency.Training backend selection:
--backend/-bCLI argument andTRAINING_BACKENDenvironment variable toscripts/submit-osmo-training.sh, allowing users to select the training backend (skrlorrsl_rl). This value is passed through the workflow and made available in training jobs. [1] [2] [3] [4] [5] [6] [7]RSL-RL 3.x compatibility:
tensordictas a required dependency.RslRl3xCompatWrapperintrain.pyto ensure all environment observations conform to the TensorDict format required by RSL-RL 3.x, supporting backward compatibility with older wrappers. [1] [2]obs_groupsandclass_name) are present and correct for RSL-RL 3.x. [1] [2] [3]Argument and environment improvements:
--checkpoint-modeand--register-checkpointarguments inlaunch_rsl_rl.pyfor improved checkpoint handling.🤖 - Generated by Copilot