Skip to content
3 changes: 2 additions & 1 deletion holoscan_i4h/operators/no_op/no_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@

class NoOp(Operator):
"""A sink operator that takes input and discards them."""
def __init__(self, fragment, input_ports = None, *args, **kwargs):

def __init__(self, fragment, input_ports=None, *args, **kwargs):
self.input_ports = input_ports or ["input"]
super().__init__(fragment, *args, **kwargs)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def compose(self):
noop = NoOp(self, ["color"])
self.add_flow(camera, noop)


def main():
"""Parse command-line arguments and run the application."""
parser = argparse.ArgumentParser(description="Run the RealSense camera application")
Expand Down
21 changes: 14 additions & 7 deletions workflows/robotic_ultrasound/scripts/simulation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,10 @@ Please refer to the [Environment Setup - Set environment variables before runnin

#### Run the script

Please move to the current [`simulation` folder](./) and execute:
Please cd to the current [`simulation` folder](./) and execute:

```sh
cd workflows/robotic_ultrasound/scripts/simulation
python imitation_learning/pi0_policy/eval.py --enable_camera
```

Expand All @@ -68,9 +69,10 @@ Please refer to the [Environment Setup - Set environment variables before runnin

#### Run the script

Please move to the current [`simulation` folder](./) and execute:
Please cd to the current [`simulation` folder](./) and execute:

```sh
cd ../simulation
python environments/sim_with_dds.py --enable_cameras
```

Expand Down Expand Up @@ -119,9 +121,10 @@ The state machine integrates multiple control modules:

Please refer to the [Environment Setup - Set environment variables before running the scripts](../../README.md#set-environment-variables-before-running-the-scripts) to set the `PYTHONPATH`.

Then please move to the current [`simulation` folder](./) and execute:
Then please cd to the current [`simulation` folder](./) and execute:

```sh
cd ../simulation
python environments/state_machine/liver_scan_sm.py --enable_cameras
```

Expand Down Expand Up @@ -207,23 +210,26 @@ Replace `/path/to/your/hdf5_data_directory` with the actual path to the director
[Cosmos-Transfer1](https://github.com/nvidia-cosmos/cosmos-transfer1) is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
We introduce a training-free guided generation method on top of Cosmos-Transfer1 to overcome unsatisfactory results on unseen healthcare simulation assets.
Directly applying Cosmos-Transfer with various control inputs results in unsatisfactory outputs for the human phantom and robotic arm (see bottom figure). In contrast, our guided generation method preserves the appearance of the phantom and robotic arm while generating diverse backgrounds.

<img src="../../../../docs/source/cosmos_transfer_result.png" width="512" height="600" />

This training-free guided generation approach by encoding simulation videos into the latent space and applying spatial masking to guide the generation process. The trade-off between realism and faithfulness can be controlled by adjusting the number of guided denoising steps. In addition, our generation pipeline supports multi-view video generation. We first leverage the camera information to warp the generated room view to wrist view, then use it as the guidance of wrist-view generation.

#### Download Cosmos-transfer1 Checkpoints
Please install cosmos-transfer1 dependency and move to the third party `cosmos-transfer1` folder. The following command downloads the checkpoints:
The cosmos-transfer1 dependency is already installed after completing the [Installation](#installation) section. Move to the third party `cosmos-transfer1` folder and run the following command to download the checkpoints:
```sh
conda activate cosmos-transfer1
cd third_party/cosmos-transfer1
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_checkpoints.py --output_dir checkpoints/
```

#### Video Prompt Generation
We follow the idea in [lucidsim](https://github.com/lucidsim/lucidsim) to first generate batches of meta prompt that contains a very concise description of the potential scene, then instruct the LLM (e.g., [gemma-3-27b-it](https://build.nvidia.com/google/gemma-3-27b-it)) to upsample the meta prompt with detailed descriptions.
We provide example prompts in [`generated_prompts_two_seperate_views.json`](./environments/cosmos_transfer1/config/generated_prompts_two_seperate_views.json).

#### Running Cosmos-transfer1 + Guided Generation
Please move to the current [`simulation` folder](./) and execute the following command to start the generation pipeline:
Please cd back to the current [simulation folder](./) and execute the following command to start the generation pipeline:
```sh
cd ../../workflows/robotic_ultrasound/scripts/simulation/
export CHECKPOINT_DIR="path to downloaded cosmos-transfer1 checkpoints"
# Set project root path
export PROJECT_ROOT="{your path}/i4h-workflows"
Expand Down Expand Up @@ -280,9 +286,10 @@ The teleoperation interface allows direct control of the robotic arm using vario

#### Running Teleoperation

Please move to the current [`simulation` folder](./) and execute the following command to start the teleoperation:
Please cd to the current [`simulation` folder](./) and execute the following command to start the teleoperation:

```sh
cd ../simulation
python environments/teleoperation/teleop_se3_agent.py --enable_cameras
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,51 +112,50 @@ The `evaluate_trajectories.py` script generates several outputs to help you asse

### Example Evaluation Table

In our experiments, we utilized the `liver_scan_sm.py` script to collect an initial dataset of 400 raw trajectories. This dataset was then augmented using the Cosmos-transfer1 model to generate an additional 400 trajectories with diverse visual appearances (1:1 ratio with raw data), effectively creating a combined dataset for training and evaluation. The following table presents a comparison of success rates (at a 0.01m radius) for different policy models (Pi0 and GR00T-N1 variants) evaluated under various texture conditions in the simulated environment.
In our experiments, we utilized the `liver_scan_sm.py` script to collect an initial dataset of 400 raw trajectories. This dataset was then augmented using the Cosmos-transfer1 model to generate an additional 400 trajectories with diverse visual appearances (1:1 ratio with raw data), effectively creating a combined dataset for training and evaluation. The following table presents a comparison of success rates (at a 0.01m radius) for different policy models (Pi0 and GR00T-N1 variants) evaluated under various texture conditions in the simulated environment. Models with **-rel** suffix use **relative action space**, while models with **-abs** suffix use **absolute action space**. All models are trained using **full fine-tuning** (no LoRA).
Our model was tested on both the original texture and several unseen textures. To enable these additional textures for testing, uncomment the `table_texture_randomizer` setting within the [environment configuration file](../exts/robotic_us_ext/robotic_us_ext/tasks/ultrasound/approach/config/franka/franka_manager_rl_env_cfg.py).

**Evaluation Table: Success Rates (%) (@0.01m)**

| Model | Original Texture | Texture 1 (Stainless Steel) | Texture 2 (Bamboo Wood) | Texture 3 (Walnut Wood) |
|---------------------------------------|------------------|-----------------------------|-------------------------|-------------------------|
| Pi0-400 | 77.1 | 57.3 | 47.7 | 55.7 |
| Pi0-800 (w/ cosmos) | 77.0 | 71.7 | 72.4 | 70.5 |
| GR00T-N1-400 | 84.1 | 61.5 | 58.3 | 64.0 |
| GR00T-N1-800 (w/ cosmos) | 92.8 | 91.1 | 92.8 | 91.7 |
| Pi0-400-rel | 84.5 | 61.2 | 63.4 | 59.6 |
| GR00T-N1-400-rel | 84.1 | 61.5 | 58.3 | 64.0 |
| Pi0-800-rel (w/ cosmos) | 90.0 | 77.6 | 83.1 | 84.8 |
| GR00T-N1-800-rel (w/ cosmos) | 92.8 | 91.1 | 92.8 | 91.7 |
| Pi0-400-abs | 96.5 | 97.0 | 96.3 | 11.6 |
| GR00T-N1-400-abs | 99.3 | 10.6 | 19.1 | 20.4 |
| Pi0-800-abs (w/ cosmos) | 97.7 | 94.5 | 95.8 | 93.8 |
| GR00T-N1-800-abs (w/ cosmos) | 98.8 | 85.1 | 84.7 | 87.6 |

### Success Rate vs. Radius Plot
- A plot named by the `--saved_compare_name` argument (default: `comparison_success_rate_vs_radius.png`) is saved in the `data_root` directory.
- This plot shows the mean success rate (y-axis) as a function of the test radius (x-axis) for all configured prediction methods.
- It includes 95% confidence interval bands for each method.

| Original Texture | Texture 1 (Stainless Steel) | Texture 2 (Bamboo Wood) | Texture 3 (Walnut Wood) |
|------------------|-----------------------------|-------------------------|-------------------------|
| ![Original Texture](../../../../../docs/source/comparison_avg_success_rate_vs_radius_original.png) | ![Stainless Texture](../../../../../docs/source/comparison_avg_success_rate_vs_radius_metalic.png) | ![Bamboo Wood](../../../../../docs/source/comparison_avg_success_rate_vs_radius_bamboo.png) | ![Walnut Wood](../../../../../docs/source/comparison_avg_success_rate_vs_radius_walnut.png) |
**Example Success Rate vs. Radius Plots:**
![Success Rate vs Radius Example](../../../../../docs/source/comparison_avg_success_rate_vs_radius_original.png)

The plots visually represent these comparisons, where different models are typically color-coded (e.g., Green for the original Pi0 model, Red for Pi0 with Cosmos-transfer, Blue for the original GR00T-N1 model, and Yellow for GR00T-N1 with Cosmos-transfer). The x-axis represents the tolerance radius in meters, and the y-axis shows the corresponding mean success rate. The shaded areas around the lines indicate the 95% confidence intervals, providing a measure of result variability.
The example plot visually represents comparisons between different models, where each method is color-coded. The x-axis represents the tolerance radius in meters, and the y-axis shows the corresponding mean success rate. The shaded areas around the lines indicate the 95% confidence intervals, providing a measure of result variability.

### 3D Trajectory Plots
- For each episode and each prediction method, a 3D plot is generated and saved.
- The path for these plots is typically `data_root/METHOD_NAME/3d_trajectories-{episode_number}.png`.
- These plots visually compare the ground truth trajectory against the predicted trajectory.
- The title of each plot includes the episode number, method name, success rate at `radius_for_plots`, and average minimum distance.

**Example 3D Trajectory Visualizations:**
To provide a qualitative view, example 3D trajectory visualizations from a selected episode (e.g., episode 14) are presented below for each model.
**Example 3D Trajectory Visualization:**

| Pi0-400 | Pi0-800 (w/ cosmos) | GR00T-N1-400 | GR00T-N1-800 (w/ cosmos) |
|------------------|-----------------------------|-------------------------|-------------------------|
| ![Pi0-400](../../../../../docs/source/3d_trajectories-5_Texture2-Pi0-wo.png) | ![Pi0-800](../../../../../docs/source/3d_trajectories-5_Texture2-Pi0-w.png) | ![GR00T-400](../../../../../docs/source/3d_trajectories-5_Texture2-GR00T-wo.png) | ![GR00T-800](../../../../../docs/source/3d_trajectories-5_Texture2-GR00T-w.png) |
![3D Trajectory Example](../../../../../docs/source/3d_trajectories-5_Texture2-Pi0-w.png)

In these visualizations, the ground truth trajectory (derived from the 'scan' state) is depicted in black, while the colored line represents the predicted trajectory from the model.
In this visualization, the ground truth trajectory (derived from the 'scan' state) is depicted in black, while the colored line represents the predicted trajectory from the model.

### Key Observations and Conclusion

The evaluation results highlight several important findings:
The evaluation results from our experiments offer several insights into model performance. Models are trained with either relative action space (-rel suffix) or absolute action space (-abs suffix), all using full fine-tuning.

* **Impact of Cosmos-transfer:** Augmenting the training dataset with Cosmos-transfer (as seen in Pi0-800 and GR00T-N1-800 models) consistently and significantly improves the policy's success rate and robustness to diverse visual textures compared to models trained on original data alone (Pi0-400 and GR00T-N1-400). For instance, GR00T-N1-800 (w/ cosmos) maintains a success rate above 90% across all tested textures, a substantial improvement over GR00T-N1-400 which sees a performance drop on some textures.
* **Model Comparison:** The GR00T-N1 architecture generally outperforms the Pi0 architecture. The GR00T-N1-800 model, benefiting from both the advanced architecture and cosmos augmented data, demonstrates the highest overall performance and consistency according to the provided data.
* **Performance under Texture Variation:** Models trained without sufficient diverse data (e.g., Pi0-400, GR00T-N1-400) exhibit a noticeable degradation in performance when encountering textures different from the original training environment. Cosmos-transfer effectively mitigates this issue.
* **Success Rate vs. Radius Insights:** The success rate vs. radius plots are expected to further substantiate these findings. Models enhanced by Cosmos-transfer (notably GR00T-N1-800, potentially depicted by a yellow line as per the convention mentioned) would likely maintain higher success rates even at stricter (smaller) radius, indicating greater precision. Their 95% confidence intervals also provide insight into the stability of these performance gains.
* **Effect of Cosmos-transfer Data Augmentation:** In our tests, augmenting the training dataset with Cosmos-transfer appeared to enhance policy success rates and robustness to three tested unseen table textures when compared to models trained solely on the original dataset. For example, the GR00T-N1-800-rel model showed more consistent performance across tested textures. Data augmentation, while beneficial for diversity, does require additional computational resources for generating and processing the augmented samples.

These observations underscore the value of diverse, augmented datasets like those generated by Cosmos-transfer for training robust robotic policies, particularly for tasks involving visual perception in variable environments. The GR00T-N1 model, when combined with such data augmentation, shows promising results for reliable trajectory execution.
* **Reproducibility and Result Variability:** Users conducting their own evaluations might observe slightly different numerical results. This can be due to several factors, including inherent stochasticity in deep learning model training, variations in computational environments, and specific versions of software dependencies. For instance, initial explorations indicated that components like the `PaliGemma.llm` from OpenPI ([link](https://github.com/Physical-Intelligence/openpi/blob/main/src/openpi/models/pi0.py#L311)) could introduce variability. To ensure the stability and reliability of the findings presented here, the reported metrics for each model are an average of three independent evaluation runs.

These observations highlight the potential benefits of data augmentation strategies like Cosmos-transfer for developing robotic policies, especially for tasks involving visual perception in dynamic environments. The choice of model architecture, training duration , and training methodology (e.g., relative action space, whether to employ LoRA, and whether fine-tune LLM) are all important factors influencing final performance. Further investigation and testing across a wider range of scenarios are always encouraged.
Loading