isaac-for-healthcare · mingxin-zheng · May 30, 2025 · May 19, 2025 · May 19, 2025 · May 19, 2025
diff --git a/workflows/robotic_ultrasound/scripts/simulation/README.md b/workflows/robotic_ultrasound/scripts/simulation/README.md
@@ -10,6 +10,7 @@
   - [Cosmos-transfer1 Integration](#cosmos-transfer1-integration)
   - [Teleoperation](#teleoperation)
   - [Ultrasound Raytracing Simulation](#ultrasound-raytracing-simulation)
+  - [Trajectory Evaluation](#trajectory-evaluation)
 
 ## Installation
 
@@ -38,11 +39,11 @@ Currently there are these robot configurations that can be used in various tasks
 ### PI Zero Policy Evaluation
 Set up `openpi` referring to [PI0 runner](../policy_runner/README.md).
 
-### Ensure the PYTHONPATH Is Set
+#### Ensure the PYTHONPATH Is Set
 
 Please refer to the [Environment Setup - Set environment variables before running the scripts](../../README.md#set-environment-variables-before-running-the-scripts) instructions.
 
-### Run the script
+#### Run the script
 
 Please move to the current [`simulation` folder](./) and execute:
 
@@ -61,18 +62,36 @@ and the same `domain id` as this example in another terminal.
 
 When `run_policy.py` is launched and idle waiting for the data,
 
-### Ensure the PYTHONPATH Is Set
+#### Ensure the PYTHONPATH Is Set
 
 Please refer to the [Environment Setup - Set environment variables before running the scripts](../../README.md#set-environment-variables-before-running-the-scripts) instructions.
 
-### Run the script
+#### Run the script
 
 Please move to the current [`simulation` folder](./) and execute:
 
 ```sh
 python environments/sim_with_dds.py --enable_cameras
 ```
 
+#### Evaluating with Recorded Initial States and Saving Trajectories
+
+The `sim_with_dds.py` script can also be used for more controlled evaluations by resetting the environment to initial states from recorded HDF5 data. When doing so, it can save the resulting end-effector trajectories.
+
+- **`--hdf5_path /path/to/your/data.hdf5`**: Provide the path to an HDF5 file (or a directory containing HDF5 files for multiple episodes). The simulation will reset the environment to the initial state(s) found in this data for each episode.
+- **`--npz_prefix your_prefix_`**: When `--hdf5_path` is used, this argument specifies a prefix for the names of the `.npz` files where the simulated end-effector trajectories (robot observations) will be saved. Each saved file will be named like `your_prefix_robot_obs_{episode_idx}.npz` and stored in the same directory as the input HDF5 file (if `--hdf5_path` is a file) or within the `--hdf5_path` directory (if it's a directory).
+
+**Example:**
+
+```sh
+python environments/sim_with_dds.py \
+    --enable_cameras \
+    --hdf5_path /mnt/hdd/cosmos/heldout-test50/data_0.hdf5 \
+    --npz_prefix pi0-800_
+```
+
+This command will load the initial state from `data_0.hdf5`, run the simulation (presumably interacting with a policy via DDS), and save the resulting trajectory to a file like `pi0-800_robot_obs_0.npz` in the `/mnt/hdd/cosmos/heldout-test50/` directory.
+
 ### Liver Scan State Machine
 
 The Liver Scan State Machine provides a structured approach to performing ultrasound scans on a simulated liver. It implements a state-based workflow that guides the robotic arm through the scanning procedure.
@@ -398,3 +417,62 @@ To see the ultrasound probe moving, please ensure the `topic_ultrasound_info` is
 | --topic_out | Topic name to publish generated ultrasound data | topic_ultrasound_data |
 | --config | Path to custom JSON configuration file with probe parameters and simulation parameters | None |
 | --period | Period of the simulation (in seconds) | 1/30.0 (30 Hz) |
+
+### Trajectory Evaluation
+
+After running simulations and collecting predicted trajectories (e.g., using `sim_with_dds.py` with the `--hdf5_path` and `--npz_prefix` arguments, or from other policy rollouts), you can use the `evaluate_trajectories.py` script to compare these predictions against ground truth trajectories.
+
+This script is located at `environments/evaluate_trajectories.py`.
+
+#### Overview
+
+The script performs the following main functions:
+
+1.  **Loads Data**: Reads ground truth trajectories from HDF5 files and predicted trajectories from `.npz` files based on configured file patterns.
+2.  **Computes Metrics**: For each episode and each prediction source, it calculates:
+    *   **Success Rate**: The percentage of ground truth points that are within a specified radius of any point in the predicted trajectory.
+    *   **Average Minimum Distance**: The average distance from each ground truth point to its nearest neighbor in the predicted trajectory.
+3.  **Generates Plots**:
+    *   Individual 3D trajectory plots comparing the ground truth and a specific prediction for each episode.
+    *   A summary plot showing the mean success rate versus different radius, including 95% confidence intervals, comparing all configured prediction methods.
+
+#### Usage
+
+Ensure your `PYTHONPATH` is set up correctly.
+
+Navigate to the `scripts/simulation/` folder and execute:
+
+```sh
+python environments/evaluate_trajectories.py
+```
+
+#### Configuration
+
+The primary configuration for this script is done by modifying the global variables at the top of the `environments/evaluate_trajectories.py` file:
+
+- **`episode`**: The number of episodes to process.
+- **`data_root`**: The root directory where your HDF5 ground truth files (e.g., `data_{e}.hdf5`) and predicted `.npz` trajectory files are located or will be organized into subdirectories.
+- **`DEFAULT_RADIUS_FOR_PLOTS`**: The radius used for calculating the success rate reported in the titles of individual 3D trajectory plots.
+- **`saved_compare_name`**: The filename for the summary plot (success rate vs. radius).
+- **`PREDICTION_SOURCES`**: A dictionary defining the different prediction methods to evaluate. Each entry requires:
+    - `file_pattern`: A string pattern for the predicted trajectory `.npz` files (e.g., `"my_model_run1/pred_traj_{e}.npz"`). The `{e}` will be replaced with the episode number.
+    - `label`: A descriptive label for the legend in plots.
+    - `color`: A color for this method in the plots.
+
+**Example `PREDICTION_SOURCES` entry:**
+```python
+PREDICTION_SOURCES = {
+    "MyModelV1": {
+        "file_pattern": "model_v1_outputs/robot_obs_{e}.npz",
+        "label": "My Model Version 1",
+        "color": "blue"
+    },
+    "BaselineModel": {
+        "file_pattern": "baseline_outputs/robot_obs_{e}.npz",
+        "label": "Baseline",
+        "color": "orange"
+    }
+}
+```
+
+The script expects predicted trajectory files to be found at `data_root/file_pattern`.
diff --git a/workflows/robotic_ultrasound/scripts/simulation/environments/sim_with_dds.py b/workflows/robotic_ultrasound/scripts/simulation/environments/sim_with_dds.py
@@ -16,6 +16,7 @@
 import argparse
 import collections
 import os
+from pathlib import Path
 
 import gymnasium as gym
 import numpy as np
@@ -36,6 +37,8 @@
     get_joint_states,
     get_probe_pos_ori,
     get_robot_obs,
+    reset_scene_to_initial_state,
+    validate_hdf5_path,
 )
 
 # add argparse arguments
@@ -105,6 +108,18 @@
     "--scale", type=float, default=1000.0, help="Scale factor to convert from omniverse to organ coordinate system."
 )
 parser.add_argument("--chunk_length", type=int, default=50, help="Length of the action chunk inferred by the policy.")
+parser.add_argument(
+    "--hdf5_path",
+    type=str,
+    default=None,
+    help="Path to single .hdf5 file or directory containing recorded data for environment reset.",
+)
+parser.add_argument(
+    "--npz_prefix",
+    type=str,
+    default="",
+    help="prefix to save the end-effector trajectory data during evaluation, only used when hdf5_path is provided.",
+)
 
 # append AppLauncher cli argruments
 AppLauncher.add_app_launcher_args(parser)
@@ -228,14 +243,26 @@ def main():
             raise ValueError("RTI license file must be an existing absolute path.")
         os.environ["RTI_LICENSE_FILE"] = args_cli.rti_license_file
 
-    # Recommended to use 40 steps to allow enough steps to reset the SETUP position of the robot
-    reset_steps = 40
     max_timesteps = 250
+    if args_cli.hdf5_path is not None:
+        reset_to_recorded_data = True
+        episode_idx = 0
+        torso_obs_key = "observations/torso_obs"
+        joint_state_key = "abs_joint_pos"
+        joint_vel_key = "observations/joint_vel"
+        if "Rel" in args_cli.task:
+            action_key = "action"
+        else:
+            action_key = "abs_action"
+    else:
+        reset_to_recorded_data = False
+        # Recommended to use 40 steps to allow enough steps to reset the SETUP position of the robot
+        reset_steps = 40
 
-    # allow environment to settle
-    for _ in range(reset_steps):
-        reset_tensor = get_reset_action(env)
-        obs, rew, terminated, truncated, info_ = env.step(reset_tensor)
+        # allow environment to settle
+        for _ in range(reset_steps):
+            reset_tensor = get_reset_action(env)
+            obs, rew, terminated, truncated, info_ = env.step(reset_tensor)
 
     infer_r_cam_writer = RoomCamPublisher(topic=args_cli.topic_in_room_camera, domain_id=args_cli.infer_domain_id)
     infer_w_cam_writer = WristCamPublisher(topic=args_cli.topic_in_wrist_camera, domain_id=args_cli.infer_domain_id)
@@ -256,68 +283,127 @@ def main():
     # Number of steps played before replanning
     replan_steps = 5
 
+    if reset_to_recorded_data:
+        total_episodes = validate_hdf5_path(args_cli.hdf5_path)
+        print(f"total_episodes: {total_episodes}")
+    else:
+        total_episodes = 1
+
     # simulate environment
     while simulation_app.is_running():
         global reset_flag
         with torch.inference_mode():
-            action_plan = collections.deque()
-
-            for t in range(max_timesteps):
-                # get and publish the current images and joint positions
-                rgb_images, depth_images = capture_camera_images(
-                    env, ["room_camera", "wrist_camera"], device=env.unwrapped.device
-                )
-                pub_data["room_cam"], pub_data["room_cam_depth"] = (
-                    rgb_images[0, 0, ...].cpu().numpy(),
-                    depth_images[0, 0, ...].cpu().numpy(),
-                )
-                pub_data["wrist_cam"], pub_data["wrist_cam_depth"] = (
-                    rgb_images[0, 1, ...].cpu().numpy(),
-                    depth_images[0, 1, ...].cpu().numpy(),
-                )
-                pub_data["joint_pos"] = get_joint_states(env)[0]
-                # Get the pose of the mesh objects (mesh)
-                # The mesh objects are aligned with the organ (organ) in the US image view (us)
-                # The US is attached to the end-effector (ee), so we have the following computation logics:
-                # Each frame-to-frame transformation is available in the scene
-                # mesh -> organ -> ee -> us
-                quat_mesh_to_us, pos_mesh_to_us = compute_transform_sequence(env, ["mesh", "organ", "ee", "us"])
-                pub_data["probe_pos"], pub_data["probe_ori"] = get_probe_pos_ori(
-                    quat_mesh_to_us, pos_mesh_to_us, scale=args_cli.scale, log=args_cli.log_probe_pos
-                )
-                viz_r_cam_writer.write()
-                viz_w_cam_writer.write()
-                viz_r_cam_depth_writer.write()
-                viz_w_cam_depth_writer.write()
-                viz_pos_writer.write()
-                viz_probe_pos_writer.write()
-                if not action_plan:
-                    # publish the images and joint positions when run policy inference
-                    infer_r_cam_writer.write()
-                    infer_w_cam_writer.write()
-                    infer_pos_writer.write()
-
-                    ret = None
-                    while ret is None:
-                        ret = infer_reader.read_data()
-                    o: FrankaCtrlInput = ret
-                    action_chunk = np.array(o.joint_positions, dtype=np.float32).reshape(args_cli.chunk_length, 6)
-                    action_plan.extend(action_chunk[:replan_steps])
-
-                action = action_plan.popleft()
-
-                action = action.astype(np.float32)
-
-                # convert to torch
-                action = torch.tensor(action, device=env.unwrapped.device).repeat(env.unwrapped.num_envs, 1)
-
-                # step the environment
-                obs, rew, terminated, truncated, info_ = env.step(action)
-
-            env.reset()
-            for _ in range(reset_steps):
-                reset_tensor = get_reset_action(env)
-                obs, rew, terminated, truncated, info_ = env.step(reset_tensor)
+            for episode_idx in range(total_episodes):
+                print(f"\nepisode_idx: {episode_idx}")
+                if reset_to_recorded_data:
+                    actions = reset_scene_to_initial_state(
+                        env,
+                        args_cli.hdf5_path,
+                        episode_idx,
+                        action_key,
+                        torso_obs_key,
+                        joint_state_key,
+                        joint_vel_key,
+                    )
+                    first_action = torch.tensor(actions[1], device=args_cli.device)
+                    first_action = first_action.unsqueeze(0)
+                    env.step(first_action)
+                    print(f"Reset to recorded data: {get_robot_obs(env)}")
+                action_plan = collections.deque()
+                robot_obs = []
+                for t in range(max_timesteps):
+                    # get and publish the current images and joint positions
+                    rgb_images, depth_images = capture_camera_images(
+                        env, ["room_camera", "wrist_camera"], device=env.unwrapped.device
+                    )
+                    pub_data["room_cam"], pub_data["room_cam_depth"] = (
+                        rgb_images[0, 0, ...].cpu().numpy(),
+                        depth_images[0, 0, ...].cpu().numpy(),
+                    )
+                    pub_data["wrist_cam"], pub_data["wrist_cam_depth"] = (
+                        rgb_images[0, 1, ...].cpu().numpy(),
+                        depth_images[0, 1, ...].cpu().numpy(),
+                    )
+                    pub_data["joint_pos"] = get_joint_states(env)[0]
+                    robot_obs.append(get_robot_obs(env))
+
+                    # Get the pose of the mesh objects (mesh)
+                    # The mesh objects are aligned with the organ (organ) in the US image view (us)
+                    # The US is attached to the end-effector (ee), so we have the following computation logics:
+                    # Each frame-to-frame transformation is available in the scene
+                    # mesh -> organ -> ee -> us
+                    quat_mesh_to_us, pos_mesh_to_us = compute_transform_sequence(env, ["mesh", "organ", "ee", "us"])
+                    pub_data["probe_pos"], pub_data["probe_ori"] = get_probe_pos_ori(
+                        quat_mesh_to_us, pos_mesh_to_us, scale=args_cli.scale, log=args_cli.log_probe_pos
+                    )
+                    viz_r_cam_writer.write()
+                    viz_w_cam_writer.write()
+                    viz_r_cam_depth_writer.write()
+                    viz_w_cam_depth_writer.write()
+                    viz_pos_writer.write()
+                    viz_probe_pos_writer.write()
+                    if not action_plan:
+                        # publish the images and joint positions when run policy inference
+                        infer_r_cam_writer.write()
+                        infer_w_cam_writer.write()
+                        infer_pos_writer.write()
+
+                        ret = None
+                        while ret is None:
+                            ret = infer_reader.read_data()
+                        o: FrankaCtrlInput = ret
+                        action_chunk = np.array(o.joint_positions, dtype=np.float32).reshape(args_cli.chunk_length, 6)
+                        action_plan.extend(action_chunk[:replan_steps])
+
+                    action = action_plan.popleft()
+
+                    action = action.astype(np.float32)
+
+                    # convert to torch
+                    action = torch.tensor(action, device=env.unwrapped.device).repeat(env.unwrapped.num_envs, 1)
+
+                    # step the environment
+                    obs, rew, terminated, truncated, info_ = env.step(action)
+
+                env.reset()
+                if reset_to_recorded_data:
+                    robot_obs = torch.stack(robot_obs, dim=0)
+                    print(
+                        f"robot_obs shape: {robot_obs.shape}, saved to {args_cli.hdf5_path}/robot_obs_{episode_idx}.npz"
+                    )
+                    if args_cli.hdf5_path.endswith(".hdf5"):
+                        save_path = os.path.join(
+                            Path(args_cli.hdf5_path).parent, f"{args_cli.npz_prefix}_robot_obs_{episode_idx}.npz"
+                        )
+                    else:
+                        save_path = os.path.join(
+                            args_cli.hdf5_path, f"{args_cli.npz_prefix}_robot_obs_{episode_idx}.npz"
+                        )
+                    np.savez(save_path, robot_obs=robot_obs.cpu().numpy())
+
+                    if episode_idx + 1 >= total_episodes:
+                        print(f"Completed all episodes ({total_episodes})")
+                        break
+
+                    actions = reset_scene_to_initial_state(
+                        env,
+                        args_cli.hdf5_path,
+                        episode_idx + 1,
+                        action_key,
+                        torso_obs_key,
+                        joint_state_key,
+                        joint_vel_key,
+                    )
+                    first_action = torch.tensor(actions[1], device=args_cli.device)
+                    first_action = first_action.unsqueeze(0)
+                    obs, rew, terminated, truncated, info_ = env.step(first_action)
+                else:
+                    for _ in range(reset_steps):
+                        reset_tensor = get_reset_action(env)
+                        obs, rew, terminated, truncated, info_ = env.step(reset_tensor)
+            if episode_idx >= total_episodes - 1 or actions is None:
+                print(f"Reached the end of available episodes ({episode_idx + 1}/{total_episodes})")
+                break
 
     infer_reader.stop()
     # close the simulator