-
Notifications
You must be signed in to change notification settings - Fork 5
integrate cosmos-transfer1 + guided generation to I4H workflows #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
690b717
add guided generation pipline
guopengf b06bf94
h5 batch inference support
guopengf 59ddcb9
improve format
guopengf 723be4b
prepare test case
guopengf b53778e
lint
guopengf dc66ae8
update test case
guopengf 0941b69
format
guopengf 1494470
update test cases
guopengf f5800f4
add readme
guopengf 6d50d78
Merge branch 'main' into pengfeig/cosmos-transfer1-integrate
guopengf 1d85d0c
add readme figure
guopengf c980304
fix link
guopengf c757b45
update dependency install and fix issues
guopengf 220535b
remve change for install_deps.py
guopengf 0ebe16f
Merge branch 'main' into pengfeig/cosmos-transfer1-integrate
guopengf 979e0ab
Merge branch 'main' into pengfeig/cosmos-transfer1-integrate
KumoLiu 389d52c
fix minnor comments
guopengf c181a9d
Update workflows/robotic_ultrasound/tests/test_simulation/test_integr…
guopengf 3c299e7
Update workflows/robotic_ultrasound/tests/test_simulation/test_integr…
guopengf 6976388
Merge branch 'main' into pengfeig/cosmos-transfer1-integrate
mingxin-zheng 164aa5d
Merge branch 'main' into pengfeig/cosmos-transfer1-integrate
mingxin-zheng 67e1e0b
fix
KumoLiu 01128e3
increase timeout for cosmos-transfer test
KumoLiu 6efb7c3
Merge remote-tracking branch 'origin/main' into pengfeig/cosmos-trans…
KumoLiu 3bfc8e8
skip if not enough gpu
KumoLiu ad6cb9f
increase pi0 eval timeout
KumoLiu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
...ws/robotic_ultrasound/scripts/simulation/environments/cosmos_transfer1/config/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
|
|
||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
14 changes: 14 additions & 0 deletions
14
...simulation/environments/cosmos_transfer1/config/generated_prompts_two_seperate_views.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| { | ||
| "Sterile lab, gleaming metal, focused robotic precision.": { | ||
| "top_view_prompt": "A meticulously organized laboratory environment. Gleaming stainless steel instruments rest on polished surfaces, reflecting the cool, static overhead lighting. A robotic arm, brushed aluminum and articulated with precision, methodically scans over a non-reflective, light-grey laminate tabletop. The scene is sterile and clinical, evoking a sense of advanced medical research or quality control. Cables are neatly managed, and background equipment \u2013 monitors displaying complex waveforms, and sealed containers \u2013 remain static and out of focus, suggesting a continuous, automated process. The overall atmosphere is one of quiet, focused efficiency.", | ||
| "bottom_view_prompt": "The close-up perspective that is focused directly on the clean surface of light-grey laminate table with uniform texture." | ||
| }, | ||
| "Bright office, scattered papers, automated scanning process.": { | ||
| "top_view_prompt": "A brightly lit, modern office space transitions into a laboratory setting. Papers are casually scattered across a large, non-reflective laminate tabletop, suggesting a busy workflow. A robotic arm, sleek and white, methodically scans an unseen object positioned centrally on the table. The scene is static, with consistent, even illumination. Focus is on the robotic arm's precise movements and the organized chaos of the workspace, hinting at automated data collection or analysis. The overall impression is one of efficient, clinical precision within a functional, lived-in environment.", | ||
| "bottom_view_prompt": "The close-up perspective that is focused directly on the clean surface of the laminate table with uniform texture." | ||
| }, | ||
| "Clinical white room, equipment hums, methodical search.": { | ||
| "top_view_prompt": "A sterile, clinical laboratory environment. A robotic arm, precise and deliberate in its movements, systematically scans across a non-reflective, matte grey table. The room is filled with the subtle hum of unseen machinery \u2013 diagnostic equipment, ventilation systems, and power supplies. Cables are neatly managed, running along the ceiling and walls. The overall aesthetic is minimalist and functional, emphasizing cleanliness and precision. The scene is static, with a consistent, diffused overhead lighting creating soft shadows. The focus is on the methodical nature of the robotic scan, suggesting a detailed analysis or quality control process.", | ||
| "bottom_view_prompt": "The close-up perspective that is focused directly on the clean surface of the matte grey table with uniform texture." | ||
| } | ||
| } |
14 changes: 14 additions & 0 deletions
14
..._ultrasound/scripts/simulation/environments/cosmos_transfer1/config/inference/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
|
|
||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
170 changes: 170 additions & 0 deletions
170
...lation/environments/cosmos_transfer1/config/inference/cosmos-1-diffusion-control2world.py
guopengf marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from cosmos_transfer1.checkpoints import BASE_7B_CHECKPOINT_AV_SAMPLE_PATH | ||
| from cosmos_transfer1.diffusion.config.transfer.conditioner import CTRL_HINT_KEYS_COMB | ||
| from cosmos_transfer1.diffusion.model.model_ctrl import VideoDiffusionT2VModelWithCtrl | ||
| from cosmos_transfer1.diffusion.networks.general_dit_video_conditioned import VideoExtendGeneralDIT | ||
| from cosmos_transfer1.utils.lazy_config import LazyCall as L | ||
| from cosmos_transfer1.utils.lazy_config import LazyDict | ||
| from hydra.core.config_store import ConfigStore | ||
| from simulation.environments.cosmos_transfer1.model.model_ctrl import VideoDiffusionModelWithCtrlAndGuidance | ||
|
|
||
| cs = ConfigStore.instance() | ||
|
|
||
| # Base configuration for 7B model | ||
| Base_7B_Config = LazyDict( | ||
| dict( | ||
| defaults=[ | ||
| {"override /net": "faditv2_7b"}, | ||
| {"override /conditioner": "add_fps_image_size_padding_mask"}, | ||
| {"override /tokenizer": "cosmos_diffusion_tokenizer_res720_comp8x8x8_t121_ver092624"}, | ||
| "_self_", | ||
| ], | ||
| model=dict( | ||
| latent_shape=[16, 16, 88, 160], | ||
| net=dict( | ||
| rope_h_extrapolation_ratio=1, | ||
| rope_w_extrapolation_ratio=1, | ||
| rope_t_extrapolation_ratio=2, | ||
| ), | ||
| ), | ||
| job=dict( | ||
| group="Control2World", | ||
| name="Base_7B_Config", | ||
| ), | ||
| ) | ||
| ) | ||
|
|
||
|
|
||
| def make_ctrlnet_config_7b( | ||
mingxin-zheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| hint_key: str = "control_input_seg", | ||
| num_control_blocks: int = 3, | ||
| ) -> LazyDict: | ||
| """ | ||
| Make a ControlNet config for 7B model | ||
| Args: | ||
| hint_key: The key to use for the control input. | ||
| num_control_blocks: The number of ViT blocks to use for the ControlNet. | ||
| Returns: | ||
| A LazyDict containing the control net config. | ||
| """ | ||
| hint_mask = [True] * len(CTRL_HINT_KEYS_COMB[hint_key]) | ||
|
|
||
| return LazyDict( | ||
| dict( | ||
| defaults=[ | ||
| "/experiment/Base_7B_Config", | ||
| {"override /hint_key": hint_key}, | ||
| {"override /net_ctrl": "faditv2_7b"}, | ||
| {"override /conditioner": "ctrlnet_add_fps_image_size_padding_mask"}, | ||
| ], | ||
| job=dict( | ||
| group="CTRL_7Bv1_lvg", | ||
| name=f"CTRL_7Bv1pt3_lvg_tp_121frames_{hint_key}_block{num_control_blocks}", | ||
| project="cosmos_transfer1", | ||
| ), | ||
| model=dict( | ||
| hint_mask=hint_mask, | ||
| hint_dropout_rate=0.3, | ||
| conditioner=dict(video_cond_bool=dict()), | ||
| net=L(VideoExtendGeneralDIT)( | ||
| extra_per_block_abs_pos_emb=True, | ||
| pos_emb_learnable=True, | ||
| extra_per_block_abs_pos_emb_type="learnable", | ||
| ), | ||
| net_ctrl=dict( | ||
| in_channels=17, | ||
| hint_channels=128, | ||
| num_blocks=28, | ||
| layer_mask=[True if (i >= num_control_blocks) else False for i in range(28)], | ||
| extra_per_block_abs_pos_emb=True, | ||
| pos_emb_learnable=True, | ||
| extra_per_block_abs_pos_emb_type="learnable", | ||
| ), | ||
| ), | ||
| model_obj=L(VideoDiffusionModelWithCtrlAndGuidance)(), | ||
| ) | ||
| ) | ||
|
|
||
|
|
||
| def make_ctrlnet_config_7b_t2v( | ||
| hint_key: str = "control_input_seg", | ||
| num_control_blocks: int = 3, | ||
| ) -> LazyDict: | ||
| """ | ||
| Make a ControlNet config for 7B text-to-video model | ||
| Args: | ||
| hint_key: The key to use for the control input. | ||
| num_control_blocks: The number of ViT blocks to use for the ControlNet. | ||
| Returns: | ||
| A LazyDict containing the ControlNet config. | ||
| """ | ||
| hint_mask = [True] * len(CTRL_HINT_KEYS_COMB[hint_key]) | ||
|
|
||
| return LazyDict( | ||
| dict( | ||
| defaults=[ | ||
| "/experiment/Base_7B_Config", | ||
| {"override /hint_key": hint_key}, | ||
| {"override /net_ctrl": "faditv2_7b"}, | ||
| {"override /conditioner": "ctrlnet_add_fps_image_size_padding_mask"}, | ||
| ], | ||
| job=dict( | ||
| group="CTRL_7Bv1_t2v", | ||
| name=f"CTRL_7Bv1pt3_t2v_121frames_{hint_key}_block{num_control_blocks}", | ||
| project="cosmos_ctrlnet1", | ||
| ), | ||
| model=dict( | ||
| base_load_from=dict( | ||
| load_path=f"checkpoints/{BASE_7B_CHECKPOINT_AV_SAMPLE_PATH}", | ||
| ), | ||
| hint_mask=hint_mask, | ||
| hint_dropout_rate=0.3, | ||
| net=dict( | ||
| extra_per_block_abs_pos_emb=True, | ||
| pos_emb_learnable=True, | ||
| extra_per_block_abs_pos_emb_type="learnable", | ||
| ), | ||
| net_ctrl=dict( | ||
| in_channels=16, | ||
| hint_channels=16, | ||
| num_blocks=28, | ||
| layer_mask=[True if (i >= num_control_blocks) else False for i in range(28)], | ||
| extra_per_block_abs_pos_emb=True, | ||
| pos_emb_learnable=True, | ||
| extra_per_block_abs_pos_emb_type="learnable", | ||
| ), | ||
| ), | ||
| model_obj=L(VideoDiffusionT2VModelWithCtrl)(), | ||
| ) | ||
| ) | ||
|
|
||
|
|
||
| # Register base configs | ||
| cs.store(group="experiment", package="_global_", name=Base_7B_Config["job"]["name"], node=Base_7B_Config) | ||
| # Register all control configurations | ||
| num_control_blocks = 3 | ||
| for key in CTRL_HINT_KEYS_COMB.keys(): | ||
| # Register 7B configurations | ||
| config_7b = make_ctrlnet_config_7b(hint_key=key, num_control_blocks=num_control_blocks) | ||
| cs.store(group="experiment", package="_global_", name=config_7b["job"]["name"], node=config_7b) | ||
|
|
||
| # Register t2v based control net | ||
| num_control_blocks = 3 | ||
| for key in ["control_input_hdmap", "control_input_lidar"]: | ||
| # Register 7B configurations | ||
| config_7b = make_ctrlnet_config_7b_t2v(hint_key=key, num_control_blocks=num_control_blocks) | ||
| cs.store(group="experiment", package="_global_", name=config_7b["job"]["name"], node=config_7b) | ||
15 changes: 15 additions & 0 deletions
15
...simulation/environments/cosmos_transfer1/config/inference_cosmos_transfer1_two_views.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| { | ||
| "prompt": "environments/cosmos_transfer1/config/generated_prompts_two_seperate_views.json", | ||
| "input_video_path" : "placeholder_not_needed.mp4", | ||
| "edge": { | ||
| "control_weight": 0.5 | ||
| }, | ||
| "depth": { | ||
| "control_weight": 0.75, | ||
| "input_control": "placeholder_not_needed.mp4" | ||
| }, | ||
| "seg": { | ||
| "control_weight": 0.75, | ||
| "input_control": "placeholder_not_needed.mp4" | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.