Cosmos Transfer 2.5 + FiftyOne - Integration - OSS by paularamo · Pull Request #6564 · voxel51/fiftyone

paularamo · 2025-11-11T20:26:50Z

What changes are proposed in this pull request?

[(This PR adds a new tutorial notebook:
cosmos-transfer-integration.ipynb

The tutorial demonstrates how to integrate NVIDIA Cosmos-Transfer 2.5 with FiftyOne. It provides a complete, reproducible pipeline for generating and visualizing multimodal data (RGB, edge, segmentation, and synthetic video outputs).

Key additions include:

End-to-end workflow for connecting Cosmos-Transfer with FiftyOne
Steps for dataset setup, image-to-video conversion, and grouping multimodal slices
Python-only batch inference pipeline (no external shell scripts)
Integration with FiftyOne Brain to compute embeddings and similarity indices using CLIP
Visualization of generated data and embeddings directly in the FiftyOne App

How is this patch tested? If it is not, please explain why.

The notebook was executed end-to-end using a subset of the BioTrove dataset with Cosmos-Transfer 2.5.
Verified dataset grouping, edge map generation, JSON spec creation, inference execution, and FiftyOne visualization..
The pipeline was tested using bash and Python code inside the bash scripts.

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release
notes for FiftyOne users.

(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)

What areas of FiftyOne does this PR affect?

App: FiftyOne application changes
Build: Build and test infrastructure changes
Core: Core fiftyone Python library changes
Documentation: FiftyOne documentation changes
Other

Summary by CodeRabbit

Documentation
- Added a comprehensive tutorial notebook demonstrating an end-to-end FiftyOne and Cosmos-Transfer integration: setup and prerequisites, loading and grouping datasets, converting image sequences to short videos, generating edge-control videos, running external model inference, extracting result frames, adding visualization slices, and computing embeddings for similarity-driven multimodal exploration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-11T20:27:19Z

Walkthrough

Adds a new Jupyter notebook and index entries documenting an end-to-end integration between Cosmos-Transfer 2.5 and FiftyOne: dataset loading and grouping, image-sequence → 1s MP4 conversion, Canny edge control-map generation, JSON spec creation, subprocess inference, output video handling (last-frame extraction), and CLIP-based embeddings and visualization in FiftyOne.

Changes

Cohort / File(s)	Summary
Cosmos-Transfer Integration Tutorial `docs/source/tutorials/cosmos-transfer-integration.ipynb`	New notebook implementing: FiftyOne dataset loading (BioTrove subset) and grouped samples (image, video, slices); image-sequence → 1s MP4 conversion; Canny edge video generation; JSON spec file creation for Cosmos-Transfer; subprocess-based inference invocation; output video handling and last-frame extraction; adding slices (`edge`, `output`, `output_last`) to FiftyOne; CLIP embedding computation via FiftyOne Brain, similarity indexing, and interactive visualization in the FiftyOne App.
Tutorials index `docs/source/tutorials/index.rst`	Added a tutorial card and toctree entry for "Integrating Cosmos-Transfer 2.5 with FiftyOne"; minor whitespace adjustments around the new card and toctree entries.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant FO as FiftyOne
    participant Pre as Preprocessor
    participant Spec as SpecWriter
    participant CT as Cosmos-Transfer (subprocess)
    participant Out as OutputProcessor
    participant Brain as FiftyOne Brain
    participant App as FiftyOne App

    User->>FO: load BioTrove subset
    FO->>FO: build grouped samples (image, sequence, slices)
    User->>Pre: run preprocessing
    Pre->>Pre: convert image sequences → 1s MP4
    Pre->>Pre: generate Canny edge videos (control maps)
    Pre->>Spec: write JSON spec files for inference
    Spec->>CT: invoke Cosmos-Transfer via subprocess
    CT->>Out: produce output videos
    Out->>Out: extract last frames (output_last)
    Out->>FO: add slices (edge, output, output_last) to samples
    User->>Brain: compute CLIP embeddings on output_last
    Brain->>FO: attach embeddings & indexes
    FO->>App: launch visualization
    App->>User: interactive exploration

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect notebook cells that generate JSON specs and invoke Cosmos-Transfer via subprocess (arguments, error handling).
Review video processing steps (MP4 conversion, codecs, Canny parameters, frame extraction).
Verify FiftyOne grouping, slice additions, CLIP embedding and similarity indexing.
Check file/path handling and I/O robustness around external processes.

Poem

🐰
I hopped through frames both loud and slight,
drew edges, stitched a video flight,
Cosmos hummed and painted bright,
FiftyOne opened glowing sight,
A rabbit cheers — pipelines alight.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a new tutorial integrating Cosmos Transfer 2.5 with FiftyOne, which aligns perfectly with the PR's primary objective of adding the integration notebook.
Description check	✅ Passed	The description covers all critical template sections: proposed changes, testing approach, and release notes section. However, both release notes checkboxes remain unchecked and the affected areas checkboxes are also unchecked, which appear to be intentional omissions rather than oversights.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55e4f6a and 826b5b8.

📒 Files selected for processing (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1 hunks)

🧰 Additional context used

🪛 Pylint (4.0.2)

docs/source/tutorials/cosmos-transfer-integration.ipynb

[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check

(E0015)

[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.

(R0022)

🪛 Ruff (0.14.4)

docs/source/tutorials/cosmos-transfer-integration.ipynb

98-98: Redefinition of unused os from line 28

Remove definition: os

(F811)

209-209: Multiple statements on one line (semicolon)

(E702)

223-223: subprocess call: check for execution of untrusted input

(S603)

224-224: Consider moving this statement to an else block

(TRY300)

312-315: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

docs/source/tutorials/cosmos-transfer-integration.ipynb

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1)
175-176: Fix the shell variable assignment (issue not addressed from previous review).

Line 175 attempts to assign a shell variable using !input_dir = "/path/to/images", but this executes as a shell command and fails. The subsequent FFmpeg command on Line 176 cannot expand $input_dir properly, breaking the video conversion step.

Apply this diff to define input_dir as a Python variable:
-!input_dir = "/path/to/images"
-!cd "$input_dir" && for d in */ ; do ffmpeg -framerate 24 -pattern_type glob -i "$d/*.jpg" -c:v libx264 -pix_fmt yuv420p "${d%/}.mp4"; done
+input_dir = "/path/to/images"
+!cd "{input_dir}" && for d in */ ; do ffmpeg -framerate 24 -pattern_type glob -i "$d/*.jpg" -c:v libx264 -pix_fmt yuv420p "${{d%/}}.mp4"; done
Note: Use Python string interpolation {input_dir} instead of shell variable $input_dir.

🧹 Nitpick comments (3)

docs/source/tutorials/cosmos-transfer-integration.ipynb (3)
196-196: Consider backward-compatible type hint syntax.

Line 196 uses str | None syntax introduced in Python 3.10. While the notebook metadata indicates Python 3.10.12, users with older Python versions will encounter syntax errors.

Consider using Optional[str] from the typing module for broader compatibility:
+from typing import Optional
+
-def guess_video_path(image_path: str) -> str | None:
+def guess_video_path(image_path: str) -> Optional[str]:
Alternatively, document Python 3.10+ as a requirement in the setup section.

385-385: Consider splitting resource cleanup statements.

While functionally correct, combining cap.release(); out.release() on one line with a semicolon goes against typical Python style guidelines.
-    cap.release(); out.release()
+    cap.release()
+    out.release()
619-621: Fix commented code variable name.

Line 620 references session2.wait(), but the actual variable name is session (defined on Line 619).
 session = fo.launch_app(flattened_output_last, port=5151, auto=False)
-#session2.wait()
+# session.wait()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 826b5b8 and 40b2b36.

📒 Files selected for processing (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1 hunks)

🧰 Additional context used

🪛 Pylint (4.0.2)

docs/source/tutorials/cosmos-transfer-integration.ipynb

[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check

(E0015)

[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.

(R0022)

🪛 Ruff (0.14.4)

docs/source/tutorials/cosmos-transfer-integration.ipynb

98-98: Redefinition of unused os from line 28

Remove definition: os

(F811)

209-209: Multiple statements on one line (semicolon)

(E702)

223-223: subprocess call: check for execution of untrusted input

(S603)

224-224: Consider moving this statement to an else block

(TRY300)

312-315: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (6)

docs/source/tutorials/cosmos-transfer-integration.ipynb (6)

1-78: Clear introduction and setup instructions.

The tutorial provides a well-structured overview of the Cosmos-Transfer 2.5 and FiftyOne integration, with appropriate setup guidance for dependencies and environment preparation.

91-138: Dataset loading and grouping logic is correct.

The code properly loads the BioTrove dataset from Hugging Face and creates a grouped dataset structure with appropriate cleanup and persistence settings.

277-477: Batch pipeline implementation is well-structured.

The pipeline effectively orchestrates edge map generation, JSON spec creation, Cosmos-Transfer inference, and frame extraction. The code is decomposed into clear helper functions and includes appropriate error handling.

486-529: FiftyOne integration logic is correct.

The code properly loads the existing grouped dataset, indexes samples by key, creates new slices for edge maps and outputs, and launches the FiftyOne App for visualization.

554-576: Embeddings and similarity computation is correctly implemented.

The code properly computes CLIP embeddings for the output_last slice and builds a similarity index using the correct field name throughout.

628-642: Excellent tutorial summary.

The summary effectively recaps the entire workflow and clearly articulates the value proposition of combining Cosmos-Transfer 2.5 with FiftyOne for Physical AI applications.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

docs/source/tutorials/cosmos-transfer-integration.ipynb (4)
221-221: Optional: Prefix unused loop variable with underscore.

The dirnames variable from os.walk() is unpacked but never used in the loop body. While this doesn't affect functionality, prefixing it with an underscore signals intent.

Apply this diff:
-for dirpath, dirnames, filenames in os.walk(images_root):
+for dirpath, _dirnames, filenames in os.walk(images_root):
522-522: Optional: Split statements onto separate lines for better readability.

While functionally correct, splitting the resource cleanup onto separate lines improves readability.

Apply this diff:
-    cap.release(); out.release()
+    cap.release()
+    out.release()
442-450: Remove unnecessary f-string prefix.

The MOTH_PROMPT string doesn't contain any placeholders, so the f prefix is unnecessary.

Apply this diff:
-MOTH_PROMPT = os.environ.get("MOTH_PROMPT", f"""
+MOTH_PROMPT = os.environ.get("MOTH_PROMPT", """
763-775: Use batch operations for better performance.

Saving samples one-by-one in a loop creates many individual database writes, which can be slow for large datasets. FiftyOne's set_values() method performs batch updates much more efficiently.

Replace the entire cell with this more efficient batch approach:
-# Add a field to store the slice name
-dataset_grp.add_sample_field("slice_name", fo.StringField)
-
-# Iterate through each slice and set the slice_name field
-for slice_name in ["image", "output_last"]:
-    # Get samples from this slice
-    slice_view = dataset_grp.select_group_slices(slice_name)
-    
-    # Set the slice_name field for all samples in this slice
-    for sample in slice_view:
-        sample["slice_name"] = slice_name
-        sample.save()
+# Add a field to store the slice name
+dataset_grp.add_sample_field("slice_name", fo.StringField)
+
+# Use batch operations for better performance
+for slice_name in ["image", "output_last"]:
+    slice_view = dataset_grp.select_group_slices(slice_name)
+    slice_view.set_values("slice_name", [slice_name] * len(slice_view))
+
+print(f"[OK] Set slice_name field for {len(dataset_grp.select_group_slices(['image', 'output_last']))} samples")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40b2b36 and 0ca8a3d.

📒 Files selected for processing (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1 hunks)

🧰 Additional context used

🪛 Pylint (4.0.2)

docs/source/tutorials/cosmos-transfer-integration.ipynb

[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check

(E0015)

[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.

(R0022)

🪛 Ruff (0.14.4)

docs/source/tutorials/cosmos-transfer-integration.ipynb

53-53: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

87-87: subprocess call: check for execution of untrusted input

(S603)

290-290: Multiple statements on one line (semicolon)

(E702)

304-304: subprocess call: check for execution of untrusted input

(S603)

305-305: Consider moving this statement to an else block

(TRY300)

392-395: Avoid specifying long messages outside the exception class

(TRY003)

447-447: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1)

803-822: Well-structured embeddings pipeline.

The embeddings computation and similarity indexing are correctly implemented with consistent field names and proper use of FiftyOne Brain APIs. The flattened view approach allows processing both slices together efficiently.

coderabbitai · 2025-11-14T17:44:06Z

docs/source/tutorials/cosmos-transfer-integration.ipynb

+    "base_dir = images_root.parent  # This is ~/fiftyone/huggingface/hub/<username>/<dataset_name>\n",
+    "\n",
+    "#Setup where the COSMOS-TRANSFER2.5 REPO folder is\n",
+    "os.environ[\"COSMOS_DIR\"] = \"/path/to/cosmos-transfer-repo\"\n",


⚠️ Potential issue | 🔴 Critical

Users must update this placeholder path before execution.

The hardcoded path /path/to/cosmos-transfer-repo will cause the pipeline to fail. Users must set this to their actual Cosmos-Transfer repository location.

Consider adding a validation check immediately after this line:

os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer-repo" + +# Validate COSMOS_DIR exists +if not Path(os.environ["COSMOS_DIR"]).exists(): + raise ValueError( + f"COSMOS_DIR does not exist: {os.environ['COSMOS_DIR']}. " + "Please update this path to your actual Cosmos-Transfer repository location." + )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In docs/source/tutorials/cosmos-transfer-integration.ipynb around line 420, the example sets os.environ["COSMOS_DIR"] to a hardcoded placeholder which will break the pipeline; immediately after that line add a validation that expands the path (os.path.expanduser), checks that it exists and is a directory (os.path.exists and os.path.isdir), and if not, raise a clear error or exit with a message instructing the user to set COSMOS_DIR to their actual Cosmos-Transfer repo path so execution stops early with actionable feedback.

coderabbitai · 2025-11-14T17:44:06Z

docs/source/tutorials/cosmos-transfer-integration.ipynb

+    "last_frames_dir = OUT_DIR / \"last_frames\"\n",
+    "last_frames_dir.mkdir(parents=True, exist_ok=True)\n",


⚠️ Potential issue | 🟡 Minor

Inconsistent directory variable naming.

This cell creates last_frames_dir = OUT_DIR / "last_frames", but the earlier configuration (line 427) defined LAST_FRAMES_DIR = base_dir / "last_frame" (singular). This creates output frames in a different location than configured.

For consistency, reuse the configuration variable:

-last_frames_dir = OUT_DIR / "last_frames" -last_frames_dir.mkdir(parents=True, exist_ok=True) +# Reuse the configured directory (already created by ensure_dirs if needed)

Then update line 718:

- last_png = extract_last_frame(out_vid, last_frames_dir) + last_png = extract_last_frame(out_vid, LAST_FRAMES_DIR)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In docs/source/tutorials/cosmos-transfer-integration.ipynb around lines 695–696 (and referencing the earlier config at line 427), the cell creates a new path variable last_frames_dir = OUT_DIR / "last_frames" which conflicts with the configured LAST_FRAMES_DIR = base_dir / "last_frame" (singular); change the cell to reuse the existing configuration variable LAST_FRAMES_DIR instead of creating last_frames_dir, call mkdir(parents=True, exist_ok=True) on LAST_FRAMES_DIR, and update any later reference at line 718 to use LAST_FRAMES_DIR so all code writes/reads to the same directory name.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

docs/source/tutorials/cosmos-transfer-integration.ipynb (2)
420-420: Critical: Placeholder path will cause pipeline failure.

The hardcoded placeholder /path/to/cosmos-transfer-repo must be updated before execution. Consider adding validation immediately after this line to fail fast with a clear error message.

Apply this diff to add validation:
 os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer-repo"
+
+# Validate COSMOS_DIR exists
+cosmos_path = Path(os.environ["COSMOS_DIR"]).expanduser()
+if not cosmos_path.exists() or not cosmos_path.is_dir():
+    raise ValueError(
+        f"COSMOS_DIR does not exist or is not a directory: {cosmos_path}\n"
+        "Please update line 420 to point to your actual Cosmos-Transfer repository."
+    )
Based on learnings

695-696: Inconsistent directory variable naming.

This cell creates last_frames_dir = OUT_DIR / "last_frames" (plural), but the configuration at line 427 defined LAST_FRAMES_DIR = base_dir / "last_frame" (singular). This creates output frames in a different location than configured, which could cause confusion.

For consistency, reuse the configured directory:
-last_frames_dir = OUT_DIR / "last_frames"
-last_frames_dir.mkdir(parents=True, exist_ok=True)
+LAST_FRAMES_DIR.mkdir(parents=True, exist_ok=True)
Then update the reference at line 718:
-    last_png = extract_last_frame(out_vid, last_frames_dir)
+    last_png = extract_last_frame(out_vid, LAST_FRAMES_DIR)
Based on learnings

🧹 Nitpick comments (2)

docs/source/tutorials/cosmos-transfer-integration.ipynb (2)

465-580: Well-designed utility functions with robust error handling.

The pipeline utilities are cleanly implemented with appropriate error handling for video processing, edge detection, and subprocess invocation. The fallback logic in extract_last_frame for videos with incorrect metadata is particularly thorough.

Note: Line 522 uses a semicolon to combine statements (cap.release(); out.release()), which works but could be split for style consistency if desired.

797-822: Correct embeddings computation with consistent field naming.

The code properly computes embeddings and builds the similarity index using the "embeddings" field consistently throughout. Both compute_embeddings() (line 811) and compute_similarity() (line 817) reference the same field.

Note: The markdown documentation at line 793 mentions an inconsistency that doesn't exist in the current code. Consider removing that outdated warning.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca8a3d and f5ee552.

📒 Files selected for processing (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1 hunks)

🧰 Additional context used

🪛 Pylint (4.0.2)

docs/source/tutorials/cosmos-transfer-integration.ipynb

[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check

(E0015)

[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.

(R0022)

🪛 Ruff (0.14.4)

docs/source/tutorials/cosmos-transfer-integration.ipynb

53-53: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

87-87: subprocess call: check for execution of untrusted input

(S603)

290-290: Multiple statements on one line (semicolon)

(E702)

304-304: subprocess call: check for execution of untrusted input

(S603)

305-305: Consider moving this statement to an else block

(TRY300)

392-395: Avoid specifying long messages outside the exception class

(TRY003)

447-447: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (11)

docs/source/tutorials/cosmos-transfer-integration.ipynb (11)

1-80: Excellent introduction and setup documentation.

The tutorial provides clear context, motivation, and comprehensive setup instructions including system requirements, installation steps, and dependencies. This gives users a solid foundation before diving into the integration workflow.

93-136: Clean dataset loading implementation.

The code properly loads the dataset from Hugging Face Hub and extracts the directory path for subsequent operations. Using max_samples=2 is appropriate for demonstration purposes.

149-164: LGTM: Grouped dataset initialization.

The grouped dataset setup follows FiftyOne best practices with proper cleanup and group field configuration.

200-263: Robust image-to-video conversion pipeline.

The FFmpeg integration is well-implemented with proper error handling, individual file processing, and even-dimension padding for video codec compatibility. The subprocess execution is safe since all paths are internally constructed.

Note: The unused dirnames variable in the os.walk() unpacking (line 221) is a common pattern and can be safely ignored or prefixed with _ if desired.

281-344: Well-structured grouped dataset population.

The code cleanly maps images to their corresponding videos and creates grouped samples with proper field copying. The guess_video_path helper function provides good encapsulation.

581-613: Clean pipeline execution with good progress tracking.

The main execution loop properly orchestrates the utilities, handles errors gracefully, and provides clear progress feedback.

698-743: Solid FiftyOne integration with comprehensive error tracking.

The group matching and slice addition logic is well-implemented with proper error handling for missing files and unmatched groups. The dataset reload and app launch complete the integration cleanly.

757-775: LGTM: Slice identifier field addition.

The code properly adds and populates the slice_name field for filtering and visualization purposes.

846-862: LGTM: Embeddings visualization setup.

The UMAP visualization configuration correctly references the computed embeddings and will enable interactive exploration in the FiftyOne App.

865-872: LGTM: App launch.

Simple and correct FiftyOne App launch for the flattened view.

883-921: Comprehensive tutorial summary.

The summary effectively recaps the key learning points and provides a clear call-to-action for users interested in the commercial offering.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1)
409-409: Critical: Placeholder path will break execution.

The hardcoded placeholder path will cause the pipeline to fail. This was flagged in previous reviews but still shows the placeholder value.

Users must update this before running the notebook. Add validation immediately after:
 os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer2.5/folder"
+
+# Validate COSMOS_DIR exists
+_cosmos_path = Path(os.environ["COSMOS_DIR"]).expanduser()
+if not _cosmos_path.exists() or str(_cosmos_path) == "/path/to/cosmos-transfer2.5/folder":
+    raise ValueError(
+        f"COSMOS_DIR must be set to your actual Cosmos-Transfer repository location. "
+        f"Current value: {os.environ['COSMOS_DIR']}"
+    )

🧹 Nitpick comments (9)

docs/source/tutorials/cosmos-transfer-integration.ipynb (9)
103-107: Consider tutorial-friendly defaults for dataset loading.

The max_samples=2 is very limiting and overwrite=True forces re-download every time the cell runs. For a tutorial, consider:

Increasing max_samples to ~10-20 for meaningful results

Setting overwrite=False by default to avoid unnecessary re-downloads
     dataset_src = fouh.load_from_hub(
         "pjramg/moth_biotrove",
         persistent=True,
-        overwrite=True,   # set to False if you don't want to re-download
-        max_samples=2,
+        overwrite=False,  # set to True to force re-download
+        max_samples=10,   # adjust based on your needs
     )
150-151: Add confirmation for destructive dataset deletion.

Unconditionally deleting the dataset without warning could result in accidental data loss. Consider adding a prompt or moving this to a setup cell with clear warnings.
 dataset_grp_name = "moth_biotrove_grouped"
 if fo.dataset_exists(dataset_grp_name):
-    fo.delete_dataset(dataset_grp_name)
+    print(f"Warning: Deleting existing dataset '{dataset_grp_name}'")
+    # Uncomment the next line to proceed with deletion:
+    # fo.delete_dataset(dataset_grp_name)
217-221: Unused loop variable dirnames.

The dirnames variable from os.walk() is not used within the loop body.
-for dirpath, dirnames, filenames in os.walk(images_root):
+for dirpath, _dirnames, filenames in os.walk(images_root):
     jpgs = sorted([f for f in filenames if f.lower().endswith(".jpg")])
511-511: Poor style: multiple statements on one line.

Using semicolons to place multiple statements on one line reduces readability.
-    cap.release(); out.release()
+    cap.release()
+    out.release()
524-528: Simplify error handling in subprocess call.

The try-except block can be streamlined - the return statement in the except clause is unnecessarily placed after the print.
     try:
         subprocess.run(cmd, check=True)
         return True
     except subprocess.CalledProcessError as e:
         print(f"[ERROR] inference failed ({spec_json.name}): {e}")
         return False
489-490: Remove unnecessary f-string prefix.

The f-string has no placeholders; use a regular string instead.
     spec_path.parent.mkdir(parents=True, exist_ok=True)
-    spec_path.write_text(json.dumps(obj, indent=2))
+    with spec_path.open('w') as f:
+        json.dump(obj, f, indent=2)
Note: Also consider using json.dump() directly instead of dumps() + write_text() for better performance.

671-672: Duplicate function definition.

The base_key() function is already defined in the main pipeline cell (line 566). Consider removing this redefinition or using the one from the pipeline.
-def base_key(path: Path) -> str:
-    return path.stem
+# base_key() is already defined in the pipeline cell above
692-695: Inconsistent directory naming with configuration.

New directories edge_last_dir and output_last_dir are created here, but the configuration cell defines LAST_FRAMES_DIR = base_dir / "last_frame" (line 416). This creates confusion about where files are actually stored.

Consider using consistent naming or referencing the configured directory:
-# Folder where last-frame PNGs will go
-edge_last_dir = OUT_DIR / "edge_last_frames"
-output_last_dir = OUT_DIR / "output_last_frames"
-edge_last_dir.mkdir(parents=True, exist_ok=True)
-output_last_dir.mkdir(parents=True, exist_ok=True)
+# Reuse configured directory or create subdirectories under it
+edge_last_dir = LAST_FRAMES_DIR / "edge"
+output_last_dir = LAST_FRAMES_DIR / "output"
+edge_last_dir.mkdir(parents=True, exist_ok=True)
+output_last_dir.mkdir(parents=True, exist_ok=True)
786-794: Inefficient: saving samples individually in a loop.

Calling sample.save() for each sample in a loop is slow for large datasets. Use FiftyOne's batch operations instead.
 # Iterate through each slice and set the slice_name field
 for slice_name in ["image", "output_last"]:
     # Get samples from this slice
     slice_view = dataset_grp.select_group_slices(slice_name)
     
-    # Set the slice_name field for all samples in this slice
-    for sample in slice_view:
-        sample["slice_name"] = slice_name
-        sample.save()
+    # Use batch update (more efficient)
+    slice_view.set_values("slice_name", [slice_name] * len(slice_view))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5ee552 and d70be13.

📒 Files selected for processing (2)

docs/source/tutorials/cosmos-transfer-integration.ipynb (1 hunks)
docs/source/tutorials/index.rst (2 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/source/tutorials/index.rst

🧰 Additional context used

🪛 Ruff (0.14.5)

docs/source/tutorials/cosmos-transfer-integration.ipynb

26-26: Redefinition of unused fo from line 2

Remove definition: fo

(F811)

39-39: Found useless expression. Either assign it to a variable or remove it.

(B018)

58-58: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

92-92: subprocess call: check for execution of untrusted input

(S603)

298-298: Multiple statements on one line (semicolon)

(E702)

312-312: subprocess call: check for execution of untrusted input

(S603)

313-313: Consider moving this statement to an else block

(TRY300)

410-413: Avoid specifying long messages outside the exception class

(TRY003)

488-488: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (4)

docs/source/tutorials/cosmos-transfer-integration.ipynb (4)

275-275: Type hint syntax requires Python 3.10+.

The union type syntax str | None requires Python 3.10 or later. Since the notebook metadata shows "version": "3.10.12", this is fine, but consider documenting this requirement in the setup section for users.

821-841: LGTM!

The embeddings computation and similarity index creation are properly configured. The field names are consistent between compute_embeddings() and compute_similarity().

871-880: LGTM!

The visualization computation is correctly configured and uses the embeddings field from the previous cell.

890-890: LGTM!

Standard FiftyOne app launch pattern.

paularamo requested a review from a team as a code owner November 11, 2025 20:26

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

docs/source/tutorials/cosmos-transfer-integration.ipynb Outdated Show resolved Hide resolved

docs/source/tutorials/cosmos-transfer-integration.ipynb Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

paularamo closed this Nov 25, 2025

paularamo force-pushed the main branch from d70be13 to dac0178 Compare November 25, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Cosmos Transfer 2.5 + FiftyOne - Integration - OSS#6564

Cosmos Transfer 2.5 + FiftyOne - Integration - OSS#6564
paularamo wants to merge 0 commit intovoxel51:mainfrom
paularamo:main

paularamo commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 14, 2025

Uh oh!

coderabbitai bot Nov 14, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		"last_frames_dir = OUT_DIR / \"last_frames\"\n",
		"last_frames_dir.mkdir(parents=True, exist_ok=True)\n",

Comments

Conversation

paularamo commented Nov 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested? If it is not, please explain why.

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

What areas of FiftyOne does this PR affect?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paularamo commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading