Cosmos Transfer 2.5 + FiftyOne - Integration - OSS#6564
Cosmos Transfer 2.5 + FiftyOne - Integration - OSS#6564paularamo wants to merge 0 commit intovoxel51:mainfrom
Conversation
WalkthroughAdds a new Jupyter notebook and index entries documenting an end-to-end integration between Cosmos-Transfer 2.5 and FiftyOne: dataset loading and grouping, image-sequence → 1s MP4 conversion, Canny edge control-map generation, JSON spec creation, subprocess inference, output video handling (last-frame extraction), and CLIP-based embeddings and visualization in FiftyOne. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant FO as FiftyOne
participant Pre as Preprocessor
participant Spec as SpecWriter
participant CT as Cosmos-Transfer (subprocess)
participant Out as OutputProcessor
participant Brain as FiftyOne Brain
participant App as FiftyOne App
User->>FO: load BioTrove subset
FO->>FO: build grouped samples (image, sequence, slices)
User->>Pre: run preprocessing
Pre->>Pre: convert image sequences → 1s MP4
Pre->>Pre: generate Canny edge videos (control maps)
Pre->>Spec: write JSON spec files for inference
Spec->>CT: invoke Cosmos-Transfer via subprocess
CT->>Out: produce output videos
Out->>Out: extract last frames (output_last)
Out->>FO: add slices (edge, output, output_last) to samples
User->>Brain: compute CLIP embeddings on output_last
Brain->>FO: attach embeddings & indexes
FO->>App: launch visualization
App->>User: interactive exploration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb(1 hunks)
🧰 Additional context used
🪛 Pylint (4.0.2)
docs/source/tutorials/cosmos-transfer-integration.ipynb
[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check
(E0015)
[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
🪛 Ruff (0.14.4)
docs/source/tutorials/cosmos-transfer-integration.ipynb
98-98: Redefinition of unused os from line 28
Remove definition: os
(F811)
209-209: Multiple statements on one line (semicolon)
(E702)
223-223: subprocess call: check for execution of untrusted input
(S603)
224-224: Consider moving this statement to an else block
(TRY300)
312-315: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb (1)
175-176: Fix the shell variable assignment (issue not addressed from previous review).Line 175 attempts to assign a shell variable using
!input_dir = "/path/to/images", but this executes as a shell command and fails. The subsequent FFmpeg command on Line 176 cannot expand$input_dirproperly, breaking the video conversion step.Apply this diff to define
input_diras a Python variable:-!input_dir = "/path/to/images" -!cd "$input_dir" && for d in */ ; do ffmpeg -framerate 24 -pattern_type glob -i "$d/*.jpg" -c:v libx264 -pix_fmt yuv420p "${d%/}.mp4"; done +input_dir = "/path/to/images" +!cd "{input_dir}" && for d in */ ; do ffmpeg -framerate 24 -pattern_type glob -i "$d/*.jpg" -c:v libx264 -pix_fmt yuv420p "${{d%/}}.mp4"; doneNote: Use Python string interpolation
{input_dir}instead of shell variable$input_dir.
🧹 Nitpick comments (3)
docs/source/tutorials/cosmos-transfer-integration.ipynb (3)
196-196: Consider backward-compatible type hint syntax.Line 196 uses
str | Nonesyntax introduced in Python 3.10. While the notebook metadata indicates Python 3.10.12, users with older Python versions will encounter syntax errors.Consider using
Optional[str]from thetypingmodule for broader compatibility:+from typing import Optional + -def guess_video_path(image_path: str) -> str | None: +def guess_video_path(image_path: str) -> Optional[str]:Alternatively, document Python 3.10+ as a requirement in the setup section.
385-385: Consider splitting resource cleanup statements.While functionally correct, combining
cap.release(); out.release()on one line with a semicolon goes against typical Python style guidelines.- cap.release(); out.release() + cap.release() + out.release()
619-621: Fix commented code variable name.Line 620 references
session2.wait(), but the actual variable name issession(defined on Line 619).session = fo.launch_app(flattened_output_last, port=5151, auto=False) -#session2.wait() +# session.wait()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb(1 hunks)
🧰 Additional context used
🪛 Pylint (4.0.2)
docs/source/tutorials/cosmos-transfer-integration.ipynb
[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check
(E0015)
[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
🪛 Ruff (0.14.4)
docs/source/tutorials/cosmos-transfer-integration.ipynb
98-98: Redefinition of unused os from line 28
Remove definition: os
(F811)
209-209: Multiple statements on one line (semicolon)
(E702)
223-223: subprocess call: check for execution of untrusted input
(S603)
224-224: Consider moving this statement to an else block
(TRY300)
312-315: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (6)
docs/source/tutorials/cosmos-transfer-integration.ipynb (6)
1-78: Clear introduction and setup instructions.The tutorial provides a well-structured overview of the Cosmos-Transfer 2.5 and FiftyOne integration, with appropriate setup guidance for dependencies and environment preparation.
91-138: Dataset loading and grouping logic is correct.The code properly loads the BioTrove dataset from Hugging Face and creates a grouped dataset structure with appropriate cleanup and persistence settings.
277-477: Batch pipeline implementation is well-structured.The pipeline effectively orchestrates edge map generation, JSON spec creation, Cosmos-Transfer inference, and frame extraction. The code is decomposed into clear helper functions and includes appropriate error handling.
486-529: FiftyOne integration logic is correct.The code properly loads the existing grouped dataset, indexes samples by key, creates new slices for edge maps and outputs, and launches the FiftyOne App for visualization.
554-576: Embeddings and similarity computation is correctly implemented.The code properly computes CLIP embeddings for the
output_lastslice and builds a similarity index using the correct field name throughout.
628-642: Excellent tutorial summary.The summary effectively recaps the entire workflow and clearly articulates the value proposition of combining Cosmos-Transfer 2.5 with FiftyOne for Physical AI applications.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docs/source/tutorials/cosmos-transfer-integration.ipynb (4)
221-221: Optional: Prefix unused loop variable with underscore.The
dirnamesvariable fromos.walk()is unpacked but never used in the loop body. While this doesn't affect functionality, prefixing it with an underscore signals intent.Apply this diff:
-for dirpath, dirnames, filenames in os.walk(images_root): +for dirpath, _dirnames, filenames in os.walk(images_root):
522-522: Optional: Split statements onto separate lines for better readability.While functionally correct, splitting the resource cleanup onto separate lines improves readability.
Apply this diff:
- cap.release(); out.release() + cap.release() + out.release()
442-450: Remove unnecessary f-string prefix.The
MOTH_PROMPTstring doesn't contain any placeholders, so thefprefix is unnecessary.Apply this diff:
-MOTH_PROMPT = os.environ.get("MOTH_PROMPT", f""" +MOTH_PROMPT = os.environ.get("MOTH_PROMPT", """
763-775: Use batch operations for better performance.Saving samples one-by-one in a loop creates many individual database writes, which can be slow for large datasets. FiftyOne's
set_values()method performs batch updates much more efficiently.Replace the entire cell with this more efficient batch approach:
-# Add a field to store the slice name -dataset_grp.add_sample_field("slice_name", fo.StringField) - -# Iterate through each slice and set the slice_name field -for slice_name in ["image", "output_last"]: - # Get samples from this slice - slice_view = dataset_grp.select_group_slices(slice_name) - - # Set the slice_name field for all samples in this slice - for sample in slice_view: - sample["slice_name"] = slice_name - sample.save() +# Add a field to store the slice name +dataset_grp.add_sample_field("slice_name", fo.StringField) + +# Use batch operations for better performance +for slice_name in ["image", "output_last"]: + slice_view = dataset_grp.select_group_slices(slice_name) + slice_view.set_values("slice_name", [slice_name] * len(slice_view)) + +print(f"[OK] Set slice_name field for {len(dataset_grp.select_group_slices(['image', 'output_last']))} samples")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb(1 hunks)
🧰 Additional context used
🪛 Pylint (4.0.2)
docs/source/tutorials/cosmos-transfer-integration.ipynb
[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check
(E0015)
[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
🪛 Ruff (0.14.4)
docs/source/tutorials/cosmos-transfer-integration.ipynb
53-53: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
87-87: subprocess call: check for execution of untrusted input
(S603)
290-290: Multiple statements on one line (semicolon)
(E702)
304-304: subprocess call: check for execution of untrusted input
(S603)
305-305: Consider moving this statement to an else block
(TRY300)
392-395: Avoid specifying long messages outside the exception class
(TRY003)
447-447: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb (1)
803-822: Well-structured embeddings pipeline.The embeddings computation and similarity indexing are correctly implemented with consistent field names and proper use of FiftyOne Brain APIs. The flattened view approach allows processing both slices together efficiently.
| "base_dir = images_root.parent # This is ~/fiftyone/huggingface/hub/<username>/<dataset_name>\n", | ||
| "\n", | ||
| "#Setup where the COSMOS-TRANSFER2.5 REPO folder is\n", | ||
| "os.environ[\"COSMOS_DIR\"] = \"/path/to/cosmos-transfer-repo\"\n", |
There was a problem hiding this comment.
Users must update this placeholder path before execution.
The hardcoded path /path/to/cosmos-transfer-repo will cause the pipeline to fail. Users must set this to their actual Cosmos-Transfer repository location.
Consider adding a validation check immediately after this line:
os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer-repo"
+
+# Validate COSMOS_DIR exists
+if not Path(os.environ["COSMOS_DIR"]).exists():
+ raise ValueError(
+ f"COSMOS_DIR does not exist: {os.environ['COSMOS_DIR']}. "
+ "Please update this path to your actual Cosmos-Transfer repository location."
+ )Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docs/source/tutorials/cosmos-transfer-integration.ipynb around line 420, the
example sets os.environ["COSMOS_DIR"] to a hardcoded placeholder which will
break the pipeline; immediately after that line add a validation that expands
the path (os.path.expanduser), checks that it exists and is a directory
(os.path.exists and os.path.isdir), and if not, raise a clear error or exit with
a message instructing the user to set COSMOS_DIR to their actual Cosmos-Transfer
repo path so execution stops early with actionable feedback.
| "last_frames_dir = OUT_DIR / \"last_frames\"\n", | ||
| "last_frames_dir.mkdir(parents=True, exist_ok=True)\n", |
There was a problem hiding this comment.
Inconsistent directory variable naming.
This cell creates last_frames_dir = OUT_DIR / "last_frames", but the earlier configuration (line 427) defined LAST_FRAMES_DIR = base_dir / "last_frame" (singular). This creates output frames in a different location than configured.
For consistency, reuse the configuration variable:
-last_frames_dir = OUT_DIR / "last_frames"
-last_frames_dir.mkdir(parents=True, exist_ok=True)
+# Reuse the configured directory (already created by ensure_dirs if needed)Then update line 718:
- last_png = extract_last_frame(out_vid, last_frames_dir)
+ last_png = extract_last_frame(out_vid, LAST_FRAMES_DIR)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docs/source/tutorials/cosmos-transfer-integration.ipynb around lines 695–696
(and referencing the earlier config at line 427), the cell creates a new path
variable last_frames_dir = OUT_DIR / "last_frames" which conflicts with the
configured LAST_FRAMES_DIR = base_dir / "last_frame" (singular); change the cell
to reuse the existing configuration variable LAST_FRAMES_DIR instead of creating
last_frames_dir, call mkdir(parents=True, exist_ok=True) on LAST_FRAMES_DIR, and
update any later reference at line 718 to use LAST_FRAMES_DIR so all code
writes/reads to the same directory name.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
docs/source/tutorials/cosmos-transfer-integration.ipynb (2)
420-420: Critical: Placeholder path will cause pipeline failure.The hardcoded placeholder
/path/to/cosmos-transfer-repomust be updated before execution. Consider adding validation immediately after this line to fail fast with a clear error message.Apply this diff to add validation:
os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer-repo" + +# Validate COSMOS_DIR exists +cosmos_path = Path(os.environ["COSMOS_DIR"]).expanduser() +if not cosmos_path.exists() or not cosmos_path.is_dir(): + raise ValueError( + f"COSMOS_DIR does not exist or is not a directory: {cosmos_path}\n" + "Please update line 420 to point to your actual Cosmos-Transfer repository." + )Based on learnings
695-696: Inconsistent directory variable naming.This cell creates
last_frames_dir = OUT_DIR / "last_frames"(plural), but the configuration at line 427 definedLAST_FRAMES_DIR = base_dir / "last_frame"(singular). This creates output frames in a different location than configured, which could cause confusion.For consistency, reuse the configured directory:
-last_frames_dir = OUT_DIR / "last_frames" -last_frames_dir.mkdir(parents=True, exist_ok=True) +LAST_FRAMES_DIR.mkdir(parents=True, exist_ok=True)Then update the reference at line 718:
- last_png = extract_last_frame(out_vid, last_frames_dir) + last_png = extract_last_frame(out_vid, LAST_FRAMES_DIR)Based on learnings
🧹 Nitpick comments (2)
docs/source/tutorials/cosmos-transfer-integration.ipynb (2)
465-580: Well-designed utility functions with robust error handling.The pipeline utilities are cleanly implemented with appropriate error handling for video processing, edge detection, and subprocess invocation. The fallback logic in
extract_last_framefor videos with incorrect metadata is particularly thorough.Note: Line 522 uses a semicolon to combine statements (
cap.release(); out.release()), which works but could be split for style consistency if desired.
797-822: Correct embeddings computation with consistent field naming.The code properly computes embeddings and builds the similarity index using the
"embeddings"field consistently throughout. Bothcompute_embeddings()(line 811) andcompute_similarity()(line 817) reference the same field.Note: The markdown documentation at line 793 mentions an inconsistency that doesn't exist in the current code. Consider removing that outdated warning.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb(1 hunks)
🧰 Additional context used
🪛 Pylint (4.0.2)
docs/source/tutorials/cosmos-transfer-integration.ipynb
[error] 1-1: Unrecognized option found: optimize-ast, files-output, function-name-hint, variable-name-hint, const-name-hint, attr-name-hint, argument-name-hint, class-attribute-name-hint, inlinevar-name-hint, class-name-hint, module-name-hint, method-name-hint, no-space-check
(E0015)
[refactor] 1-1: Useless option value for '--disable', 'bad-continuation' was removed from pylint, see pylint-dev/pylint#3571.
(R0022)
🪛 Ruff (0.14.4)
docs/source/tutorials/cosmos-transfer-integration.ipynb
53-53: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
87-87: subprocess call: check for execution of untrusted input
(S603)
290-290: Multiple statements on one line (semicolon)
(E702)
304-304: subprocess call: check for execution of untrusted input
(S603)
305-305: Consider moving this statement to an else block
(TRY300)
392-395: Avoid specifying long messages outside the exception class
(TRY003)
447-447: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (11)
docs/source/tutorials/cosmos-transfer-integration.ipynb (11)
1-80: Excellent introduction and setup documentation.The tutorial provides clear context, motivation, and comprehensive setup instructions including system requirements, installation steps, and dependencies. This gives users a solid foundation before diving into the integration workflow.
93-136: Clean dataset loading implementation.The code properly loads the dataset from Hugging Face Hub and extracts the directory path for subsequent operations. Using
max_samples=2is appropriate for demonstration purposes.
149-164: LGTM: Grouped dataset initialization.The grouped dataset setup follows FiftyOne best practices with proper cleanup and group field configuration.
200-263: Robust image-to-video conversion pipeline.The FFmpeg integration is well-implemented with proper error handling, individual file processing, and even-dimension padding for video codec compatibility. The subprocess execution is safe since all paths are internally constructed.
Note: The unused
dirnamesvariable in theos.walk()unpacking (line 221) is a common pattern and can be safely ignored or prefixed with_if desired.
281-344: Well-structured grouped dataset population.The code cleanly maps images to their corresponding videos and creates grouped samples with proper field copying. The
guess_video_pathhelper function provides good encapsulation.
581-613: Clean pipeline execution with good progress tracking.The main execution loop properly orchestrates the utilities, handles errors gracefully, and provides clear progress feedback.
698-743: Solid FiftyOne integration with comprehensive error tracking.The group matching and slice addition logic is well-implemented with proper error handling for missing files and unmatched groups. The dataset reload and app launch complete the integration cleanly.
757-775: LGTM: Slice identifier field addition.The code properly adds and populates the
slice_namefield for filtering and visualization purposes.
846-862: LGTM: Embeddings visualization setup.The UMAP visualization configuration correctly references the computed embeddings and will enable interactive exploration in the FiftyOne App.
865-872: LGTM: App launch.Simple and correct FiftyOne App launch for the flattened view.
883-921: Comprehensive tutorial summary.The summary effectively recaps the key learning points and provides a clear call-to-action for users interested in the commercial offering.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/tutorials/cosmos-transfer-integration.ipynb (1)
409-409: Critical: Placeholder path will break execution.The hardcoded placeholder path will cause the pipeline to fail. This was flagged in previous reviews but still shows the placeholder value.
Users must update this before running the notebook. Add validation immediately after:
os.environ["COSMOS_DIR"] = "/path/to/cosmos-transfer2.5/folder" + +# Validate COSMOS_DIR exists +_cosmos_path = Path(os.environ["COSMOS_DIR"]).expanduser() +if not _cosmos_path.exists() or str(_cosmos_path) == "/path/to/cosmos-transfer2.5/folder": + raise ValueError( + f"COSMOS_DIR must be set to your actual Cosmos-Transfer repository location. " + f"Current value: {os.environ['COSMOS_DIR']}" + )
🧹 Nitpick comments (9)
docs/source/tutorials/cosmos-transfer-integration.ipynb (9)
103-107: Consider tutorial-friendly defaults for dataset loading.The
max_samples=2is very limiting andoverwrite=Trueforces re-download every time the cell runs. For a tutorial, consider:
- Increasing
max_samplesto ~10-20 for meaningful results- Setting
overwrite=Falseby default to avoid unnecessary re-downloadsdataset_src = fouh.load_from_hub( "pjramg/moth_biotrove", persistent=True, - overwrite=True, # set to False if you don't want to re-download - max_samples=2, + overwrite=False, # set to True to force re-download + max_samples=10, # adjust based on your needs )
150-151: Add confirmation for destructive dataset deletion.Unconditionally deleting the dataset without warning could result in accidental data loss. Consider adding a prompt or moving this to a setup cell with clear warnings.
dataset_grp_name = "moth_biotrove_grouped" if fo.dataset_exists(dataset_grp_name): - fo.delete_dataset(dataset_grp_name) + print(f"Warning: Deleting existing dataset '{dataset_grp_name}'") + # Uncomment the next line to proceed with deletion: + # fo.delete_dataset(dataset_grp_name)
217-221: Unused loop variabledirnames.The
dirnamesvariable fromos.walk()is not used within the loop body.-for dirpath, dirnames, filenames in os.walk(images_root): +for dirpath, _dirnames, filenames in os.walk(images_root): jpgs = sorted([f for f in filenames if f.lower().endswith(".jpg")])
511-511: Poor style: multiple statements on one line.Using semicolons to place multiple statements on one line reduces readability.
- cap.release(); out.release() + cap.release() + out.release()
524-528: Simplify error handling in subprocess call.The try-except block can be streamlined - the return statement in the except clause is unnecessarily placed after the print.
try: subprocess.run(cmd, check=True) return True except subprocess.CalledProcessError as e: print(f"[ERROR] inference failed ({spec_json.name}): {e}") return False
489-490: Remove unnecessary f-string prefix.The f-string has no placeholders; use a regular string instead.
spec_path.parent.mkdir(parents=True, exist_ok=True) - spec_path.write_text(json.dumps(obj, indent=2)) + with spec_path.open('w') as f: + json.dump(obj, f, indent=2)Note: Also consider using
json.dump()directly instead ofdumps()+write_text()for better performance.
671-672: Duplicate function definition.The
base_key()function is already defined in the main pipeline cell (line 566). Consider removing this redefinition or using the one from the pipeline.-def base_key(path: Path) -> str: - return path.stem +# base_key() is already defined in the pipeline cell above
692-695: Inconsistent directory naming with configuration.New directories
edge_last_dirandoutput_last_dirare created here, but the configuration cell definesLAST_FRAMES_DIR = base_dir / "last_frame"(line 416). This creates confusion about where files are actually stored.Consider using consistent naming or referencing the configured directory:
-# Folder where last-frame PNGs will go -edge_last_dir = OUT_DIR / "edge_last_frames" -output_last_dir = OUT_DIR / "output_last_frames" -edge_last_dir.mkdir(parents=True, exist_ok=True) -output_last_dir.mkdir(parents=True, exist_ok=True) +# Reuse configured directory or create subdirectories under it +edge_last_dir = LAST_FRAMES_DIR / "edge" +output_last_dir = LAST_FRAMES_DIR / "output" +edge_last_dir.mkdir(parents=True, exist_ok=True) +output_last_dir.mkdir(parents=True, exist_ok=True)
786-794: Inefficient: saving samples individually in a loop.Calling
sample.save()for each sample in a loop is slow for large datasets. Use FiftyOne's batch operations instead.# Iterate through each slice and set the slice_name field for slice_name in ["image", "output_last"]: # Get samples from this slice slice_view = dataset_grp.select_group_slices(slice_name) - # Set the slice_name field for all samples in this slice - for sample in slice_view: - sample["slice_name"] = slice_name - sample.save() + # Use batch update (more efficient) + slice_view.set_values("slice_name", [slice_name] * len(slice_view))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/source/tutorials/cosmos-transfer-integration.ipynb(1 hunks)docs/source/tutorials/index.rst(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/source/tutorials/index.rst
🧰 Additional context used
🪛 Ruff (0.14.5)
docs/source/tutorials/cosmos-transfer-integration.ipynb
26-26: Redefinition of unused fo from line 2
Remove definition: fo
(F811)
39-39: Found useless expression. Either assign it to a variable or remove it.
(B018)
58-58: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
92-92: subprocess call: check for execution of untrusted input
(S603)
298-298: Multiple statements on one line (semicolon)
(E702)
312-312: subprocess call: check for execution of untrusted input
(S603)
313-313: Consider moving this statement to an else block
(TRY300)
410-413: Avoid specifying long messages outside the exception class
(TRY003)
488-488: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (4)
docs/source/tutorials/cosmos-transfer-integration.ipynb (4)
275-275: Type hint syntax requires Python 3.10+.The union type syntax
str | Nonerequires Python 3.10 or later. Since the notebook metadata shows"version": "3.10.12", this is fine, but consider documenting this requirement in the setup section for users.
821-841: LGTM!The embeddings computation and similarity index creation are properly configured. The field names are consistent between
compute_embeddings()andcompute_similarity().
871-880: LGTM!The visualization computation is correctly configured and uses the embeddings field from the previous cell.
890-890: LGTM!Standard FiftyOne app launch pattern.
What changes are proposed in this pull request?
[(This PR adds a new tutorial notebook:
cosmos-transfer-integration.ipynb
The tutorial demonstrates how to integrate NVIDIA Cosmos-Transfer 2.5 with FiftyOne. It provides a complete, reproducible pipeline for generating and visualizing multimodal data (RGB, edge, segmentation, and synthetic video outputs).
Key additions include:
How is this patch tested? If it is not, please explain why.
The notebook was executed end-to-end using a subset of the BioTrove dataset with Cosmos-Transfer 2.5.
Verified dataset grouping, edge map generation, JSON spec creation, inference execution, and FiftyOne visualization..
The pipeline was tested using bash and Python code inside the bash scripts.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyonePython library changesSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.