Skip to content

Conversation

@JessieeeNotLi
Copy link

@JessieeeNotLi JessieeeNotLi commented Nov 6, 2025

Summary by CodeRabbit

  • New Features

    • Added NVENC GPU video benchmark and GPU video-filtering workflow; new GPU-enabled inference and GPU compute benchmarks (image-classification, BERT, recommendation, JAX/linear-algebra suites).
  • Documentation

    • Added benchmark run/readme and usage examples for GPU benchmarking.
  • Chores

    • Added benchmark configs, dependency manifests, init/package helpers, experiment data; moved CI linting to GitHub Actions; added pre-commit and editor settings.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

Adds a GPU-focused multimedia benchmark suite: NVENC/FFmpeg benchmarking tools, a Bash launcher, a Python GPU benchmark orchestrator, multiple new GPU-enabled benchmark modules (video, inference, linear algebra, JAX workloads), CI/lint config updates, and related configs/input generators and requirements.

Changes

Cohort / File(s) Summary
Watermarking / NVENC bench
watermarking_bench/.gitkeep, watermarking_bench/gpu_bench.py, watermarking_bench/run_nvenc_bench.sh, watermarking_bench/read.me, watermarking_bench/watermarking_readme.md
New NVENC benchmarking tooling: Python GPU benchmark orchestrator (FFmpeg detection, filter-graph planning, GPU/CPU paths, progress parsing, metrics), Bash launcher to prepare/run bench and sample video creation, placeholder and docs.
GPU video-processing benchmark
benchmarks/200.multimedia/220.video-processing-gpu/config.json, benchmarks/200.multimedia/220.video-processing-gpu/init.sh, benchmarks/200.multimedia/220.video-processing-gpu/input.py, benchmarks/200.multimedia/220.video-processing-gpu/python/function.py, benchmarks/200.multimedia/220.video-processing-gpu/python/requirements.txt
New benchmark: config, init, input generator, FaaS handler performing download → CPU decode → GPU conv+ReLU processing → re-encode/upload, and Torch/OpenCV/NumPy deps.
Existing video processing updates
benchmarks/200.multimedia/220.video-processing/input.py, benchmarks/200.multimedia/220.video-processing/python/function.py
Formatting, style, and minor API surface adjustments (client instantiation, operations mapping); explicit FFmpeg argument list and quoting normalization.
Many new GPU/JAX/linear-algebra benchmarks
benchmarks/... (multiple new dirs under benchmarks/500.*, benchmarks/600.*, benchmarks/400.inference/*.*/python/*, plus requirements/package scripts and imagenet_class_index.json)
Adds many new benchmark suites (JAX channel flow, compute, deep-learning ResNet, ONNX/Tiny BERT, image-classification, recommendation, numerous linear algebra CUDA workloads), with handlers, helper functions, input generators, configs, and requirements.
CI, tooling, and metadata
.github/workflows/lint.yml, .pre-commit-config.yaml, .mypy.ini, .vscode/settings.json, .gitmodules, .circleci/config.yml (removed)
Replaces CircleCI config with GitHub Actions lint workflow; adds pre-commit config, mypy section expansion, VSCode settings, and updates submodule URL.
Small edits & data snapshots
experiments.json, multiple input.py formatting updates across microbenchmarks and webapps, plus watermarking_bench/.gitkeep
Adds experiments snapshot and many stylistic/formatting changes across many input modules (quote/style normalization, wrapped signatures), plus placeholder file.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Shell as run_nvenc_bench.sh
    participant Bench as gpu_bench.py
    participant FFmpeg
    participant GPU as NVIDIA
    participant FS as FileSystem

    User->>Shell: run benchmark
    Shell->>GPU: query (nvidia-smi)
    Shell->>FFmpeg: probe encoders/filters
    Shell->>FS: ensure sample video exists
    Shell->>Bench: invoke trials

    loop Trials
      Bench->>Bench: plan filter graph (gpu vs cpu path)
      Bench->>FFmpeg: execute transcode command
      FFmpeg->>GPU: use hw accel (if planned)
      FFmpeg-->>Bench: emit logs & progress
      Bench->>Bench: parse_progress & collect metrics
    end

    Bench-->>User: JSON/CSV summary
Loading
sequenceDiagram
    participant Invoker as FaaS Handler
    participant Storage
    participant CPU
    participant GPU
    participant Encoder

    Invoker->>Storage: download input
    Invoker->>CPU: decode frames
    CPU->>GPU: convert frames -> tensor batch
    GPU-->>GPU: run conv2d + ReLU loops
    GPU->>Invoker: processed tensors -> frames
    Invoker->>Encoder: re-encode video
    Invoker->>Storage: upload result
    Invoker-->>Invoker: return measurements
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Heterogeneous, large-surface changes: many new benchmark modules with GPU/JAX/ONNX/Torch code needing correctness and safety review.
  • High-risk files to inspect:
    • watermarking_bench/gpu_bench.py — FFmpeg command assembly, filter-graph logic, encoder/filter detection, error handling, and parsing.
    • watermarking_bench/run_nvenc_bench.sh — environment checks, sample generation, and invocation correctness across systems.
    • benchmarks/200.multimedia/220.video-processing-gpu/python/function.py — video decode/encode and GPU tensor pipeline, VideoWriter config and resource management.
    • ONNX/torch initialization and CUDA provider selection in benchmarks/400.inference/*/python/function.py modules.
    • Multiple new requirements files and packaging scripts — ensure CI/build compatibility and size/packaging logic.
  • Also verify license/third-party tool expectations for bundling ffmpeg, NVIDIA tooling, and large model artifacts.

Possibly related PRs

Suggested reviewers

  • mcopik

Poem

🐰 I hopped into a bench so bright,

GPUs humming through the night.
Frames and tensors in a whirl,
Metrics sparkle, logs unfurl.
From warren to cloud — benchmarks take flight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.71% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Jessie/video watermarking' is vague and uses a personal identifier prefix; while it references 'video watermarking', it's not descriptive of the actual changes. Use a more descriptive title like 'Add video watermarking GPU benchmark' that clearly conveys the primary change without personal identifiers.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JessieeeNotLi
Copy link
Author

added video watermarking benchmark

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
watermarking_bench/.gitkeep (1)

1-1: .gitkeep may be redundant if the directory contains other files.

Since the PR already adds multiple files to watermarking_bench/ (gpu_bench.py, documentation, etc.), the .gitkeep placeholder is unnecessary—the directory will be preserved by these files alone. You can safely remove this.

watermarking_bench/gpu_bench.py (2)

32-40: Consider splitting multiple statements per line.

Lines 37-39 use semicolons to place multiple statements on single lines, which slightly reduces readability.

Apply this diff for improved clarity:

-        m = re.search(r"fps=\s*([0-9]+(?:\.[0-9]+)?)", last);  fps = float(m.group(1)) if m else None
-        m = re.search(r"speed=\s*([0-9]+(?:\.[0-9]+)?)x", last); speed = float(m.group(1)) if m else None
-        m = re.search(r"frame=\s*([0-9]+)", last);              frames = int(m.group(1)) if m else None
+        m = re.search(r"fps=\s*([0-9]+(?:\.[0-9]+)?)", last)
+        fps = float(m.group(1)) if m else None
+        m = re.search(r"speed=\s*([0-9]+(?:\.[0-9]+)?)x", last)
+        speed = float(m.group(1)) if m else None
+        m = re.search(r"frame=\s*([0-9]+)", last)
+        frames = int(m.group(1)) if m else None

245-293: Consider splitting the cleanup statements for clarity.

Lines 275-276 compress the temporary file cleanup into a single line. While functional, separating these statements would improve readability.

Apply this diff:

             idx += 1
-            try: os.remove(outp)
-            except OSError: pass
+            try:
+                os.remove(outp)
+            except OSError:
+                pass
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b54b46c and aa3483f.

⛔ Files ignored due to path filters (4)
  • watermarking_bench/results.csv is excluded by !**/*.csv
  • watermarking_bench/results_long.csv is excluded by !**/*.csv
  • watermarking_bench/sample.mp4 is excluded by !**/*.mp4
  • watermarking_bench/watermark.png is excluded by !**/*.png
📒 Files selected for processing (5)
  • watermarking_bench/.gitkeep (1 hunks)
  • watermarking_bench/gpu_bench.py (1 hunks)
  • watermarking_bench/read.me (1 hunks)
  • watermarking_bench/run_nvenc_bench.sh (1 hunks)
  • watermarking_bench/watermarking_readme.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
watermarking_bench/gpu_bench.py (1)
benchmarks/200.multimedia/220.video-processing/python/function.py (1)
  • watermark (38-47)
🪛 Ruff (0.14.3)
watermarking_bench/gpu_bench.py

1-1: Shebang is present but file is not executable

(EXE001)


14-14: subprocess call: check for execution of untrusted input

(S603)


29-29: Do not catch blind exception: Exception

(BLE001)


37-37: Multiple statements on one line (semicolon)

(E702)


38-38: Multiple statements on one line (semicolon)

(E702)


39-39: Multiple statements on one line (semicolon)

(E702)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


275-275: Multiple statements on one line (colon)

(E701)


276-276: Multiple statements on one line (colon)

(E701)

🔇 Additional comments (10)
watermarking_bench/watermarking_readme.md (1)

1-9: LGTM! Clear and helpful documentation.

The instructions are concise and cover the common use cases effectively.

watermarking_bench/read.me (1)

1-3: LGTM! Concise usage instructions.

The commands are clear and cover the essential use cases.

watermarking_bench/gpu_bench.py (5)

7-22: LGTM! Well-designed helper functions.

The utility functions are clean and appropriate for the task. The subprocess call is safe since command arguments are controlled by the script logic, not untrusted user input.


24-30: LGTM! Appropriate error handling for optional GPU info.

The broad exception catch is acceptable here since nvidia-smi availability is optional, and the function gracefully degrades by returning None values.


44-130: LGTM! Comprehensive filter graph construction.

The function effectively handles multiple scenarios with proper GPU/CPU bridging. The explicit use of hwdownload and hwupload_cuda for format conversion is good practice and aligns with FFmpeg best practices for CUDA filters.


134-198: LGTM! Well-structured transcoding function.

The function properly orchestrates the FFmpeg command construction, execution, and metric collection. The detailed error message on line 149 (flagged by static analysis) is actually helpful for debugging, providing both the failure output and the command arguments.


200-244: LGTM! Comprehensive benchmark orchestration.

The main function effectively handles argument parsing, warmup execution, and trial iteration. The default trials include av1_nvenc, which will fail on systems without AV1 support, but this is caught by the encoder availability check in transcode_once(), so the error handling is appropriate.

watermarking_bench/run_nvenc_bench.sh (3)

1-26: LGTM! Robust script setup and validation.

The script properly validates prerequisites and provides helpful diagnostic output. The use of set -euo pipefail ensures failures are caught early.


269-278: LGTM! Helpful sample generation for testing.

The automatic generation of a 4K test video with audio is convenient for users who want to run the benchmark without providing their own input. The encoding parameters are reasonable for a test sample.


280-300: LGTM! Clear benchmark execution and results preview.

The benchmark execution is well-structured with timestamped CSV output and a helpful preview of results. The trial selection excludes av1_nvenc, which improves compatibility with systems that lack AV1 encoder support.

@@ -0,0 +1,293 @@
#!/usr/bin/env python3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Set executable permission on the file.

The shebang indicates this script should be directly executable, but the file permissions aren't set. Ensure chmod +x is applied to match the usage shown in the documentation.

Apply this fix:

chmod +x watermarking_bench/gpu_bench.py
🧰 Tools
🪛 Ruff (0.14.3)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents
In watermarking_bench/gpu_bench.py at line 1 the file contains a shebang but
lacks executable permission; make the script executable by setting the execute
bit (e.g., run chmod +x watermarking_bench/gpu_bench.py) so it can be run
directly as indicated in the docs and CI.

Comment on lines +28 to +264
# --- Make or update a working gpu_bench.py (GPU-first, safe bridges, skip missing encoders) ---
cat > gpu_bench.py <<'PY'
#!/usr/bin/env python3
import argparse, datetime, json, os, re, shutil, subprocess, sys, tempfile, csv
from typing import List, Dict, Any, Optional, Tuple
def which_ffmpeg() -> str:
p = shutil.which("ffmpeg")
if not p:
sys.exit("ffmpeg not found on PATH. Use Docker image with NVENC or install FFmpeg with NVENC.")
return p
def run(cmd: List[str]) -> subprocess.CompletedProcess:
return subprocess.run(cmd, stdin=subprocess.DEVNULL, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
def has_encoder(ffmpeg: str, enc: str) -> bool:
out = run([ffmpeg, "-hide_banner", "-encoders"]).stdout
return re.search(rf"\b{re.escape(enc)}\b", out) is not None
def has_filter(ffmpeg: str, name: str) -> bool:
out = run([ffmpeg, "-hide_banner", "-filters"]).stdout
return (f" {name} " in out)
def gpu_info() -> Dict[str, Any]:
try:
out = run(["nvidia-smi", "--query-gpu=name,memory.total,driver_version", "--format=csv,noheader,nounits"]).stdout.strip()
name, mem, drv = [x.strip() for x in out.splitlines()[0].split(",")]
return {"name": name, "memory_total_mb": int(mem), "driver_version": drv}
except Exception:
return {"name": None, "memory_total_mb": None, "driver_version": None}
def parse_progress(log: str) -> Dict[str, Any]:
lines = [ln for ln in log.splitlines() if ("fps=" in ln or "speed=" in ln or "frame=" in ln)]
fps = speed = frames = None
if lines:
last = lines[-1]
m = re.search(r"fps=\s*([0-9]+(?:\.[0-9]+)?)", last); fps = float(m.group(1)) if m else None
m = re.search(r"speed=\s*([0-9]+(?:\.[0-9]+)?)x", last); speed = float(m.group(1)) if m else None
m = re.search(r"frame=\s*([0-9]+)", last); frames = int(m.group(1)) if m else None
return {"fps": fps, "speed_x": speed, "frames": frames}
def build_vf_or_complex(ffmpeg: str, scale: Optional[str], wm_path: Optional[str], overlay: str, want_gpu_decode: bool) -> Tuple[List[str], str]:
used = []
vf_args: List[str] = []
complex_graph = ""
have_scale_npp = has_filter(ffmpeg, "scale_npp")
have_scale_cuda = has_filter(ffmpeg, "scale_cuda")
have_overlay_cuda = has_filter(ffmpeg, "overlay_cuda")
if not wm_path:
if scale:
if want_gpu_decode and have_scale_npp:
vf_args = ["-vf", f"scale_npp={scale}"]; used.append("scale_npp")
elif want_gpu_decode and have_scale_cuda:
vf_args = ["-vf", f"scale_cuda={scale}"]; used.append("scale_cuda")
else:
vf_args = ["-vf", f"hwdownload,format=nv12,scale={scale},hwupload_cuda"]
used.append("scale(cpu)+hwdownload+hwupload_cuda")
return (vf_args, "+".join(used))
# watermark path
if want_gpu_decode and have_overlay_cuda:
if scale and have_scale_npp:
complex_graph = f"[0:v]scale_npp={scale}[v0];[v0][1:v]overlay_cuda={overlay}[vout]"
used += ["scale_npp","overlay_cuda"]
elif scale and have_scale_cuda:
complex_graph = f"[0:v]scale_cuda={scale}[v0];[v0][1:v]overlay_cuda={overlay}[vout]"
used += ["scale_cuda","overlay_cuda"]
elif scale:
complex_graph = f"[0:v]hwdownload,format=nv12,scale={scale},hwupload_cuda[v0];[v0][1:v]overlay_cuda={overlay}[vout]"
used += ["scale(cpu)+hwdownload+hwupload_cuda","overlay_cuda"]
else:
complex_graph = f"[0:v][1:v]overlay_cuda={overlay}[vout]"
used += ["overlay_cuda"]
return (["-filter_complex", complex_graph, "-map", "[vout]"], "+".join(used))
# CPU overlay fallback (explicit bridges)
if scale and want_gpu_decode and (have_scale_npp or have_scale_cuda):
scaler = "scale_npp" if have_scale_npp else "scale_cuda"
complex_graph = (
f"[0:v]{scaler}={scale}[v0gpu];"
f"[v0gpu]hwdownload,format=nv12[v0cpu];"
f"[v0cpu][1:v]overlay={overlay}[mix];"
f"[mix]hwupload_cuda[vout]"
)
used += [scaler, "hwdownload+overlay(cpu)+hwupload_cuda"]
elif scale:
complex_graph = (
f"[0:v]hwdownload,format=nv12,scale={scale}[v0cpu];"
f"[v0cpu][1:v]overlay={overlay}[mix];"
f"[mix]hwupload_cuda[vout]"
)
used += ["scale(cpu)+overlay(cpu)+hwupload_cuda"]
else:
complex_graph = (
f"[0:v]hwdownload,format=nv12[v0cpu];"
f"[v0cpu][1:v]overlay={overlay}[mix];"
f"[mix]hwupload_cuda[vout]"
)
used += ["overlay(cpu)+hwupload_cuda"]
return (["-filter_complex", complex_graph, "-map", "[vout]"], "+".join(used))
def transcode_once(ffmpeg: str, inp: str, outp: str, codec: str, bitrate: str, preset: str,
duration: Optional[float], scale: Optional[str], wm_path: Optional[str],
overlay_pos: str, decode_mode: str = "gpu") -> Dict[str, Any]:
if not has_encoder(ffmpeg, codec):
raise RuntimeError(f"encoder '{codec}' not available; check your ffmpeg build (NVENC/AV1).")
want_gpu_decode = (decode_mode == "gpu")
args = [ffmpeg, "-hide_banner", "-y", "-vsync", "0"]
if want_gpu_decode:
args += ["-hwaccel", "cuda", "-hwaccel_output_format", "cuda", "-extra_hw_frames", "16",
"-init_hw_device", "cuda=cuda", "-filter_hw_device", "cuda"]
args += ["-i", inp]
if wm_path:
args += ["-loop", "1", "-i", wm_path]
if duration:
args += ["-t", str(duration)]
filt_args, filter_used = build_vf_or_complex(ffmpeg, scale, wm_path, overlay_pos, want_gpu_decode)
args += filt_args
args += ["-c:v", codec, "-b:v", bitrate, "-preset", preset, "-rc", "vbr", "-movflags", "+faststart"]
args += ["-c:a", "copy"]
args += [outp]
t0 = datetime.datetime.now()
proc = run(args)
t1 = datetime.datetime.now()
if proc.returncode != 0:
raise RuntimeError("ffmpeg failed:\n" + proc.stdout + f"\n\nARGS:\n{' '.join(args)}")
parsed = parse_progress(proc.stdout)
size = os.path.getsize(outp) if os.path.exists(outp) else 0
return {
"args": args,
"filter_used": filter_used,
"stdout_tail": "\n".join(proc.stdout.splitlines()[-15:]),
"compute_time_us": (t1 - t0) / datetime.timedelta(microseconds=1),
"fps": parsed["fps"],
"speed_x": parsed["speed_x"],
"frames": parsed["frames"],
"output_size_bytes": size
}
def main():
ap = argparse.ArgumentParser(description="GPU NVENC benchmark.")
ap.add_argument("--input", required=True)
ap.add_argument("--duration", type=float, default=None)
ap.add_argument("--repeat", type=int, default=1)
ap.add_argument("--warmup", action="store_true")
ap.add_argument("--csv", default=None)
ap.add_argument("--watermark", default=None)
ap.add_argument("--overlay", default="main_w/2-overlay_w/2:main_h/2-overlay_h/2")
ap.add_argument("--decode", choices=["gpu","cpu"], default="gpu")
ap.add_argument("--trials", nargs="+", default=[
"codec=h264_nvenc,bitrate=5M,preset=p5",
"codec=h264_nvenc,bitrate=12M,preset=p1,scale=1920:1080",
"codec=hevc_nvenc,bitrate=6M,preset=p4",
# "codec=av1_nvenc,bitrate=3M,preset=p5", # only if available
])
args = ap.parse_args()
ffmpeg = which_ffmpeg()
gi = gpu_info()
def parse_trial(s: str) -> Dict[str, str]:
d: Dict[str, str] = {}
for kv in s.split(","):
k, v = kv.split("=", 1); d[k.strip()] = v.strip()
return d
trial_specs = [parse_trial(s) for s in args.trials]
# Warmup with first available encoder
if args.warmup:
warm = next((t for t in trial_specs if has_encoder(ffmpeg, t.get("codec","h264_nvenc"))), None)
if warm:
with tempfile.NamedTemporaryFile(suffix=".mp4", delete=True) as tmp:
_ = transcode_once(ffmpeg, args.input, tmp.name,
warm.get("codec","h264_nvenc"),
warm.get("bitrate","5M"),
warm.get("preset","p5"),
args.duration,
warm.get("scale"),
args.watermark,
args.overlay,
args.decode)
results = []; idx = 0
for spec in trial_specs:
for _ in range(args.repeat):
if not has_encoder(ffmpeg, spec.get("codec","h264_nvenc")):
results.append({
"trial_index": idx, "codec": spec.get("codec"), "bitrate": spec.get("bitrate"),
"preset": spec.get("preset"), "scale_filter": "", "fps": None, "speed_x": None,
"frames": None, "compute_time_us": 0, "output_size_bytes": 0,
"stdout_tail": "SKIPPED: encoder not available", "argv": "", "status": "skipped"
}); idx += 1; continue
with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
outp = tmp.name
res = transcode_once(ffmpeg, args.input, outp,
spec.get("codec","h264_nvenc"),
spec.get("bitrate","5M"),
spec.get("preset","p5"),
args.duration,
spec.get("scale"),
args.watermark,
args.overlay,
args.decode)
results.append({
"trial_index": idx, "codec": spec.get("codec"), "bitrate": spec.get("bitrate"),
"preset": spec.get("preset"), "scale_filter": res["filter_used"], "fps": res["fps"],
"speed_x": res["speed_x"], "frames": res["frames"],
"compute_time_us": res["compute_time_us"], "output_size_bytes": res["output_size_bytes"],
"stdout_tail": res["stdout_tail"], "argv": " ".join(res["args"]), "status": "ok"
})
idx += 1
try: os.remove(outp)
except OSError: pass
report = {"gpu": gi, "ffmpeg_path": ffmpeg, "trial_count": len(results), "results": results}
print(json.dumps(report, indent=2))
if args.csv and results:
with open(args.csv, "w", newline="") as f:
w = csv.DictWriter(f, fieldnames=list(results[0].keys()))
w.writeheader(); w.writerows(results)
if __name__ == "__main__":
main()
PY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Eliminate code duplication between embedded and standalone scripts.

The Python benchmarking code is duplicated here (237 lines embedded) and in the standalone gpu_bench.py file. This creates maintenance burden and increases the risk of the two versions diverging—which has already begun (the embedded version has av1_nvenc commented out on line 192 and includes different skipping logic on lines 225-231).

Consider one of these approaches:

Option 1 (recommended): Ship gpu_bench.py as a standalone script and invoke it:

-cat > gpu_bench.py <<'PY'
-#!/usr/bin/env python3
-[... entire script ...]
-PY
-chmod +x gpu_bench.py
+# Ensure gpu_bench.py is in the same directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+GPU_BENCH="${SCRIPT_DIR}/gpu_bench.py"
+
+if [[ ! -f "$GPU_BENCH" ]]; then
+  echo "Error: gpu_bench.py not found in ${SCRIPT_DIR}"
+  exit 1
+fi
+chmod +x "$GPU_BENCH"

Then update the execution line:

-./gpu_bench.py \
+"$GPU_BENCH" \

Option 2: If embedding is required for distribution, generate it from the canonical source during build/release rather than maintaining two copies manually.

Committable suggestion skipped: line range outside the PR's diff.

McLavish and others added 28 commits November 13, 2025 10:30
Removed configuration details for MinIO and ScyllaDB.
Removed sensitive SSH private key from eval command.
Merge benchmarks by Russell and Yuxuan into Development
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
benchmarks/200.multimedia/220.video-processing/input.py (1)

19-43: LGTM! Well-implemented with good error handling.

The function correctly:

  • Tracks the last uploaded file with clear variable naming
  • Handles the case when no .mp4 files are found
  • Provides comprehensive documentation through comments
  • Configures the GPU filter operation with appropriate parameters

The unused size and nosql_func parameters are likely required by the benchmark framework interface. Based on learnings, input validation is not required since benchmarks are invoked by maintainers.

Note: This implementation is nearly identical to benchmarks/200.multimedia/220.video-processing-gpu/input.py (which has critical bugs). Consider whether both files are necessary or if they should share common logic to avoid drift.

benchmarks/200.multimedia/220.video-processing-gpu/python/function.py (4)

3-12: Tighten imports and ensure external GPU deps are provisioned

  • subprocess is no longer used anywhere; keeping it is noise and may confuse future readers.
  • numpy, cv2, and torch are hard runtime dependencies of this function. Make sure they are either in this benchmark’s requirements / Docker image or clearly documented, otherwise the handler will fail at import time in some environments.

Consider:

-import subprocess  # not strictly needed anymore, but you can remove it
-import stat        # can also be removed if you drop ffmpeg entirely
+import stat        # can also be removed if you drop ffmpeg entirely

…and verify that the environment used by this benchmark explicitly includes numpy, opencv-python, and torch (or their equivalent packages).


42-49: Rename unused loop index for clarity / lint compliance

i is not used in the loop body, which Ruff flags (B007) and mildly hurts readability.

-    for i in range(max_frames):
+    for _ in range(max_frames):

This communicates intent and silences the warning without behavior change.


52-57: Fix ambiguous dash in comment and unused shape components

  • The 0–255 comment uses an en dash instead of a hyphen (RUF003).
  • T and C are never used (RUF059); better to avoid binding them.
-    # Stack into (T, H, W, C)
-    video_np = np.stack(frames, axis=0)  # uint8, 0–255
-    T, H, W, C = video_np.shape
+    # Stack into (T, H, W, C)
+    video_np = np.stack(frames, axis=0)  # uint8, 0-255
+    _, H, W, _ = video_np.shape

This keeps the doc value of the comment while aligning with the linter and avoiding unused variables.


154-160: Consider dropping the ffmpeg chmod block if ffmpeg is no longer used

The handler no longer shells out to an embedded ffmpeg binary; this block now just does a stat/chmod in a try/except, usually on a non-existent file.

If you’re confident no path still relies on that embedded binary, simplifying by removing this block will cut a tiny bit of cold-start and per-invocation overhead and reduce cognitive load:

-    ffmpeg_binary = os.path.join(SCRIPT_DIR, "ffmpeg", "ffmpeg")
-    try:
-        st = os.stat(ffmpeg_binary)
-        os.chmod(ffmpeg_binary, st.st_mode | stat.S_IEXEC)
-    except OSError:
-        # Ignore if ffmpeg is not present or filesystem is read-only.
-        pass

If you do still need the binary for other operations, keeping this as-is is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e53cfde and 703a05d.

⛔ Files ignored due to path filters (1)
  • benchmarks/200.multimedia/220.video-processing-gpu/resources/watermark.png is excluded by !**/*.png
📒 Files selected for processing (7)
  • benchmarks/200.multimedia/220.video-processing-gpu/config.json (1 hunks)
  • benchmarks/200.multimedia/220.video-processing-gpu/init.sh (1 hunks)
  • benchmarks/200.multimedia/220.video-processing-gpu/input.py (1 hunks)
  • benchmarks/200.multimedia/220.video-processing-gpu/python/function.py (1 hunks)
  • benchmarks/200.multimedia/220.video-processing-gpu/python/requirements.txt (1 hunks)
  • benchmarks/200.multimedia/220.video-processing/input.py (1 hunks)
  • experiments.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • benchmarks/200.multimedia/220.video-processing-gpu/config.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-19T16:15:01.811Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.

Applied to files:

  • benchmarks/200.multimedia/220.video-processing-gpu/init.sh
🧬 Code graph analysis (2)
benchmarks/200.multimedia/220.video-processing-gpu/input.py (1)
benchmarks/200.multimedia/220.video-processing/input.py (2)
  • buckets_count (4-6)
  • generate_input (19-43)
benchmarks/200.multimedia/220.video-processing/input.py (1)
benchmarks/200.multimedia/210.thumbnailer/input.py (1)
  • generate_input (15-29)
🪛 Gitleaks (8.29.0)
experiments.json

[high] 266-266: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 273-273: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 Ruff (0.14.5)
benchmarks/200.multimedia/220.video-processing-gpu/input.py

15-15: Unused function argument: size

(ARG001)


15-15: Unused function argument: nosql_func

(ARG001)

benchmarks/200.multimedia/220.video-processing/input.py

19-19: Unused function argument: size

(ARG001)


19-19: Unused function argument: nosql_func

(ARG001)


29-29: Avoid specifying long messages outside the exception class

(TRY003)

benchmarks/200.multimedia/220.video-processing-gpu/python/function.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)


42-42: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


53-53: Avoid specifying long messages outside the exception class

(TRY003)


56-56: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)


57-57: Unpacked variable T is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


57-57: Unpacked variable C is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


88-88: Probable insecure usage of temporary file or directory: "/tmp/processed-"

(S108)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


146-148: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Probable insecure usage of temporary file or directory: "/tmp/"

(S108)

🪛 Shellcheck (0.11.0)
benchmarks/200.multimedia/220.video-processing-gpu/init.sh

[warning] 4-4: DIR appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 5-5: VERBOSE appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 6-6: TARGET_ARCHITECTURE appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (4)
benchmarks/200.multimedia/220.video-processing-gpu/init.sh (1)

1-10: LGTM!

The script correctly implements a no-op initialization that conforms to the SeBS framework interface. The unused variables are intentional as documented in the comment.

benchmarks/200.multimedia/220.video-processing-gpu/input.py (1)

3-4: LGTM!

The buckets_count() function correctly returns one input and one output bucket.

benchmarks/200.multimedia/220.video-processing/input.py (2)

1-6: LGTM!

The imports and buckets_count() function are correctly implemented with clear documentation.


9-18: LGTM!

The docstring accurately describes the function and its parameters.

Comment on lines +6 to +14
'''
Generate test, small and large workload for thumbnailer.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the docstring to match this benchmark.

The docstring references "thumbnailer" and uses outdated parameter names that don't match the actual function signature.

Apply this diff to update the docstring:

 '''
-    Generate test, small and large workload for thumbnailer.
+    Generate test, small and large workload for the GPU video filter benchmark.
 
     :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    :param size: workload size (e.g., "test", "small", "large")
+    :param benchmarks_bucket: name of the benchmark bucket
+    :param input_paths: list of input prefixes (one per input bucket)
+    :param output_paths: list of output prefixes (one per output bucket)
+    :param upload_func: upload function taking three params (bucket_idx, key, filepath)
+    :param nosql_func: not used here
 '''

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/input.py around lines 6 to
14, the module docstring incorrectly references "thumbnailer" and lists outdated
parameter names; update it to describe this benchmark (video-processing-gpu) and
its behavior (generate test, small and large workloads for video processing),
and replace the parameter list with the actual parameter names from the function
signature in this file (copy the exact names and types/meanings from the def
line into the docstring), ensuring each param description matches the current
parameters (e.g., data directory, size, input/output containers and the upload
callback) and remove any stale references to "thumbnailer."

Comment on lines +15 to +28
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)
#TODO: multiple datasets
input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = img
# The function supports only "gpu-filter" at the moment.
input_config['object']['op'] = 'gpu-filter'
input_config['object']['duration'] = 1
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Add error handling for missing files and improve variable naming.

This function has several issues:

  1. NameError if no .mp4 files exist: If the glob finds no files, the loop never executes and img is never defined, causing a NameError on line 21.
  2. Misleading variable name: img suggests an image file, but this processes video files.
  3. Missing parameter: The num_iters field present in the similar function at benchmarks/200.multimedia/220.video-processing/input.py (line 37) is missing here.

Apply this diff to fix these issues (matching the implementation in 220.video-processing/input.py):

 def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
+    last_key = None
+
+    # Upload all .mp4 files from data_dir to bucket 0
     for file in glob.glob(os.path.join(data_dir, '*.mp4')):
-        img = os.path.relpath(file, data_dir)
-        upload_func(0, img, file)
+        key = os.path.relpath(file, data_dir)
+        upload_func(0, key, file)
+        last_key = key
+
+    if last_key is None:
+        raise RuntimeError(f"No .mp4 files found in {data_dir}")
+
     #TODO: multiple datasets
     input_config = {'object': {}, 'bucket': {}}
-    input_config['object']['key'] = img
+    # Use the last uploaded file as the input object
+    input_config['object']['key'] = last_key
     # The function supports only "gpu-filter" at the moment.
-    input_config['object']['op'] = 'gpu-filter'
-    input_config['object']['duration'] = 1
+    input_config['object']['op'] = 'gpu-filter'     # must match your handler's operations dict
+    input_config['object']['duration'] = 1          # seconds of video to process
+    input_config['object']['num_iters'] = 10        # extra param for GPU workload intensity
+
     input_config['bucket']['bucket'] = benchmarks_bucket
🧰 Tools
🪛 Ruff (0.14.5)

15-15: Unused function argument: size

(ARG001)


15-15: Unused function argument: nosql_func

(ARG001)

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/input.py around lines 15
to 28, handle the case where no .mp4 files are found (avoid NameError) by
checking glob results and raising/logging an error or returning None, rename the
variable from img to a clearer video_relpath or video_key, ensure you call
upload_func for the selected video(s) with the correct args, and add the missing
'num_iters' field (set to 1 like the non-GPU implementation) into input_config;
mirror the control flow and field names used in
benchmarks/200.multimedia/220.video-processing/input.py so input_config always
has object.key, object.op ('gpu-filter'), object.duration, object.num_iters, and
bucket.* set, and fail fast if no video files are available.

Comment on lines +35 to +40
fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0:
fps = 25.0 # fallback

max_frames = int(fps * duration)
frames = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against NaN/invalid FPS values before computing max_frames

If cap.get(cv2.CAP_PROP_FPS) returns NaN (which some containers/drivers do), the condition if not fps or fps <= 0 will not trigger, and int(fps * duration) will raise at runtime.

You can defensively handle this:

+import math
 ...
-    fps = cap.get(cv2.CAP_PROP_FPS)
-    if not fps or fps <= 0:
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    if not fps or fps <= 0 or math.isnan(fps):
         fps = 25.0  # fallback

This keeps the simple fallback while avoiding a crash on malformed metadata.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0:
fps = 25.0 # fallback
max_frames = int(fps * duration)
frames = []
fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0 or math.isnan(fps):
fps = 25.0 # fallback
max_frames = int(fps * duration)
frames = []
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/function.py around
lines 35 to 40, fps obtained from cap.get(cv2.CAP_PROP_FPS) can be NaN or
non-finite so the current check (if not fps or fps <= 0) can miss it and cause
int(fps * duration) to crash; defend by validating fps with math.isfinite (or
math.isnan) and the existing <= 0 check, e.g. import math and set fps = fallback
when not math.isfinite(fps) or fps <= 0, then compute max_frames from the
validated fps.

Comment on lines +86 to +90
# Encode processed video with OpenCV (CPU)
base = os.path.splitext(os.path.basename(video_path))[0]
out_path = f"/tmp/processed-{base}.mp4"

fourcc = cv2.VideoWriter_fourcc(*"mp4v")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden /tmp usage and sanitize user-controlled key

Two related issues here:

  1. download_path = f"/tmp/{key}":
    • If key contains path separators or .., this can escape /tmp or attempt to create nested directories that don’t exist, leading to failures or even overwriting unintended files.
  2. Static analysis (S108) is also concerned about the predictable paths /tmp/processed-{base}.mp4 and /tmp/{key}.

A minimal hardening that preserves current naming:

-    base = os.path.splitext(os.path.basename(video_path))[0]
-    out_path = f"/tmp/processed-{base}.mp4"
+    base = os.path.splitext(os.path.basename(video_path))[0]
+    out_path = os.path.join("/tmp", f"processed-{base}.mp4")
 ...
-    download_path = f"/tmp/{key}"
+    # Restrict local path to a basename to avoid directory traversal and
+    # missing parent directories under /tmp.
+    local_name = os.path.basename(key)
+    download_path = os.path.join("/tmp", local_name)

If you want to fully satisfy stricter security tooling, consider using tempfile.mkstemp / NamedTemporaryFile for one or both of these paths, but the basename sanitization above already removes the traversal risk while staying simple.

Also applies to: 150-177

🧰 Tools
🪛 Ruff (0.14.5)

88-88: Probable insecure usage of temporary file or directory: "/tmp/processed-"

(S108)

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/function.py around
lines 86 to 90 (and similarly lines 150-177), the code constructs predictable
/tmp paths using user-controlled values; sanitize the inputs and avoid path
traversal by taking only the basename of user-controlled key (e.g., key_base =
os.path.basename(key)) and validating/stripping any remaining suspicious
segments like '..' or os.sep, then build paths with os.path.join("/tmp",
key_base); for the processed output prefer creating a temporary file via
tempfile.NamedTemporaryFile or tempfile.mkstemp (or at minimum sanitize base =
os.path.basename(os.path.splitext(os.path.basename(video_path))[0]) and join
with /tmp) to eliminate predictable filenames and ensure the directory is
correct and exists before writing.

Comment on lines +1 to +3
torch
opencv-python-headless
numpy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Pin dependency versions for reproducibility and security.

The requirements file lacks version constraints, which poses reproducibility and security risks. Unpinned versions can produce non-deterministic benchmark results and may introduce breaking changes across CI/CD runs.

Additionally, torch is specified without explicit GPU support configuration—confirm whether CUDA/cuDNN dependencies are documented elsewhere in the benchmark setup, or consider pinning a CUDA-specific torch variant if GPU acceleration is required.

Consider applying version pins like this:

-torch
-opencv-python-headless
-numpy
+torch==2.0.1
+opencv-python-headless==4.8.1.78
+numpy==1.24.3

Once you've determined the appropriate versions, please verify:

  1. The versions are compatible with your target GPU/CUDA environment.
  2. Any external CUDA/cuDNN dependencies are documented in a setup guide for the benchmark.
  3. Run a test build to confirm all dependencies resolve without conflicts.





---

🏁 Script executed:

```shell
#!/bin/bash
# Search for all requirements files to understand version pinning patterns in codebase
echo "=== Finding all requirements.txt files ==="
find . -name "requirements*.txt" -type f | head -20

echo -e "\n=== Checking versions in other requirements files ==="
find . -name "requirements*.txt" -type f -exec sh -c 'echo "File: $1"; head -10 "$1"' _ {} \; | head -60

Length of output: 2594


🏁 Script executed:

#!/bin/bash
# Search for GPU, CUDA, cuDNN references in benchmark directory and related docs
echo "=== Searching for GPU/CUDA references in benchmark context ==="
rg -i "cuda|gpu|cudnn" benchmarks/200.multimedia/220.video-processing-gpu/ --type-list | head -5
rg -i "cuda|gpu|cudnn" benchmarks/200.multimedia/220.video-processing-gpu/ -A2 -B2 2>/dev/null | head -40

echo -e "\n=== Looking for setup documentation ==="
find benchmarks/200.multimedia/220.video-processing-gpu/ -type f \( -name "*.md" -o -name "*.txt" -o -name "setup*" -o -name "README*" \) 

Length of output: 4364


🏁 Script executed:

#!/bin/bash
# Check structure of the benchmark directory and look for related docs
echo "=== Contents of video-processing-gpu benchmark directory ==="
ls -la benchmarks/200.multimedia/220.video-processing-gpu/ 2>/dev/null || echo "Directory check"
find benchmarks/200.multimedia/220.video-processing-gpu/ -type f | head -20

echo -e "\n=== Parent directory context ==="
ls -la benchmarks/200.multimedia/

Length of output: 1214


🏁 Script executed:

#!/bin/bash
# Search for any documentation about dependencies or torch in the benchmark context
echo "=== Searching for torch, dependencies, or installation docs ==="
rg -i "torch|dependencies|install|setup" benchmarks/200.multimedia/ --type md --type txt -A2 | head -50

echo -e "\n=== Checking if there's a main setup guide ==="
find benchmarks/ -maxdepth 2 -name "README*" -o -name "setup*" -o -name "INSTALL*" | head -10
cat benchmarks/README.md 2>/dev/null | head -30

Length of output: 371


🌐 Web query:

torch 2.0.1 opencv-python-headless 4.8.1 numpy 1.24 compatibility

💡 Result:

Short answer: Yes — torch 2.0.1 works with NumPy 1.24, and opencv-python-headless 4.8.1 also supports NumPy 1.24. The main incompatibility to avoid is NumPy 2.x (PyTorch wheels built against NumPy 1.x can fail under NumPy 2.x). Test by importing torch and cv2 after installing.

Recommended pip pins:

  • pip install "numpy==1.24.*"
  • pip install "torch==2.0.1"
  • pip install "opencv-python-headless==4.8.1"

References:

  • PyTorch issues describing failures when NumPy 2.x is used with PyTorch built against NumPy 1.x. [1][2]
  • opencv-python-headless PyPI (wheels follow NumPy/Python compatibility). [3]

Sources:
[1] PyTorch issue: Numpy compatibility / DataLoader RuntimeError.
[2] PyTorch issue: modules compiled with NumPy 1.x incompatible with NumPy 2.x.
[3] opencv-python-headless PyPI project page.


🌐 Web query:

torch GPU setup requirements CUDA environment variables

💡 Result:

Summary — what you need for PyTorch GPU + relevant CUDA environment variables

  1. Requirements (high level)
  • Install a PyTorch binary built for the CUDA/ROCm version you intend to use, or build PyTorch from source against your local CUDA. The pip/conda wheel’s CUDA version must match the binary you select (or you must build). Verify with torch.cuda.is_available() after install. [1][3]

  • NVIDIA GPU + matching NVIDIA driver installed on host (driver must support the CUDA runtime version used by the PyTorch binary). The driver + GPU must be functional (check nvidia-smi). [3]

  • If building extensions or PyTorch from source you need a local CUDA toolkit (nvcc), cuDNN, NVML, etc., and to point environment variables to the toolkit and libs. [2][4]

  1. Important environment variables you’ll commonly set/check
  • CUDA_HOME (or CUDA_PATH on Windows): root of CUDA installation (e.g. /usr/local/cuda or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1). Used by build scripts and some tools. [2][4]
  • CUDACXX: path to nvcc (e.g. ${CUDA_HOME}/bin/nvcc) when building from source. [4]
  • PATH: include ${CUDA_HOME}/bin so nvcc and tools are found. [2][4]
  • LD_LIBRARY_PATH (Linux) / PATH (Windows) / DYLD_LIBRARY_PATH (macOS older): include CUDA lib directories (e.g. ${CUDA_HOME}/lib64) so shared libs (libcudart, cuDNN) are found at runtime. [2][4]
  • CUDA_VISIBLE_DEVICES: control which GPUs are visible to CUDA (comma-separated indices, or -1 to hide all). Useful for multi-GPU hosts. [2]
  • CUDA_LAUNCH_BLOCKING=1: make CUDA calls synchronous (debugging). [2]
  • PYTORCH_CUDA_ALLOC_CONF: control PyTorch CUDA allocator behavior (memory fragmentation, max split size, etc.). [2]
  • PYTORCH_NO_CUDA_MEMORY_CACHING: disable PyTorch caching allocator (useful for debugging). [2]
  • TORCH_CUDNN_V8_API_DISABLED / TORCH_CUDNN_V8_API_LRU_CACHE_LIMIT etc.: cuDNN-specific controls. [2]
  • NVML-related (NVML_LIB_PATH) when building packages that query NVML. [4]
  1. Typical quick checks & steps (Linux)
  • Ensure driver/GPU visible: nvidia-smi
  • Ensure CUDA bin/libs on PATH/LD_LIBRARY_PATH:
    export CUDA_HOME=/usr/local/cuda
    export CUDACXX="${CUDA_HOME}/bin/nvcc"
    export PATH="${CUDA_HOME}/bin:${PATH}"
    export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
    source ~/.bashrc
  • Install PyTorch GPU wheel matching desired CUDA version (or conda package with cudatoolkit). Then in Python:
    import torch; print(torch.cuda.is_available(), torch.version.cuda, torch.cuda.device_count())
  1. Notes / gotchas
  • Prebuilt PyTorch wheels include CUDA runtimes for the CUDA versions they were built for; setting CUDA_HOME / local CUDA does not make a CPU-only wheel suddenly use your system CUDA — you must install a GPU-enabled PyTorch build or build from source. [3]
  • Mismatch between PyTorch’s bundled CUDA version and your GPU driver can cause torch.cuda.is_available() to be False; updating the NVIDIA driver or installing a matching PyTorch binary usually resolves this. [3]

Sources

  • PyTorch CUDA environment variables and PyTorch-specific vars. [1]
  • PyTorch Get Started / installation and verification notes (matching binaries, building from source). [3]
  • Example CUDA env var usage and build-time vars (CUDA_HOME, CUDACXX, LD_LIBRARY_PATH, NVML paths). [4]

References
[1] PyTorch — CUDA Environment Variables.
[2] PyTorch/XLA GPU environment setup (examples of PATH/LD_LIBRARY_PATH).
[3] PyTorch Get Started / Installation & verification.

(If you want, tell me your OS, PyTorch version and how you installed it and I’ll give the exact install / env var commands and a short checklist to get torch.cuda.is_available() → True.)


Pin dependency versions to prevent NumPy 2.x incompatibility and ensure reproducibility.

Version pinning is missing, which creates reproducibility risks. Most critically, NumPy 2.x is incompatible with PyTorch wheels built against NumPy 1.x, leading to runtime failures if versions drift across CI/CD runs.

Regarding GPU support: the code correctly uses torch.cuda.is_available() to detect and enable GPU at runtime, which is the proper pattern. GPU availability depends on the environment (CUDA toolkit, driver, PyTorch build selection) rather than the requirements.txt file itself. The standard torch package handles both CPU and GPU cases gracefully.

Consider pinning versions with NumPy <2.0 constraint:

-torch
-opencv-python-headless
-numpy
+torch==2.0.1
+opencv-python-headless==4.8.1.78
+numpy<2.0

Verify compatibility with your target environment after applying pins.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch
opencv-python-headless
numpy
torch==2.0.1
opencv-python-headless==4.8.1.78
numpy<2.0
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/requirements.txt
lines 1-3, dependency versions are not pinned which risks pulling NumPy 2.x
(incompatible with many PyTorch wheels) and breaks reproducibility; update the
file to pin specific, tested versions (at minimum constrain NumPy to <2.0), pin
opencv-python-headless to a known working release, and pin torch to the version
you intend to run (or document the exact wheel URL/extra-index for CUDA-enabled
builds); after changing, run CI or a local environment install to verify
compatibility with your target CUDA/toolchain and adjust pins accordingly.

"version": "6.0"
},
"storage": {
"access_key": "cStL2xQ1C1c2wpB_25M3QnnNO4aizEjhHEH203q_a7s",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove hardcoded credentials from version control.

Lines 266 and 273 contain hardcoded MinIO storage credentials (access_key and secret_key). Storing secrets in committed code is a critical security vulnerability. These must be removed, replaced with placeholder values, or sanitized before merging.

If this file contains actual test data with development credentials, replace the values with non-sensitive placeholders:

         "storage": {
-          "access_key": "cStL2xQ1C1c2wpB_25M3QnnNO4aizEjhHEH203q_a7s",
+          "access_key": "PLACEHOLDER_ACCESS_KEY",
           "address": "host.docker.internal:9011",
           "data_volume": "minio-volume",
           "input_buckets": [],
           "instance_id": "0571b3121a5e06773187dff7ffc30b4c2a9f63ae7c05ff592e3c45b9b3a6d1e9",
           "mapped_port": 9011,
           "output_buckets": [],
-          "secret_key": "22a1207b6c97d3669271e9896097e3d8fddd6160e3c75444fb0eabf8b9ef8681",
+          "secret_key": "PLACEHOLDER_SECRET_KEY",
           "type": "minio",
           "version": "RELEASE.2024-07-16T23-46-41Z"
         }

If credentials are needed for runtime, inject them via environment variables or a secure credential store instead.

Also applies to: 273-273

🧰 Tools
🪛 Gitleaks (8.29.0)

[high] 266-266: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
experiments.json lines 266 and 273: the file contains hardcoded MinIO
credentials (access_key and secret_key); remove these literal values and replace
them with non-sensitive placeholders or reference environment/config variables
(e.g., "${MINIO_ACCESS_KEY}" and "${MINIO_SECRET_KEY}") so the runtime reads
credentials from env vars or a secrets manager; update any README or deployment
docs to describe setting the required env vars and ensure the file is not
carrying real secrets (if these were real, revoke/rotate them immediately and
sanitize the repository history if necessary).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
benchmarks/000.microbenchmarks/040.server-reply/python/function.py (1)

8-12: Fix socket resource leak and add error handling.

The socket is never closed, causing a resource leak. Additionally, socket operations (connect, recv) can fail but are not wrapped in error handling.

Apply this diff to fix the resource leak and add error handling:

     socket.setdefaulttimeout(20)
     s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-    s.connect(addr)
-    msg = s.recv(1024).decode()
-    return {"result": msg}
+    try:
+        s.connect(addr)
+        msg = s.recv(1024).decode()
+        return {"result": msg}
+    except (socket.error, socket.timeout) as e:
+        return {"error": f"Socket error: {str(e)}"}
+    finally:
+        s.close()
♻️ Duplicate comments (3)
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/requirements.txt (1)

1-1: Consider pinning the JAX version for reproducibility.

Same unpinned dependency issue as benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/requirements.txt.

benchmarks/600.linearalgebra/601.matmul/python/requirements.txt (1)

1-1: Critical security vulnerabilities in torch 2.4.1.

This file uses the same vulnerable torch version (2.4.1) as benchmarks/600.linearalgebra/606.spmv/python/requirements.txt, exposing the benchmark to the same critical security issues including RCE and DoS vulnerabilities.

benchmarks/400.inference/413.image-classification/python/requirements.txt (1)

3-3: Critical security vulnerabilities in torch 2.4.1.

This file uses the same vulnerable torch version (2.4.1) with critical security issues. This is particularly concerning for an inference benchmark that may use torch.load to load models.

🟠 Major comments (17)
benchmarks/000.microbenchmarks/040.server-reply/python/function.py-7-7 (1)

7-7: Add input validation for required event keys.

The event.get() calls will return None if keys are missing, causing socket.connect() to fail with a less-clear error. Validate that "ip-address" and "port" exist and are valid before attempting the connection.

Apply this diff to add validation:

 def handler(event):
 
     # start timing
+    ip_address = event.get("ip-address")
+    port = event.get("port")
+    
+    if not ip_address or not port:
+        return {"error": "Missing required fields: ip-address and port"}
+    
-    addr = (event.get("ip-address"), event.get("port"))
+    addr = (ip_address, port)
     socket.setdefaulttimeout(20)

Committable suggestion skipped: line range outside the PR's diff.

benchmarks/200.multimedia/220.video-processing/input.py-24-26 (1)

24-26: Fix loop logic: all .mp4 files are uploaded, but only the last filename is stored in the configuration.

Lines 24-26 iterate over all matching files, but the img variable gets overwritten each iteration. When line 29 sets input_config["object"]["key"] = img, only the last file's path is used, while all files were uploaded.

Additionally, if no .mp4 files match, img is undefined on line 29, causing a runtime error.

Decide the intended behavior: (1) process only one specific file (add break or filter), or (2) support multiple files (modify config structure). The TODO comment suggests incomplete multi-dataset support.

benchmarks/200.multimedia/220.video-processing/input.py-10-18 (1)

10-18: Move docstring inside the generate_input function.

The docstring is currently placed at module level, but it documents the generate_input function parameters. Python convention requires function docstrings to be the first statement inside the function body.

Apply this diff to move the docstring to the correct location:

-
-"""
-    Generate test, small and large workload for thumbnailer.
-
-    :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
-
-
 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
+    """
+    Generate test, small and large workload for video processing.
+
+    :param data_dir: directory where benchmark data is placed
+    :param size: workload size
+    :param benchmarks_bucket: benchmark storage bucket
+    :param input_paths: input storage paths for this benchmark
+    :param output_paths: output storage paths for this benchmark
+    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    :param nosql_func: nosql function for database operations
+    """
     for file in glob.glob(os.path.join(data_dir, "*.mp4")):
benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py-62-73 (1)

62-73: Inconsistent thresholds confirmed – review comment is accurate

The full function context confirms the review comment's analysis. The thresholds are indeed misaligned:

  • Loop break condition (line 44–45): breaks when consecutive_failures == 7 with "Can't setup the connection"
  • CSV upload condition (line 62): skips upload only when consecutive_failures == 5

This inverted logic causes a failed connection attempt (7 consecutive timeouts) to still upload a CSV file, while a run exiting with exactly 5 consecutive failures suppresses upload. The suggestion to align thresholds or use a named constant (e.g., MAX_CONSECUTIVE_FAILURES = 7) is appropriate.

benchmarks/400.inference/413.image-classification/python/package.sh-16-17 (1)

16-17: Handle filenames with spaces or special characters.

Using xargs without -print0 can break on filenames containing spaces, newlines, or special characters.

Apply this diff:

 # cleaning
 # stripping some of the numpy libs - libgfortran-2e0d59d6.so.5.0.0 - causes issues on Azure
-find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
-find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
+find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 strip
+find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 strip
benchmarks/400.inference/413.image-classification/python/package.sh-25-25 (1)

25-25: Add error handling for directory change.

This cd command needs error handling.

Apply this diff:

 find . -name \*.pyc -delete
-cd ${CUR_DIR}
+cd "${CUR_DIR}" || exit 1
 echo "Stripped size $(du -sh $1 | cut -f1)"
benchmarks/400.inference/413.image-classification/python/package.sh-7-7 (1)

7-7: Add error handling for directory changes.

The cd command should include error handling to prevent subsequent commands from executing in an unexpected directory if the change fails.

Apply this diff:

 CUR_DIR=$(pwd)
-cd $1
+cd "$1" || exit 1
 # cleaning libs
benchmarks/400.inference/412.language-bert/python/package.sh-25-25 (1)

25-25: Add error handling for directory change.

Similar to Line 7, this cd command needs error handling.

Apply this diff:

 find . -name \*.pyc -delete
-cd ${CUR_DIR}
+cd "${CUR_DIR}" || exit 1
 echo "Stripped size $(du -sh $1 | cut -f1)"
benchmarks/400.inference/412.language-bert/python/package.sh-16-17 (1)

16-17: Handle filenames with spaces or special characters.

Using xargs without -print0 can break on filenames containing spaces, newlines, or special characters.

Apply this diff:

 # cleaning
 # stripping some of the numpy libs - libgfortran-2e0d59d6.so.5.0.0 - causes issues on Azure
-find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
-find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
+find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 strip
+find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 strip
benchmarks/400.inference/412.language-bert/python/package.sh-7-7 (1)

7-7: Add error handling for directory changes.

The cd command should include error handling to prevent subsequent commands from executing in an unexpected directory if the change fails.

Apply this diff:

 CUR_DIR=$(pwd)
-cd $1
+cd "$1" || exit 1
 # cleaning libs
benchmarks/400.inference/412.language-bert/python/package.sh-30-33 (1)

30-33: Add error handling for directory changes.

Both cd commands in the torch-zipping block need error handling.

Apply this diff:

 if [ -d "$1/${TORCH_DIR}" ]; then
-	cd $1
+	cd "$1" || exit 1
 	zip -qr torch.zip ${TORCH_DIR}
 	rm -rf ${TORCH_DIR}
-	cd ${CUR_DIR}
+	cd "${CUR_DIR}" || exit 1
 	echo "Torch-zipped size $(du -sh $1 | cut -f1)"
 fi
benchmarks/600.linearalgebra/603.jacobi2d/python/requirements.txt-1-1 (1)

1-1: I need to verify the PYSEC vulnerabilities mentioned in the original review before finalizing the rewritten comment.

Update torch 2.4.1 to address security vulnerabilities; upgrade to 2.6.0 minimum.

Torch 2.4.1 does contain vulnerabilities, but the original characterization overstates severity. PYSEC-2024-259 involves RemoteModule deserialization RCE affecting PyTorch <=2.4.1, but is marked withdrawn as disputed.

Verified vulnerabilities:

  • PYSEC-2025-41 (CVE-2025-32434): CRITICAL RCE via torch.load with weights_only=True; fixed in 2.6.0
  • GHSA-3749-ghw9-m3mg: Low severity DoS, requires upgrade to 2.7.1+
  • GHSA-887c-mr87-cxwp: Medium severity resource management, requires upgrade to 2.8.0+

Upgrade to version 2.6.0 or above to address the CRITICAL issue. Latest available is 2.9.1.

benchmarks/400.inference/413.recommendation/python/requirements.txt.3.8-1-1 (1)

1-1: Upgrade torch from 2.2.2 to 2.4.1 for Python 3.8 projects.

Verification confirms torch 2.2.2 contains two active CRITICAL vulnerabilities including an RCE via torch.load with weights_only=True. PyTorch 2.5+ requires Python 3.9+, so 2.4.1 is the latest version supporting Python 3.8 compatibility. While PyTorch 2.4.1 has a known deserialization RCE (CVE-2024-48063) affecting RemoteModule, it is still preferable to 2.2.2 due to fewer vulnerabilities.

Repository currently uses: 5× torch==2.2.2, 10× torch==2.4.1, 5× torch==2.5.1. Upgrade this file to torch==2.4.1 for consistency and security.

Alternatively, if Python 3.8 support can be dropped (EOL October 2024), upgrade to torch==2.5.1+ for better security posture across all benchmarks.

benchmarks/400.inference/413.image-classification/python/init.sh-10-10 (1)

10-10: Quote variables to handle paths with spaces.

The cp command uses unquoted variables which will fail if paths contain spaces or special characters.

Apply this diff:

-cp ${path} ${DIR}
+cp "${path}" "${DIR}"
benchmarks/600.linearalgebra/601.matmul/python/function.py-48-62 (1)

48-62: Include gpu_time in the measurement result.

The gpu_ms variable is computed from kernel_gemm (line 48) but is not included in the returned measurement dictionary. All other linear algebra benchmarks (602.axpy, 603.jacobi2d, 604.cholesky, 605.lu) include gpu_time in their measurements.

Apply this diff to include GPU timing:

     return {
         # "result": result[0],
         "measurement": {
             "generating_time": matrix_generating_time,
             "compute_time": matmul_time,
+            "gpu_time": gpu_ms,
         },
     }

Also, consider prefixing C_out with an underscore if it's intentionally unused:

-    C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1)
+    _C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1)
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/function.py-1-8 (1)

1-8: Resolve the license TODO before merging

You’re explicitly copying/deriving from Barba & Forsyth’s CFD Python work and note CC‑BY 4.0 and BSD‑3 in the header, but there’s a # TODO: License comment.

Before merging, ensure:

  • The appropriate BSD‑3 license text is present in the repo (e.g., LICENSE file plus per‑file header, as required).
  • Attribution requirements from CC‑BY are satisfied.

Leaving this as a TODO is a compliance risk.

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py-39-45 (1)

39-45: Batch norm is using std instead of var in the denominator

Canonical BN uses (x - mean) / sqrt(var + eps) where var is the variance of x. Here you compute std = jnp.std(...) and then divide by sqrt(std + eps), i.e. sqrt(sigma + eps) rather than sqrt(sigma² + eps), which changes the scaling, especially for large variances.

If you intend standard BN semantics:

-@jax.jit
-def batchnorm2d(x, eps=1e-5):
-    mean = jnp.mean(x, axis=0, keepdims=True)
-    std = jnp.std(x, axis=0, keepdims=True)
-    return (x - mean) / jnp.sqrt(std + eps)
+@jax.jit
+def batchnorm2d(x, eps=1e-5):
+    mean = jnp.mean(x, axis=0, keepdims=True)
+    var = jnp.var(x, axis=0, keepdims=True)
+    return (x - mean) / jnp.sqrt(var + eps)

If this is an intentional simplification for benchmarking, a short comment would help avoid future “fixes.”

🟡 Minor comments (3)
benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py-42-51 (1)

42-51: Verify counter name semantics and logging overhead

The latency loop is functional, but two valid concerns merit addressing:

  • Counter semantics mismatch: measurements_not_smaller suggests ≥ behavior, but the condition uses > (line 48). When cur_time == cur_min, the counter resets. If strictly-greater is intentional, the name should reflect that; if "not smaller" (≥) is the goal, change the condition to >=.

  • Per-iteration logging: The print statement (lines 45–47) runs inside the while loop and can emit up to ~1000 lines per invocation. Consider guarding it behind a debug flag or environment variable to prevent timing skew in benchmarking.

benchmarks/400.inference/413.recommendation/python/package.sh-3-4 (1)

3-4: Address unused variable and quote path arguments.

The script has two issues:

  1. PACKAGE_DIR is assigned but never used (line 4 uses $1 directly).
  2. Unquoted $1 will fail on paths with spaces.

Apply this diff:

-PACKAGE_DIR=$1
-echo "DLRM GPU package size $(du -sh $1 | cut -f1)"
+PACKAGE_DIR="$1"
+echo "DLRM GPU package size $(du -sh "$PACKAGE_DIR" | cut -f1)"
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py-82-123 (1)

82-123: Acknowledge broad exception handling is safer than narrowing to undocumented TypeError

The web search confirms that jax.device_get does not document a specific set of exception types—it only describes operation and inputs. Narrowing to TypeError based on implementation details is risky. The current broad Exception catch, while hiding potential bugs, is actually more robust than the suggested fix.

Instead of narrowing the exception, add logging/metrics to understand when this occurs without losing visibility:

     try:
         results = jax.device_get(results)
-    except Exception:
-        pass
+    except Exception as e:
+        # Not a JAX DeviceArray or already on host; leave as-is.
+        # Consider logging e or emitting a metric here for observability.
+        pass
🧹 Nitpick comments (39)
benchmarks/200.multimedia/210.thumbnailer/input.py (1)

20-22: Consider removing unused function parameters.

Function arguments size and nosql_func are not used within the function body. If these parameters are no longer needed, removing them would clean up the API surface and reduce confusion. If they are retained for backward compatibility, consider adding a comment to clarify intent.

benchmarks/100.webapps/120.uploader/input.py (1)

19-20: Unused function parameters.

The parameters data_dir, input_buckets, upload_func, and nosql_func are unused in the function body. While this may be acceptable (the benchmark may not require file uploads), consider whether these parameters should be removed to simplify the interface, or if they should be retained for API consistency with other benchmark modules.

benchmarks/200.multimedia/220.video-processing/input.py (2)

27-27: TODO: Multiple datasets support is not implemented.

The TODO comment indicates that multiple dataset support is planned but not yet implemented. This aligns with the loop logic issue identified above.

Do you want me to help design and implement support for multiple datasets, or open a new issue to track this enhancement?


21-23: Unused function parameters.

The size and nosql_func parameters are not used in the function body. If these are required by the interface contract for consistency with other benchmark input generators, consider adding a comment explaining why they're unused. Otherwise, they could be removed.

benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py (2)

11-16: Event field extraction looks correct; consider hardening bucket access

The extracted fields match the expected AWS wrapper contract and are straightforward. The nested event.get("bucket").get("bucket") / .get("output") will raise if "bucket" is missing or None, though, so if this benchmark might be invoked without a bucket configuration, consider something more defensive like:

bucket_cfg = event.get("bucket") or {}
output_bucket = bucket_cfg.get("bucket")
output_prefix = bucket_cfg.get("output", "")

or enforce presence early with a clear error.


63-65: Static /tmp/data.csv path: consider a safer temp‑file pattern

Ruff’s S108 warning about /tmp/data.csv is valid in a general Unix context: using a fixed filename in /tmp can open you up to symlink/race issues and, in non‑serverless environments, cross‑user interference.

In typical AWS Lambda‑style deployments the risk is lower (per‑container /tmp, single active invocation), but if this code might ever be reused elsewhere, it’s safer to use a unique temp file:

import tempfile

with tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False) as csvfile:
    tmp_path = csvfile.name
    writer = csv.writer(csvfile, delimiter=",")
    writer.writerow(["id", "client_send", "client_rcv"])
    for row in times:
        writer.writerow(row)

key = client.upload(
    output_bucket,
    os.path.join(output_prefix, filename),
    tmp_path,
)

This also avoids accidental reuse of stale data if anything goes wrong between writes and uploads.

Also applies to: 70-71

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/requirements.txt (1)

1-1: Consider pinning the JAX version for reproducibility.

The dependency is unpinned, which may lead to non-reproducible benchmark results across different environments as JAX updates. Consider pinning to a specific version if reproducibility is important for this benchmark suite. Note that all three JAX benchmarks in the project currently follow the same unpinned pattern, reflecting the project's mixed approach to dependency management across benchmarks.

benchmarks/400.inference/412.language-bert/python/package.sh (1)

1-1: Add shebang for portability.

The script lacks a shebang line, which can lead to unexpected behavior depending on how it's invoked.

Apply this diff:

+#!/bin/bash
 # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo
benchmarks/400.inference/413.image-classification/python/package.sh (1)

1-1: Add shebang for Bash-specific syntax.

The script uses [[ which is Bash-specific, but lacks a shebang line.

Apply this diff:

+#!/bin/bash
 # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo
benchmarks/200.multimedia/220.video-processing/python/function.py (3)

17-23: Consider using iterable unpacking instead of list concatenation.

The list concatenation on line 18 can be simplified using iterable unpacking for better readability.

Apply this diff:

-    ret = subprocess.run(
-        [os.path.join(SCRIPT_DIR, "ffmpeg", "ffmpeg"), "-y"] + args,
+    ret = subprocess.run(
+        [os.path.join(SCRIPT_DIR, "ffmpeg", "ffmpeg"), "-y", *args],
         # subprocess might inherit Lambda's input for some reason
         stdin=subprocess.DEVNULL,
         stdout=subprocess.PIPE,
         stderr=subprocess.STDOUT,
     )

31-46: Unused parameter can be removed or prefixed.

The event parameter is not used in this function. If it's required by the operations interface, consider prefixing it with an underscore (_event) to indicate it's intentionally unused.

Apply this diff if the parameter is not needed:

-def to_gif(video, duration, event):
+def to_gif(video, duration, _event):
     output = "/tmp/processed-{}.gif".format(os.path.basename(video))
     call_ffmpeg(
         [
             "-i",
             video,
             "-t",
             "{0}".format(duration),
             "-vf",
             "fps=10,scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse",
             "-loop",
             "0",
             output,
         ]
     )
     return output

50-66: Unused parameter can be removed or prefixed.

The event parameter is not used in this function. If it's required by the operations interface, consider prefixing it with an underscore (_event) to indicate it's intentionally unused.

Apply this diff if the parameter is not needed:

-def watermark(video, duration, event):
+def watermark(video, duration, _event):
     output = "/tmp/processed-{}".format(os.path.basename(video))
     watermark_file = os.path.dirname(os.path.realpath(__file__))
     call_ffmpeg(
         [
             "-i",
             video,
             "-i",
             os.path.join(watermark_file, os.path.join("resources", "watermark.png")),
             "-t",
             "{0}".format(duration),
             "-filter_complex",
             "overlay=main_w/2-overlay_w/2:main_h/2-overlay_h/2",
             output,
         ]
     )
     return output
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/input.py (1)

1-17: generate_input contract and size presets look consistent; address Ruff ARG001 via config, not signature changes

The size_generators mapping and generate_input implementation match the pattern used in other JAX NPBench inputs (e.g., 5xx.compute_jax_npbench/input.py), so behavior is consistent. Ruff correctly notes that data_dir, benchmarks_bucket, input_paths, output_paths, upload_func, and nosql_func are unused, but these parameters are part of the common generate_input interface used across benchmarks. If this file is linted, consider suppressing ARG001 at the function or file level (e.g., # ruff: noqa: ARG001) rather than altering the signature.

benchmarks/500.scientific/503.graph-bfs/input.py (1)

1-7: Deterministic size/seed configuration is correct; handle unused-arg linting consistently

size_generators plus a constant "seed": 42 matches the pattern in the pagerank and MST inputs, so graph generation for BFS will be deterministic and comparable across the graph benchmarks. As elsewhere, the generate_input parameters are unused but required by the shared interface; if Ruff runs on benchmarks, it’s better to ignore ARG001 for this function/file (or this directory) than to change the signature and risk diverging from the orchestrator’s expectations.

benchmarks/600.linearalgebra/602.axpy/input.py (1)

1-7: Input generator logic is fine; consider minimal handling of unused-arg lint.

The size_generators mapping and generate_input return shape are consistent with the other linear-algebra benchmarks and look correct. The unused parameters in generate_input are only there to satisfy the common benchmark interface, but they trigger Ruff ARG001 warnings.

If Ruff is enforced in CI, you could cheaply acknowledge these without changing the public API, for example:

 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
-    return {"size": size_generators[size], "seed": 42}
+    # Parameters are part of the common benchmark interface but not used here.
+    _ = (data_dir, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func)
+    return {"size": size_generators[size], "seed": 42}
benchmarks/500.scientific/5xx.compute_jax_npbench/input.py (1)

1-17: Consistent JAX size mapping; same note about unused-arg warnings.

The size_generators structure and generate_input behavior are aligned with the other npbench-style inputs and look good.

As with the other generators, Ruff flags the unused parameters in generate_input. If lint is enforced, you can keep the interface and quiet the warnings:

 def generate_input(
@@
-):
-    return {"size": size_generators[size]}
+):
+    # Unused parameters kept to match the common benchmark interface.
+    _ = (data_dir, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func)
+    return {"size": size_generators[size]}
benchmarks/600.linearalgebra/604.cholesky/input.py (1)

1-7: Matches the linear-algebra input pattern; optional lint-silencing for unused args.

This mirrors the other 600.linearalgebra input modules and is functionally fine.

If Ruff ARG001 is a concern, you can acknowledge the interface-only parameters:

 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
-    return {"size": size_generators[size], "seed": 42}
+    # Unused, required by the shared benchmark interface.
+    _ = (data_dir, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func)
+    return {"size": size_generators[size], "seed": 42}
benchmarks/400.inference/411.image-recognition/input.py (1)

8-15: Minor polish: unused loop variable, unused args, and a small edge-case in generate_input.

  • Line 10: dirs is never used; to satisfy Ruff B007 you can rename it to _dirs (or _) to make intent explicit:
    for root, _dirs, files in os.walk(data_dir):
  • Lines 29–31: size, output_paths, and nosql_func are unused but required by the shared interface. If Ruff is enforced, consider the same _ = (...) pattern used in other generators to quiet ARG001 without changing behavior.
  • Lines 37–48: input_images[0][0] assumes that val_map.txt is non-empty. If there’s any chance of an empty file in some runs, a simple guard would prevent an IndexError, e.g.:
    if not input_images:
        raise ValueError("val_map.txt did not contain any images")

Also applies to: 29-51

benchmarks/600.linearalgebra/603.jacobi2d/input.py (1)

1-7: Linear-algebra input looks good; same optional unused-arg handling as siblings.

This module is consistent with the other 600.linearalgebra inputs (601.matmul, 602.axpy, 604.cholesky) and the return structure is fine.

If you decide to quiet Ruff ARG001 in one of these files, consider applying the same _ = (...) pattern here for the unused parameters to keep the suite consistent.

benchmarks/400.inference/413.image-classification/python/function.py (2)

17-21: Inference preprocessing and label mapping look correct; consider minor robustness tweaks.

Overall, this follows a standard ResNet-50 ImageNet pipeline and looks sound:

  • imagenet_class_index.json is loaded once and idx2label is built as expected.
  • _prepare_tensor does resize→center-crop→normalize→NCHW with ImageNet mean/std; that’s correct for typical ResNet-50.

A few low-priority cleanups you might consider:

  • Line 18: wrap the JSON load in a context manager so the file handle is explicitly closed:
    with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f:
        class_idx = json.load(f)
  • Lines 92, 95, 102, 103: the outer int() around round(...) is redundant (Ruff RUF046) and can be dropped without changing behavior.
  • _prepare_tensor: if you expect occasional bad image paths, you might want to let the Image.open error bubble up with a clearer message or catch/log it, but for a controlled benchmark dataset the current behavior is acceptable.

Also applies to: 30-31, 109-117


34-85: Secure tar extraction and temp handling — well-established patterns exist; consider applying them.

Yes, well-established patterns do exist for this. Tar path-traversal vulnerabilities are real (documented in CVEs) and Python's tarfile module has known risks with extractall() on untrusted archives.

Recommended hardening approach:

  1. Replace hardcoded /tmp with tempfile.TemporaryDirectory() (context manager for automatic cleanup) or tempfile.mkstemp() for per-file temps. This avoids TOCTOU issues and makes intent clearer.

  2. Validate tar members before extraction instead of calling extractall() directly:

    • Compute resolved target path: resolved = (Path(extract_dir) / member.name).resolve()
    • Verify it stays within extract_dir using Path.is_relative_to() or prefix comparison
    • Reject absolute paths, .. segments, symlinks (member.issym()), device nodes (member.isdev()), and other special members
    • Use tar.extractfile() and write validated content manually rather than extractall()
  3. Example pattern (lines 46–60):

    with tempfile.TemporaryDirectory() as temp_dir:
        archive_path = os.path.join(temp_dir, archive_basename)
        # ... download to archive_path ...
        with tarfile.open(archive_path, "r:gz") as tar:
            for member in tar.getmembers():
                resolved = (Path(MODEL_DIRECTORY) / member.name).resolve()
                if not resolved.is_relative_to(Path(MODEL_DIRECTORY).resolve()):
                    raise ValueError(f"Unsafe member path: {member.name}")
                if member.issym() or member.isdev() or member.islnk():
                    raise ValueError(f"Unsafe member type: {member.name}")
                tar.extract(member, MODEL_DIRECTORY)  # or extractfile + manual write

For a controlled benchmark, the current code is functionally safe, but applying these patterns would make it safer to reuse and demonstrates defense-in-depth.

benchmarks/600.linearalgebra/603.jacobi2d/python/function.py (2)

44-50: Remove unused seed handling code.

The seed handling logic extracts a seed variable (lines 44-50), but it's never used. The initialize_torch function at line 53 doesn't accept a seed parameter, and the seed isn't referenced elsewhere in the handler. This is dead code.

Apply this diff to remove the unused code:

 def handler(event):
 
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)
 
     matrix_generating_begin = datetime.datetime.now()

57-57: Prefix unused variables with underscore.

The variables A_out and B_out are unpacked but never used. Apply the underscore prefix to indicate they are intentionally unused.

Apply this diff:

-    A_out, B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
+    _, _, gpu_ms = kernel_jacobi2d(A, B, iters=50)
benchmarks/600.linearalgebra/601.matmul/python/function.py (1)

35-41: Remove unused seed handling code.

The seed handling logic (lines 35-41) extracts a seed variable, but it's never used. The initialize_torch function doesn't accept a seed parameter.

Apply this diff to remove the unused code:

 def handler(event):
 
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)
 
     matrix_generating_begin = datetime.datetime.now()
benchmarks/600.linearalgebra/605.lu/python/function.py (2)

53-59: Remove unused seed handling code.

The seed handling logic (lines 53-59) extracts a seed variable, but it's never used since initialize_torch doesn't accept a seed parameter.

Apply this diff to remove the unused code:

 def handler(event):
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)
 
     gen_begin = datetime.datetime.now()

66-66: Prefix unused variable with underscore.

The variable B is unpacked but never used. Apply the underscore prefix to indicate it's intentionally unused.

Apply this diff:

-    B, gpu_ms = kernel(A)
+    _, gpu_ms = kernel(A)
benchmarks/600.linearalgebra/604.cholesky/python/function.py (2)

36-42: Remove unused seed handling code.

The seed handling logic (lines 36-42) extracts a seed variable, but it's never used since initialize_torch doesn't accept a seed parameter.

Apply this diff to remove the unused code:

 def handler(event):
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)
 
     gen_begin = datetime.datetime.now()

49-49: Prefix unused variable with underscore.

The variable L is unpacked but never used. Apply the underscore prefix to indicate it's intentionally unused.

Apply this diff:

-    L, gpu_ms = kernel_cholesky(A)
+    _, gpu_ms = kernel_cholesky(A)
benchmarks/600.linearalgebra/606.spmv/python/function.py (1)

58-60: Clarify that the SpMV output is intentionally unused

y_out is never used after kernel_spmv returns, which triggers the Ruff warning and may confuse readers.

You can explicitly mark the result as unused:

-    y_out, gpu_ms = kernel_spmv(A, x, reps=100)
+    _, gpu_ms = kernel_spmv(A, x, reps=100)

This keeps behavior identical while silencing the unused-variable warning.

benchmarks/000.microbenchmarks/010.sleep/input.py (1)

1-11: Input generator matches project conventions; unused args are expected

size_generators and generate_input follow the common input.py pattern (fixed sizes, only size used; other parameters kept for a uniform interface). Unused-argument warnings here can be safely ignored since the benchmark framework supplies these arguments but this benchmark only needs size. Based on learnings, inputs are maintainer-controlled, so additional validation isn’t required.

benchmarks/400.inference/413.image-classification/input.py (1)

18-26: Update the outdated docstring to match image‑classification behavior

The docstring still mentions “compression test”, which no longer matches this benchmark’s purpose and can mislead future readers.

Consider updating it along these lines:

-"""
-    Generate test, small and large workload for compression test.
+"""
+    Generate test, small and large workloads for the image classification benchmark.

The rest of the logic (model/image uploads and input_config layout) looks consistent with other inference benchmarks.

benchmarks/400.inference/413.recommendation/python/function.py (1)

54-63: Use MODEL_CACHE consistently for temporary model files

MODEL_CACHE is created but the downloaded checkpoint is written directly under "/tmp" instead of that directory. Using the cache directory for tmp_path makes the intent clearer and keeps related files grouped.

-    os.makedirs(MODEL_CACHE, exist_ok=True)
-    tmp_path = os.path.join("/tmp", f"{uuid.uuid4()}-{MODEL_FILE}")
+    os.makedirs(MODEL_CACHE, exist_ok=True)
+    tmp_path = os.path.join(MODEL_CACHE, f"{uuid.uuid4()}-{MODEL_FILE}")

Behavior (download, load, remove) stays the same, but the cache directory is now actually used.

benchmarks/400.inference/412.language-bert/python/function.py (4)

31-45: Harden model archive handling and temp file usage

Current model download/extraction uses hard-coded /tmp paths and tarfile.extractall directly on an archive that comes from external storage. That opens you up to classic temp-file and tar path‑traversal issues (e.g., a malicious archive writing outside MODEL_DIRECTORY).

Consider:

  • Using tempfile.NamedTemporaryFile / TemporaryDirectory instead of manual /tmp paths.
  • Validating archive members before extraction and refusing entries that would escape MODEL_DIRECTORY.

For example:

-import tarfile
+import tarfile
+import tempfile
...
-            archive_path = os.path.join("/tmp", f"{uuid.uuid4()}-{MODEL_ARCHIVE}")
+            fd, archive_path = tempfile.mkstemp(suffix=f"-{MODEL_ARCHIVE}")
+            os.close(fd)
...
-        with tarfile.open(archive_path, "r:gz") as tar:
-            tar.extractall(MODEL_DIRECTORY)
+        with tarfile.open(archive_path, "r:gz") as tar:
+            base = os.path.realpath(MODEL_DIRECTORY)
+            for member in tar.getmembers():
+                dest = os.path.realpath(os.path.join(MODEL_DIRECTORY, member.name))
+                if not dest.startswith(base + os.sep):
+                    raise RuntimeError(f"Refusing to extract outside {MODEL_DIRECTORY}: {member.name}")
+            tar.extractall(MODEL_DIRECTORY)

72-79: Clarify model timing semantics

model_download_time and model_process_time are measured in microseconds and can legitimately be zero when the model is already initialized. That’s fine, but for consistency with other benchmarks (which generally report milliseconds), consider normalizing units or documenting that these are µs so downstream tooling doesn’t misinterpret values.


107-118: Validate required event fields before composing storage paths

bucket, model_prefix, text_prefix, and text_key are assumed present; if any are missing/None, os.path.join or client.download will raise confusing TypeErrors.

Consider an explicit validation up front:

 def handler(event):
-    bucket = event.get("bucket", {}).get("bucket")
-    model_prefix = event.get("bucket", {}).get("model")
-    text_prefix = event.get("bucket", {}).get("text")
-    text_key = event.get("object", {}).get("input")
+    bucket_info = event.get("bucket", {})
+    object_info = event.get("object", {})
+    bucket = bucket_info.get("bucket")
+    model_prefix = bucket_info.get("model")
+    text_prefix = bucket_info.get("text")
+    text_key = object_info.get("input")
+
+    if not all([bucket, model_prefix, text_prefix, text_key]):
+        raise ValueError(f"Missing required bucket/text fields in event: {event}")

This keeps failures deterministic and easier to debug when wiring new benchmarks.


133-144: Use zip(..., strict=True) to catch length mismatches early

Static analysis is right that an explicit strict helps detect subtle bugs if sentences and probabilities ever diverge in length.

-    for sentence, probs in zip(sentences, probabilities):
+    for sentence, probs in zip(sentences, probabilities, strict=True):

That will raise immediately instead of silently dropping or duplicating elements.

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py (2)

13-36: Convolution logic is correct but could use lax.conv_general_dilated for performance

The hand‑rolled conv2d using nested lax.scan and dynamic_slice/dynamic_update_slice is functionally correct for stride‑1 valid convolution, but it will be significantly slower than using lax.conv_general_dilated on GPUs/TPUs.

If you care about maximizing throughput for this benchmark, consider switching to:

def conv2d(x, w):
    return lax.conv_general_dilated(
        x,
        w,
        window_strides=(1, 1),
        padding="VALID",
        dimension_numbers=("NHWC", "HWIO", "NHWC"),
    )

68-79: Prefer NumPy/JAX dtypes explicitly in RNG initialization

You’re passing dtype=jnp.float32 into numpy.random.default_rng().random. That’s usually convertible, but being explicit avoids surprises:

-    input = rng.random((N, H, W, C1), dtype=jnp.float32)
-    conv1 = rng.random((1, 1, C1, C2), dtype=jnp.float32)
-    conv2 = rng.random((3, 3, C2, C2), dtype=jnp.float32)
-    conv3 = rng.random((1, 1, C2, C1), dtype=jnp.float32)
+    import numpy as np
+    input = rng.random((N, H, W, C1), dtype=np.float32)
+    conv1 = rng.random((1, 1, C1, C2), dtype=np.float32)
+    conv2 = rng.random((3, 3, C2, C2), dtype=np.float32)
+    conv3 = rng.random((1, 1, C2, C1), dtype=np.float32)

JAX will still happily accept these arrays and move them to the device on first use.

benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/function.py (1)

237-279: Catch specific transfer exceptions instead of broad except Exception

The core concern is valid: swallowing all exceptions hides real issues. However, based on jax.device_get behavior, TypeError is not the primary exception you'd encounter. Transfer guard violations raise RuntimeError, and backend/XLA failures surface as runtime errors like XlaRuntimeError. Catching only TypeError would miss these cases. Consider:

-    try:
-        results = jax.device_get(results)
-    except Exception:
-        pass
+    try:
+        results = jax.device_get(results)
+    except (RuntimeError, TypeError):
+        # Transfer failed or __array__ not available; keep device array.
+        pass

Or remove the catch entirely if device/host transfer failure should propagate.

Comment on lines +35 to +37
- name: Install Python dependencies (via install.py)
run: |
python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Activate virtualenv before installing dependencies.

The python3 install.py command runs without activating the virtualenv first. This causes dependencies to be installed in the system Python instead of the cached virtualenv, making the cache ineffective and causing subsequent steps to fail if they expect tools in the venv.

Activate the virtualenv before running install.py:

       - name: Install Python dependencies (via install.py)
         run: |
+          . python-venv/bin/activate
           python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local

Alternatively, use the Python binary from the venv directly:

       - name: Install Python dependencies (via install.py)
         run: |
-          python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
+          python-venv/bin/python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Install Python dependencies (via install.py)
run: |
python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
- name: Install Python dependencies (via install.py)
run: |
. python-venv/bin/activate
python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
Suggested change
- name: Install Python dependencies (via install.py)
run: |
python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
- name: Install Python dependencies (via install.py)
run: |
python-venv/bin/python3 install.py --no-aws --no-azure --no-gcp --no-openwhisk --no-local
🤖 Prompt for AI Agents
In .github/workflows/lint.yml around lines 35-37, the workflow runs python3
install.py without activating the cached virtualenv so packages are installed
into the system Python; before running install.py either activate the virtualenv
created/checked-out by the job (e.g. source the venv activation script for
Linux/macOS or the Scripts\Activate script on Windows) and then run install.py,
or invoke the Python binary from the virtualenv directly (e.g. use the venv's
python executable to run install.py) so dependencies are installed into and
cached with the venv.

Comment on lines +19 to +20
data_dir, size, benchmarks_bucket, input_buckets, output_buckets, upload_func, nosql_func
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify: Breaking function signature change across call sites.

The function signature changed parameter names from input_paths, output_paths to input_buckets, output_buckets. This breaks any code calling this function with positional arguments or keyword arguments using the old names. Existing call sites throughout the codebase must be updated to match.

Additionally, this signature diverges from the standard pattern used in other benchmarks (e.g., 110.dynamic-html, 030.clock-synchronization, 411.image-recognition), which use input_paths and output_paths. Confirm this is intentional and not an incomplete refactor.


🏁 Script executed:

#!/bin/bash
# Description: Find and verify all call sites to generate_input in the uploader benchmark module

# Search for calls to generate_input that might still use old parameter names
rg -n 'generate_input' --type py -- benchmarks/100.webapps/120.uploader/ -A 3 -B 3

# Search for imports or references to this module's generate_input
rg -n 'from.*120.uploader.*import\|import.*120.uploader' --type py

Length of output: 237


🏁 Script executed:

#!/bin/bash
# Corrected: Find all call sites to generate_input with proper rg syntax

# Search for all calls to generate_input in the entire codebase
rg -n 'generate_input\s*\(' --type py -A 5 -B 2

# Search for imports from 120.uploader module
rg -n 'from.*120\.uploader|import.*120\.uploader' --type py

Length of output: 18273


🏁 Script executed:

#!/bin/bash
# Check if 130.crud-api is part of the same change
git log -1 --oneline -- benchmarks/100.webapps/130.crud-api/input.py

# Check whether abstract method signature matches the two changed implementations
grep -A 7 "def generate_input" sebs/benchmark.py | head -20

Length of output: 335


Critical: Abstract method signature mismatch with implementation.

The generate_input function signature in benchmarks/100.webapps/120.uploader/input.py:18-20 uses parameter names input_buckets, output_buckets, but the abstract method definition at sebs/benchmark.py:804-809 still declares input_paths, output_paths. This same inconsistency also exists in benchmarks/100.webapps/130.crud-api/input.py:8-10, while ~30 other benchmark implementations use the standard input_paths, output_paths signature.

Required fix: Update the abstract method signature in sebs/benchmark.py:804-809 to use input_buckets, output_buckets to match these implementations, or revert both 120.uploader and 130.crud-api to use the standard parameter names. The interface contract must be consistent across all implementations.

🧰 Tools
🪛 Ruff (0.14.5)

19-19: Unused function argument: data_dir

(ARG001)


19-19: Unused function argument: input_buckets

(ARG001)


19-19: Unused function argument: upload_func

(ARG001)


19-19: Unused function argument: nosql_func

(ARG001)

Comment on lines +29 to 31
class_idx = json.load(open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r"))
idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]
model = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix Tensor‑index bug when mapping prediction to label (and tighten JSON loading)

torch.max returns a Tensor index; using it directly to index a Python list (idx2label[index]) will raise a TypeError. Convert the index to a plain integer first. While here, it’s also cleaner to load imagenet_class_index.json via a context manager.

-SCRIPT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
-class_idx = json.load(open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r"))
+SCRIPT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
+with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f:
+    class_idx = json.load(f)
@@
-    input_batch = input_tensor.unsqueeze(0)  # create a mini-batch as expected by the model
-    output = model(input_batch)
-    _, index = torch.max(output, 1)
-    ret = idx2label[index]
+    input_batch = input_tensor.unsqueeze(0)  # create a mini-batch as expected by the model
+    output = model(input_batch)
+    _, index = torch.max(output, 1)
+    idx = index.item()
+    ret = idx2label[idx]

This removes the runtime error and keeps behavior otherwise unchanged.

Also applies to: 78-79

🤖 Prompt for AI Agents
In benchmarks/400.inference/411.image-recognition/python/function.py around
lines 29-31 (and similarly lines 78-79) the code loads imagenet_class_index.json
without a context manager and later uses a torch tensor index directly to index
a Python list; open the JSON with a with-statement (using open(..., "r") as f
and json.load(f)) and when mapping a prediction to a label convert the tensor
index to an int (e.g., index = int(index_tensor) or index_tensor.item()) before
using idx2label[index]; keep all other logic unchanged.

Comment on lines +3 to +4
torch==2.5.1
torchvision==0.20.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify torch 2.5.1 is the latest secure version.

torch 2.5.1 is newer than the vulnerable 2.4.1 used elsewhere in this PR, which is good. However, please verify:

  • If 2.5.1 is the latest version or if a newer patch release is available
  • Whether 2.5.1 addresses all the security vulnerabilities found in 2.4.1
  • If torchvision 0.20.1 is compatible and secure

Run the following script to verify:


🏁 Script executed:

#!/bin/bash
# Check latest torch and torchvision versions

echo "=== Latest torch version ==="
curl -s https://pypi.org/pypi/torch/json | jq '.info.version'

echo "=== Latest torchvision version ==="
curl -s https://pypi.org/pypi/torchvision/json | jq '.info.version'

echo "=== Security advisories for torch 2.5.x ==="
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "torch") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}' | jq '.data.securityVulnerabilities.nodes[] | select(.vulnerableVersionRange | contains("2.5"))'

Length of output: 124


I need to gather additional information to complete the verification. Let me check security advisories for torch 2.4.1 vs 2.5.1 and version compatibility.

Upgrade torch and torchvision to versions with critical security patch.

torch 2.5.1 contains a critical Remote Code Execution vulnerability (CVE-2025-32434) in the torch.load() function when used with weights_only=True. Update to PyTorch 2.6.0 or higher, and correspondingly upgrade torchvision to 0.21.0 or later. torch 2.5.1 is compatible with torchvision 0.20.1, so current compatibility is not the issue, but the security vulnerability is critical and requires immediate patching.

@@ -0,0 +1,4 @@
numpy>=2.0
pillow>=10.0
torch==2.5.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify torch 2.5.1 is the latest secure version and standardize across files.

This file uses torch==2.5.1, which differs from torch==2.4.1 used in other requirements files in this PR. Ensure that 2.5.1 is the latest patched version and update all other requirements files to match this version for consistency.


🌐 Web query:

What is the latest stable version of PyTorch and does version 2.5.1 have any known security vulnerabilities?

💡 Result:

  • Latest stable PyTorch (as of Nov 23, 2025): PyTorch 2.9 (released Oct 2025). [1]

  • Does PyTorch 2.5.1 have known security vulnerabilities? Yes — PyTorch ≤ 2.5.1 is affected by a critical RCE vulnerability (CVE-2025-32434) in torch.load(...) when using weights_only=True; the issue was fixed in PyTorch 2.6.0. Update to ≥2.6.0 (or the current stable release) to mitigate. [2][3]

Sources:

  1. PyTorch 2.9 release announcement. [PyTorch blog]. [1]
  2. NVD / CVE-2025-32434 details (lists affected versions and fix in 2.6.0). [2]
  3. Additional vendor advisories summarizing the issue (e.g., INCIBE/IBM). [3]


Update torch to 2.6.0 or later immediately—torch 2.5.1 contains a critical RCE vulnerability.

PyTorch 2.5.1 is affected by a critical RCE vulnerability (CVE-2025-32434) in torch.load() when using weights_only=True, which was fixed in PyTorch 2.6.0. The latest stable version is PyTorch 2.9 (released Oct 2025). Update this file and all other requirements files using torch 2.4.1 to a secure version (≥2.6.0, preferably 2.9).

🤖 Prompt for AI Agents
In
benchmarks/400.inference/413.image-classification/python/requirements.txt.3.12
around line 3, the file pins torch to 2.5.1 which is vulnerable
(CVE-2025-32434); update the pin to a secure release (at minimum >=2.6.0,
preferably the latest stable such as 2.9.x) across this file and any other
requirements that reference torch 2.4.x/2.5.x, run dependency resolution and
tests to ensure compatibility, and if needed adjust any torch-specific code or
extra pins (CUDA builds, torchvision) to match the chosen secure torch version.

Comment on lines +77 to +110
@partial(jax.jit, static_argnums=(0,))
def pressure_poisson_periodic(nit, p, dx, dy, b):
def body_func(p, q):
pn = p.copy()
p = p.at[1:-1, 1:-1].set(
((pn[1:-1, 2:] + pn[1:-1, 0:-2]) * dy**2 + (pn[2:, 1:-1] + pn[0:-2, 1:-1]) * dx**2)
/ (2 * (dx**2 + dy**2))
- dx**2 * dy**2 / (2 * (dx**2 + dy**2)) * b[1:-1, 1:-1]
)

# Periodic BC Pressure @ x = 2
p = p.at[1:-1, -1].set(
((pn[1:-1, 0] + pn[1:-1, -2]) * dy**2 + (pn[2:, -1] + pn[0:-2, -1]) * dx**2)
/ (2 * (dx**2 + dy**2))
- dx**2 * dy**2 / (2 * (dx**2 + dy**2)) * b[1:-1, -1]
)

# Periodic BC Pressure @ x = 0
p = p.at[1:-1, 0].set(
(
((pn[1:-1, 1] + pn[1:-1, -1]) * dy**2 + (pn[2:, 0] + pn[0:-2, 0]) * dx**2)
/ (2 * (dx**2 + dy**2))
- dx**2 * dy**2 / (2 * (dx**2 + dy**2)) * b[1:-1, 0]
)
)

# Wall boundary conditions, pressure
p = p.at[-1, :].set(p[-2, :]) # dp/dy = 0 at y = 2
p = p.at[0, :].set(p[1, :]) # dp/dy = 0 at y = 0

return p, None

p, _ = lax.scan(body_func, p, jnp.arange(nit))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

pressure_poisson_periodic never returns the updated pressure field

pressure_poisson_periodic runs a lax.scan over nit iterations and computes an updated p, but the function doesn’t return it. As written, it returns None, and channel_flow ignores the result:

b = build_up_b(...)
pressure_poisson_periodic(nit, p, dx, dy, b)  # result unused
# p is not mutated in-place; JAX arrays are immutable

So p in channel_flow is never updated by the Poisson solve, which is a correctness bug.

You likely want:

 @partial(jax.jit, static_argnums=(0,))
 def pressure_poisson_periodic(nit, p, dx, dy, b):
-    def body_func(p, q):
+    def body_func(p, _):
         pn = p.copy()
         ...
-        return p, None
+        return p, None
@@
-    p, _ = lax.scan(body_func, p, jnp.arange(nit))
-
+    p, _ = lax.scan(body_func, p, jnp.arange(nit))
+    return p

And in channel_flow:

-        b = build_up_b(rho, dt, dx, dy, u, v)
-        pressure_poisson_periodic(nit, p, dx, dy, b)
+        b = build_up_b(rho, dt, dx, dy, u, v)
+        p = pressure_poisson_periodic(nit, p, dx, dy, b)

This makes the pressure updates actually feed into the momentum equations.

🧰 Tools
🪛 Ruff (0.14.5)

79-79: Unused function argument: q

(ARG001)

🤖 Prompt for AI Agents
In benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/function.py
around lines 77-110, pressure_poisson_periodic runs lax.scan to compute the
updated pressure but never returns it, so callers keep the old p; change the
function to return the final p from the scan (e.g. capture the output p and
return it), and update channel_flow to assign its p to the returned value (p =
pressure_poisson_periodic(nit, p, dx, dy, b)) so the updated pressure feeds into
the momentum solve.

Comment on lines +32 to +59
def handler(event):
size = event.get("size")
if "seed" in event:
import random

random.seed(event["seed"])

seed = event.get("seed", 42)
seed = int(seed)

gen_begin = datetime.datetime.now()
alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)
gen_end = datetime.datetime.now()

comp_begin = datetime.datetime.now()
y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
comp_end = datetime.datetime.now()

gen_us = (gen_end - gen_begin) / datetime.timedelta(microseconds=1)
comp_us = (comp_end - comp_begin) / datetime.timedelta(microseconds=1)

return {
"measurement": {
"generating_time": gen_us,
"compute_time": comp_us,
"gpu_time": gpu_ms,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined variable when seed is not provided.

Line 43 references seed, but this variable is only defined inside the if "seed" in event block (lines 34-40). If the event does not contain a "seed" key, seed will be undefined, causing a NameError at runtime.

Apply this diff to fix the issue:

 def handler(event):
     size = event.get("size")
+    seed = 42  # default seed
     if "seed" in event:
         import random
 
         random.seed(event["seed"])
 
-        seed = event.get("seed", 42)
-        seed = int(seed)
+        seed = int(event["seed"])
 
     gen_begin = datetime.datetime.now()
     alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)

Note: The random.seed() call on line 37 appears redundant since PyTorch's RNG is seeded within initialize_torch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def handler(event):
size = event.get("size")
if "seed" in event:
import random
random.seed(event["seed"])
seed = event.get("seed", 42)
seed = int(seed)
gen_begin = datetime.datetime.now()
alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)
gen_end = datetime.datetime.now()
comp_begin = datetime.datetime.now()
y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
comp_end = datetime.datetime.now()
gen_us = (gen_end - gen_begin) / datetime.timedelta(microseconds=1)
comp_us = (comp_end - comp_begin) / datetime.timedelta(microseconds=1)
return {
"measurement": {
"generating_time": gen_us,
"compute_time": comp_us,
"gpu_time": gpu_ms,
}
}
def handler(event):
size = event.get("size")
seed = 42 # default seed
if "seed" in event:
import random
random.seed(event["seed"])
seed = int(event["seed"])
gen_begin = datetime.datetime.now()
alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)
gen_end = datetime.datetime.now()
comp_begin = datetime.datetime.now()
y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
comp_end = datetime.datetime.now()
gen_us = (gen_end - gen_begin) / datetime.timedelta(microseconds=1)
comp_us = (comp_end - comp_begin) / datetime.timedelta(microseconds=1)
return {
"measurement": {
"generating_time": gen_us,
"compute_time": comp_us,
"gpu_time": gpu_ms,
}
}
🧰 Tools
🪛 Ruff (0.14.5)

47-47: Unpacked variable y_out is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/602.axpy/python/function.py around lines 32 to
59, the variable `seed` can be undefined when the incoming event lacks a "seed"
key; to fix, initialize `seed` unconditionally at the top (e.g., seed =
int(event.get("seed", 42))) before any conditional, remove the redundant
import/random.seed usage inside the if-block (or only keep the import if needed
elsewhere), and ensure all references use this initialized `seed` so NameError
cannot occur when "seed" is omitted.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical security vulnerabilities in torch 2.4.1.

Static analysis has flagged 5 critical security vulnerabilities in torch 2.4.1, including remote code execution risks (GHSA-53q9-r3pm-6pq6), denial of service (GHSA-3749-ghw9-m3mg), and resource management issues (GHSA-887c-mr87-cxwp).

Consider upgrading to torch 2.5.1 or later, which appears to be used elsewhere in this PR (see benchmarks/400.inference/413.image-classification/python/requirements.txt.3.12). If torch 2.4.1 is required for specific compatibility reasons, document the rationale and ensure the vulnerable functionality is not exposed in the benchmark.


🌐 Web query:

What are the security vulnerabilities in PyTorch 2.4.1 and what is the latest patched version?

💡 Result:

Short answer

  • Known security issues affecting PyTorch 2.4.1:

    1. CVE-2024-48063 — deserialization (RemoteModule / torch.distributed.rpc) can lead to remote code execution (disputed as “intended” for distributed RPC, but tracked as a security issue). [1][2][3]
    2. Earlier heap buffer-overflow bug (CVE-2024-31580) affected versions before v2.2.0 and is not present in 2.4.1 (it was fixed in earlier 2.x releases). [4]
  • More recent related RCE (reported Apr 2025) — GHSA/CVE-2025-32434: torch.load(..., weights_only=True) RCE — affects <=2.5.1 and was patched in 2.6.0. That issue therefore also implies 2.4.1 is vulnerable to that finding. [5]

What to upgrade to

  • Minimum patched release for the GHSA/CVE-2025-32434 fix: 2.6.0. [5]
  • PyTorch’s current “stable” release (as shown on the PyTorch site) is 2.7.0 — upgrade to the latest stable (or at least ≥2.6.0) to obtain the patches and additional fixes. [6]

Sources

  • NVD / CVE-2024-48063 (deserialization RCE, affects <=2.4.1). [1]
  • Multiple vulnerability trackers / advisories summarizing CVE-2024-48063 (Wiz, Snyk, Rapid7, ResolvedSecurity). [2][3][7]
  • CVE-2024-31580 (heap buffer overflow — fixed before 2.4.1). [4]
  • PyTorch GitHub security advisory GHSA-53q9-r3pm-6pq6 / CVE-2025-32434 (torch.load weights_only=True RCE) — patched in 2.6.0. [5]
  • PyTorch official site (current stable release listing). [6]

If you want, I can:

  • give the exact pip/conda upgrade command for your environment, or
  • fetch the specific upstream commits/PRs that backport each fix.

I need to verify the specific CVEs mentioned in the review and check if torch 2.5.1 is actually used elsewhere in the codebase.

Based on the search results, I can now identify critical inaccuracies in the original review comment:

Key Findings:

  1. Incorrect CVE severity and count: The review claims "5 critical security vulnerabilities" but GHSA-3749-ghw9-m3mg and GHSA-887c-mr87-cxwp are both classified with LOW severity (CVSS 3.3), not critical.

  2. Wrong affected versions: GHSA-3749-ghw9-m3mg (CVE-2025-2953) is described as found in PyTorch 2.6.0+cu124 and GHSA-887c-mr87-cxwp (CVE-2025-3730) is found in PyTorch 2.6.0, suggesting they may not affect 2.4.1 at all.

  3. Critical security issue acknowledged: CVE-2024-48063 (the RemoteModule deserialization RCE) affects PyTorch ≤2.4.1 with CVSS 9.8 Critical. However, this is disputed by multiple parties because this is intended behavior in PyTorch distributed computing.

  4. Incorrect upgrade recommendation: CVE-2025-32434 (torch.load RCE) affects PyTorch versions 2.5.1 and earlier, with users advised to update to version 2.6.0 or later. Upgrading to 2.5.1 as suggested in the review would not resolve this vulnerability.


Upgrade to torch 2.6.0 or later to address the confirmed RCE vulnerabilities affecting 2.4.1 and 2.5.1. Note that CVE-2024-48063 (the primary RCE in 2.4.1) is disputed as intended behavior for distributed RPC—verify whether RemoteModule deserialization is actually used in this benchmark.

🧰 Tools
🪛 OSV Scanner (2.2.4)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/604.cholesky/python/requirements.txt lines 1-1:
the review's upgrade advice and CVE counts are incorrect and the project may
still be vulnerable; verify CVE applicability and then either upgrade or
document mitigations. Search the repo for any occurrences of "torch==2.5.1" or
other pinned torch versions and for uses of torch.load, RemoteModule, RPC or
RemoteModule deserialization in this benchmark and related code; if any code
uses remote deserialization or torch.load on untrusted data, change the
requirement to torch>=2.6.0 (or unpinned >=2.6.0) to include fixes, run tests to
confirm compatibility, and add a comment in requirements or README explaining
the CVE mapping and why the chosen version was selected if you choose not to
upgrade.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical security vulnerabilities in torch 2.4.1.

This file uses torch==2.4.1, which has the same critical security vulnerabilities flagged in other requirements files in this PR. Upgrade to torch 2.5.1 or the latest patched version to address these security concerns and maintain consistency with requirements.txt.3.12.

🧰 Tools
🪛 OSV Scanner (2.2.4)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/605.lu/python/requirements.txt around lines 1 to
1, the file pins torch==2.4.1 which contains critical security vulnerabilities;
update the requirement to torch==2.5.1 (or the latest patched release used
elsewhere in the repo, e.g., requirements.txt.3.12) to resolve the
vulnerabilities, ensure consistency across requirements files, and run the test
suite/CI to verify no compatibility regressions.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical security vulnerabilities in torch 2.4.1.

The static analysis tool has flagged five critical security vulnerabilities in torch 2.4.1, including remote code execution via torch.load and denial of service issues. These are legitimate security concerns that should be addressed before release.

Please verify the latest patched version of PyTorch and update this dependency:


🌐 Web query:

What is the latest stable version of PyTorch and are there security patches for torch 2.4.1 vulnerabilities GHSA-53q9-r3pm-6pq6, GHSA-3749-ghw9-m3mg, GHSA-887c-mr87-cxwp, PYSEC-2024-259, and PYSEC-2025-41?

💡 Result:

Short answer

  • Latest stable PyTorch (as of Nov 23, 2025): PyTorch 2.9.0. [1]

Status of the listed vulnerabilities (impacting torch 2.4.1)

  • GHSA-53q9-r3pm-6pq6 / PYSEC-2025-41 (CVE-2025-32434): RCE via torch.load(..., weights_only=True). Patched in 2.6.0 (i.e., fixed in releases after 2.4.1). [2][3]
  • PYSEC-2024-259: Deserialization RCE in RemoteModule affecting torch ≤ 2.4.1. Reported fixed in later releases (fixed in <2.5.0 / addressed in 2.5.x+). [4]
  • GHSA-3749-ghw9-m3mg (CVE-2025-2953): local DoS in torch.mkldnn_max_pool2d. Patched in the 2.7.1 release candidate / fixed in 2.7.1+. [5]
  • GHSA-887c-mr87-cxwp (CVE-2025-3730): improper resource shutdown (ctc_loss) — fixed in 2.8.0 (patched in the upstream commits and released in 2.8+). [6]

Recommendation

  • Upgrade to the current stable release (2.9.0) or at minimum to the first fixed release that addresses the specific advisory relevant to you (2.6.0+ for the RCE advisories; 2.7.1+ and 2.8.0+ for the DoS/resource-release issues). Verify your deployment’s exact wheel/CUDA variant and check PyTorch release notes/SECURITY.md for any platform-specific patching instructions. [1][2][5][6]

Sources
[1] PyTorch 2.9 release blog (Oct 15, 2025). (pytorch.org)
[2] OSV / PYSEC-2025-41 / GHSA-53q9-r3pm-6pq6 (shows fixed version 2.6.0). (osv.dev)
[3] Debian / security tracker for CVE-2025-32434 (notes fix in 2.6.0). (security-tracker.debian.org)
[4] PYSEC-2024-259 advisory / resolved-security summary (affects <=2.4.1; fixed in later releases). (resolvedsecurity.com)
[5] GitHub Advisory GHSA-3749-ghw9-m3mg (CVE-2025-2953) — patched in 2.7.1-rc1 / fixed in 2.7.1+. (github.com)
[6] GHSA-887c-mr87-cxwp / CVE-2025-3730 (ctc_loss) — fixed in 2.8.0 per advisory/patch. (vulnerability.circl.lu)

If you want, I can (a) give the exact pip install commands to upgrade to 2.9.0 for your OS/CUDA, or (b) scan a requirements.txt to show whether your environment includes any of the vulnerable versions.

Citations:


Update torch from 2.4.1 to address five critical security vulnerabilities.

The five flagged vulnerabilities are confirmed as critical and affecting torch 2.4.1: GHSA-53q9-r3pm-6pq6 / PYSEC-2025-41 (RCE via torch.load; fixed in 2.6.0), PYSEC-2024-259 (deserialization RCE; fixed in 2.5.x+), GHSA-3749-ghw9-m3mg (local DoS; fixed in 2.7.1+), and GHSA-887c-mr87-cxwp (resource shutdown issue; fixed in 2.8.0+).

Update benchmarks/600.linearalgebra/606.spmv/python/requirements.txt to PyTorch 2.9.0 (the latest stable version) or at minimum 2.8.0 to address all five vulnerabilities in a single upgrade.

🧰 Tools
🪛 OSV Scanner (2.2.4)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/606.spmv/python/requirements.txt lines 1-1, the
pinned torch version 2.4.1 contains multiple critical security vulnerabilities;
update the torch entry to a secure release (preferably torch==2.9.0, or at
minimum torch==2.8.0) to obtain the fixes for the listed CVEs, then run
pip-compile / pip install -r requirements.txt or update your lockfile and CI
dependency checks accordingly to ensure the new version is used in builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants