Add parallel testing to unit test script by dierksen · Pull Request #2531 · flashinfer-ai/flashinfer

dierksen · 2026-02-09T23:39:00Z

Enable parallel testing on systems with multiple GPUs
Remove default for CUDA_VISIBLE_DEVICES

📌 Description

To make better use of available GPUs, instead of running all unit tests sequentially, this breaks
them up into chunks based on the number of available GPUs. This should help reduce issues with
timeouts on internal CI/CD and give us faster results from ci-bot as well.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

[x ] I have installed pre-commit by running pip install pre-commit (or used your preferred method).
[x ] I have installed the hooks with pre-commit install.
[x ] I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- GPU-aware parallel test execution enabled by default with automatic GPU detection and graceful fallback to single-GPU mode.
Tests
- Added a dedicated GPU-detection test covering multiple visibility and fallback scenarios.
- Improved test harness to run tests concurrently across GPUs, assign/reclaim GPUs dynamically, report per-test outcomes, and summarize pass/fail totals.

gemini-code-assist · 2026-02-09T23:39:15Z

Summary of Changes

Hello @dierksen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the efficiency of unit test execution, particularly on systems equipped with multiple GPUs. By enabling parallel testing, it aims to reduce CI/CD pipeline timeouts and accelerate feedback from automated testing, leading to a more streamlined development workflow.

Highlights

Parallel Test Execution: Implemented a new mechanism to run unit tests in parallel across multiple GPUs, significantly reducing execution time on multi-GPU systems.
GPU Detection Logic: Introduced a detect_gpus function that intelligently determines available GPUs based on CUDA_VISIBLE_DEVICES or nvidia-smi output.
Default CUDA_VISIBLE_DEVICES Removal: Removed the default setting for CUDA_VISIBLE_DEVICES in test_utils.sh, allowing the new detection logic to take precedence.
Parallel Tests Toggle: Added a PARALLEL_TESTS environment variable, defaulting to true, to enable or disable parallel test execution.
GPU Detection Test Script: Added a dedicated script (test_gpu_detection.sh) to verify the new GPU detection logic under various scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

scripts/test_gpu_detection.sh
- Added a new shell script to test the detect_gpus function under different CUDA_VISIBLE_DEVICES configurations and nvidia-smi availability.
scripts/test_utils.sh
- Removed the default assignment for CUDA_VISIBLE_DEVICES.
- Added PARALLEL_TESTS environment variable with a default value of true.
- Introduced the detect_gpus function to identify available GPUs.
- Implemented the run_tests_parallel function to orchestrate parallel test execution across detected GPUs.
- Modified the execute_tests function to conditionally invoke parallel or sequential test execution based on the PARALLEL_TESTS flag.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-09T23:39:19Z

📝 Walkthrough

Walkthrough

Adds GPU detection and parallel test orchestration utilities, a new GPU-detection test script, and enables parallel test execution in the unit test task script.

Changes

Cohort / File(s)	Summary
GPU Detection Test Script `scripts/test_gpu_detection.sh`	New shell test validating GPU detection across five scenarios: comma/space `CUDA_VISIBLE_DEVICES`, single-GPU, `PARALLEL_TESTS=false` behavior, unset-device handling with optional `nvidia-smi`, and summary/pass-fail reporting.
Parallel Test Execution Framework `scripts/test_utils.sh`	Added `detect_gpus()`, `run_tests_parallel()`, and `run_single_test_background()`; `execute_tests()` now delegates to parallel flow when `PARALLEL_TESTS=true`; includes GPU assignment, dynamic reclamation, per-test background execution, and result aggregation (logging/JUnit preserved).
Unit Test Task Invocation `scripts/task_run_unit_tests.sh`	Exports `PARALLEL_TESTS=true`, enabling the parallel test execution path by default for unit test runs.

Sequence Diagram

sequenceDiagram
    participant Executor as Test Executor
    participant Detector as GPU Detector
    participant Orchestrator as Parallel Orchestrator
    participant Worker as Background Test Worker
    participant Aggregator as Result Aggregator

    Executor->>Detector: detect_gpus()
    Detector-->>Executor: gpu_list

    Executor->>Orchestrator: run_tests_parallel(tests, gpu_list)

    par parallel test runs
        Orchestrator->>Worker: run_single_test_background(test1, gpu=0)
        Worker-->>Orchestrator: exit code / log
    and
        Orchestrator->>Worker: run_single_test_background(test2, gpu=1)
        Worker-->>Orchestrator: exit code / log
    and
        Orchestrator->>Worker: run_single_test_background(testN, gpu=k)
        Worker-->>Orchestrator: exit code / log
    end

    Orchestrator->>Aggregator: collect per-test results & JUnit XML
    Aggregator-->>Executor: aggregated report

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add/update multi node/multi GPU test scripts #2410: Refactor and extraction of multi-GPU/multi-node test logic into test utilities that overlap with the new GPU detection and parallel orchestration changes.

Suggested reviewers

yzh119
nvmbreughe
kahyunnam
jimmyzho
yongwww

Poem

🐰 I sniff the GPUs from zero to three,
I spin up tests and set them free,
Logs in my burrow, results hop back,
I tally passes on the rabbit-track,
A happy hopper, reporting with glee.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description provides context for why parallel testing is needed and mentions key changes, but the Tests checklist is incomplete.	Clarify test coverage: either add and verify tests for the new parallel testing logic, or explain why existing tests cover the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: enabling parallel testing in the unit test script.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

scripts/test_utils.sh (3)
454-527: Nested function run_single_test_background leaks into global scope.

In bash, functions defined inside other functions become globally visible. After run_tests_parallel runs, run_single_test_background remains callable from anywhere, and it captures variables like total_files, mode, and PARALLEL_TMP_DIR by dynamic scope—meaning it will silently use stale or unset values if invoked outside its intended context.

This is unlikely to cause issues in practice given the script's usage, but worth being aware of.

610-629: Duplicate sanity-mode result parsing in PASSED and FAILED branches.

Lines 612–617 and 623–628 contain identical logic for extracting and accumulating total_in_file / sampled_in_file. Consider extracting the common parsing above the if/elif so it runs once for any non-SKIPPED result.
Sketch
             TOTAL_TESTS=$((TOTAL_TESTS + 1))
 
+            # Parse sanity-mode counts (common to PASSED and FAILED)
+            if [ "$mode" = "sanity" ] && [[ "$result" == PASSED:* || "$result" == FAILED:* ]]; then
+                local total_in_file sampled_in_file
+                IFS=':' read -r _ total_in_file sampled_in_file <<< "$result"
+                TOTAL_TEST_CASES=$((TOTAL_TEST_CASES + total_in_file))
+                SAMPLED_TEST_CASES=$((SAMPLED_TEST_CASES + sampled_in_file))
+            fi
+
             if [[ "$result" == PASSED* ]]; then
                 PASSED_TESTS=$((PASSED_TESTS + 1))
-                if [ "$mode" = "sanity" ]; then
-                    local total_in_file sampled_in_file
-                    IFS=':' read -r _ total_in_file sampled_in_file <<< "$result"
-                    TOTAL_TEST_CASES=$((TOTAL_TEST_CASES + total_in_file))
-                    SAMPLED_TEST_CASES=$((SAMPLED_TEST_CASES + sampled_in_file))
-                fi
             elif [[ "$result" == FAILED* ]]; then
                 FAILED_TESTS="$FAILED_TESTS\n  - $test_file"
                 EXIT_CODE=1
-                if [ "$mode" = "sanity" ]; then
-                    local total_in_file sampled_in_file
-                    IFS=':' read -r _ total_in_file sampled_in_file <<< "$result"
-                    TOTAL_TEST_CASES=$((TOTAL_TEST_CASES + total_in_file))
-                    SAMPLED_TEST_CASES=$((SAMPLED_TEST_CASES + sampled_in_file))
-                fi
434-435: Trap preservation can break if the existing EXIT trap body contains single quotes.

The sed extracts the inner body from trap -p EXIT, then splices it back via string concatenation into a new single-quoted trap string. If the original trap contained single quotes (which trap -p escapes as '\''), the re-wrapping produces malformed quoting.

A more robust alternative uses eval:
Suggested alternative
-    PREV_EXIT_TRAP=$(trap -p EXIT | sed -E "s/^trap -- '(.*)' EXIT$/\1/")
-    trap 'rm -rf "$PARALLEL_TMP_DIR"; '"$PREV_EXIT_TRAP" EXIT
+    PREV_EXIT_TRAP=$(trap -p EXIT)
+    # shellcheck disable=SC2064
+    trap "rm -rf '$PARALLEL_TMP_DIR'; ${PREV_EXIT_TRAP:+eval \"\$PREV_EXIT_TRAP\"}" EXIT
This is unlikely to matter in practice since the caller probably doesn't set an EXIT trap with quotes, but it's worth noting for robustness.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces parallel testing for unit tests, which is a great improvement for CI/CD efficiency. The implementation is solid, but I've identified a few areas for improvement regarding robustness and performance. Specifically, the GPU detection can be made more reliable, the parallel job scheduling logic has a flaw that underutilizes GPUs, and the method for passing job information is not robust against certain characters in file paths. I've provided detailed comments and suggestions to address these points.

gemini-code-assist · 2026-02-09T23:41:57Z

scripts/test_utils.sh

+    for i in "${!test_files_array[@]}"; do
+        local test_file="${test_files_array[$i]}"
+        local gpu_index=$((i % NUM_GPUS))
+        local gpu_id="${GPU_LIST[$gpu_index]}"
+        local file_index=$((i + 1))
+
+        # Launch test in background
+        local job_info
+        job_info=$(run_single_test_background "$test_file" "$gpu_id" "$file_index")
+
+        # Parse job info
+        local pid result_file log_file
+        IFS=':' read -r pid test_file result_file log_file file_index <<< "$job_info"
+        test_result_files[$pid]="$result_file:$test_file:$log_file:$file_index"
+        test_pid_map[$pid]="$test_file"
+
+        active_jobs=$((active_jobs + 1))
+
+        # Limit concurrent jobs to NUM_GPUS
+        while [ $active_jobs -ge $NUM_GPUS ]; do
+            # Wait for any job to finish
+            for pid in "${!test_pid_map[@]}"; do
+                if ! kill -0 "$pid" 2>/dev/null; then
+                    # Job finished, process result
+                    wait "$pid" 2>/dev/null || true
+                    active_jobs=$((active_jobs - 1))
+                    unset "test_pid_map[$pid]"
+                    break
+                fi
+            done
+            sleep 0.1
+        done
+    done


The current job scheduling logic underutilizes the available GPUs. For example, with NUM_GPUS=4, it will only keep 3 tests running concurrently. This is because it launches a job and then checks if capacity is met, at which point it waits. This results in NUM_GPUS - 1 concurrency.

The correct approach is to wait for a slot to become available before launching a new job when at capacity. This can be fixed by restructuring the launch loop and removing the redundant active_jobs counter in favor of checking the size of test_pid_map directly.

for i in "${!test_files_array[@]}"; do # Wait for a free slot if we are at capacity while [ "${#test_pid_map[@]}" -ge "$NUM_GPUS" ]; do local finished=false for pid in "${!test_pid_map[@]}"; do if ! kill -0 "$pid" 2>/dev/null; then wait "$pid" 2>/dev/null || true unset "test_pid_map[$pid]" finished=true break fi done if ! $finished; then sleep 0.1 fi done local test_file="${test_files_array[$i]}" local gpu_index=$((i % NUM_GPUS)) local gpu_id="${GPU_LIST[$gpu_index]}" local file_index=$((i + 1)) # Launch test in background local job_info job_info=$(run_single_test_background "$test_file" "$gpu_id" "$file_index") # Parse job info local pid result_file log_file IFS=':' read -r pid test_file result_file log_file file_index <<< "$job_info" test_result_files[$pid]="$result_file:$test_file:$log_file:$file_index" test_pid_map[$pid]="$test_file" done

gemini-code-assist · 2026-02-09T23:41:57Z

scripts/test_utils.sh

+        AVAILABLE_GPUS=$(nvidia-smi --list-gpus 2>/dev/null | awk '{print NR-1}' | tr '\n' ' ' | sed 's/ $//')
+        if [ -n "$AVAILABLE_GPUS" ]; then
+            echo "$AVAILABLE_GPUS"
+            return
+        fi


The use of nvidia-smi --list-gpus is not ideal for scripting as its output format is intended for human consumption and may change between driver versions. A more robust approach is to use nvidia-smi --query-gpu, which provides a stable, script-friendly interface. This also allows for better error handling by checking the command's exit code.

Suggested change

AVAILABLE_GPUS=$(nvidia-smi --list-gpus 2>/dev/null | awk '{print NR-1}' | tr '\n' ' ' | sed 's/ $//')

if [ -n "$AVAILABLE_GPUS" ]; then

echo "$AVAILABLE_GPUS"

return

fi

gpus=$(nvidia-smi --query-gpu=index --format=csv,noheader 2>/dev/null)

if [ $? -eq 0 ] && [ -n "$gpus" ]; then

AVAILABLE_GPUS=$(echo "$gpus" | tr '\n' ' ')

echo "$AVAILABLE_GPUS"

return

fi

scripts/test_utils.sh

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@scripts/test_utils.sh`:
- Around line 8-10: Defaulting PARALLEL_TESTS to true changes behavior and can
cause GPU resource contention; change the default to false (or make
PARALLEL_TESTS opt-in) and/or implement auto-fallback logic that sets
PARALLEL_TESTS=false when only a single GPU is detected by your existing
detection logic (e.g., detect_gpus/nvidia-smi check) so parallel execution is
only enabled when multiple GPUs are available; update the assignment for
PARALLEL_TESTS and the startup logic that reads it to respect the
GPU-count-based fallback while leaving SAMPLE_RATE unchanged.
- Around line 408-410: The current trap 'rm -rf "$PARALLEL_TMP_DIR"' overwrites
any existing EXIT handler; capture the existing EXIT trap (e.g.,
PREV_EXIT_TRAP=$(trap -p EXIT)), then install a new EXIT trap that removes
PARALLEL_TMP_DIR and then invokes the previous trap (via eval "$PREV_EXIT_TRAP"
or calling the saved handler), ensuring you reference PARALLEL_TMP_DIR, mktemp,
and the trap invocation in the change so the temporary directory is cleaned
while preserving any caller-registered EXIT cleanup.
- Around line 501-535: The round-robin GPU assignment (using i % NUM_GPUS and
GPU_LIST) allows multiple concurrent tests to use the same GPU because the
throttle only tracks active_jobs; change this to a free-GPU queue: maintain an
available_gpus list initialized from GPU_LIST, pop a GPU id from available_gpus
when launching run_single_test_background (use run_single_test_background
"$test_file" "$gpu_id" "$file_index"), store a mapping pid->gpu (e.g.,
test_gpu_map[$pid]="$gpu_id") and still track test_result_files/test_pid_map by
pid, then in the wait loop detect finished pids, return their GPU to
available_gpus before launching new jobs, and only launch when available_gpus is
non-empty (instead of checking active_jobs >= NUM_GPUS); ensure all places
referencing test_pid_map, test_result_files, active_jobs are updated to use the
new pid->gpu mapping and the available_gpus queue.

🧹 Nitpick comments (3)

scripts/test_gpu_detection.sh (1)
1-56: Consider adding assertions to make this a proper automated test.

This script prints detected values but never asserts expected outcomes, so regressions can only be caught by a human reading the output. Adding simple equality checks would make it usable in CI:
Example assertion pattern
 echo "Test 1: CUDA_VISIBLE_DEVICES='0,1,2,3'"
 export CUDA_VISIBLE_DEVICES="0,1,2,3"
 export PARALLEL_TESTS=true
 DETECTED=$(detect_gpus)
 echo "Detected GPUs: $DETECTED"
+if [ "$DETECTED" != "0 1 2 3" ]; then
+    echo "FAIL: Expected '0 1 2 3', got '$DETECTED'"
+    exit 1
+fi
 echo ""
scripts/test_utils.sh (2)

548-590: Test results are printed in non-deterministic order.

Iterating over the associative array test_result_files yields PIDs in arbitrary order, so logs and result lines will appear in a different (seemingly random) order each run. Consider sorting by file_index when printing to make output reproducible and easier to scan.

412-416: Paths with spaces will break the array split.

IFS=' ' read -r -a test_files_array <<< "$test_files" mirrors the pre-existing for test_file in $test_files pattern, so this isn't a regression. Worth noting if test paths ever contain spaces in the future—passing an array (or newline-delimited list) would be more robust.

scripts/test_utils.sh

- Fix GPU queue management: use free-GPU queue instead of round-robin to ensure only one test runs per GPU at a time (critical fix) - Preserve existing EXIT trap to avoid breaking caller cleanup - Add auto-fallback to sequential execution when only 1 GPU detected - Sort test results by file_index for deterministic output - Add assertions to test_gpu_detection.sh for automated validation Addresses all critical and nitpick issues from PR flashinfer-ai#2531 review. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

dierksen · 2026-02-10T00:16:20Z

/bot run

flashinfer-bot · 2026-02-10T00:17:02Z

@dierksen is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@scripts/test_utils.sh`:
- Around line 558-565: The colon-delimited parsing of the output from
run_single_test_background (the job_info variable) is unsafe because file paths
may contain ':'; update run_single_test_background to return fields using a safe
separator (e.g., NUL or newline) or, better, write each field to separate files
in PARALLEL_TMP_DIR keyed by PID, then change the consumer that currently does
IFS=':' read -r pid test_file result_file log_file file_index (and the other
result-processing reads that populate test_result_files, test_pid_map,
test_gpu_map) to read from those per-PID files (or use read -d '' for
NUL-separated input) and populate test_result_files[$pid], test_pid_map[$pid],
and test_gpu_map[$pid] accordingly so parsing cannot break on colons in paths.
- Around line 367-373: The unguarded use of CUDA_VISIBLE_DEVICES will fail under
set -u; update the conditional and uses to use the default-value syntax. Replace
occurrences in the if test and in the assignment with
"${CUDA_VISIBLE_DEVICES:-}" (e.g., change if [ -n "$CUDA_VISIBLE_DEVICES" ] && [
"$CUDA_VISIBLE_DEVICES" != "-1" ] to if [ -n "${CUDA_VISIBLE_DEVICES:-}" ] && [
"${CUDA_VISIBLE_DEVICES:-}" != "-1" ] and change AVAILABLE_GPUS=$(echo
"$CUDA_VISIBLE_DEVICES" | ...) to AVAILABLE_GPUS=$(echo
"${CUDA_VISIBLE_DEVICES:-}" | ...)) so the script won’t error when the variable
is unset.
- Around line 604-635: Add an explicit else branch for the existing if [ -f
"$result_file" ] check that logs a clear warning when the result file is missing
(include $test_file and $result_file in the message) and set EXIT_CODE=1 so CI
can surface the issue; modify the block around variables result_file,
TOTAL_TESTS and EXIT_CODE (the loop that reads and handles result) to echo or
printf a one-line warning like "WARNING: missing result file for $test_file
(expected $result_file)" and set EXIT_CODE=1 (do not change existing
passed/failed/skipped counting logic).

🧹 Nitpick comments (3)

scripts/test_utils.sh (3)

433-435: Trap preservation: edge-case fragility with single quotes in existing trap.

The sed pattern on line 434 strips the outer '...' wrapping from trap -p EXIT, but if the existing trap body itself contains escaped single quotes (e.g., 'can'\''t'), the resulting string will be malformed. This is unlikely in practice but worth a defensive note.

454-527: Nested function run_single_test_background leaks into global scope.

In bash, functions defined inside other functions are still added to the global function namespace. After run_tests_parallel runs, run_single_test_background remains callable from anywhere. This is a bash quirk, not a bug, but worth knowing — especially since it silently captures mode, total_files, and PARALLEL_TMP_DIR from the caller's scope at fork time and won't work correctly if called standalone.

437-440: Test file paths with spaces will break the space-delimited splitting.

Both the sequential fallback (line 410) and the parallel path (line 439) split $test_files on whitespace. This is consistent with the pre-existing pattern in the script, but worth noting: if any test file path contains spaces, it will be incorrectly split into multiple entries.

scripts/test_utils.sh

dierksen · 2026-02-10T00:23:58Z

/bot run

flashinfer-bot · 2026-02-10T00:24:23Z

@dierksen is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww

yongwww · 2026-02-10T00:28:18Z

maybe it needs some time to propagate, could you try it again in a few minutes? @dierksen

dierksen · 2026-02-10T00:29:25Z

/bot run

flashinfer-bot · 2026-02-10T00:29:36Z

GitLab MR !305 has been created, and the CI pipeline #43649074 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-10T16:33:43Z

[FAILED] Pipeline #43649074: 10/20 passed

yongwww · 2026-02-12T00:01:31Z

Hi @dierksen, I am trying to read the multi-gpu / multi-node test log in the pipeline, for example in https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/265444028, I saw:

TEST SUMMARY
==========================================
Total test files executed: 0
Passed: 0
Failed: 0

It’d be good to double-check whether the tests are actually running, or if this is just a summary/reporting issue.

dierksen

/bot run

dierksen

Hi @dierksen, I am trying to read the multi-gpu / multi-node test log in the pipeline, for example in https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/265444028, I saw:
TEST SUMMARY
==========================================
Total test files executed: 0
Passed: 0
Failed: 0
It’d be good to double-check whether the tests are actually running, or if this is just a summary/reporting issue.

Good catch, I think the multi-GPU tests were being skipped because they were essentially falling back to single-GPU testing in parallel. I updated the logic for that and requested another ci-bot run.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@scripts/test_utils.sh`:
- Line 10: The comment for the PARALLEL_TESTS shell variable is inaccurate: it
says "Enable parallel test execution by default" while the variable is
initialized to false; update either the comment or the default so they match —
e.g., change the comment to "Disable parallel test execution by default" or set
PARALLEL_TESTS to true; modify the line that sets PARALLEL_TESTS (the :
"${PARALLEL_TESTS:=false}" assignment) and its inline comment to keep semantics
and documentation consistent.

scripts/test_utils.sh

dierksen · 2026-02-12T02:10:42Z

/bot run

flashinfer-bot · 2026-02-12T02:11:27Z

GitLab MR !305 has been updated with latest changes, and the CI pipeline #43843112 is currently running. I'll report back once the pipeline job completes.

yongwww · 2026-02-12T14:44:33Z

GitLab MR !305 has been updated with latest changes, and the CI pipeline #43843112 is currently running. I'll report back once the pipeline job completes.

Seems like the multi-gpu (4-gpu) test duration is not faster than before. For example, in run 266339217, muti-gpu cu13 takes ~12 min, however, the test is taking ~17 min with this change.

dierksen · 2026-02-12T15:08:11Z

GitLab MR !305 has been updated with latest changes, and the CI pipeline #43843112 is currently running. I'll report back once the pipeline job completes.

Seems like the multi-gpu (4-gpu) test duration is not faster than before. For example, in run 266339217, muti-gpu cu13 takes ~12 min, however, the test is taking ~17 min with this change.

I wouldn't expected the multi-gpu test to be faster than before; the parallel testing is disabled in those cases now.

In the cu130 case here, it seems like the 4 minute difference is just due to wait time for compute resources. The cu129 case was actually faster, likely for the same reason.

I had claude compile a table for the two pipeline runs for a little better visibility:

Test Job	Before	After	Δ Time	% Change
`multi_gpu_test_b300: [cu129]`	11m 58s	8m 49s	-3m 8s	-26.3%
`multi_gpu_test_b300: [cu130]`	12m 51s	17m 13s	+4m 22s	+34.0%
`multi_node_test_gb200: [cu129]`	5m 32s	4m 41s	-0m 50s	-15.3%
`multi_node_test_gb200: [cu130]`	9m 42s	4m 41s	-5m 1s	-51.7%
`multi_node_test_gb300: [cu129]`	7m 28s	7m 46s	+0m 18s	+4.1%
`multi_node_test_gb300: [cu130]`	5m 51s	6m 47s	+0m 56s	+16.0%
`unit_test_5090: [cu129]`	146m 24s	147m 26s	+1m 1s	+0.7%
`unit_test_5090: [cu130]`	148m 50s	149m 21s	+0m 31s	+0.4%
`unit_test_b200: [cu129]`	215m 19s	55m 7s	-160m 12s	-74.4%
`unit_test_b200: [cu130]`	207m 6s	54m 24s	-152m 41s	-73.7%
`unit_test_b300: [cu129]`	221m 36s	220m 9s	-1m 26s	-0.6%
`unit_test_b300: [cu130]`	213m 49s	213m 58s	+0m 9s	+0.1%
`unit_test_gb200: [cu129]`	314m 46s	68m 11s	-246m 34s	-78.3%
`unit_test_gb200: [cu130]`	314m 45s	66m 42s	-248m 3s	-78.8%
`unit_test_gb300: [cu129]`	360m 52s	78m 36s	-282m 15s	-78.2%
`unit_test_gb300: [cu130]`	360m 52s	72m 50s	-288m 1s	-79.8%
`unit_test_h100: [cu129]`	213m 32s	236m 9s	+22m 37s	+10.6%
`unit_test_h100: [cu130]`	199m 28s	243m 39s	+44m 10s	+22.1%
`unit_test_rtx-pro-6000-blackwell: [cu129]`	126m 19s	126m 31s	+0m 12s	+0.2%
`unit_test_rtx-pro-6000-blackwell: [cu130]`	126m 5s	124m 44s	-1m 21s	-1.1%
`unit_test_spark: [cu129]`	205m 30s	192m 15s	-13m 15s	-6.5%
`unit_test_spark: [cu130]`	189m 57s	216m 49s	+26m 51s	+14.1%
`unit_test_thor: [cu130]`	331m 8s	464m 14s	+133m 6s	+40.2%

B200, GB200, and GB300 see huge improvements since those are running in clusters. H100, Spark, and Thor are still going through some changes on the internal CI, so I wouldn't put too much stock in those. The multi-GPU numbers look pretty variable because the queue time can wind up being significant compared to the total runtime of the tests.

(edit: I should also note the GB200 and GB300 were timing out before, whereas now they're actually finishing)

scripts/test_gpu_detection.sh

scripts/test_utils.sh

flashinfer-bot · 2026-02-12T19:12:56Z

[FAILED] Pipeline #43843112: 16/20 passed

kahyunnam

lgtm

dierksen added 2 commits February 9, 2026 15:51

Enable parallel testing on systems with multiple GPUs

a1247e5

Remove default for CUDA_VISIBLE_DEVICES

89f4a73

dierksen requested review from jimmyzho, kahyunnam, nvmbreughe and yzh119 as code owners February 9, 2026 23:39

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

scripts/test_utils.sh Outdated Show resolved Hide resolved

scripts/test_utils.sh Outdated Show resolved Hide resolved

scripts/test_utils.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

scripts/test_utils.sh Show resolved Hide resolved

scripts/test_utils.sh Show resolved Hide resolved

scripts/test_utils.sh Show resolved Hide resolved

Disable parallel testing by default and for multi-GPU tests

cfb1598

dierksen commented Feb 12, 2026

View reviewed changes

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

scripts/test_utils.sh Outdated Show resolved Hide resolved

fix comment

d6c1e7a

yongwww reviewed Feb 12, 2026

View reviewed changes

scripts/test_gpu_detection.sh Outdated Show resolved Hide resolved

scripts/test_utils.sh Show resolved Hide resolved

remove test_gpu_detection.sh

28cc042

yongwww approved these changes Feb 12, 2026

View reviewed changes

kahyunnam approved these changes Feb 12, 2026

View reviewed changes

kahyunnam merged commit 1e94c60 into flashinfer-ai:main Feb 12, 2026
25 checks passed

dierksen deleted the parallel-testing branch February 12, 2026 22:07

Conversation

dierksen commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dierksen commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dierksen commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

yongwww commented Feb 10, 2026

Uh oh!

dierksen commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

yongwww commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dierksen left a comment

Choose a reason for hiding this comment

Uh oh!

dierksen left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dierksen commented Feb 12, 2026

Uh oh!

dierksen commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

yongwww commented Feb 12, 2026 •

edited

Loading

dierksen commented Feb 12, 2026 •

edited

Loading