Add parallel testing to unit test script#2531
Conversation
Summary of ChangesHello @dierksen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the efficiency of unit test execution, particularly on systems equipped with multiple GPUs. By enabling parallel testing, it aims to reduce CI/CD pipeline timeouts and accelerate feedback from automated testing, leading to a more streamlined development workflow. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughAdds GPU detection and parallel test orchestration utilities, a new GPU-detection test script, and enables parallel test execution in the unit test task script. Changes
Sequence DiagramsequenceDiagram
participant Executor as Test Executor
participant Detector as GPU Detector
participant Orchestrator as Parallel Orchestrator
participant Worker as Background Test Worker
participant Aggregator as Result Aggregator
Executor->>Detector: detect_gpus()
Detector-->>Executor: gpu_list
Executor->>Orchestrator: run_tests_parallel(tests, gpu_list)
par parallel test runs
Orchestrator->>Worker: run_single_test_background(test1, gpu=0)
Worker-->>Orchestrator: exit code / log
and
Orchestrator->>Worker: run_single_test_background(test2, gpu=1)
Worker-->>Orchestrator: exit code / log
and
Orchestrator->>Worker: run_single_test_background(testN, gpu=k)
Worker-->>Orchestrator: exit code / log
end
Orchestrator->>Aggregator: collect per-test results & JUnit XML
Aggregator-->>Executor: aggregated report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces parallel testing for unit tests, which is a great improvement for CI/CD efficiency. The implementation is solid, but I've identified a few areas for improvement regarding robustness and performance. Specifically, the GPU detection can be made more reliable, the parallel job scheduling logic has a flaw that underutilizes GPUs, and the method for passing job information is not robust against certain characters in file paths. I've provided detailed comments and suggestions to address these points.
scripts/test_utils.sh
Outdated
| for i in "${!test_files_array[@]}"; do | ||
| local test_file="${test_files_array[$i]}" | ||
| local gpu_index=$((i % NUM_GPUS)) | ||
| local gpu_id="${GPU_LIST[$gpu_index]}" | ||
| local file_index=$((i + 1)) | ||
|
|
||
| # Launch test in background | ||
| local job_info | ||
| job_info=$(run_single_test_background "$test_file" "$gpu_id" "$file_index") | ||
|
|
||
| # Parse job info | ||
| local pid result_file log_file | ||
| IFS=':' read -r pid test_file result_file log_file file_index <<< "$job_info" | ||
| test_result_files[$pid]="$result_file:$test_file:$log_file:$file_index" | ||
| test_pid_map[$pid]="$test_file" | ||
|
|
||
| active_jobs=$((active_jobs + 1)) | ||
|
|
||
| # Limit concurrent jobs to NUM_GPUS | ||
| while [ $active_jobs -ge $NUM_GPUS ]; do | ||
| # Wait for any job to finish | ||
| for pid in "${!test_pid_map[@]}"; do | ||
| if ! kill -0 "$pid" 2>/dev/null; then | ||
| # Job finished, process result | ||
| wait "$pid" 2>/dev/null || true | ||
| active_jobs=$((active_jobs - 1)) | ||
| unset "test_pid_map[$pid]" | ||
| break | ||
| fi | ||
| done | ||
| sleep 0.1 | ||
| done | ||
| done |
There was a problem hiding this comment.
The current job scheduling logic underutilizes the available GPUs. For example, with NUM_GPUS=4, it will only keep 3 tests running concurrently. This is because it launches a job and then checks if capacity is met, at which point it waits. This results in NUM_GPUS - 1 concurrency.
The correct approach is to wait for a slot to become available before launching a new job when at capacity. This can be fixed by restructuring the launch loop and removing the redundant active_jobs counter in favor of checking the size of test_pid_map directly.
for i in "${!test_files_array[@]}"; do
# Wait for a free slot if we are at capacity
while [ "${#test_pid_map[@]}" -ge "$NUM_GPUS" ]; do
local finished=false
for pid in "${!test_pid_map[@]}"; do
if ! kill -0 "$pid" 2>/dev/null; then
wait "$pid" 2>/dev/null || true
unset "test_pid_map[$pid]"
finished=true
break
fi
done
if ! $finished; then
sleep 0.1
fi
done
local test_file="${test_files_array[$i]}"
local gpu_index=$((i % NUM_GPUS))
local gpu_id="${GPU_LIST[$gpu_index]}"
local file_index=$((i + 1))
# Launch test in background
local job_info
job_info=$(run_single_test_background "$test_file" "$gpu_id" "$file_index")
# Parse job info
local pid result_file log_file
IFS=':' read -r pid test_file result_file log_file file_index <<< "$job_info"
test_result_files[$pid]="$result_file:$test_file:$log_file:$file_index"
test_pid_map[$pid]="$test_file"
done| AVAILABLE_GPUS=$(nvidia-smi --list-gpus 2>/dev/null | awk '{print NR-1}' | tr '\n' ' ' | sed 's/ $//') | ||
| if [ -n "$AVAILABLE_GPUS" ]; then | ||
| echo "$AVAILABLE_GPUS" | ||
| return | ||
| fi |
There was a problem hiding this comment.
The use of nvidia-smi --list-gpus is not ideal for scripting as its output format is intended for human consumption and may change between driver versions. A more robust approach is to use nvidia-smi --query-gpu, which provides a stable, script-friendly interface. This also allows for better error handling by checking the command's exit code.
| AVAILABLE_GPUS=$(nvidia-smi --list-gpus 2>/dev/null | awk '{print NR-1}' | tr '\n' ' ' | sed 's/ $//') | |
| if [ -n "$AVAILABLE_GPUS" ]; then | |
| echo "$AVAILABLE_GPUS" | |
| return | |
| fi | |
| gpus=$(nvidia-smi --query-gpu=index --format=csv,noheader 2>/dev/null) | |
| if [ $? -eq 0 ] && [ -n "$gpus" ]; then | |
| AVAILABLE_GPUS=$(echo "$gpus" | tr '\n' ' ') | |
| echo "$AVAILABLE_GPUS" | |
| return | |
| fi |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/test_utils.sh`:
- Around line 8-10: Defaulting PARALLEL_TESTS to true changes behavior and can
cause GPU resource contention; change the default to false (or make
PARALLEL_TESTS opt-in) and/or implement auto-fallback logic that sets
PARALLEL_TESTS=false when only a single GPU is detected by your existing
detection logic (e.g., detect_gpus/nvidia-smi check) so parallel execution is
only enabled when multiple GPUs are available; update the assignment for
PARALLEL_TESTS and the startup logic that reads it to respect the
GPU-count-based fallback while leaving SAMPLE_RATE unchanged.
- Around line 408-410: The current trap 'rm -rf "$PARALLEL_TMP_DIR"' overwrites
any existing EXIT handler; capture the existing EXIT trap (e.g.,
PREV_EXIT_TRAP=$(trap -p EXIT)), then install a new EXIT trap that removes
PARALLEL_TMP_DIR and then invokes the previous trap (via eval "$PREV_EXIT_TRAP"
or calling the saved handler), ensuring you reference PARALLEL_TMP_DIR, mktemp,
and the trap invocation in the change so the temporary directory is cleaned
while preserving any caller-registered EXIT cleanup.
- Around line 501-535: The round-robin GPU assignment (using i % NUM_GPUS and
GPU_LIST) allows multiple concurrent tests to use the same GPU because the
throttle only tracks active_jobs; change this to a free-GPU queue: maintain an
available_gpus list initialized from GPU_LIST, pop a GPU id from available_gpus
when launching run_single_test_background (use run_single_test_background
"$test_file" "$gpu_id" "$file_index"), store a mapping pid->gpu (e.g.,
test_gpu_map[$pid]="$gpu_id") and still track test_result_files/test_pid_map by
pid, then in the wait loop detect finished pids, return their GPU to
available_gpus before launching new jobs, and only launch when available_gpus is
non-empty (instead of checking active_jobs >= NUM_GPUS); ensure all places
referencing test_pid_map, test_result_files, active_jobs are updated to use the
new pid->gpu mapping and the available_gpus queue.
🧹 Nitpick comments (3)
scripts/test_gpu_detection.sh (1)
1-56: Consider adding assertions to make this a proper automated test.This script prints detected values but never asserts expected outcomes, so regressions can only be caught by a human reading the output. Adding simple equality checks would make it usable in CI:
Example assertion pattern
echo "Test 1: CUDA_VISIBLE_DEVICES='0,1,2,3'" export CUDA_VISIBLE_DEVICES="0,1,2,3" export PARALLEL_TESTS=true DETECTED=$(detect_gpus) echo "Detected GPUs: $DETECTED" +if [ "$DETECTED" != "0 1 2 3" ]; then + echo "FAIL: Expected '0 1 2 3', got '$DETECTED'" + exit 1 +fi echo ""scripts/test_utils.sh (2)
548-590: Test results are printed in non-deterministic order.Iterating over the associative array
test_result_filesyields PIDs in arbitrary order, so logs and result lines will appear in a different (seemingly random) order each run. Consider sorting byfile_indexwhen printing to make output reproducible and easier to scan.
412-416: Paths with spaces will break the array split.
IFS=' ' read -r -a test_files_array <<< "$test_files"mirrors the pre-existingfor test_file in $test_filespattern, so this isn't a regression. Worth noting if test paths ever contain spaces in the future—passing an array (or newline-delimited list) would be more robust.
- Fix GPU queue management: use free-GPU queue instead of round-robin to ensure only one test runs per GPU at a time (critical fix) - Preserve existing EXIT trap to avoid breaking caller cleanup - Add auto-fallback to sequential execution when only 1 GPU detected - Sort test results by file_index for deterministic output - Add assertions to test_gpu_detection.sh for automated validation Addresses all critical and nitpick issues from PR flashinfer-ai#2531 review. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
/bot run |
|
@dierksen is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/test_utils.sh`:
- Around line 558-565: The colon-delimited parsing of the output from
run_single_test_background (the job_info variable) is unsafe because file paths
may contain ':'; update run_single_test_background to return fields using a safe
separator (e.g., NUL or newline) or, better, write each field to separate files
in PARALLEL_TMP_DIR keyed by PID, then change the consumer that currently does
IFS=':' read -r pid test_file result_file log_file file_index (and the other
result-processing reads that populate test_result_files, test_pid_map,
test_gpu_map) to read from those per-PID files (or use read -d '' for
NUL-separated input) and populate test_result_files[$pid], test_pid_map[$pid],
and test_gpu_map[$pid] accordingly so parsing cannot break on colons in paths.
- Around line 367-373: The unguarded use of CUDA_VISIBLE_DEVICES will fail under
set -u; update the conditional and uses to use the default-value syntax. Replace
occurrences in the if test and in the assignment with
"${CUDA_VISIBLE_DEVICES:-}" (e.g., change if [ -n "$CUDA_VISIBLE_DEVICES" ] && [
"$CUDA_VISIBLE_DEVICES" != "-1" ] to if [ -n "${CUDA_VISIBLE_DEVICES:-}" ] && [
"${CUDA_VISIBLE_DEVICES:-}" != "-1" ] and change AVAILABLE_GPUS=$(echo
"$CUDA_VISIBLE_DEVICES" | ...) to AVAILABLE_GPUS=$(echo
"${CUDA_VISIBLE_DEVICES:-}" | ...)) so the script won’t error when the variable
is unset.
- Around line 604-635: Add an explicit else branch for the existing if [ -f
"$result_file" ] check that logs a clear warning when the result file is missing
(include $test_file and $result_file in the message) and set EXIT_CODE=1 so CI
can surface the issue; modify the block around variables result_file,
TOTAL_TESTS and EXIT_CODE (the loop that reads and handles result) to echo or
printf a one-line warning like "WARNING: missing result file for $test_file
(expected $result_file)" and set EXIT_CODE=1 (do not change existing
passed/failed/skipped counting logic).
🧹 Nitpick comments (3)
scripts/test_utils.sh (3)
433-435: Trap preservation: edge-case fragility with single quotes in existing trap.The
sedpattern on line 434 strips the outer'...'wrapping fromtrap -p EXIT, but if the existing trap body itself contains escaped single quotes (e.g.,'can'\''t'), the resulting string will be malformed. This is unlikely in practice but worth a defensive note.
454-527: Nested functionrun_single_test_backgroundleaks into global scope.In bash, functions defined inside other functions are still added to the global function namespace. After
run_tests_parallelruns,run_single_test_backgroundremains callable from anywhere. This is a bash quirk, not a bug, but worth knowing — especially since it silently capturesmode,total_files, andPARALLEL_TMP_DIRfrom the caller's scope at fork time and won't work correctly if called standalone.
437-440: Test file paths with spaces will break the space-delimited splitting.Both the sequential fallback (line 410) and the parallel path (line 439) split
$test_fileson whitespace. This is consistent with the pre-existing pattern in the script, but worth noting: if any test file path contains spaces, it will be incorrectly split into multiple entries.
|
/bot run |
|
@dierksen is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww |
|
maybe it needs some time to propagate, could you try it again in a few minutes? @dierksen |
|
/bot run |
|
[FAILED] Pipeline #43649074: 10/20 passed |
|
Hi @dierksen, I am trying to read the multi-gpu / multi-node test log in the pipeline, for example in https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/265444028, I saw: It’d be good to double-check whether the tests are actually running, or if this is just a summary/reporting issue. |
dierksen
left a comment
There was a problem hiding this comment.
Hi @dierksen, I am trying to read the multi-gpu / multi-node test log in the pipeline, for example in https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/265444028, I saw:
TEST SUMMARY ========================================== Total test files executed: 0 Passed: 0 Failed: 0It’d be good to double-check whether the tests are actually running, or if this is just a summary/reporting issue.
Good catch, I think the multi-GPU tests were being skipped because they were essentially falling back to single-GPU testing in parallel. I updated the logic for that and requested another ci-bot run.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/test_utils.sh`:
- Line 10: The comment for the PARALLEL_TESTS shell variable is inaccurate: it
says "Enable parallel test execution by default" while the variable is
initialized to false; update either the comment or the default so they match —
e.g., change the comment to "Disable parallel test execution by default" or set
PARALLEL_TESTS to true; modify the line that sets PARALLEL_TESTS (the :
"${PARALLEL_TESTS:=false}" assignment) and its inline comment to keep semantics
and documentation consistent.
|
/bot run |
Seems like the multi-gpu (4-gpu) test duration is not faster than before. For example, in run 266339217, muti-gpu cu13 takes ~12 min, however, the test is taking ~17 min with this change. |
I wouldn't expected the multi-gpu test to be faster than before; the parallel testing is disabled in those cases now. In the cu130 case here, it seems like the 4 minute difference is just due to wait time for compute resources. The cu129 case was actually faster, likely for the same reason. I had claude compile a table for the two pipeline runs for a little better visibility:
B200, GB200, and GB300 see huge improvements since those are running in clusters. H100, Spark, and Thor are still going through some changes on the internal CI, so I wouldn't put too much stock in those. The multi-GPU numbers look pretty variable because the queue time can wind up being significant compared to the total runtime of the tests. (edit: I should also note the GB200 and GB300 were timing out before, whereas now they're actually finishing) |
|
[FAILED] Pipeline #43843112: 16/20 passed |
📌 Description
To make better use of available GPUs, instead of running all unit tests sequentially, this breaks
them up into chunks based on the number of available GPUs. This should help reduce issues with
timeouts on internal CI/CD and give us faster results from ci-bot as well.
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
Tests