Skip to content

Add job-specific timeouts to GHA test jobs#20730

Merged
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
bdice:add-ci-test-timeouts
Nov 26, 2025
Merged

Add job-specific timeouts to GHA test jobs#20730
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
bdice:add-ci-test-timeouts

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Nov 25, 2025

Summary

Adds timeout commands to all CI test scripts to prevent indefinite hangs, following the pattern established in rapidsai/cuml#7533.

Changes

Adds timeout commands to test execution in 13 CI scripts:

  • C++ tests: 30m (run_cudf_ctests, run_cudf_examples, run_cudf_kafka_ctests, run_cudf_benchmark_smoketests)
  • C++ memcheck: 2h (observed max: 70 min)
  • Java tests: 30m (observed max: 9 min)
  • Python cudf tests: 40m each (observed max: 14 min)
  • Python other tests: 30m each (observed max: 13 min)
  • Wheel cudf tests: 30m each (observed max: 14 min)
  • Wheel cudf-polars tests: 1h (observed max: 30 min)
  • Wheel dask-cudf tests: 15m (observed max: 6 min)
  • cudf-polars polars tests: 30m (observed max: 13 min)
  • cudf-polars with rapidsmpf: 15m (observed max: 6 min)
  • narwhals tests: 15m each (observed max: 4 min)
  • notebooks tests: 10m per notebook (observed max: 3 min)
  • third-party integration tests: 45m (observed max: 23 min)

Timeout Selection

Timeout values are set at 2-3x observed runtimes from recent successful test runs (analyzed from run 19673854267 and others), providing sufficient safety margin while preventing resource waste from hung tests.

Adds timeout commands to all CI test scripts to prevent indefinite hangs.
Timeout values are set at 2-3x observed runtimes from recent successful
test runs, providing sufficient safety margin while preventing resource waste.
@bdice bdice requested review from a team as code owners November 25, 2025 17:40
@bdice bdice requested review from Matt711, gforsyth, vyasr and wence- and removed request for a team November 25, 2025 17:40
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Nov 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bdice bdice changed the base branch from branch-25.08 to main November 25, 2025 17:41
@github-actions github-actions Bot added the External Issues or PRs created by external contributors label Nov 25, 2025
@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed External Issues or PRs created by external contributors labels Nov 25, 2025
@bdice bdice removed request for a team, Matt711, vyasr and wence- November 25, 2025 17:43
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Nov 25, 2025

/merge

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Nov 26, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 33ab2a7 into rapidsai:main Nov 26, 2025
263 of 271 checks passed
TomAugspurger pushed a commit to TomAugspurger/pygdf that referenced this pull request Nov 26, 2025
## Summary
Adds timeout commands to all CI test scripts to prevent indefinite hangs, following the pattern established in rapidsai/cuml#7533.

## Changes
Adds `timeout` commands to test execution in 13 CI scripts:
- C++ tests: 30m (run_cudf_ctests, run_cudf_examples, run_cudf_kafka_ctests, run_cudf_benchmark_smoketests)
- C++ memcheck: 2h (observed max: 70 min)
- Java tests: 30m (observed max: 9 min)
- Python cudf tests: 40m each (observed max: 14 min)
- Python other tests: 30m each (observed max: 13 min)
- Wheel cudf tests: 30m each (observed max: 14 min)
- Wheel cudf-polars tests: 1h (observed max: 30 min)
- Wheel dask-cudf tests: 15m (observed max: 6 min)
- cudf-polars polars tests: 30m (observed max: 13 min)
- cudf-polars with rapidsmpf: 15m (observed max: 6 min)
- narwhals tests: 15m each (observed max: 4 min)
- notebooks tests: 10m per notebook (observed max: 3 min)
- third-party integration tests: 45m (observed max: 23 min)

## Timeout Selection
Timeout values are set at 2-3x observed runtimes from recent successful test runs (analyzed from run 19673854267 and others), providing sufficient safety margin while preventing resource waste from hung tests.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#20730
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants