Skip to content

Remove reliance on remote datasets in tests#7637

Merged
rapids-bot[bot] merged 12 commits intorapidsai:mainfrom
csadorf:fix/do-not-rely-on-remote-datasets
Dec 31, 2025
Merged

Remove reliance on remote datasets in tests#7637
rapids-bot[bot] merged 12 commits intorapidsai:mainfrom
csadorf:fix/do-not-rely-on-remote-datasets

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Dec 31, 2025

Replace fetched datasets with synthetic generated data to make tests more robust and eliminate network dependencies.

Closes #3161 ; Closes #5158; Closes #6558; Closes #7639

@csadorf csadorf requested a review from a team as a code owner December 31, 2025 18:33
@csadorf csadorf requested a review from divyegala December 31, 2025 18:33
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Dec 31, 2025
@csadorf csadorf added bug Something isn't working non-breaking Non-breaking change and removed Cython / Python Cython or Python issue labels Dec 31, 2025
The generator functions mimick the originally used datasets, including
the California housing dataset and the 20 newsgroups dataset. In
addition, we update some precision assers to work with the new datasets.
@csadorf csadorf force-pushed the fix/do-not-rely-on-remote-datasets branch from d91469e to bf00358 Compare December 31, 2025 19:24
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Dec 31, 2025
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Dec 31, 2025

I am addressing the remaining dask test failures.

The fixture functions remain as thin wrappers.
It was a very thin wrapper of the make_regression synthetic dataset
generation function and used in only one place where it actually failed
with an xfail marker.
To better reflect the actual fixture content and avoid misconceptions.
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Dec 31, 2025

/merge

1 similar comment
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Dec 31, 2025

/merge

@rapids-bot rapids-bot Bot merged commit d494d67 into rapidsai:main Dec 31, 2025
190 of 193 checks passed
@csadorf csadorf deleted the fix/do-not-rely-on-remote-datasets branch January 2, 2026 16:03
rapids-bot Bot pushed a commit that referenced this pull request Jan 2, 2026
To not rely on remote datasets.

Closes #7643 .

Follow-up to #7637 .

Authors:
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #7644
rapids-bot Bot pushed a commit that referenced this pull request Jan 5, 2026
#7637 removed the last uses of `tenacity` here.

This PR removes that dependency from test environments, including the `[test]` extra for `cuml` wheels.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Bradley Dice (https://github.com/bdice)

URL: #7645
mani-builds pushed a commit to mani-builds/cuml that referenced this pull request Jan 11, 2026
Replace fetched datasets with synthetic generated data to make tests more robust and eliminate network dependencies.

Closes rapidsai#3161 ; Closes rapidsai#5158; Closes rapidsai#6558; Closes rapidsai#7639

Authors:
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#7637
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

3 participants