enforce wheel size limits, README formatting in CI#6136
enforce wheel size limits, README formatting in CI#6136rapids-bot[bot] merged 3 commits intorapidsai:branch-24.12from jameslamb:wheel-validation
Conversation
| ] | ||
|
|
||
| # detect when package size grows significantly | ||
| max_allowed_size_compressed = '1.5G' |
There was a problem hiding this comment.
Is this value coming from the existing wheel size, or coming from somewhere else?
There was a problem hiding this comment.
This seems too big. cuML wheels appear to be closer to 550MB. Source: https://anaconda.org/rapidsai-wheels-nightly/cuml-cu12/files
Maybe set the threshold at 600MB.
There was a problem hiding this comment.
This is the existing wheel size + a buffer. It varies by CPU architecture, Python version (because of Cython stuff), and CUDA version (because, for example, we don't use CUDA math lib wheels for CUDA 11).
The largest one I've seen clicking through logs on this PR was CUDA 11.8.0, Python 3.10, amd64:
checking 'final_dist/cuml_cu11-24.12.0a38-cp310-cp310-manylinux_2_28_x86_64.whl'
----- package inspection summary -----
file size
* compressed size: 1.3G
* uncompressed size: 2.2G
* compression space saving: 42.2%
contents
* directories: 72
* files: 432 (86 compiled)
size by extension
* .so - 2.2G (99.9%)
* .py - 1.4M (0.1%)
* .pyx - 1.2M (0.1%)
* .0 - 0.2M (0.0%)
* .ipynb - 0.1M (0.0%)
* no-extension - 57.2K (0.0%)
* .png - 51.3K (0.0%)
* .pxd - 34.0K (0.0%)
* .txt - 25.7K (0.0%)
* .md - 10.3K (0.0%)
* .h - 2.1K (0.0%)
* .ini - 0.8K (0.0%)
largest files
* (2.2G) cuml/libcuml++.so
* (3.0M) cuml/experimental/fil/fil.cpython-310-x86_64-linux-gnu.so
* (2.9M) cuml/fil/fil.cpython-310-x86_64-linux-gnu.so
* (1.5M) cuml/cluster/hdbscan/hdbscan.cpython-310-x86_64-linux-gnu.so
* (1.5M) cuml/svm/linear.cpython-310-x86_64-linux-gnu.so
------------ check results -----------
errors found while checking: 0
So proposing setting this to around 200MB above that size, so we'd be notified if the binary size increased above that level.
There's nothing special about 1.5GB... it's already way way too big to be on PyPI. But proposing putting some limit so that we can get automated feedback from CI about binary size growth, and make informed decisions about whether to do something about it... similar to setting a coverage threshold for tests.
There was a problem hiding this comment.
Aaahhh, but CUDA 11 is huge. We only did CUDA wheels work for CUDA 12. https://anaconda.org/rapidsai-wheels-nightly/cuml-cu11/files
There was a problem hiding this comment.
Yeah exactly:
cuml/python/cuml/CMakeLists.txt
Lines 96 to 104 in 56e5e62
There was a problem hiding this comment.
I’m indifferent as well. Let’s stick to the single definition for now.
There was a problem hiding this comment.
alright sounds good, thanks for considering it. I'm glad these changes are helping to expose these differences and leading to these conversations 😊
There was a problem hiding this comment.
I'd be a fan having two different limits. Mostly because for "not CUDA 11" the limit of 1.5GB might as well be "infinity". As in, if we ever reach it, it will be way to late to course correct.
Should I make a PR that uses Jams' suggestion for two limits?
There was a problem hiding this comment.
@betatim I've put up PRs in other repos following this suggestion, if you'd like something to copy from here in cuml:
bdice
left a comment
There was a problem hiding this comment.
Accepting the large threshold for now. Maybe we can make it depend on CUDA version? Or just wait until we drop CUDA 11.
|
/merge |
`cuvs-cu11` wheels are significantly larger than `cuvs-cu12` wheels, because (among other reasons) they are not able to dynamically link to CUDA math library wheels. In #464, I proposed a size limit for CI checks of "max CUDA 11 wheel size + a buffer". This PR proposes using different thresholds based on CUDA major version, following these discussions: * rapidsai/cugraph#4754 (comment) * rapidsai/cuml#6136 (comment) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #469
Description
Contributes to rapidsai/build-planning#110
Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.