Skip to content

Conversation

@jjacobelli
Copy link
Contributor

@jjacobelli jjacobelli commented Apr 21, 2022

This PR convert the from sources build we are doing in GPU test job to a conda build. This is done for the following reasons:

  • This is required step to improve the Ops CI/CD setup to a more convenient pipeline
  • This is required to start using conda compilers and mamba to build RAPIDS packages
  • This prevent us from manually managing and installing the dependencies in GPU job
  • This ensure the packages can be installed
  • This ensure the tests are running and working against the package content and not the build results. Currently the Python packages are not tested.

This may increase the global pipeline time, but the usage of mamba should resolve this as mamba is faster than conda to build packages

@jjacobelli jjacobelli added DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 21, 2022
@jjacobelli jjacobelli self-assigned this Apr 21, 2022
@github-actions github-actions bot added the gpuCI gpuCI issue label Apr 21, 2022
@jjacobelli jjacobelli force-pushed the remove-from-sources branch from ec21c59 to c917ad7 Compare April 21, 2022 08:23
@pentschev
Copy link
Member

Is there some advantage to doing this, or is it just to test conda packaging as well?

@jjacobelli
Copy link
Contributor Author

jjacobelli commented Apr 21, 2022

Is there some advantage to doing this, or is it just to test conda packaging as well?

That's one of the advantages. We are also doing this to remove the usage of the rapids-build-env package and to start using conda compilers with mamba to build packages. I will update the PR body with all of this before merging it

@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from b6360f6 to a7e30b3 Compare April 25, 2022 09:33
@jjacobelli jjacobelli removed the DO NOT MERGE Hold off on merging; see PR for details label Apr 25, 2022
@jjacobelli jjacobelli marked this pull request as ready for review April 25, 2022 09:38
@jjacobelli jjacobelli requested a review from a team as a code owner April 25, 2022 09:38
@jjacobelli jjacobelli force-pushed the remove-from-sources branch from a7e30b3 to 09e49fc Compare April 25, 2022 10:20
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.06@c4d542b). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-22.06    #897   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?      22           
  Lines                   ?    3075           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?    3075           
  Partials                ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4d542b...0d5b7b6. Read the comment docs.

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

The timeout from dask_cuda.tests.test_spill.test_cudf_cluster_device_spill[params0] is being dealt with in #901 .

@pentschev
Copy link
Member

rerun tests

1 similar comment
@pentschev
Copy link
Member

rerun tests

@jjacobelli jjacobelli force-pushed the remove-from-sources branch from 09e49fc to 5d1db0f Compare April 26, 2022 09:14
@pentschev
Copy link
Member

This many timeouts doesn't look normal, even the timeout that was increased in #901 was hit here. @ethyling could you add conda list --show-channels after gpuci_mamba_retry install dask-cuda so I can compare the packages being picked here vs what was being picked before?

@jjacobelli
Copy link
Contributor Author

This many timeouts doesn't look normal, even the timeout that was increased in #901 was hit here. @ethyling could you add conda list --show-channels after gpuci_mamba_retry install dask-cuda so I can compare the packages being picked here vs what was being picked before?

@pentschev PR updated with this

@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from d65e32f to 0d5b7b6 Compare April 27, 2022 11:08
@pentschev
Copy link
Member

I have been trying to reproduce timeouts on CUDA 11.0, driver 450.80.02, just like in CI, for days but have been unsuccessful. I can only imagine at this point that there is something related to the CI machines that make it just a tiny bit slower, in which case we may try to increase timeouts. I will run tests a few more times in CI here to see how it behaves and see what are the tests that may need increased timeouts.

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

Tests that failed in latest run are now resolved by #905, rerunning.

@pentschev
Copy link
Member

rerun tests

2 similar comments
@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

Three successful runs in a row. At this point I would assume we had a hiccup in CI when this was running before and would suggest we move forward as is.

@ethyling is there anything more that needs to be done here or should we consider reviewing/merging this PR?

@jjacobelli
Copy link
Contributor Author

Three successful runs in a row. At this point I would assume we had a hiccup in CI when this was running before and would suggest we move forward as is.

Nice, thank you @pentschev!

@ethyling is there anything more that needs to be done here or should we consider reviewing/merging this PR?

I prefer to wait that all the PRs pass for all the RAPIDS projects before merging it

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a question.

@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from 772733a to b0024db Compare May 10, 2022 11:20
@github-actions github-actions bot added the conda conda issue label May 10, 2022

source:
path: ../../..
git_url: ../../..
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required to prevent this issue:

AssertionError: Can't merge/copy source into subdirectory of itself.  Please create separate spaces for these things.
  src: /workspace
  dst: /workspace/.conda-bld/work

@jjacobelli
Copy link
Contributor Author

rerun tests

1 similar comment
@jjacobelli
Copy link
Contributor Author

rerun tests

@jjacobelli jjacobelli force-pushed the remove-from-sources branch from b0024db to 601f9a7 Compare May 10, 2022 14:00
@jjacobelli jjacobelli force-pushed the remove-from-sources branch from 601f9a7 to 80fd80a Compare May 10, 2022 14:18
@pentschev
Copy link
Member

Failing test is due to dask/distributed#6320 , which was caused by dask/distributed#5910 merge.

@jjacobelli
Copy link
Contributor Author

Failing test is due to dask/distributed#6320 , which was caused by dask/distributed#5910 merge.

Do you know if this will be fixed soon?

@pentschev
Copy link
Member

Sorry @ethyling , I don't know if that is going to be fixed soon. I marked the test to xfail in #908 to unblock CI for now.

@pentschev
Copy link
Member

rerun tests

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work here @ethyling .

@pentschev
Copy link
Member

Feel free to merge if this is ready to go.

@jjacobelli
Copy link
Contributor Author

Thank you for your help with this PR @pentschev!

@jjacobelli
Copy link
Contributor Author

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conda conda issue gpuCI gpuCI issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants