Skip to content

Conversation

@jdye64
Copy link
Collaborator

@jdye64 jdye64 commented Nov 3, 2021

Under certain circumstances importing dask_cuda after any dask_sql imports will cause an underlying segfault to be triggered in the JVM. In tests thus far adjusting that import order in context.py prevents that from happening. Subsequent PRs may be required after further testing.

@jdye64 jdye64 marked this pull request as draft November 3, 2021 16:42
@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2021

Codecov Report

Merging #294 (1ba9a26) into main (e48d9c1) will decrease coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #294      +/-   ##
==========================================
- Coverage   95.99%   95.89%   -0.11%     
==========================================
  Files          64       65       +1     
  Lines        2797     2800       +3     
  Branches      421      418       -3     
==========================================
  Hits         2685     2685              
- Misses         71       73       +2     
- Partials       41       42       +1     
Impacted Files Coverage Δ
dask_sql/context.py 99.09% <ø> (+<0.01%) ⬆️
dask_sql/physical/utils/groupby.py 100.00% <100.00%> (ø)
dask_sql/physical/utils/sort.py 83.33% <0.00%> (-7.06%) ⬇️
dask_sql/physical/rel/custom/__init__.py 100.00% <0.00%> (ø)
dask_sql/physical/rel/custom/distributeby.py 86.36% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77f1d87...1ba9a26. Read the comment docs.

@jdye64
Copy link
Collaborator Author

jdye64 commented Nov 3, 2021

I plan to add a pytest with some of the common occurrences where we were seeing the segfault occur so we can guard against those in the future. I'm just compiling that list offline but will post a pytest in the coming days.

@charlesbluca charlesbluca mentioned this pull request Nov 4, 2021
@jdye64
Copy link
Collaborator Author

jdye64 commented Nov 4, 2021

rerun tests

@charlesbluca
Copy link
Collaborator

Cool, looks like this triggers the segfault! Is there any way we can wrap this up in a GPU-only test?

@jdye64
Copy link
Collaborator Author

jdye64 commented Nov 4, 2021

Cool, looks like this triggers the segfault! Is there any way we can wrap this up in a GPU-only test?

I think it is actually failing to find dask_cuda still

@charlesbluca
Copy link
Collaborator

Checking the gpuCI run, it looks like dask-cuda is there in the conda list step, so I'm assuming it being imported after dask_sql.Context is what's causing the segfault here. However it is failing to find the package in the host tests since we don't install any GPU requirements there.

Would we need to add dask-cuda to the host test requirements, or is there a way to contain the conditions for this segfault (i.e. the dask-cuda import) in a test that can be disabled on host?

@jdye64
Copy link
Collaborator Author

jdye64 commented Nov 4, 2021

Ok good news. So I commented out the fix in commit and CI failed. This is actually good. Then I un-commented the fix out again in commit and it works. Therefore I am satisfied this at least solves the segfault issue we are interested in.

@jdye64 jdye64 marked this pull request as ready for review November 8, 2021 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants