Correctly align trees in experimental FIL#6397
Correctly align trees in experimental FIL#6397rapids-bot[bot] merged 37 commits intorapidsai:branch-25.04from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
hcho3
left a comment
There was a problem hiding this comment.
LGTM. I really like how you defined a good set of abstractions and primitives for tree traversal.
|
cudf-pandas failures are unrelated. One is a recently-introduced issue that is being worked on separately. The other is a flaky issue that has been around for sometime, is related to original FIL, and will probably be eliminated by promoting new FIL to stable. |
jakirkham
left a comment
There was a problem hiding this comment.
Thanks Will! 🙏
Had a question on the OpenMP addition below
| # clang 15 required by libcudacxx. | ||
| - clang==15.0.7 | ||
| - clang-tools==15.0.7 | ||
| - llvm-openmp==15.0.7 |
There was a problem hiding this comment.
Is there more context on how OpenMP is used in the clang-tidy step?
Is OpenMP needed at build and runtime as well?
There was a problem hiding this comment.
Ah, my apologies! Should have addressed that in the PR description, but I forgot to update it. cuML should be compilable and runnable with or without OpenMP. The problem is that when we clang-tidy, we fairly naively forward our build flags on to clang-tidy itself, including -fopenmp, which sets the _OPENMP compilation definition.
For actual compilation, this is fine, since gcc compilation makes omp.h available by default. That's no longer true with llvm, so we need to either explicitly make that header available to clang-tidy or remove the -fopenmp flag from the clang-tidy invocation. Following offline discussion with @robertmaynard, I opted for providing the header. That will ensure that we're tidying OpenMP-only code paths.
This is the first PR that exposes this problem because we do not clang-tidy our .cu files, and this is the first time a .cpp file required OpenMP.
|
/merge |




Due to a bug in the import code, experimental FIL was previously not making use of the
align_bytesargument correctly. The effect was not just a failure to take advantage of cache line boundaries but a severe pessimization in which padding nodes were inserted in the forest structure at highly non-optimal places.This PR corrects this, resulting in a substantial performance improvement. It also introduces the
layeredlayout type, in which nodes of the same depth are stored together. This allows for a moderate performance improvement in some models. It also allows CPU FIL to intelligently set the number of threads rather than accepting the highly non-optimal default. This provides a significant performance improvement for small batch size.