Use cuco::static_set in the hash-based groupby#14813
Use cuco::static_set in the hash-based groupby#14813rapids-bot[bot] merged 27 commits intorapidsai:branch-24.04from
Conversation
6213d8e to
166ed49
Compare
…428) This PR fixes a bug in the OA base class where the erased key sentinel value should have been initialized by the empty key sentinel if not specified. Tests are updated to exercise this issue. Needed by rapidsai/cudf#14813
|
Question to reviewers: is it worth doing runtime dispatching based on whether nested types are involved or not? Benchmark results are shown below, TLDR:
Based on our past tuning experience, e.g.: cudf/cpp/src/search/contains_table.cu Lines 158 to 165 in 3ba63c3 CGSize == 1 for all types.
groupby_max[0] Quadro RTX 8000
groupby_struct_keys[0] Quadro RTX 8000
|
|
@vyasr Thanks for the review, have you reviewed both cpp and python or do I need another cpp/python approval to merge the PR? |
|
I reviewed both. I think the main question on the Python side would be whether we need to update the docs in https://github.com/rapidsai/cudf/blob/branch-24.04/docs/cudf/source/user_guide/pandas-comparison.md#result-ordering to also mention |
|
I think we should do that, yes (this PR is then a breaking change) |
| using probing_scheme_type = cuco::linear_probing< | ||
| 1, ///< Number of threads used to handle each input key |
There was a problem hiding this comment.
@PointKernel You mentioned:
Based on our past tuning experience, e.g.:
cudf/cpp/src/search/contains_table.cu
using a larger CG size (like 2 or 4) for nested types can bring significant speedups compared to the current CGSize == 1 for all types.
Do we need to adopt that here? Do we need an issue or TODO describing that optimization?
There was a problem hiding this comment.
Do we need to adopt that here?
Probably not. That requires nontrivial changes to the current code and the benefit is unclear, i.e., no users actually complained about the groupby performance with nested types. I'm inclined to look into it until we have the bandwidth or someone raises an issue about it. Does it sound good to you?
There was a problem hiding this comment.
I’d like some kind of note in the code or an issue to make sure that we are aware of this optimization potential for the future. Otherwise, no action needed in terms of implementation.
|
Also CC @mroeschke @galipremsagar we may have to introduce an extra sorting in pandas compatibility mode due to the ordering change in this PR for cudf.pandas to behave as expected. No need to make any changes yet, just something to keep on your radars. |
|
/merge |
…#428) This PR fixes a bug in the OA base class where the erased key sentinel value should have been initialized by the empty key sentinel if not specified. Tests are updated to exercise this issue. Needed by rapidsai/cudf#14813
Description
Depends on #14849
Contributes to #12261
This PR migrates hash groupby to use the new
cuco::static_setdata structure. It doesn't change any existing libcudf behavior but uncovers the fact that the cudf pythonvalue_countsdoesn't guarantee output orders thus the PR becomes a breaking change.Checklist