-
Notifications
You must be signed in to change notification settings - Fork 351
Labels
cubFor all items related to CUBFor all items related to CUB
Description
As a lower-level interface, CUB should optimize for flexibility and performance. As a result, CUB will not guarantee a large input will work by default. However, it should enable users to specify their desired offset type.
This means CUB should not perform any dynamic dispatch based on the input size. Instead, users should have a way to statically specify the offset type. In previous discussion we favored making the type of num_items a template and infer the offset type from the type of num_items.
Design-related research
- Identify CUB algorithms that do/do not support large inputs #1408
- Determine and finalize design for large input support in CUB #1454
Testing large number of items
- Add tests for large number of items & large number of segments to
DeviceSegmentedRadixSort#2139 - Add tests for large number of items & large number of segments to
DeviceSegmentedSort#2140 - Add tests for large number of items in
DeviceScan::*ByKey
Enable large num_items in CUB algorithms that are sensitive to the choice of offset_t
- Add support for large
num_itemstodevice_scan.cuh#2062 - Add support for large
num_itemstoDeviceScan::*ByKey#2458 - Add support for large
num_itemstodevice_select.cuh#1422 - Add support for large
num_itemstodevice_partition.cuh#1437 - Add support for large
num_itemstoDevicePartition::ThreeWayPartition#2442 - Add support for large
num_itemstoDeviceReduce::{ArgMin,ArgMax}#2515 - Add support for large
num_itemstocub::DeviceTransform#2714 - Add support for large number of items and large number of segments to
device_segmented_sort.cuh#3132 - Add support for large
num_itemstodevice_merge.cuh#3134 - Add support for large
num_itemstodevice_segmented_radix_sort.cuh#3133 - Add support for large
num_itemstoDeviceSegmentedReduce::{ArgMin,ArgMax}#2522 - Add support for large number of segments to
DeviceSegmentedReduce#3242 - Add support for large
num_itemstoDeviceRunLengthEncode::Encode#2521 - Add support for large
num_itemstoDeviceRunLengthEncode::NonTrivialRuns#2520 - Add support for large number of ranges to
Device{Copy,Memcpy}::Batched#3622 -
Add support for largenum_itemstodevice_spmv.cuh
Limit the number of kernel template instantiations by reducing the set of offset types
Clean up interim testing infrastructure
- Switch tests for large number of items to use
Device*interface inDeviceSelect - Switch tests for large number of items to use
Device*interface inDeviceScan
Documentation
- We want to be explicit about supported and tested offset types in the
Dispatchinterface (see Portthrust::merge[_by_key]to CUB #1817 (comment))
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
cubFor all items related to CUBFor all items related to CUB
Type
Projects
Status
In Progress