Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
6f09ff5 to
123fc37
Compare
eb72c88 to
c3432de
Compare
|
/ok to test |
c3432de to
1f6f315
Compare
|
/ok to test |
|
Current status:
Current APIs are based on dispatch signature: cudaError_t Dispatch(
void *temp_storage,
size_t &temp_storage_bytes,
InputIteratorT d_in,
OutputIteratorT d_out,
int64_t num_segments,
BeginOffsetsIteratorInputT d_in_begin_offsets,
EndOffsetsIteratorInputT d_in_end_offsets,
BeginOffsetsIteratorOutputT d_out_begin_offsets,
ScanOpT scan_op,
InitialValueT init_value,
cudaStream_t stream = 0);Common situations is when In general, when To compute We could provide an API where |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
da9743a to
6082574
Compare
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
/ok to test |
1 similar comment
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
6b982d1 to
ca360ee
Compare
826c40f to
ba9ffc3
Compare
Add file test/catch2_test_segmented_scan_api.cu which several examples of using two-offsets and three-offsets APIs. Some of these examples are references from docstrings.
Use examples from catch2_test_segmented_scan_api.cu using literalinclude in docstrings of DeviceSegmentedScan algorithm.
The std::exclusive_sum is not available in numeric header in GCC7.
This works around an issue with CTK 13.0 support for extended lambdas and CCCL testing harness.
Adhere to C++ coding style. Renamed dispatch/kernels/segmented_scan.cuh to dispatch/kernels/kernel_segmented_scan.cuh to follow along the change upstream
Reimplemented tuning for segmented scan based on excelled review suggestion to set AgentSegmentedScanPolicyT template values based on AgentScanPolicyT instance extracted from the same node in chain of policies. The assumption is that the chain detail::segmented_scan::policy_hub is the same as the chain in detail::scan::policy_hub.
When AgentSegmentedScanPolicyT is built from parameters of scan_hub::PolicyXY::ScanPolicyT that was already mem-bound scaled, we should not apply membound scaling again. So cub::detail::segmented_scan::policy_hub implements private NoScaling and applies it in translate_agent helper type.
Per PR review feedback, use prefix operator to track exclusive prefix value accross iterations. Improved readibility of iteration loop by introducing ScanFirstTile and ScanLaterTile private methods that hide away if constexpr logic. While working on the refactoring, fixed a bug refactoring uncovered, where exclusive_prefix computed for ExclusiveScan was not accounting for init_value.
Instead start with default single setting policy.
8cfc7d9 to
c780fba
Compare
🥳 CI Workflow Results🟩 Finished in 4h 35m: Pass: 100%/81 | Total: 3d 06h | Max: 4h 35m | Hits: 79%/75140See results here. |
Description
closes #5598
Checklist