- [x] #3245 - [x] Fix temp storage computation when using non-double buffer invocation - [x] Fix test failures for large segments - [x] Add support for large number of segments - [x] Add benchmarks for `DeviceSegmentedRadixSort`