Commit 233dad3
authored
Optimize partition_validity function used in sort kernels (#7937)
# Which issue does this PR close?
Optimize `partition_validity` function used in sort kernels
- Preallocate vectors based on known null counts
- Avoid dynamic dispatch by calling `NullBuffer::is_valid` instead of
`Array::is_valid`
- Avoid capacity checks inside loop by writing to `spare_capacity_mut`
instead of using `push`
- Closes #7936.
# Rationale for this change
Microbenchmark results for `sort_kernels` compared to `main`, only
looking at benchmarks matching "nulls to indices":
```
sort i32 nulls to indices 2^10
time: [4.9325 µs 4.9370 µs 4.9422 µs]
change: [−20.303% −20.133% −19.974%] (p = 0.00 < 0.05)
Performance has improved.
sort i32 nulls to indices 2^12
time: [20.096 µs 20.209 µs 20.327 µs]
change: [−26.819% −26.275% −25.697%] (p = 0.00 < 0.05)
Performance has improved.
sort f32 nulls to indices 2^12
time: [26.329 µs 26.366 µs 26.406 µs]
change: [−29.487% −29.331% −29.146%] (p = 0.00 < 0.05)
Performance has improved.
sort string[0-10] nulls to indices 2^12
time: [70.667 µs 70.762 µs 70.886 µs]
change: [−20.057% −19.935% −19.819%] (p = 0.00 < 0.05)
Performance has improved.
sort string[0-100] nulls to indices 2^12
time: [101.98 µs 102.44 µs 102.99 µs]
change: [−0.3501% +0.0835% +0.4918%] (p = 0.71 > 0.05)
No change in performance detected.
sort string[0-400] nulls to indices 2^12
time: [84.952 µs 85.024 µs 85.102 µs]
change: [−5.3969% −4.9827% −4.6421%] (p = 0.00 < 0.05)
Performance has improved.
sort string[10] nulls to indices 2^12
time: [72.486 µs 72.664 µs 72.893 µs]
change: [−14.937% −14.781% −14.599%] (p = 0.00 < 0.05)
Performance has improved.
sort string[100] nulls to indices 2^12
time: [71.354 µs 71.606 µs 71.902 µs]
change: [−17.207% −16.795% −16.373%] (p = 0.00 < 0.05)
Performance has improved.
sort string[1000] nulls to indices 2^12
time: [73.088 µs 73.195 µs 73.311 µs]
change: [−16.705% −16.599% −16.483%] (p = 0.00 < 0.05)
Performance has improved.
sort string_view[10] nulls to indices 2^12
time: [32.592 µs 32.654 µs 32.731 µs]
change: [−15.722% −15.512% −15.310%] (p = 0.00 < 0.05)
Performance has improved.
sort string_view[0-400] nulls to indices 2^12
time: [32.981 µs 33.074 µs 33.189 µs]
change: [−25.570% −25.132% −24.700%] (p = 0.00 < 0.05)
Performance has improved.
sort string_view_inlined[0-12] nulls to indices 2^12
time: [28.467 µs 28.496 µs 28.529 µs]
change: [−22.978% −22.786% −22.574%] (p = 0.00 < 0.05)
Performance has improved.
sort string[10] dict nulls to indices 2^12
time: [94.463 µs 94.503 µs 94.542 µs]
change: [−11.386% −11.165% −10.961%] (p = 0.00 < 0.05)
Performance has improved.
```
# Are these changes tested?
Covered by existing tests
# Are there any user-facing changes?
No, the method is internal to the sort kernels.1 parent d0fa24e commit 233dad3
1 file changed
+34
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
189 | 215 | | |
| 216 | + | |
| 217 | + | |
190 | 218 | | |
191 | 219 | | |
192 | 220 | | |
| |||
0 commit comments