Commit cfe012a
[SPARK-32629][SQL] Track metrics of BitSet/OpenHashSet in full outer SHJ
### What changes were proposed in this pull request?
This is followup from #29342, where to do two things:
* Per #29342 (comment), change from java `HashSet` to spark in-house `OpenHashSet` to track matched rows for non-unique join keys. I checked `OpenHashSet` implementation which is built from a key index (`OpenHashSet._bitset` as `BitSet`) and key array (`OpenHashSet._data` as `Array`). Java `HashSet` is built from `HashMap`, which stores value in `Node` linked list and by theory should have taken more memory than `OpenHashSet`. Reran the same benchmark query used in #29342, and verified the query has similar performance here between `HashSet` and `OpenHashSet`.
* Track metrics of the extra data structure `BitSet`/`OpenHashSet` for full outer SHJ. This depends on above thing, because there seems no easy way to get java `HashSet` memory size.
### Why are the changes needed?
To better surface the memory usage for full outer SHJ more accurately.
This can help users/developers to debug/improve full outer SHJ.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added unite test in `SQLMetricsSuite.scala` .
Closes #29566 from c21/add-metrics.
Authored-by: Cheng Su <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>1 parent ccc0250 commit cfe012a
3 files changed
Lines changed: 73 additions & 25 deletions
File tree
- sql/core/src
- main/scala/org/apache/spark/sql/execution/joins
- test/scala/org/apache/spark/sql/execution/metric
Lines changed: 15 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | 22 | | |
25 | 23 | | |
26 | 24 | | |
| |||
31 | 29 | | |
32 | 30 | | |
33 | 31 | | |
34 | | - | |
| 32 | + | |
35 | 33 | | |
36 | 34 | | |
37 | 35 | | |
| |||
136 | 134 | | |
137 | 135 | | |
138 | 136 | | |
139 | | - | |
| 137 | + | |
140 | 138 | | |
141 | 139 | | |
142 | | - | |
| 140 | + | |
143 | 141 | | |
144 | 142 | | |
145 | 143 | | |
| |||
150 | 148 | | |
151 | 149 | | |
152 | 150 | | |
153 | | - | |
154 | | - | |
155 | 151 | | |
| 152 | + | |
156 | 153 | | |
157 | 154 | | |
158 | 155 | | |
| |||
198 | 195 | | |
199 | 196 | | |
200 | 197 | | |
201 | | - | |
| 198 | + | |
202 | 199 | | |
203 | 200 | | |
204 | 201 | | |
205 | | - | |
| 202 | + | |
206 | 203 | | |
207 | 204 | | |
208 | 205 | | |
| |||
218 | 215 | | |
219 | 216 | | |
220 | 217 | | |
221 | | - | |
222 | | - | |
223 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
224 | 227 | | |
225 | 228 | | |
226 | 229 | | |
| |||
Lines changed: 39 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
363 | 364 | | |
364 | 365 | | |
365 | 366 | | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
366 | 402 | | |
367 | 403 | | |
368 | 404 | | |
| |||
686 | 722 | | |
687 | 723 | | |
688 | 724 | | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | 725 | | |
700 | 726 | | |
701 | 727 | | |
| |||
706 | 732 | | |
707 | 733 | | |
708 | 734 | | |
709 | | - | |
710 | | - | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
711 | 738 | | |
712 | 739 | | |
Lines changed: 19 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
257 | 275 | | |
258 | 276 | | |
259 | 277 | | |
| |||
0 commit comments