Commit fc324c5
Parquet exporter handle optional fields (#1024)
part of #863
Because some OTAP fields are optional, in a stream of record batches we
may receive subsequent batches with different schemas. Parquet doesn't
support having row groups with different sets of column chunks, which
means we need to know the schema a-priori when the writer is created.
This PR adds code to normalize the schema of the record batch before
writing by:
- putting all the fields in the same order
- creating all null/default value columns for any missing column
The missing columns should have a small overhead when written to disk,
because parquet will either write an entirely empty column chunk for the
null column (all null count, no data), or and for all default-value
columns, parquet will use dictionary and RLE encoding by default,
leading to a small column chunk with a single value value in dict & a
single run for the key.
What's unfortunate is that we still materialize an all-null column
before writing with the length of the record batch. This can be
optimized when run-end encoded arrays are supported in parquet, because
we could just create a run array with a single run of null/default
value. The arrow community is currently working on adding support (see
apache/arrow-rs#7713 &
apache/arrow-rs#8069).
---------
Co-authored-by: Laurent Quérel <[email protected]>1 parent 0d3422f commit fc324c5
File tree
7 files changed
+1032
-15
lines changed- rust
- otap-dataflow
- crates/otap
- src
- parquet_exporter
- otel-arrow-rust/src
7 files changed
+1032
-15
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
64 | 65 | | |
65 | 66 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| 49 | + | |
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| 53 | + | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| |||
189 | 192 | | |
190 | 193 | | |
191 | 194 | | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
192 | 204 | | |
193 | 205 | | |
194 | 206 | | |
| |||
409 | 421 | | |
410 | 422 | | |
411 | 423 | | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
412 | 485 | | |
413 | 486 | | |
414 | 487 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
Lines changed: 6 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 19 | + | |
28 | 20 | | |
29 | 21 | | |
30 | 22 | | |
| |||
61 | 53 | | |
62 | 54 | | |
63 | 55 | | |
64 | | - | |
| 56 | + | |
65 | 57 | | |
66 | 58 | | |
67 | 59 | | |
68 | 60 | | |
69 | | - | |
| 61 | + | |
70 | 62 | | |
71 | 63 | | |
72 | 64 | | |
| |||
157 | 149 | | |
158 | 150 | | |
159 | 151 | | |
160 | | - | |
| 152 | + | |
161 | 153 | | |
162 | 154 | | |
163 | 155 | | |
| |||
172 | 164 | | |
173 | 165 | | |
174 | 166 | | |
175 | | - | |
| 167 | + | |
176 | 168 | | |
177 | 169 | | |
178 | 170 | | |
179 | 171 | | |
180 | 172 | | |
181 | 173 | | |
182 | 174 | | |
183 | | - | |
| 175 | + | |
184 | 176 | | |
185 | 177 | | |
186 | 178 | | |
| |||
0 commit comments