Commit 7de8abd
[SPARK-11500][SQL] Not deterministic order of columns when using merging schemas.
https://issues.apache.org/jira/browse/SPARK-11500
As filed in SPARK-11500, if merging schemas is enabled, the order of files to touch is a matter which might affect the ordering of the output columns.
This was mostly because of the use of `Set` and `Map` so I replaced them to `LinkedHashSet` and `LinkedHashMap` to keep the insertion order.
Also, I changed `reduceOption` to `reduceLeftOption`, and replaced the order of `filesToTouch` from `metadataStatuses ++ commonMetadataStatuses ++ needMerged` to `needMerged ++ metadataStatuses ++ commonMetadataStatuses` in order to touch the part-files first which always have the schema in footers whereas the others might not exist.
One nit is, If merging schemas is not enabled, but when multiple files are given, there is no guarantee of the output order, since there might not be a summary file for the first file, which ends up putting ahead the columns of the other files.
However, I thought this should be okay since disabling merging schemas means (assumes) all the files have the same schemas.
In addition, in the test code for this, I only checked the names of fields.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #9517 from HyukjinKwon/SPARK-11500.
(cherry picked from commit 1bc4112)
Signed-off-by: Cheng Lian <lian@databricks.com>1 parent daa74be commit 7de8abd
3 files changed
Lines changed: 55 additions & 17 deletions
File tree
- sql
- core/src/main/scala/org/apache/spark/sql
- execution/datasources/parquet
- sources
- hive/src/test/scala/org/apache/spark/sql/sources
Lines changed: 23 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
383 | 383 | | |
384 | 384 | | |
385 | 385 | | |
386 | | - | |
| 386 | + | |
387 | 387 | | |
388 | 388 | | |
389 | 389 | | |
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
399 | | - | |
| 399 | + | |
400 | 400 | | |
401 | 401 | | |
402 | 402 | | |
403 | 403 | | |
404 | 404 | | |
405 | | - | |
| 405 | + | |
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
| |||
465 | 465 | | |
466 | 466 | | |
467 | 467 | | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
468 | 485 | | |
469 | 486 | | |
470 | 487 | | |
471 | 488 | | |
472 | 489 | | |
473 | 490 | | |
474 | | - | |
| 491 | + | |
475 | 492 | | |
476 | 493 | | |
477 | 494 | | |
| |||
768 | 785 | | |
769 | 786 | | |
770 | 787 | | |
771 | | - | |
| 788 | + | |
772 | 789 | | |
773 | 790 | | |
774 | | - | |
| 791 | + | |
775 | 792 | | |
776 | 793 | | |
777 | 794 | | |
| |||
Lines changed: 12 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
431 | | - | |
| 431 | + | |
432 | 432 | | |
433 | 433 | | |
434 | 434 | | |
435 | | - | |
| 435 | + | |
436 | 436 | | |
437 | 437 | | |
438 | 438 | | |
| |||
450 | 450 | | |
451 | 451 | | |
452 | 452 | | |
| 453 | + | |
453 | 454 | | |
454 | | - | |
| 455 | + | |
455 | 456 | | |
456 | | - | |
| 457 | + | |
457 | 458 | | |
458 | 459 | | |
459 | 460 | | |
| |||
464 | 465 | | |
465 | 466 | | |
466 | 467 | | |
467 | | - | |
| 468 | + | |
468 | 469 | | |
469 | 470 | | |
470 | 471 | | |
| |||
475 | 476 | | |
476 | 477 | | |
477 | 478 | | |
478 | | - | |
479 | | - | |
| 479 | + | |
| 480 | + | |
480 | 481 | | |
481 | 482 | | |
482 | 483 | | |
| |||
834 | 835 | | |
835 | 836 | | |
836 | 837 | | |
837 | | - | |
| 838 | + | |
838 | 839 | | |
839 | 840 | | |
840 | 841 | | |
| |||
854 | 855 | | |
855 | 856 | | |
856 | 857 | | |
857 | | - | |
| 858 | + | |
858 | 859 | | |
859 | 860 | | |
860 | | - | |
| 861 | + | |
| 862 | + | |
861 | 863 | | |
862 | 864 | | |
Lines changed: 20 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
158 | 177 | | |
0 commit comments