[Automated] Merge 3.2.3.4-2 into ODP-3.2.x-main #89

JeffreySmith · 2025-11-21T20:56:30Z

No description provided.

…cumentation Followup PR to change JRE version from 8 to 11 Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 9fc1e05) Signed-off-by: Hyukjin Kwon <[email protected]>

…cumentation Followup PR to change JRE version from 8 to 17 Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 9fc1e05) Signed-off-by: Hyukjin Kwon <[email protected]>

Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 9fc1e05) Signed-off-by: Hyukjin Kwon <[email protected]>

…nnect notebook This is a followup of apache#47883 that adds manual `source ~/.profile`. Ever since we switched to `Dockerfile`, none of `./profile`, `/.bashrc`, `./bash_profile`, etc seems working. There are a couple of related issues in Jupyter but I cannot figure it out. This is the only cell it needs the environment variable so decided to simply work around. No. Manually tested. No. Closes apache#47902 from HyukjinKwon/SPARK-49402-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 1c9cde5) Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit df07aa7) Signed-off-by: Hyukjin Kwon <[email protected]>

…without codegen This is a re-submitting of apache#43938 to fix a join correctness bug caused by apache#41398 . Credits go to mcdull-zhang correctness fix Yes, the query result will be corrected. new test no Closes apache#47905 from cloud-fan/join. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit af5e0a2) Signed-off-by: Wenchen Fan <[email protected]>

…uffle corruption diagnose #### What changes were proposed in this pull request? port to 3.5 for [[SPARK-43242](https://issues.apache.org/jira/browse/SPARK-43242)][CORE] Fix throw 'Unexpected type of BlockId' in shuffle corruption diagnose #### Why are the changes needed? 3.5 conflict with PR in master, see end of discussion apache#40921 #### Does this PR introduce any user-facing change? No #### How was this patch tested? Existing tests Closes apache#47910 from CavemanIV/port3.5-SPARK-43242. Authored-by: zhangliang <[email protected]> Signed-off-by: Yi Wu <[email protected]>

### What changes were proposed in this pull request? Add `artifacts` to `.gitignore` ### Why are the changes needed? ``` bin/spark-shell --remote "local[*]" ``` generates a lot of files in it ``` (spark_dev_312) ➜ spark git:(master) ✗ git status On branch master Your branch is ahead of 'origin/master' by 1386 commits. (use "git push" to publish your local commits) Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd0$.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd0$Helper.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd0.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd1$.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd1$Helper.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd1.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd2$.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd2$Helper.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd2.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd9999999$.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/85157252-6f8a-46b3-ab42-585c70184d08/classes/ammonite/$sess/cmd9999999$Helper.class new file: artifacts/spark-37fc351b-0207-4957-ac39-5b23ae672c0c/8515 ``` ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? manually check ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47936 from zhengruifeng/infra_artifacts. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit df42568) Signed-off-by: Kent Yao <[email protected]>

…er.isInternalError` ### What changes were proposed in this pull request? Handle null input for `SparkThrowableHelper.isInternalError` method. ### Why are the changes needed? The `SparkThrowableHelper.isInternalError` method doesn't handle null input, and it could lead to NullPointerException. It happens when a `SparkException` without `errorClass` is invoked `isInternalError`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add 2 assertions to current test cases to cover this issue. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47946 from jshmchenxi/SPARK-49480/null-pointer-is-internal-error. Authored-by: Xi Chen <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit cef3c86) Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Fix the nullability of the `Base64` expression to be based on the child's nullability, and not always be nullable. ### Why are the changes needed? apache#47303 had a side effect of changing the nullability by the switch to using `StaticInvoke`. This was also backported to Spark 3.5.2 and caused schema mismatch errors for stateful streams when we upgraded. This restores the previous behavior which is supported by StaticInvoke through the `returnNullable` argument. If the child is non-nullable, we know the result will be non-nullable. ### Does this PR introduce _any_ user-facing change? Restores the nullability of the `Base64` expression to what is was in Spark 3.5.1 and earlier. ### How was this patch tested? New UT ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47941 from Kimahriman/base64-nullability. Lead-authored-by: Adam Binford <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit c274c5a) Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? Fix a test that is failing from backporting apache#47941 ### Why are the changes needed? Fix test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Fixed test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47964 from Kimahriman/base64-proto-test. Authored-by: Adam Binford <[email protected]> Signed-off-by: Kent Yao <[email protected]>

### What changes were proposed in this pull request? This is a cherry-pick of apache#47796. The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47959 from chenhao-db/fix_xpath_nullness_3.5. Authored-by: Chenhao Li <[email protected]> Signed-off-by: Max Gekk <[email protected]>

Fix site.SPARK_VERSION pattern in RDD Programming Guide. I found this when I was developing apache#47968 doc fix no doc build no Closes apache#47985 from yaooqinn/version. Authored-by: Kent Yao <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 90a236e) Signed-off-by: Hyukjin Kwon <[email protected]>

…yteBuffer.allocateDirect` This PR aims to use `Platform.allocateDirectBuffer` instead of `ByteBuffer.allocateDirect`. apache#47733 (review) Allocating off-heap memory should use the `allocateDirectBuffer` API provided `by Platform`. No GA No Closes apache#47987 from cxzl25/SPARK-49509. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 2ed6c3e) Signed-off-by: Dongjoon Hyun <[email protected]>

In `Dataset#toJSON`, use the schema from `exprEnc`. This schema reflects any changes (e.g., decimal precision, column ordering) that `exprEnc` might make to input rows. `Dataset#toJSON` currently uses the schema from the logical plan, but that schema does not necessarily describe the rows passed to `JacksonGenerator`: the function passed to `mapPartitions` uses `exprEnc` to serialize the input, and this could potentially change the precision on decimals or rearrange columns. Here's an example that tricks `UnsafeRow#getDecimal` (called from `JacksonGenerator`) to mistakenly assume the decimal is stored as a Long: ``` scala> case class Data(a: BigDecimal) class Data scala> sql("select 123.456bd as a").as[Data].toJSON.collect warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` val res0: Array[String] = Array({"a":68719476.745}) scala> ``` Here's an example that tricks `JacksonGenerator` to ask for a string from an array and an array from a string. This case actually crashes the JVM: ``` scala> case class Data(x: Array[Int], y: String) class Data scala> sql("select repeat('Hey there', 17) as y, array_repeat(22, 17) as x").as[Data].toJSON.collect warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` Exception in task 0.0 in stage 0.0 (TID 0) java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5(JacksonGenerator.scala:129) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5$adapted(JacksonGenerator.scala:128) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArrayData(JacksonGenerator.scala:258) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23(JacksonGenerator.scala:201) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArray(JacksonGenerator.scala:249) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] ... at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) bash-3.2$ ``` Both these cases work correctly without `toJSON`. Before the PR, converting the dataframe to a dataset of Tuple would preserve the column names in the JSON strings: ``` scala> sql("select 123.456d as a, 12 as b").as[(Double, Int)].toJSON.collect warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` val res0: Array[String] = Array({"a":123.456,"b":12}) scala> ``` After the PR, the JSON strings use the field name from the Tuple class: ``` scala> sql("select 123.456d as a, 12 as b").as[(Double, Int)].toJSON.collect warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` val res1: Array[String] = Array({"_1":123.456,"_2":12}) scala> ``` New tests. No. Closes apache#47982 from bersprockets/to_json_issue. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5375ce2) Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? In ProjectingInternalRow, accessing colOrdinals causes poor performace. Replace colOrdinals with the IndexedSeq type. ### Why are the changes needed? Replace colOrdinals with the IndexedSeq type. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need to add UT ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47890 from wzx140/project-row-fix. Lead-authored-by: wzx <[email protected]> Co-authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 37f2fa9) Signed-off-by: Kent Yao <[email protected]>

…se V1 commands ### What changes were proposed in this pull request? This is a followup of apache#47660 . If users override `spark_catalog` with `DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS. This PR also forces to use v1 commands for certain commands that do not have a v2 version. ### Why are the changes needed? Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test case ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47995 from amaliujia/fix_catalog_v2. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Rui Wang <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit f7cfeb5) Signed-off-by: Wenchen Fan <[email protected]>

…be changed by falling back to v1 command This is a followup of apache#47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to apache#47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior. Behavior regression. No UT No Closes apache#48019 from amaliujia/regress_v2. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 37b39b4) Signed-off-by: Wenchen Fan <[email protected]>

Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call. In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars. Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`: ``` spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table")) ``` Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'` Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped. Updated existing test in `CatalogSuite`. No Closes apache#47976 from cstavr/location-double-escaping. Authored-by: Christos Stavrakakis <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit dc3333b) Signed-off-by: Wenchen Fan <[email protected]>

…-2021-0341

…ent. (#40) * ODP-2118: Hudi, DeltaLake, Iceberg version upgrade for open table clients. * ODP-2118: Delta spark version fix * ODP-2118: delta-spark and iceberg jars scala version fix. * ODP-2118: iceberg jars scala version fix.

…ent. ### What changes were proposed in this pull request? Update `kubernetes-client` from 6.10.0 to 6.11.0 ### Why are the changes needed? [Release notes for 6.11.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.11.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45707 from bjornjorgensen/kub-client6.11.0. Authored-by: Bjørn Jørgensen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 7b9b3cb) (cherry picked from commit 06e2b2e)

### What changes were proposed in this pull request? Update `kubernetes-client` from 6.10.0 to 6.11.0 ### Why are the changes needed? [Release notes for 6.11.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.11.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45707 from bjornjorgensen/kub-client6.11.0. Authored-by: Bjørn Jørgensen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 7b9b3cb) (cherry picked from commit 06e2b2e)

### What changes were proposed in this pull request? This PR aims to upgrade `Parquet` to 1.15.2. ### Why are the changes needed? To bring the latest bug fixes. - https://parquet.apache.org/blog/2025/05/01/1.15.2/ - https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.15.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50755 from dongjoon-hyun/SPARK-51950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 15732fc)

…apeUtils

…23-39410 and CVE-2024-47561" This reverts commit 023f1a9

… fix CVE-2021-0341" This reverts commit 086a934.

…s-client from `6.x` to `7.x`" This reverts commit 9b17fca

…n project structure" This reverts commit 4006559

…ager based on Hadoop's Abortable interface" This reverts commit b89d077

…ions to PartitionedFileUtil API to reduce memory requirements" This reverts commit 23637fe.

…025-55163

…22-45685 (#88)

HyukjinKwon and others added 30 commits August 27, 2024 17:38

[SPARK-49402][PYTHON][FOLLOW-UP] Set upperfound for NumPy

93fedc5

Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 9fc1e05) Signed-off-by: Hyukjin Kwon <[email protected]>

Preparing Spark release v3.5.3-rc1

a1cd99d

Preparing development version 3.5.4-SNAPSHOT

0616ef2

Preparing Spark release v3.5.3-rc2

1f8c71f

Preparing development version 3.5.4-SNAPSHOT

46214da

Preparing Spark release v3.5.3-rc3

6292cfc

Preparing development version 3.5.4-SNAPSHOT

e923790

fix import

ba374c6

Preparing Spark release v3.5.3-rc3

32232e9

Preparing development version 3.5.4-SNAPSHOT

67421bb

senthh and others added 28 commits September 2, 2025 15:56

ODP-3791 [Spark3][CVE] - Increase okhttp version to 4.12.0 to fix CVE…

086a934

…-2021-0341

ODP-3913: Change distribution management to point to staging repo

cb0ad94

ODP-4422 Add deltalake profile and update Open Table Format versions

a964bdf

ODP-4446 - CVE - Upgrade derby to 10.14.3 to fix CVE-2022-46337

7509730

ODP-1603:Could not initialize class org.apache.commons.text.StringEsc…

06edbc0

…apeUtils

ODP-780 Update libthrift from 0.12.0 to 0.14.1 for Hive 3.1.4 connection

b4b7185

ODP-4858 Fix Hudi/delta sql queries issue cuased by antlr version (#85)

d368c8b

Revert "ODP-2837 - Upgrading avro version to 1.12.0 for fixing CVE-20…

1f774c1

…23-39410 and CVE-2024-47561" This reverts commit 023f1a9

Revert "ODP-3791 [Spark3][CVE] - Increase okhttp version to 4.12.0 to…

d348ff5

… fix CVE-2021-0341" This reverts commit 086a934.

Revert "ODP-3237: [SPARK-50493][SPARK-37687][BUILD] Migrate kubernete…

4e907f4

…s-client from `6.x` to `7.x`" This reverts commit 9b17fca

Revert "[SPARK-43880][BUILD] Organize hadoop-cloud in standard mave…

51a8f25

…n project structure" This reverts commit 4006559

Revert "[SPARK-40039][SS] Introducing a streaming checkpoint file man…

b727972

…ager based on Hadoop's Abortable interface" This reverts commit b89d077

ODP-5029 Handle Iceberg version matching JDK8 and enforceByteCode error

16c6436

ODP-5105|ODP-5004 Revert "[SPARK-51185][CORE][3.5] Revert simplificat…

47a57f0

…ions to PartitionedFileUtil API to reduce memory requirements" This reverts commit 23637fe.

ODP-5106 Include Gluten as profile in Spark 3.5.5.3.2.3.4*

542716c

OSV-6034 - CVE - Increase netty version to 4.1.127.Final in fix CVE-2…

c0f072c

…025-55163

OSV-6096 - CVE - Increase jdom2 version to 2.0.6.1 in fix CVE-2021-33813

8f78988

Update build-version to ODP-3.2.3.4-2001

f1f09b9

Update build-version to ODP-3.2.3.4-2002

03bbd36

OSV-7929 - Mention jettison version explicitly as 1.5.4 to fix CVE-20…

2ce0536

…22-45685 (#88)

Update build-version to ODP-3.2.3.4-2003

94baecd

Update build-version to ODP-3.2.3.4-2

50e06e2

Point repositories and distributionManagement to odp-central/release

a1c6ae1

[Automated] Changed version from '3.2.3.4-2' to '3.2.3.4-SNAPSHOT'

39c4231

shubhluck closed this Dec 9, 2025

shubhluck deleted the rel/ODP-3.2.3.4-2 branch December 9, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Automated] Merge 3.2.3.4-2 into ODP-3.2.x-main #89

[Automated] Merge 3.2.3.4-2 into ODP-3.2.x-main #89

Uh oh!

JeffreySmith commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[Automated] Merge 3.2.3.4-2 into ODP-3.2.x-main #89

[Automated] Merge 3.2.3.4-2 into ODP-3.2.x-main #89

Uh oh!

Conversation

JeffreySmith commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants