Skip to content

Conversation

@abstractdog
Copy link
Contributor

No description provided.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@ayushtkn
Copy link
Member

@abstractdog do you have pointers like which ticket removed it in Hadoop, I was checking the history I found only tickets around upgrade

@abstractdog
Copy link
Contributor Author

@abstractdog do you have pointers like which ticket removed it in Hadoop, I was checking the history I found only tickets around upgrade

good point, I need to check this

@abstractdog
Copy link
Contributor Author

abstractdog commented Jun 30, 2025

weird things, here is what I got as verbose dependency trees, full examples attached to jira (extracted only bcprov parts for clarity)

hadoop 3.4.0 (hadoop-common)

[INFO] +- org.bouncycastle:bcprov-jdk15on:jar:1.70:compile

tez on hadoop 3.4.0 (tez-api)

[INFO] +- org.apache.hadoop:hadoop-common:jar:3.4.0:compile
[INFO] |  +- org.bouncycastle:bcprov-jdk15on:jar:1.70:compile

[INFO] +- org.apache.hadoop:hadoop-common:test-jar:tests:3.4.0:test
[INFO] |  +- (org.bouncycastle:bcprov-jdk15on:jar:1.70:test - omitted for duplicate)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78:test

hadoop 3.4.1 (hadoop-common)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78.1:compile

tez on hadoop 3.4.1 (tez-api)

[INFO] +- org.apache.hadoop:hadoop-common:jar:3.4.1:compile
[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)


[INFO] +- org.apache.hadoop:hadoop-common:test-jar:tests:3.4.1:test
[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78:test

so apparently, when depending on hadoop 3.4.1 (where the bcprov compile scope dependency looks good in hadoop project itself: org.bouncycastle:bcprov-jdk18on:jar:1.78.1:compile), the compile time dependency (instead of being brought as a compile-time dependency in tez), get ommited due to a totally confusing message:

[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)

what does this mean? bcprov 1.78:test is omitted because the version is managed from 1.78.1 (which is the version defined in hadoop), but at the same time, I cannot see a proper 1.78.1 compile time dependency or something, this just doesn't make sense to me

I’ve put in 1–2 hours of investigation so far, and I'm not sure how much more time it's worth, so here’s what we can do:

  1. merge this change and accept tez bringing this dependency with its own ${bouncycastle.version} compile time
  2. understand what happened here and still let hadoop bring its ${bouncycastle.version}

maybe do 1) now, and follow-up 2) later

@abstractdog abstractdog changed the title TEZ-4635: Hadoop doesn't bring BouncyCastle to tez tar since 3.4 TEZ-4635: The bcprov JAR is no longer included in tez.tar.gz from Hadoop 3.4.1. Jul 1, 2025
@ayushtkn
Copy link
Member

ayushtkn commented Jul 2, 2025

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom.
I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope.

Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync.

I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier?

I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly.

However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

@abstractdog
Copy link
Contributor Author

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom. I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope.

Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync.

I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier?

I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly.

However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

thanks @ayushtkn , absolutely makes sense

what's weird is that upstream tez, the fix works also without the version harmonization, so only changing the root pom.xml
however, I need to consider 2 things:

  1. version harmonization makes sense, I'll most probably add it
  2. this fix doesn't work downstream, only if I change the tez-api pom.xml too...I need to understand the difference before proceeding here

I'll keep you posted

@abstractdog
Copy link
Contributor Author

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom. I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope.
Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync.
I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier?
I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly.
However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

thanks @ayushtkn , absolutely makes sense

what's weird is that upstream tez, the fix works also without the version harmonization, so only changing the root pom.xml however, I need to consider 2 things:

  1. version harmonization makes sense, I'll most probably add it
  2. this fix doesn't work downstream, only if I change the tez-api pom.xml too...I need to understand the difference before proceeding here

I'll keep you posted

okay, after 4-5 hours I just figured out that the downstream version was missing TEZ-4266, leading very different plugin versions...TLDR: did everything to synchronize the pom structure, still downstream bcprov jar didn't appear in the dist package, after applying TEZ-4266 it magically started to work: I'm 99% sure that the old maven assembly plugin was the one to blame

so I believe this PR could stay with a simple scope change in the root pom.xml + version bump

@abstractdog abstractdog changed the title TEZ-4635: The bcprov JAR is no longer included in tez.tar.gz from Hadoop 3.4.1. TEZ-4635: The bcprov JAR is no longer included in tez.tar.gz from Hadoop 3.4.1 Jul 2, 2025
@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 5m 51s Docker failed to build run-specific yetus/tez:tp-15228}.
Subsystem Report/Notes
GITHUB PR #419
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/5/console
versions git=2.34.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 20m 10s Docker failed to build run-specific yetus/tez:tp-32760}.
Subsystem Report/Notes
GITHUB PR #419
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/4/console
versions git=2.34.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 28m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 11m 11s master passed
+1 💚 compile 2m 21s master passed
+1 💚 javadoc 1m 27s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 41s the patch passed
+1 💚 codespell 1m 0s No new issues.
+1 💚 compile 2m 22s the patch passed
+1 💚 javac 2m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 javadoc 1m 11s the patch passed
_ Other Tests _
+1 💚 unit 68m 52s root in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
123m 16s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/artifact/out/Dockerfile
GITHUB PR #419
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile
uname Linux 5749504c4ae8 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/tez-multibranch_PR-419/src/.yetus/personality.sh
git revision master / d162495
Default Java Ubuntu-21.0.7+6-Ubuntu-0ubuntu124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/testReport/
Max. process+thread count 1273 (vs. ulimit of 5500)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/console
versions git=2.43.0 maven=3.8.7 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abstractdog abstractdog merged commit 8fa0c37 into apache:master Jul 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants