Skip to content

Conversation

@ramesh0201
Copy link
Contributor

Tez input splits can be opened asynchronously. This will reduce the amount of time spent for s3 to prepare the connection and opening the object

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 17m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 13m 43s master passed
+1 💚 compile 0m 32s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 31s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 3s master passed
+1 💚 javadoc 0m 40s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 28s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 14s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 18s the patch passed
-0 ⚠️ checkstyle 0m 12s tez-mapreduce: The patch generated 4 new + 33 unchanged - 0 fixed = 37 total (was 33)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 0m 42s the patch passed
_ Other Tests _
+1 💚 unit 1m 21s tez-mapreduce in the patch passed.
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
40m 39s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/1/artifact/out/Dockerfile
GITHUB PR #195
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 0f0ec404e815 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 132ea4c
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/1/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/1/testReport/
Max. process+thread count 238 (vs. ulimit of 5500)
modules C: tez-mapreduce U: tez-mapreduce
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/1/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

curReader = wrappedInputFormat.getRecordReader(
groupedSplit.wrappedSplits.get(idx), job, reporter);
curReader = initedReaders.poll().get();
submitInitReaders(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the next stage onwards (i.e after the init part), this will start executing in sequential order as it requests for 1 additional reader. It will be good to init next set of readers in parallel.

RecordReader<K, V> reader = wrappedInputFormat.getRecordReader(s, job, reporter);
LOG.debug("Init Thread processed reader number {} initialization", index);
return reader;
} catch(IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If thread is interrupted, it should cancel other pending tasks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done for all exceptions as well.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 13m 37s master passed
+1 💚 compile 0m 32s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 31s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 2s master passed
+1 💚 javadoc 0m 42s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 28s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 14s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 12s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 19s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 18s the patch passed
-0 ⚠️ checkstyle 0m 11s tez-mapreduce: The patch generated 6 new + 32 unchanged - 1 fixed = 38 total (was 33)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 0m 43s the patch passed
_ Other Tests _
+1 💚 unit 1m 22s tez-mapreduce in the patch passed.
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
24m 1s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/2/artifact/out/Dockerfile
GITHUB PR #195
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 65f01d64be1f 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 20873a3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/2/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/2/testReport/
Max. process+thread count 233 (vs. ulimit of 5500)
modules C: tez-mapreduce U: tez-mapreduce
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/2/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link

@jfsii jfsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two comments - I learned a bit reading about ThreadPools and garbage collection

throw new RuntimeException(e);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
for (Future<RecordReader<K,V>> f : initedReaders) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to cancel on any exception and thread interrupt only on InterruptedException.

TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
this.numReaders = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT);
this.initReaderExecService = Executors.newFixedThreadPool(numThreads,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want to somehow make this static (to share between usages of TezGroupedSplitsRecordReader) or figure out how to call shutdown in a reliable manner. Otherwise I think a long lived process that uses multiple TezGroupedSplitsRecordReaders throughout its life will end up having a large number of unused threads since they will not auto shutdown and thus not garbage collected.

https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html
See finalization section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the context, but if the goal is to properly shutdown a global resource somehow, it can done by the shutdownhandler, somewhere here: https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L954

@ramesh0201 ramesh0201 force-pushed the TEZ-4397 branch 2 times, most recently from 214e242 to 6229a16 Compare March 22, 2022 06:12
@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 12m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 14m 15s master passed
+1 💚 compile 0m 31s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 28s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 58s master passed
+1 💚 javadoc 0m 41s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 29s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 13s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 21s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 17s the patch passed
-0 ⚠️ checkstyle 0m 12s tez-mapreduce: The patch generated 6 new + 32 unchanged - 1 fixed = 38 total (was 33)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 15s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 0m 44s the patch passed
_ Other Tests _
+1 💚 unit 1m 22s tez-mapreduce in the patch passed.
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
36m 18s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/3/artifact/out/Dockerfile
GITHUB PR #195
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 77abc76b3997 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 20873a3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/3/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/3/testReport/
Max. process+thread count 233 (vs. ulimit of 5500)
modules C: tez-mapreduce U: tez-mapreduce
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/3/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 13m 16s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 13m 55s master passed
+1 💚 compile 0m 33s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 32s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 2s master passed
+1 💚 javadoc 0m 41s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 29s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 11s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 10s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 17s the patch passed
-0 ⚠️ checkstyle 0m 11s tez-mapreduce: The patch generated 6 new + 32 unchanged - 1 fixed = 38 total (was 33)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 0m 40s the patch passed
_ Other Tests _
+1 💚 unit 1m 19s tez-mapreduce in the patch passed.
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
36m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/4/artifact/out/Dockerfile
GITHUB PR #195
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 77a7605771ad 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 20873a3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/4/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/4/testReport/
Max. process+thread count 238 (vs. ulimit of 5500)
modules C: tez-mapreduce U: tez-mapreduce
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/4/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ramesh0201
Copy link
Contributor Author

Thanks @rbalamohan @jfsii and @abstractdog for the reviews.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 17m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 13m 37s master passed
+1 💚 compile 0m 33s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 32s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 4s master passed
+1 💚 javadoc 0m 41s master passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 29s master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 12s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 17s the patch passed
-0 ⚠️ checkstyle 0m 12s tez-mapreduce: The patch generated 6 new + 32 unchanged - 1 fixed = 38 total (was 33)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 findbugs 0m 42s the patch passed
_ Other Tests _
+1 💚 unit 1m 21s tez-mapreduce in the patch passed.
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
40m 27s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/5/artifact/out/Dockerfile
GITHUB PR #195
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 48a183fc752d 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 20873a3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/5/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/5/testReport/
Max. process+thread count 238 (vs. ulimit of 5500)
modules C: tez-mapreduce U: tez-mapreduce
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-195/5/console
versions git=2.25.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@rbalamohan
Copy link
Contributor

Latest patch LGTM. +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants