Skip to content

Conversation

@abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Jul 4, 2023

All the changes are according to the conversation on Jira + addressing comments on this PR so far.
Added comments myself to clarify what unrelated changes mean

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @abstractdog for the PR. Sounds cool, minor stuff.
Do you plan to extend a test to cover this?

@abstractdog
Copy link
Contributor Author

Thanx @abstractdog for the PR. Sounds cool, minor stuff. Do you plan to extend a test to cover this?

yes, absolutely, I'm trying to add a UT

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@abstractdog abstractdog force-pushed the TEZ-2119 branch 2 times, most recently from 1eb42e0 to fcc8a54 Compare July 5, 2023 14:12
@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor Author

abstractdog commented Jul 6, 2023

finally a working version for this patch, unit tests are included, tested on cluster as show below:

  1. ran a dag with resources available for 2 tasks containers (after the first 2 allocation/launch, the rest 6 attempts were all reuse as expected)
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 8
INFO  :    TOTAL_LAUNCHED_TASKS: 8
INFO  :    OTHER_LOCAL_TASKS: 5
INFO  :    AM_CPU_MILLISECONDS: 5130
INFO  :    AM_GC_TIME_MILLIS: 141
INFO  :    INITIAL_HELD_CONTAINERS: 0
INFO  :    TOTAL_CONTAINERS_USED: 2
INFO  :    TOTAL_CONTAINER_ALLOCATION_COUNT: 2
INFO  :    TOTAL_CONTAINER_LAUNCH_COUNT: 2
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 6
  1. ran the dag quickly again: the same amount of tasks, but already had 2 containers, so all 8 attempts were running on reused containers (on the same 2 containers as shown in TOTAL_CONTAINERS_USED)
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 8
INFO  :    TOTAL_LAUNCHED_TASKS: 8
INFO  :    OTHER_LOCAL_TASKS: 5
INFO  :    AM_CPU_MILLISECONDS: 1130
INFO  :    AM_GC_TIME_MILLIS: 83
INFO  :    INITIAL_HELD_CONTAINERS: 2
INFO  :    TOTAL_CONTAINERS_USED: 2
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 8
  1. killed another yarn application to make resources (actually: I got 1 more container :) ), this is the same as 1) just with 3 containers (3 launch/allocation, then 5 reuse)
INFO  : org.apache.tez.common.counters.DAGCounter: 
INFO  :    NUM_SUCCEEDED_TASKS: 8
INFO  :    TOTAL_LAUNCHED_TASKS: 8
INFO  :    OTHER_LOCAL_TASKS: 5
INFO  :    AM_CPU_MILLISECONDS: 5790
INFO  :    AM_GC_TIME_MILLIS: 184
INFO  :    INITIAL_HELD_CONTAINERS: 0
INFO  :    TOTAL_CONTAINERS_USED: 3
INFO  :    TOTAL_CONTAINER_ALLOCATION_COUNT: 3
INFO  :    TOTAL_CONTAINER_LAUNCH_COUNT: 3
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 5
  1. ran dag again, the same as 2) but with 3 containers:
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 8
INFO  :    TOTAL_LAUNCHED_TASKS: 8
INFO  :    OTHER_LOCAL_TASKS: 5
INFO  :    AM_CPU_MILLISECONDS: 1360
INFO  :    AM_GC_TIME_MILLIS: 77
INFO  :    INITIAL_HELD_CONTAINERS: 3
INFO  :    TOTAL_CONTAINERS_USED: 3
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 8

cc: @r0hini , can you please review the code? tested on cluster and with unit tests

@abstractdog abstractdog requested a review from rbalamohan July 6, 2023 07:21

TaskSchedulerWithDrainableContext taskScheduler = (TaskSchedulerWithDrainableContext) ((TaskSchedulerManagerForTest) taskSchedulerManager)
.getSpyTaskScheduler();
TaskSchedulerWithDrainableContext taskScheduler =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a reformat

DAGClientServer mockClientService;
TestEventHandler mockEventHandler;
ContainerSignatureMatcher mockSigMatcher;
MockTaskSchedulerManager schedulerHandler;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedulerHandler was incorrect, simply renamed

@Override
TaskScheduler createUberTaskScheduler(TaskSchedulerContext taskSchedulerContext, int schedulerId) {
taskSchedulerContexts.add(taskSchedulerContext);
testTaskSchedulers.add(uberTaskScheduler);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't have to do anything with this patch, it's just fixed for clarity's sake

/**
* This container launcher simply implements ContainerLauncher methods with the proper context callbacks.
*/
public static class ContainerLauncherForTest extends ContainerLauncher {
Copy link
Contributor Author

@abstractdog abstractdog Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally: ContainerLauncherForTest was a mock launcher for reporting errors, I simply renamed it to FailureReporterContainerLauncher, and turned ContainerLauncherForTest to a NO-OP-ish container launcher with simple callbacks

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 6m 10s Maven dependency ordering for branch
+1 💚 mvninstall 12m 3s master passed
+1 💚 compile 2m 2s master passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu122.04.1
+1 💚 compile 1m 58s master passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
+1 💚 checkstyle 2m 12s master passed
+1 💚 javadoc 1m 51s master passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu122.04.1
+1 💚 javadoc 1m 34s master passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
+0 🆗 spotbugs 0m 49s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 46s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 1m 8s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu122.04.1
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 7s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
+1 💚 javac 1m 7s the patch passed
-0 ⚠️ checkstyle 0m 36s tez-dag: The patch generated 5 new + 554 unchanged - 7 fixed = 559 total (was 561)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu122.04.1
+1 💚 javadoc 0m 47s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
+1 💚 findbugs 2m 50s the patch passed
_ Other Tests _
+1 💚 unit 2m 18s tez-api in the patch passed.
+1 💚 unit 5m 12s tez-dag in the patch passed.
+1 💚 unit 4m 11s tez-ext-service-tests in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
55m 27s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-301/6/artifact/out/Dockerfile
GITHUB PR #301
JIRA Issue TEZ-2119
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux a5ab820d4d95 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 5038075
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu122.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu122.04-b09
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-301/6/artifact/out/diff-checkstyle-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-301/6/testReport/
Max. process+thread count 1015 (vs. ulimit of 5500)
modules C: tez-api tez-dag tez-ext-service-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-301/6/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jteagles jteagles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Looks good and will be a great feature. Can you take care of committing this?

@abstractdog abstractdog merged commit b643f9b into apache:master Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants