[SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor #26624

igreenfield · 2019-11-21T10:04:23Z

What changes were proposed in this pull request?

Added MDC support in all thread pools.
ThreaddUtils create new pools that pass over MDC.

Why are the changes needed?

In many cases, it is very hard to understand from which actions the logs in the executor come from.
when you are doing multi-thread work in the driver and send actions in parallel.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No test added because no new functionality added it is thread pull change and all current tests pass.

igreenfield · 2019-12-18T10:27:42Z

@dongjoon-hyun Could you review that, please?

igreenfield · 2020-01-02T08:24:46Z

@srowen Hi Could you please look at that? It is very important to me that it will go into spark 3.0.0

github-actions · 2020-04-12T00:10:12Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

igreenfield · 2020-04-13T08:01:05Z

@marmbrus @mojodna @dongjoon-hyun could one of you look at that PR? please I think it will help all users.

yairogen · 2020-04-16T09:28:37Z

yes - please approve. we need this as well

shaleo · 2020-04-20T17:11:41Z

+1, welcomed addition for adding more context to our current logs

igreenfield · 2020-04-26T10:41:00Z

@cloud-fan Could you please look at that?

cloud-fan · 2020-04-27T05:20:24Z

cc @Ngone51

core/src/main/scala/org/apache/spark/executor/Executor.scala

Ngone51

This looks useful to me. But could we reduce the scope from the beginning, e.g. TaskRunner only. ThreadUtils seems to have many impact.

And it seem we also need to add pattern configuration for MDC? Otherwise, it doesn't work(I tried it locally).

igreenfield · 2020-04-27T12:18:14Z

Hi, @Ngone51 first thank for reviewing!
about the pattern, it should be added but I think each will add what he needed and how he wanted it as it also supports adding local properties starting with MDC.
about ThreadUtils without all that change the MDC will not propagate to all places only to the scope of that thread and from my tests it is not enough, we are using that code internally for more than a year.

Ngone51 · 2020-04-27T12:45:12Z

about the pattern, it should be added but I think each will add what he needed and how he wanted it as it also supports adding local properties starting with MDC.

Could you give a pattern configuration template to make MDC work with Spark?

IIUC, the configuration is related to the key that put in MDC. If you've already hard-coded keys (appId & appName) in the code, how do people add their custom configurations?

about ThreadUtils without all that change the MDC will not propagate to all places only to the scope of that thread and from my tests it is not enough, we are using that code internally for more than a year.

So you internally use MDC to track logs at application level? (IIUC, appId and appName will be prepend to each logs with MDC, right?) But is it possible that more than one applications log in the same file?(I assume that you're trying to get rid of such a problem if any)

igreenfield · 2020-04-27T13:12:26Z

log4j.appender.console.layout.ConversionPattern=%d{yyyy/MM/dd HH:mm:ss} %p [%X{appId}] [%X{appName}] %c{3} - %m%n

properties.asScala.filter(_._1.startsWith("mdc.")).foreach { item =>
      val key = item._1.substring(4)
      org.slf4j.MDC.put(key, item._2)
    }

in that code you iterate over the properties and add each property that start with mdc it adds to MDC so you can use.

we are using spark under spark server so we have long-running sessions and many tasks in it that do not connect to each other. and we add properties so we can see in the logs which connected to which.

gatorsmile · 2020-04-28T03:23:36Z

ok to test

SparkQA · 2020-04-28T07:05:02Z

Test build #121949 has finished for PR 26624 at commit 6c2d27d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-04-29T07:01:27Z

This looks like a good feature.

Can we split it into several smaller PRs? e.g. the first PR can just focus on the TaskRunner.

igreenfield · 2020-04-29T17:36:09Z

Hi @cloud-fan, I think that if we will not merge all as part it will not have the benefit. as I answer in an earlier comment

core/src/main/scala/org/apache/spark/executor/Executor.scala

Ngone51 · 2020-04-30T03:09:22Z

we are using spark under spark server so we have long-running sessions and many tasks in it that do not connect to each other. and we add properties so we can see in the logs which connected to which.

Sorry I can't understand your use case here. From the PR implementation, do you want to know the somewhat relationship between tasks from different applications?

Can you explain more about your use case? @igreenfield

igreenfield · 2020-04-30T03:25:21Z

we are running one-app but submit to it many tasks using spark-server so it has many tasks that belong to different requests. so in our case, we mostly use the mdc. local property. to add info into logs.
and from the original Jira:

It would be nice to have, because it's good to have logs in one file when using log agents (like logentires) in standalone mode. Also allows configuring rolling file appender without a mess when multiple applications are running.

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala

Ngone51 · 2020-04-30T03:39:30Z

we are running one-app but submit to it many tasks using spark-server so it has many tasks that belong to different requests.

So you're running one application but tasks are running on distributed nodes because of spark-server and logs separately. And you want to collect logs for the application by attaching appId for the task logs, right?

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala

core/src/main/scala/org/apache/spark/executor/Executor.scala

docs/configuration.md

cloud-fan · 2020-05-19T06:30:08Z

retest this please

SparkQA · 2020-05-19T07:05:02Z

Test build #122834 has finished for PR 26624 at commit d5c1aa9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

igreenfield · 2020-05-19T07:36:29Z

@cloud-fan why the build failed?

cloud-fan · 2020-05-19T07:42:26Z

retest this please

SparkQA · 2020-05-19T10:38:16Z

Test build #122841 has finished for PR 26624 at commit d5c1aa9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

igreenfield · 2020-05-19T10:59:25Z

@cloud-fan seems like the failing test are not connected to the change...

cloud-fan · 2020-05-19T12:45:10Z

retest this please

SparkQA · 2020-05-19T15:33:08Z

Test build #122849 has finished for PR 26624 at commit d5c1aa9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

igreenfield · 2020-05-19T17:59:01Z

@cloud-fan What the problem with these tests?

cloud-fan · 2020-05-20T03:55:15Z

I don't know what's going on, let me retest it on more time

cloud-fan · 2020-05-20T03:55:22Z

retest this please

SparkQA · 2020-05-20T06:52:09Z

Test build #122870 has finished for PR 26624 at commit d5c1aa9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

igreenfield · 2020-05-20T07:27:00Z

@cloud-fan now all test pass what next?

cloud-fan · 2020-05-20T07:40:58Z

thanks, merging to master!

… task ### What changes were proposed in this pull request? This PR is a followup of #26624. This PR cleans up MDC properties if the original value is empty. Besides, this PR adds a warning and ignore the value when the user tries to override the value of `taskName`. ### Why are the changes needed? Before this PR, running the following jobs: ``` sc.setLocalProperty("mdc.my", "ABC") sc.parallelize(1 to 100).count() sc.setLocalProperty("mdc.my", null) sc.parallelize(1 to 100).count() ``` there's still MDC value "ABC" in the log of the second count job even if we've unset the value. ### Does this PR introduce _any_ user-facing change? Yes, user will 1) no longer see the MDC values after unsetting the value; 2) see a warning if he/she tries to override the value of `taskName`. ### How was this patch tested? Tested Manaually. Closes #28756 from Ngone51/followup-8981. Authored-by: yi.wu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

gatorsmile · 2020-06-17T06:13:39Z

docs/configuration.md

 `log4j.properties.template` located there.

+By default, Spark adds 1 record to the MDC (Mapped Diagnostic Context): `taskName`, which shows something
+like `task 1.0 in stage 0.0`. You can add `%X{taskName}` to your patternLayout in


patternLayout -> PatternLayout .

Could you give an example to show how to use it in this document? For example, use an example to show how to specify your application names/identifiers.

@gatorsmile were you able to figure this out? My MDC values are not propagating in my logs after following this same procedure.

@alefischer13 Just a guess, but it looks like this was changed in 54e702c so that the MDC key still includes the mdc. prefix.

alefischer13 · 2021-02-01T17:20:26Z

@igreenfield this does not seem to be working for me. I'm trying to log spark application_id by setting mdc.applicationId to sparkContext's applicationId and adding %X{applicationId} to patternLayout, but no applicationId shows up neither on the driver nor on the executors. For reference, %X{taskName} doesn't work either, but setting the MDC value explicitly (MDC.put...) does provide the applicationId value, but only for the driver. Is there anything else we have to change?

igreenfield · 2021-02-02T06:41:34Z

@alefischer13 Hi please look at this PR that was merged later and changed the way you need to configure the log4j.properties: #28801

alefischer13 · 2021-02-02T17:47:04Z

Hi @igreenfield, thanks for the reply. I tried this as well, but it didn't work either. setLocalProperty is working correctly, since I'm able to access the value through getLocalProperty, but %X{mdc.applicationId} is not producing any values in the logs. Any other suggestions?

melin · 2025-01-08T06:52:26Z

Currently, spark only adds taskName to mdc, can add executorId to MDC as well?
Plan to write logs to kafka via kafka appender, and then periodically write kafka data to s3 for consumption.

https://aws.github.io/aws-emr-containers-best-practices/troubleshooting/docs/where-to-look-for-spark-logs/
Executor Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz)

@cloud-fan @igreenfield

dongjoon-hyun added the SPARK CORE label Nov 21, 2019

github-actions bot added the Stale label Apr 12, 2020

github-actions bot closed this Apr 13, 2020

Ngone51 reviewed Apr 27, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

Ngone51 reviewed Apr 27, 2020

View reviewed changes

gatorsmile reopened this Apr 28, 2020

github-actions bot closed this Apr 29, 2020

cloud-fan removed the Stale label Apr 29, 2020

cloud-fan reopened this Apr 29, 2020

jiangxb1987 reviewed Apr 30, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 30, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 30, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala Outdated Show resolved Hide resolved

Fix scalaStyle

4deda15

cloud-fan reviewed May 19, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/util/ThreadUtils.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 19, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 19, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 19, 2020

View reviewed changes

docs/configuration.md Outdated Show resolved Hide resolved

Izek Greenfield added 2 commits May 19, 2020 09:04

Fix instructions

0685aba

style changes

d5c1aa9

cloud-fan approved these changes May 19, 2020

View reviewed changes

cloud-fan closed this in eaf7a2a May 20, 2020

Ngone51 mentioned this pull request Jun 8, 2020

[SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task #28756

Closed

gatorsmile reviewed Jun 17, 2020

View reviewed changes

[SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor #26624

[SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor #26624

Uh oh!

Conversation

igreenfield commented Nov 21, 2019 • edited by jiangxb1987 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

igreenfield commented Dec 18, 2019

Uh oh!

igreenfield commented Jan 2, 2020

Uh oh!

github-actions bot commented Apr 12, 2020

Uh oh!

igreenfield commented Apr 13, 2020

Uh oh!

yairogen commented Apr 16, 2020

Uh oh!

shaleo commented Apr 20, 2020

Uh oh!

igreenfield commented Apr 26, 2020

Uh oh!

cloud-fan commented Apr 27, 2020

Uh oh!

Uh oh!

Ngone51 left a comment

Choose a reason for hiding this comment

Uh oh!

igreenfield commented Apr 27, 2020

Uh oh!

Ngone51 commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igreenfield commented Apr 27, 2020

Uh oh!

gatorsmile commented Apr 28, 2020

Uh oh!

SparkQA commented Apr 28, 2020

Uh oh!

cloud-fan commented Apr 29, 2020

Uh oh!

igreenfield commented Apr 29, 2020

Uh oh!

Uh oh!

Ngone51 commented Apr 30, 2020

Uh oh!

igreenfield commented Apr 30, 2020

Uh oh!

Uh oh!

Ngone51 commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 19, 2020

Uh oh!

SparkQA commented May 19, 2020

Uh oh!

igreenfield commented May 19, 2020

Uh oh!

cloud-fan commented May 19, 2020

Uh oh!

SparkQA commented May 19, 2020

Uh oh!

igreenfield commented May 19, 2020

Uh oh!

cloud-fan commented May 19, 2020

Uh oh!

SparkQA commented May 19, 2020

Uh oh!

igreenfield commented May 19, 2020

Uh oh!

cloud-fan commented May 20, 2020

Uh oh!

igreenfield commented Nov 21, 2019 •

edited by jiangxb1987

Loading

Ngone51 commented Apr 27, 2020 •

edited

Loading

Ngone51 commented Apr 30, 2020 •

edited

Loading

alefischer13 commented Feb 1, 2021 •

edited

Loading

melin commented Jan 8, 2025 •

edited

Loading