Skip to content

Conversation

@igreenfield
Copy link

@igreenfield igreenfield commented Nov 21, 2019

What changes were proposed in this pull request?

Added MDC support in all thread pools.
ThreaddUtils create new pools that pass over MDC.

Why are the changes needed?

In many cases, it is very hard to understand from which actions the logs in the executor come from.
when you are doing multi-thread work in the driver and send actions in parallel.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No test added because no new functionality added it is thread pull change and all current tests pass.

@igreenfield
Copy link
Author

@dongjoon-hyun Could you review that, please?

@igreenfield
Copy link
Author

@srowen Hi Could you please look at that? It is very important to me that it will go into spark 3.0.0

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 12, 2020
@github-actions github-actions bot closed this Apr 13, 2020
@igreenfield
Copy link
Author

@marmbrus @mojodna @dongjoon-hyun could one of you look at that PR? please I think it will help all users.

@yairogen
Copy link

yes - please approve. we need this as well

@shaleo
Copy link

shaleo commented Apr 20, 2020

+1, welcomed addition for adding more context to our current logs

@igreenfield
Copy link
Author

@cloud-fan Could you please look at that?

@cloud-fan
Copy link
Contributor

cc @Ngone51

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useful to me. But could we reduce the scope from the beginning, e.g. TaskRunner only. ThreadUtils seems to have many impact.

And it seem we also need to add pattern configuration for MDC? Otherwise, it doesn't work(I tried it locally).

@igreenfield
Copy link
Author

Hi, @Ngone51 first thank for reviewing!
about the pattern, it should be added but I think each will add what he needed and how he wanted it as it also supports adding local properties starting with MDC.
about ThreadUtils without all that change the MDC will not propagate to all places only to the scope of that thread and from my tests it is not enough, we are using that code internally for more than a year.

@Ngone51
Copy link
Member

Ngone51 commented Apr 27, 2020

about the pattern, it should be added but I think each will add what he needed and how he wanted it as it also supports adding local properties starting with MDC.

Could you give a pattern configuration template to make MDC work with Spark?

IIUC, the configuration is related to the key that put in MDC. If you've already hard-coded keys (appId & appName) in the code, how do people add their custom configurations?

about ThreadUtils without all that change the MDC will not propagate to all places only to the scope of that thread and from my tests it is not enough, we are using that code internally for more than a year.

So you internally use MDC to track logs at application level? (IIUC, appId and appName will be prepend to each logs with MDC, right?) But is it possible that more than one applications log in the same file?(I assume that you're trying to get rid of such a problem if any)

@igreenfield
Copy link
Author

log4j.appender.console.layout.ConversionPattern=%d{yyyy/MM/dd HH:mm:ss} %p [%X{appId}] [%X{appName}] %c{3} - %m%n

properties.asScala.filter(_._1.startsWith("mdc.")).foreach { item =>
      val key = item._1.substring(4)
      org.slf4j.MDC.put(key, item._2)
    }

in that code you iterate over the properties and add each property that start with mdc it adds to MDC so you can use.

we are using spark under spark server so we have long-running sessions and many tasks in it that do not connect to each other. and we add properties so we can see in the logs which connected to which.

@gatorsmile gatorsmile reopened this Apr 28, 2020
@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #121949 has finished for PR 26624 at commit 6c2d27d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot closed this Apr 29, 2020
@cloud-fan cloud-fan removed the Stale label Apr 29, 2020
@cloud-fan cloud-fan reopened this Apr 29, 2020
@cloud-fan
Copy link
Contributor

This looks like a good feature.

Can we split it into several smaller PRs? e.g. the first PR can just focus on the TaskRunner.

@igreenfield
Copy link
Author

Hi @cloud-fan, I think that if we will not merge all as part it will not have the benefit. as I answer in an earlier comment

@Ngone51
Copy link
Member

Ngone51 commented Apr 30, 2020

we are using spark under spark server so we have long-running sessions and many tasks in it that do not connect to each other. and we add properties so we can see in the logs which connected to which.

Sorry I can't understand your use case here. From the PR implementation, do you want to know the somewhat relationship between tasks from different applications?

Can you explain more about your use case? @igreenfield

@igreenfield
Copy link
Author

we are running one-app but submit to it many tasks using spark-server so it has many tasks that belong to different requests. so in our case, we mostly use the mdc. local property. to add info into logs.
and from the original Jira:

It would be nice to have, because it's good to have logs in one file when using log agents (like logentires) in standalone mode. Also allows configuring rolling file appender without a mess when multiple applications are running.

@Ngone51
Copy link
Member

Ngone51 commented Apr 30, 2020

we are running one-app but submit to it many tasks using spark-server so it has many tasks that belong to different requests.

So you're running one application but tasks are running on distributed nodes because of spark-server and logs separately. And you want to collect logs for the application by attaching appId for the task logs, right?

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 19, 2020

Test build #122834 has finished for PR 26624 at commit d5c1aa9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@igreenfield
Copy link
Author

@cloud-fan why the build failed?

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 19, 2020

Test build #122841 has finished for PR 26624 at commit d5c1aa9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@igreenfield
Copy link
Author

@cloud-fan seems like the failing test are not connected to the change...

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 19, 2020

Test build #122849 has finished for PR 26624 at commit d5c1aa9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@igreenfield
Copy link
Author

@cloud-fan What the problem with these tests?

@cloud-fan
Copy link
Contributor

I don't know what's going on, let me retest it on more time

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122870 has finished for PR 26624 at commit d5c1aa9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@igreenfield
Copy link
Author

@cloud-fan now all test pass what next?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in eaf7a2a May 20, 2020
cloud-fan pushed a commit that referenced this pull request Jun 11, 2020
… task

### What changes were proposed in this pull request?

This PR is a followup of #26624. This PR cleans up MDC properties if the original value is empty.
Besides, this PR adds a warning and ignore the value when the user tries to override the value of `taskName`.

### Why are the changes needed?

Before this PR, running the following jobs:

```
sc.setLocalProperty("mdc.my", "ABC")
sc.parallelize(1 to 100).count()
sc.setLocalProperty("mdc.my", null)
sc.parallelize(1 to 100).count()
```

there's still MDC value "ABC" in the log of the second count job even if we've unset the value.

### Does this PR introduce _any_ user-facing change?

Yes, user will 1) no longer see the MDC values after unsetting the value; 2) see a warning if he/she tries to override the value of `taskName`.

### How was this patch tested?

Tested Manaually.

Closes #28756 from Ngone51/followup-8981.

Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
`log4j.properties.template` located there.

By default, Spark adds 1 record to the MDC (Mapped Diagnostic Context): `taskName`, which shows something
like `task 1.0 in stage 0.0`. You can add `%X{taskName}` to your patternLayout in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patternLayout -> PatternLayout .

Could you give an example to show how to use it in this document? For example, use an example to show how to specify your application names/identifiers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile were you able to figure this out? My MDC values are not propagating in my logs after following this same procedure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alefischer13 Just a guess, but it looks like this was changed in 54e702c so that the MDC key still includes the mdc. prefix.

@alefischer13
Copy link

alefischer13 commented Feb 1, 2021

@igreenfield this does not seem to be working for me. I'm trying to log spark application_id by setting mdc.applicationId to sparkContext's applicationId and adding %X{applicationId} to patternLayout, but no applicationId shows up neither on the driver nor on the executors. For reference, %X{taskName} doesn't work either, but setting the MDC value explicitly (MDC.put...) does provide the applicationId value, but only for the driver. Is there anything else we have to change?

@igreenfield
Copy link
Author

@alefischer13 Hi please look at this PR that was merged later and changed the way you need to configure the log4j.properties: #28801

@alefischer13
Copy link

Hi @igreenfield, thanks for the reply. I tried this as well, but it didn't work either. setLocalProperty is working correctly, since I'm able to access the value through getLocalProperty, but %X{mdc.applicationId} is not producing any values in the logs. Any other suggestions?

@melin
Copy link

melin commented Jan 8, 2025

Currently, spark only adds taskName to mdc, can add executorId to MDC as well?
Plan to write logs to kafka via kafka appender, and then periodically write kafka data to s3 for consumption.

https://aws.github.io/aws-emr-containers-best-practices/troubleshooting/docs/where-to-look-for-spark-logs/
Executor Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz)

@cloud-fan @igreenfield

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.