Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jun 11, 2020

What changes were proposed in this pull request?

This PR proposes to use "mdc.XXX" as the consistent key for both sc.setLocalProperty and log4j.properties when setting up configurations for MDC.

Why are the changes needed?

It's weird that we use "mdc.XXX" as key to set MDC value via sc.setLocalProperty while we use "XXX" as key to set MDC pattern in log4j.properties. It could also bring extra burden to the user.

Does this PR introduce any user-facing change?

No, as MDC feature is added in version 3.1, which hasn't been released.

How was this patch tested?

Tested manually.

@Ngone51 Ngone51 changed the title [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties Jun 11, 2020
@Ngone51
Copy link
Member Author

Ngone51 commented Jun 11, 2020

cc @cloud-fan @igreenfield Please take a look, thanks!

// task is fully deserialized. When possible, the TaskContext.getLocalProperty call should be
// used instead.
val taskDeserializationProps: ThreadLocal[Properties] = new ThreadLocal[Properties]
val MDC = "mdc."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too short... I'm ok to just hardcode it in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine.

@cloud-fan
Copy link
Contributor

I don't have a strong preference of it, seems OK. cc @jiangxb1987

@igreenfield
Copy link

seems ok to me

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member

Hi, All. Sorry, but how can we disable this MDC feature and enforce to generate the same result like Spark 2.4.6?

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123858 has finished for PR 28801 at commit b8affa3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 12, 2020

Hi, All. Sorry, but how can we disable this MDC feature and enforce to generate the same result like Spark 2.4.6?

@dongjoon-hyun

As for end-user, they will not see the MDC properties in the log if they use default log4j.peroperties (since MDC requires extra pattern configuration). So it's still the same compares to Spark 2.4.6.

But internally, yes, Spark will always add at least one MDC property (taskName) to the MDC, even if nowhere else will actually use it. But considering the data is small, so probably it's fine to do it without a controllable flag.

@cloud-fan
Copy link
Contributor

yea the log4j property file is kind of the config to control this MDC feature, and it's off by default.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123884 has finished for PR 28801 at commit 2c77bbb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123916 has finished for PR 28801 at commit 2c77bbb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123930 has finished for PR 28801 at commit 2c77bbb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 12, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123939 has finished for PR 28801 at commit 2c77bbb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 13, 2020

Test build #123947 has finished for PR 28801 at commit 2c77bbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @Ngone51 , @cloud-fan , @jiangxb1987 .
Merged to master.

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 15, 2020

thanks all!!

dongjoon-hyun pushed a commit that referenced this pull request May 5, 2024
### What changes were proposed in this pull request?

Currently there are two MDC keys for task name:
* `mdc.taskName`, which is introduced in #28801. Before the change, it was `taskName`.
* `task_name`: introduce from the structured logging framework project.

To make the MDC keys unified, this PR renames the `mdc.taskName` as `task_name`. This MDC is showing frequently in logs when running Spark application.
Before change:
```
"context":{"mdc.taskName":"task 19.0 in stage 0.0 (TID 19)”}
```
after change
```
"context":{“task_name":"task 19.0 in stage 0.0 (TID 19)”}
```

### Why are the changes needed?

1. Make the MDC names consistent
2. Minor upside: this will allow users to query task names with `SELECT * FROM logs where context.task_name = ...`.  Otherwise, querying with `context.mdc.task_name` will result in an analysis exception. Users will have to query with `context['mdc.task_name']`

### Does this PR introduce _any_ user-facing change?

No really. The MDC key is used by developers for debugging purpose.

### How was this patch tested?

Manual test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46386 from gengliangwang/unify.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sinaiamonkar-sai pushed a commit to sinaiamonkar-sai/spark that referenced this pull request May 5, 2024
### What changes were proposed in this pull request?

Currently there are two MDC keys for task name:
* `mdc.taskName`, which is introduced in apache#28801. Before the change, it was `taskName`.
* `task_name`: introduce from the structured logging framework project.

To make the MDC keys unified, this PR renames the `mdc.taskName` as `task_name`. This MDC is showing frequently in logs when running Spark application.
Before change:
```
"context":{"mdc.taskName":"task 19.0 in stage 0.0 (TID 19)”}
```
after change
```
"context":{“task_name":"task 19.0 in stage 0.0 (TID 19)”}
```

### Why are the changes needed?

1. Make the MDC names consistent
2. Minor upside: this will allow users to query task names with `SELECT * FROM logs where context.task_name = ...`.  Otherwise, querying with `context.mdc.task_name` will result in an analysis exception. Users will have to query with `context['mdc.task_name']`

### Does this PR introduce _any_ user-facing change?

No really. The MDC key is used by developers for debugging purpose.

### How was this patch tested?

Manual test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46386 from gengliangwang/unify.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants