[SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key #19819

eatoncys · 2017-11-25T09:05:13Z

What changes were proposed in this pull request?

If the value of param 'spark.streaming.concurrentJobs' is more than one, and the value of param 'spark.executor.cores' is more than one, there may be two or more tasks in one executor will use the same kafka consumer at the same time, then it will throw an exception: "KafkaConsumer is not safe for multi-threaded access";
for example:
spark.streaming.concurrentJobs=2
spark.executor.cores=2
spark.cores.max=2
if there is only one topic with one partition('topic1',0) to consume, there will be two jobs to run at the same time, and they will use the same cacheKey('groupid','topic1',0) to get the CachedKafkaConsumer from the cache list of' private var cache: ju.LinkedHashMap[CacheKey, CachedKafkaConsumer[_, _]]' , then it will get the same CachedKafkaConsumer.

this PR add threadId to the CachedKafkaConsumer key to prevent two thread using a consumer at the same time.

How was this patch tested?

existing ut test

SparkQA · 2017-11-25T09:23:15Z

Test build #84183 has finished for PR 19819 at commit aa02d89.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lvdongr · 2017-11-27T06:35:21Z

Will the cached consumer to the same partition increase , when different tasks consume the same partition and no place to remove?

gaborgsomogyi · 2018-03-20T05:28:39Z

It will create a new consumer for each thread. This could be quite resource consuming when several topics shared with thread pools.

gaborgsomogyi · 2018-07-17T15:26:32Z

@lvdongr I think this can be closed as the problem solved.

lvdongr · 2018-07-18T01:48:31Z

I've seen your PR: #20997, a good solution @gaborgsomogyi

Add threadId to CachedKafkaConsumer key

aa02d89

eatoncys closed this Jul 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key #19819

[SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key #19819

Uh oh!

eatoncys commented Nov 25, 2017

Uh oh!

SparkQA commented Nov 25, 2017

Uh oh!

lvdongr commented Nov 27, 2017

Uh oh!

gaborgsomogyi commented Mar 20, 2018

Uh oh!

gaborgsomogyi commented Jul 17, 2018 •

edited

Loading

Uh oh!

lvdongr commented Jul 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key #19819

[SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key #19819

Uh oh!

Conversation

eatoncys commented Nov 25, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 25, 2017

Uh oh!

lvdongr commented Nov 27, 2017

Uh oh!

gaborgsomogyi commented Mar 20, 2018

Uh oh!

gaborgsomogyi commented Jul 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvdongr commented Jul 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gaborgsomogyi commented Jul 17, 2018 •

edited

Loading