[SPARK-28464][Doc][SS] Document Kafka source minPartitions option #25219

arunpandianp · 2019-07-21T04:46:29Z

What changes were proposed in this pull request?

Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

dongjoon-hyun · 2019-07-21T08:17:12Z

ok to test

dongjoon-hyun · 2019-07-21T08:21:43Z

docs/structured-streaming-kafka-integration.md

+<tr>
+  <td>minPartitions</td>
+  <td>int</td>
+  <td>0 (disabled)</td>


Thank you for your first contribution, @arunpandianp .
However, this is wrong because this will mislead the users to try to set 0 and face IllegalArgumentException.
Technically, the default value is None. Just leave this line as a blank like <td></td>.

arunpandianp · 2019-07-21T08:31:50Z

@dongjoon-hyun thanks for checking, changed it to <td></td>

SparkQA · 2019-07-21T08:33:50Z

Test build #107962 has finished for PR 25219 at commit 69683eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-07-21T08:38:55Z

docs/structured-streaming-kafka-integration.md

+  <td></td>
+  <td>streaming and batch</td>
+  <td>Minimum number of partitions to read from Kafka.
+  You can configure Spark to use an arbitrary minimum of partitions to read from Kafka using the minPartitions option.


Let's remove this line because we don't allow arbitrary number.

dongjoon-hyun · 2019-07-21T08:39:06Z

docs/structured-streaming-kafka-integration.md

+  <td>streaming and batch</td>
+  <td>Minimum number of partitions to read from Kafka.
+  You can configure Spark to use an arbitrary minimum of partitions to read from Kafka using the minPartitions option.
+  Normally Spark has a 1-1 mapping of Kafka TopicPartitions to Spark partitions consuming from Kafka.


Normally -> By default, ?

dongjoon-hyun · 2019-07-21T08:40:01Z

docs/structured-streaming-kafka-integration.md

+  If you set the minPartitions option to a value greater than your Kafka TopicPartitions,
+  Spark will divvy up large Kafka partitions to smaller pieces.
+  This option can be set at times of peak loads, data skew, and as your stream is falling behind to increase processing rate.
+  It comes at a cost of initializing Kafka consumers at each trigger, which may impact performance if you use SSL when connecting to Kafka.</td>


Let's remove line 401~402, too.

SparkQA · 2019-07-21T08:51:14Z

Test build #107963 has finished for PR 25219 at commit 6081c50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-21T09:07:21Z

Test build #107964 has finished for PR 25219 at commit 7c0e448.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

arunpandianp · 2019-07-21T09:25:20Z

@dongjoon-hyun pushed suggested changes.

dongjoon-hyun

+1, LGTM. Thank you, @arunpandianp .
Merged to master/branch-2.4.

dongjoon-hyun · 2019-07-21T20:06:53Z

docs/structured-streaming-kafka-integration.md

+  <td>Minimum number of partitions to read from Kafka.
+  By default, Spark has a 1-1 mapping of Kafka TopicPartitions to Spark partitions consuming from Kafka.
+  If you set the minPartitions option to a value greater than your Kafka TopicPartitions,
+  Spark will divvy up large Kafka partitions to smaller pieces.


</td> is missed. I'll fix that during merging.

Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide" The text is based on the content in https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration Closes #25219 from arunpandianp/SPARK-28464. Authored-by: Arun Pandian <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit a0a58cf) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2019-07-21T20:27:24Z

Welcome to the Apache Spark community, @arunpandianp .
You're added to the Apache Spark contributor group and SPARK-28464 is assigned to you.

Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide" The text is based on the content in https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration Closes apache#25219 from arunpandianp/SPARK-28464. Authored-by: Arun Pandian <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide" The text is based on the content in https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration Closes apache#25219 from arunpandianp/SPARK-28464. Authored-by: Arun Pandian <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit a0a58cf) Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-28464] Document kafka source minPartitions option

69683eb

dongjoon-hyun added the DOCUMENTATION label Jul 21, 2019

dongjoon-hyun reviewed Jul 21, 2019

View reviewed changes

dongjoon-hyun added the STRUCTURED STREAMING label Jul 21, 2019

[SPARK-28464] Set default minPartitions to None

6081c50

dongjoon-hyun reviewed Jul 21, 2019

View reviewed changes

[SPARK-28464] Review comments

7c0e448

dongjoon-hyun approved these changes Jul 21, 2019

View reviewed changes

dongjoon-hyun reviewed Jul 21, 2019

View reviewed changes

dongjoon-hyun closed this in a0a58cf Jul 21, 2019

arunpandianp deleted the SPARK-28464 branch July 22, 2019 04:44

dongjoon-hyun mentioned this pull request Jul 26, 2019

[SPARK-28489][SS] Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets #25237

Closed

[SPARK-28464][Doc][SS] Document Kafka source minPartitions option #25219

[SPARK-28464][Doc][SS] Document Kafka source minPartitions option #25219

Uh oh!

Conversation

arunpandianp commented Jul 21, 2019 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Uh oh!

dongjoon-hyun commented Jul 21, 2019

Uh oh!

dongjoon-hyun Jul 21, 2019

Choose a reason for hiding this comment

Uh oh!

arunpandianp commented Jul 21, 2019

Uh oh!

SparkQA commented Jul 21, 2019

Uh oh!

dongjoon-hyun Jul 21, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 21, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 21, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 21, 2019

Uh oh!

SparkQA commented Jul 21, 2019

Uh oh!

arunpandianp commented Jul 21, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 21, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arunpandianp commented Jul 21, 2019 •

edited by dongjoon-hyun

Loading