Skip to content

Conversation

@arunpandianp
Copy link

@arunpandianp arunpandianp commented Jul 21, 2019

What changes were proposed in this pull request?

Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

screenshot_doc

@dongjoon-hyun
Copy link
Member

ok to test

<tr>
<td>minPartitions</td>
<td>int</td>
<td>0 (disabled)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @arunpandianp .
However, this is wrong because this will mislead the users to try to set 0 and face IllegalArgumentException.
Technically, the default value is None. Just leave this line as a blank like <td></td>.

@arunpandianp
Copy link
Author

@dongjoon-hyun thanks for checking, changed it to <td></td>

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107962 has finished for PR 25219 at commit 69683eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<td></td>
<td>streaming and batch</td>
<td>Minimum number of partitions to read from Kafka.
You can configure Spark to use an arbitrary minimum of partitions to read from Kafka using the minPartitions option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this line because we don't allow arbitrary number.

<td>streaming and batch</td>
<td>Minimum number of partitions to read from Kafka.
You can configure Spark to use an arbitrary minimum of partitions to read from Kafka using the minPartitions option.
Normally Spark has a 1-1 mapping of Kafka TopicPartitions to Spark partitions consuming from Kafka.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally -> By default, ?

If you set the minPartitions option to a value greater than your Kafka TopicPartitions,
Spark will divvy up large Kafka partitions to smaller pieces.
This option can be set at times of peak loads, data skew, and as your stream is falling behind to increase processing rate.
It comes at a cost of initializing Kafka consumers at each trigger, which may impact performance if you use SSL when connecting to Kafka.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove line 401~402, too.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107963 has finished for PR 25219 at commit 6081c50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107964 has finished for PR 25219 at commit 7c0e448.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@arunpandianp
Copy link
Author

@dongjoon-hyun pushed suggested changes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @arunpandianp .
Merged to master/branch-2.4.

<td>Minimum number of partitions to read from Kafka.
By default, Spark has a 1-1 mapping of Kafka TopicPartitions to Spark partitions consuming from Kafka.
If you set the minPartitions option to a value greater than your Kafka TopicPartitions,
Spark will divvy up large Kafka partitions to smaller pieces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</td> is missed. I'll fix that during merging.

dongjoon-hyun pushed a commit that referenced this pull request Jul 21, 2019
Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

The text is based on the content in  https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration

Closes #25219 from arunpandianp/SPARK-28464.

Authored-by: Arun Pandian <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a0a58cf)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Welcome to the Apache Spark community, @arunpandianp .
You're added to the Apache Spark contributor group and SPARK-28464 is assigned to you.

@arunpandianp arunpandianp deleted the SPARK-28464 branch July 22, 2019 04:44
yiheng pushed a commit to yiheng/spark that referenced this pull request Jul 24, 2019
Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

The text is based on the content in  https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration

Closes apache#25219 from arunpandianp/SPARK-28464.

Authored-by: Arun Pandian <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

The text is based on the content in  https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration

Closes apache#25219 from arunpandianp/SPARK-28464.

Authored-by: Arun Pandian <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a0a58cf)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
Adding doc for the kafka source minPartitions option to "Structured Streaming + Kafka Integration Guide"

The text is based on the content in  https://docs.databricks.com/spark/latest/structured-streaming/kafka.html#configuration

Closes apache#25219 from arunpandianp/SPARK-28464.

Authored-by: Arun Pandian <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a0a58cf)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants