Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Dec 10, 2019

What changes were proposed in this pull request?

This pr intends to add ExplainMode for explaining Dataset/DataFrame with a given format mode (ExplainMode). ExplainMode has four types along with the SQL EXPLAIN command: Simple, Extended, Codegen, Cost, and Formatted.

For example, this pr enables users to explain DataFrame/Dataset with the FORMATTED format implemented in #24759;

scala> spark.range(10).groupBy("id").count().explain(ExplainMode.Formatted)
== Physical Plan ==
* HashAggregate (3)
+- * HashAggregate (2)
   +- * Range (1)

(1) Range [codegen id : 1]
Output: [id#0L]
     
(2) HashAggregate [codegen id : 1]
Input: [id#0L]
     
(3) HashAggregate [codegen id : 1]
Input: [id#0L, count#8L]

This comes from the @cloud-fan suggestion.

Why are the changes needed?

To follow the SQL EXPLAIN command.

Does this PR introduce any user-facing change?

No, this is just for a new API in Dataset.

How was this patch tested?

Add tests in ExplainSuite.

* @since 3.0.0
*/
def explain(extended: Boolean): Unit = {
def explain(mode: ExplainMode): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about retain the old api and add a new api ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you . @maropu already did. Please see line 564.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we should keep this. Thanks for the comment, @dongjoon-hyun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115091 has finished for PR 26829 at commit e8c4af1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*
* {{{
* EXPLAIN (EXTENDED | CODEGEN) SELECT * FROM ...
* EXPLAIN (EXTENDED | CODEGEN | COST | FORMATTED) SELECT * FROM ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this together.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.

@maropu
Copy link
Member Author

maropu commented Dec 11, 2019

Thanks for the check, @dongjoon-hyun !

*/
Extended,
/**
* Extended mode means that when printing explain for a DataFrame, if generated codes are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen mode ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... I'll do follow-up, thanks!

*/
Codegen,
/**
* Extended mode means that when printing explain for a DataFrame, if plan node statistics are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

dongjoon-hyun pushed a commit that referenced this pull request Dec 11, 2019
### What changes were proposed in this pull request?

This pr is a follow-up of #26829 to fix typos in ExplainMode.

### Why are the changes needed?

For better docs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #26851 from maropu/SPARK-30200-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Dec 13, 2019
### What changes were proposed in this pull request?

This pr intends to support explain modes implemented in #26829 for PySpark.

### Why are the changes needed?

For better debugging info. in PySpark dataframes.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UTs.

Closes #26861 from maropu/ExplainModeInPython.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants