Skip to content

Commit 84918e7

Browse files
xuanyuankingfqaiser94
authored andcommitted
[SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes
### What changes were proposed in this pull request? Update the Spark SQL document menu and join strategy hints. ### Why are the changes needed? - Several new changes in the Spark SQL document didn't change the menu-sql.yaml correspondingly. - Update the demo code for join strategy hints. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Document change only. Closes apache#26917 from xuanyuanking/SPARK-30278. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent aa0a3b6 commit 84918e7

3 files changed

Lines changed: 26 additions & 10 deletions

File tree

docs/_data/menu-sql.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
url: sql-getting-started.html#creating-datasets
1616
- text: Interoperating with RDDs
1717
url: sql-getting-started.html#interoperating-with-rdds
18+
- text: Scalar Functions
19+
url: sql-getting-started.html#scalar-functions
1820
- text: Aggregations
1921
url: sql-getting-started.html#aggregations
2022
- text: Data Sources
@@ -34,6 +36,8 @@
3436
url: sql-data-sources-jdbc.html
3537
- text: Avro Files
3638
url: sql-data-sources-avro.html
39+
- text: Whole Binary Files
40+
url: sql-data-sources-binaryFile.html
3741
- text: Troubleshooting
3842
url: sql-data-sources-troubleshooting.html
3943
- text: Performance Tuning
@@ -43,8 +47,8 @@
4347
url: sql-performance-tuning.html#caching-data-in-memory
4448
- text: Other Configuration Options
4549
url: sql-performance-tuning.html#other-configuration-options
46-
- text: Broadcast Hint for SQL Queries
47-
url: sql-performance-tuning.html#broadcast-hint-for-sql-queries
50+
- text: Join Strategy Hints for SQL Queries
51+
url: sql-performance-tuning.html#join-strategy-hints-for-sql-queries
4852
- text: Distributed SQL Engine
4953
url: sql-distributed-sql-engine.html
5054
subitems:

docs/sql-migration-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ license: |
525525

526526
Note that, for <b>DecimalType(38,0)*</b>, the table above intentionally does not cover all other combinations of scales and precisions because currently we only infer decimal type like `BigInteger`/`BigInt`. For example, 1.1 is inferred as double type.
527527

528-
- Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](sql-performance-tuning.html#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
528+
- Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Join Strategy Hints for SQL Queries](sql-performance-tuning.html#join-strategy-hints-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
529529

530530
- Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.concatBinaryAsString` to `true`.
531531

docs/sql-performance-tuning.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -129,26 +129,23 @@ a specific strategy may not support all join types.
129129
<div data-lang="scala" markdown="1">
130130

131131
{% highlight scala %}
132-
import org.apache.spark.sql.functions.broadcast
133-
broadcast(spark.table("src")).join(spark.table("records"), "key").show()
132+
spark.table("src").join(spark.table("records").hint("broadcast"), "key").show()
134133
{% endhighlight %}
135134

136135
</div>
137136

138137
<div data-lang="java" markdown="1">
139138

140139
{% highlight java %}
141-
import static org.apache.spark.sql.functions.broadcast;
142-
broadcast(spark.table("src")).join(spark.table("records"), "key").show();
140+
spark.table("src").join(spark.table("records").hint("broadcast"), "key").show();
143141
{% endhighlight %}
144142

145143
</div>
146144

147145
<div data-lang="python" markdown="1">
148146

149147
{% highlight python %}
150-
from pyspark.sql.functions import broadcast
151-
broadcast(spark.table("src")).join(spark.table("records"), "key").show()
148+
spark.table("src").join(spark.table("records").hint("broadcast"), "key").show()
152149
{% endhighlight %}
153150

154151
</div>
@@ -158,7 +155,7 @@ broadcast(spark.table("src")).join(spark.table("records"), "key").show()
158155
{% highlight r %}
159156
src <- sql("SELECT * FROM src")
160157
records <- sql("SELECT * FROM records")
161-
head(join(broadcast(src), records, src$key == records$key))
158+
head(join(src, hint(records, "broadcast"), src$key == records$key))
162159
{% endhighlight %}
163160

164161
</div>
@@ -172,3 +169,18 @@ SELECT /*+ BROADCAST(r) */ * FROM records r JOIN src s ON r.key = s.key
172169

173170
</div>
174171
</div>
172+
173+
## Coalesce Hints for SQL Queries
174+
175+
Coalesce hints allows the Spark SQL users to control the number of output files just like the
176+
`coalesce`, `repartition` and `repartitionByRange` in Dataset API, they can be used for performance
177+
tuning and reducing the number of output files. The "COALESCE" hint only has a partition number as a
178+
parameter. The "REPARTITION" hint has a partition number, columns, or both of them as parameters.
179+
The "REPARTITION_BY_RANGE" hint must have column names and a partition number is optional.
180+
181+
SELECT /*+ COALESCE(3) */ * FROM t
182+
SELECT /*+ REPARTITION(3) */ * FROM t
183+
SELECT /*+ REPARTITION(c) */ * FROM t
184+
SELECT /*+ REPARTITION(3, c) */ * FROM t
185+
SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
186+
SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t

0 commit comments

Comments
 (0)