Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
985d53c
[SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
caneGuy Jan 11, 2019
51a6ba0
[SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
srowen Jan 11, 2019
d9e4cf6
[SPARK-26482][CORE] Use ConfigEntry for hardcoded configs for ui cate…
HeartSaVioR Jan 11, 2019
50ebf3a
[SPARK-26551][SQL] Fix schema pruning error when selecting one comple…
viirya Jan 11, 2019
ae382c9
[SPARK-26586][SS] Fix race condition that causes streams to run with …
mukulmurthy Jan 11, 2019
19e17ac
[SPARK-25692][TEST] Increase timeout in fetchBothChunks test
dongjoon-hyun Jan 12, 2019
e00ebd5
[SPARK-26482][K8S][TEST][FOLLOWUP] Fix compile failure
dongjoon-hyun Jan 12, 2019
3587a9a
[SPARK-26607][SQL][TEST] Remove Spark 2.2.x testing from HiveExternal…
dongjoon-hyun Jan 12, 2019
5b37092
[SPARK-26538][SQL] Set default precision and scale for elements of po…
a-shkarupin Jan 12, 2019
3bd77aa
[SPARK-26564] Fix wrong assertions and error messages for parameter c…
sekikn Jan 12, 2019
4ff2b94
[SPARK-26503][CORE][DOC][FOLLOWUP] Get rid of spark.sql.legacy.timePa…
MaxGekk Jan 13, 2019
c01152d
[SPARK-23182][CORE] Allow enabling TCP keep alive on the RPC connections
peshopetrov Jan 13, 2019
09b0548
[SPARK-26450][SQL] Avoid rebuilding map of schema for every column in…
bersprockets Jan 13, 2019
985f966
[SPARK-26065][FOLLOW-UP][SQL] Revert hint behavior in join reordering
maryannxue Jan 13, 2019
3f80071
[SPARK-26576][SQL] Broadcast hint not applied to partitioned table
jzhuge Jan 13, 2019
115fecf
[SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFor…
MaxGekk Jan 14, 2019
27759b7
Refine comment
caneGuy Jan 14, 2019
9669569
Update
caneGuy Jan 14, 2019
ac2ec82
Update
caneGuy Jan 14, 2019
bafc7ac
[SPARK-26350][SS] Allow to override group id of the Kafka consumer
zsxwing Jan 14, 2019
abc937b
[MINOR][BUILD] Remove binary license/notice files in a source release…
maropu Jan 15, 2019
33b5039
[SPARK-25935][SQL] Allow null rows for bad records from JSON/CSV parsers
MaxGekk Jan 15, 2019
a77505d
[CORE][MINOR] Fix some typos about MemoryMode
SongYadong Jan 15, 2019
b45ff02
[SPARK-26203][SQL][TEST] Benchmark performance of In and InSet expres…
aokolnychyi Jan 15, 2019
5ca45e8
[SPARK-26592][SS] Throw exception when kafka delegation token tried t…
gaborgsomogyi Jan 15, 2019
7296999
[SPARK-26462][CORE] Use ConfigEntry for hardcoded configs for executi…
pralabhkumar Jan 15, 2019
8a54492
[SPARK-25857][CORE] Add developer documentation regarding delegation …
Jan 15, 2019
1b75f3b
[SPARK-17928][MESOS] No driver.memoryOverhead setting for mesos clust…
Jan 15, 2019
954ef96
[SPARK-25530][SQL] data source v2 API refactor (batch write)
cloud-fan Jan 15, 2019
2ebb79b
[SPARK-26350][FOLLOWUP] Add actual verification on new UT introduced …
HeartSaVioR Jan 15, 2019
cf133e6
[SPARK-26604][CORE] Clean up channel registration for StreamManager
viirya Jan 16, 2019
819e5ea
[SPARK-26615][CORE] Fixing transport server/client resource leaks in …
attilapiros Jan 16, 2019
e92088d
[MINOR][PYTHON] Fix SQLContext to SparkSession in Python API main page
HyukjinKwon Jan 16, 2019
670bc55
[SPARK-25992][PYTHON] Document SparkContext cannot be shared for mult…
HyukjinKwon Jan 16, 2019
06d5b17
[SPARK-26629][SS] Fixed error with multiple file stream in a query + …
tdas Jan 16, 2019
190814e
[SPARK-26550][SQL] New built-in datasource - noop
MaxGekk Jan 16, 2019
8f17078
[SPARK-26619][SQL] Prune the unused serializers from SerializeFromObject
viirya Jan 16, 2019
01301d0
[SPARK-26625] Add oauthToken to spark.redaction.regex
Jan 16, 2019
dc3b35c
[SPARK-26633][REPL] Add ExecutorClassLoader.getResourceAsStream
rednaxelafx Jan 16, 2019
272428d
[SPARK-26600] Update spark-submit usage message
LucaCanali Jan 17, 2019
38f0307
[SPARK-26466][CORE] Use ConfigEntry for hardcoded configs for submit …
HeartSaVioR Jan 17, 2019
4915cb3
[MINOR][BUILD] ensure call to translate_component has correct number …
Jan 17, 2019
06af625
Refine comment
caneGuy Jan 17, 2019
47fbe49
fix code style
caneGuy Jan 17, 2019
b08805e
Refine comment
caneGuy Jan 17, 2019
650b879
[SPARK-26457] Show hadoop configurations in HistoryServer environment…
Jan 17, 2019
d89aa38
Update
caneGuy Jan 17, 2019
c0632ce
[SPARK-23817][SQL] Create file source V2 framework and migrate ORC re…
gengliangwang Jan 17, 2019
6f8c0e5
[SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8St…
MaxGekk Jan 17, 2019
1b575ef
[SPARK-26621][CORE] Use ConfigEntry for hardcoded configs for shuffle…
10110346 Jan 17, 2019
ede35c8
[SPARK-26622][SQL] Revise SQL Metrics labels
juliuszsompolski Jan 17, 2019
0b3abef
[SPARK-26638][PYSPARK][ML] Pyspark vector classes always return error…
srowen Jan 17, 2019
c2d0d70
[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lg…
srowen Jan 18, 2019
e341864
[SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output o…
rednaxelafx Jan 18, 2019
30d94ff
fix session error
caneGuy Jan 18, 2019
34db5f5
[SPARK-26618][SQL] Make typed Timestamp/Date literals consistent to c…
MaxGekk Jan 18, 2019
8503aa3
[SPARK-26646][TEST][PYSPARK] Fix flaky test: pyspark.mllib.tests.test…
viirya Jan 18, 2019
64cc9e5
[SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe …
kiszk Jan 19, 2019
ace2364
[MINOR][TEST] Correct some unit test mistakes
10110346 Jan 19, 2019
6d9c54b
[SPARK-26645][PYTHON] Support decimals with negative scale when parsi…
mgaido91 Jan 20, 2019
6c18d8d
[SPARK-26642][K8S] Add --num-executors option to spark-submit for Spa…
LucaCanali Jan 20, 2019
9a30e23
[SPARK-26351][MLLIB] Update doc and minor correction in the mllib eva…
shahidki31 Jan 21, 2019
421227a
Fix unit test failure
caneGuy Jan 21, 2019
00d144f
[SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
caneGuy Jan 11, 2019
0b6e954
Refine comment
caneGuy Jan 14, 2019
df5c075
Update
caneGuy Jan 14, 2019
dcaaebf
Update
caneGuy Jan 14, 2019
06b857c
Refine comment
caneGuy Jan 17, 2019
057c46e
fix code style
caneGuy Jan 17, 2019
b0c16d2
Refine comment
caneGuy Jan 17, 2019
121def8
Update
caneGuy Jan 17, 2019
869cd14
fix session error
caneGuy Jan 18, 2019
708b248
Fix unit test failure
caneGuy Jan 21, 2019
bbeffc1
Refine comment
caneGuy Jan 28, 2019
ad4f649
refine
caneGuy Jan 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,13 @@ object StaticSQLConf {
.intConf
.createWithDefault(1000)

val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about BROADCAST_EXCHANGE_MAX_THREAD_THREASHOLD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @maropu Thanks

buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
.doc("The maximum degree of parallelism to fetch and broadcast the table.If we " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra sapce -> table. If

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this style @HyukjinKwon actually i found some other code has the same problem,can i open pr to fix that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's around this code and that's the only the one, yea, let's do that. If there are multiple across this files, let's don't include.

"encounter memory issue when broadcast table we can decrease this number." +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory issue: can you elaborate which memory issue here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this number in order to ...

"Notice the number should be carefully chosen since decrease parallelism will " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decrease -> decreasing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will -> might

"cause longer waiting for other broadcasting.And increase parallelism may " +
Copy link
Member

@HyukjinKwon HyukjinKwon Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broadcasting.And -> broadcasting. Also, increasing

"cause memory problem.")
.intConf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz check .checkValue(thres => thres > 0, ....

.createWithDefault(128)
}
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ import org.apache.spark.sql.catalyst.plans.physical.{BroadcastMode, BroadcastPar
import org.apache.spark.sql.execution.{SparkPlan, SQLExecution}
import org.apache.spark.sql.execution.joins.HashedRelation
import org.apache.spark.sql.execution.metric.SQLMetrics
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
import org.apache.spark.util.{SparkFatalException, ThreadUtils}

/**
Expand Down Expand Up @@ -157,5 +157,6 @@ case class BroadcastExchangeExec(

object BroadcastExchangeExec {
private[execution] val executionContext = ExecutionContext.fromExecutorService(
ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange", 128))
ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange",
SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER)))
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,18 @@

package org.apache.spark.sql.execution

import scala.concurrent.{Future, TimeoutException}
import scala.concurrent.duration._
import scala.util.Random

import org.apache.spark.sql.{Dataset, Row}
import org.apache.spark.sql.{Dataset, Row, SparkSession}
import org.apache.spark.sql.catalyst.expressions.{Alias, Literal}
import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, IdentityBroadcastMode, SinglePartition}
import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, ReusedExchangeExec, ShuffleExchangeExec}
import org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.util.ThreadUtils

class ExchangeSuite extends SparkPlanTest with SharedSQLContext {
import testImplicits._
Expand Down Expand Up @@ -132,4 +135,33 @@ class ExchangeSuite extends SparkPlanTest with SharedSQLContext {
val projection2 = cached.select("_1", "_3").queryExecution.executedPlan
assert(!projection1.sameResult(projection2))
}

test("SPARK-26601: Make broadcast-exchange thread pool configurable") {
val previousNumber = SparkSession.getActiveSession.get.sparkContext.conf
.get(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER)

SparkSession.getActiveSession.get.sparkContext.conf.
set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER, 1)
assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) === 1)

Future {
Thread.sleep(5*1000)
} (BroadcastExchangeExec.executionContext)

val f = Future {} (BroadcastExchangeExec.executionContext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to test Java's thread executors. Can you just check if BroadcastExchangeExec.executionContext .getMaximumPoolSize is as configured?

intercept[TimeoutException] {
ThreadUtils.awaitResult(f, 3 seconds)
}

var executed = false
val ef = Future {
executed = true
} (BroadcastExchangeExec.executionContext)
ThreadUtils.awaitResult(ef, 3 seconds)
assert(executed)

// for other test
SparkSession.getActiveSession.get.sparkContext.conf.
set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER, previousNumber)
}
}