Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1175 commits
Select commit Hold shift + click to select a range
d71be73
[SPARK-32428][EXAMPLES] Make BinaryClassificationMetricsExample cons…
titsuki Jul 26, 2020
4b8761e
[SPARK-32448][K8S][TESTS] Use single version for exec-maven-plugin/sc…
dongjoon-hyun Jul 27, 2020
6ed93c3
[SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
GuoPhilipse Jul 28, 2020
b35b3eb
[MINOR][PYTHON] Fix spacing in error message
hauntsaninja Jul 28, 2020
f349a78
[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing …
yaooqinn Jul 28, 2020
8cfb718
[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
xwu-intel Jul 28, 2020
9f18d54
[SPARK-32283][CORE] Kryo should support multiple user registrators
LantaoJin Jul 29, 2020
e5b5b7e
[SPARK-32175][CORE] Fix the order between initialization for Executor…
sarutak Jul 29, 2020
d00e104
[SPARK-32397][BUILD] Allow specifying of time for build to keep time …
holdenk Jul 29, 2020
235552a
[SPARK-32478][R][SQL] Error message to show the schema mismatch in ga…
HyukjinKwon Jul 30, 2020
b40df01
[SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0
Jul 30, 2020
2a38090
[SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in
sarutak Jul 31, 2020
7c91b15
[SPARK-32332][SQL][3.0] Support columnar exchanges
andygrove Jul 31, 2020
ea4b288
[SPARK-32467][UI] Avoid encoding URL twice on https redirect
gengliangwang Aug 1, 2020
d8d3e87
[SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization
prakharjain09 Aug 3, 2020
64b3b56
[SPARK-32083][SQL][3.0] AQE should not coalesce partitions for Single…
cloud-fan Aug 3, 2020
c148a98
[SPARK-32160][CORE][PYSPARK][3.0] Add a config to switch allow/disall…
ueshin Aug 3, 2020
6d7ae4a
[SPARK-32160][CORE][PYSPARK][3.0][FOLLOWUP] Change the config name to…
ueshin Aug 4, 2020
fd445cb
[SPARK-32003][CORE][3.0] When external shuffle service is used, unreg…
wypoon Aug 4, 2020
ab5034f
[SPARK-32529][CORE] Fix Historyserver log scan aborted by application…
yanxiaole Aug 5, 2020
d3eea05
[SPARK-32546][SQL][3.0] Get table names directly from Hive tables
MaxGekk Aug 6, 2020
30c3a50
[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
huaxingao Aug 6, 2020
c7af0be
[SPARK-32538][CORE][TEST] Use local time zone for the timestamp logge…
sarutak Aug 7, 2020
4d8642f
[SPARK-32560][SQL] Improve exception message at InsertIntoHiveTable.p…
GuoPhilipse Aug 7, 2020
17ce605
[SPARK-32556][INFRA] Fix release script to have urlencoded passwords …
ScrapCodes Aug 7, 2020
cfe62fc
[SPARK-32564][SQL][TEST][3.0] Inject data statistics to simulate plan…
maropu Aug 8, 2020
7caecae
[MINOR][DOCS] Fix typos at ExecutorAllocationManager.scala
kimminw00 Aug 8, 2020
1f1fc8b
[SPARK-32564][SQL][TEST][FOLLOWUP] Re-enable TPCDSQuerySuite with emp…
maropu Aug 8, 2020
9391705
[SPARK-32559][SQL][3.0] Fix the trim logic in UTF8String.toInt/toLong…
WangGuangxin Aug 9, 2020
0f4989c
[SPARK-32393][SQL][TEST] Add tests for all the character types in Pos…
maropu Aug 10, 2020
e4c6ebf
[SPARK-32576][SQL] Support PostgreSQL `bpchar` type and array of char…
kujon Aug 10, 2020
8ff615d
[SPARK-32456][SS] Check the Distinct by assuming it as Aggregate for …
xuanyuanking Aug 10, 2020
eaae91b
[MINOR] add test_createDataFrame_empty_partition in pyspark arrow tests
WeichenXu123 Aug 10, 2020
843ff03
[SPARK-32576][SQL][TEST][FOLLOWUP] Add tests for all the character ar…
maropu Aug 10, 2020
6749ad8
[SPARK-32409][DOC] Document dependency between spark.metrics.staticSo…
LucaCanali Aug 10, 2020
93eb567
[SPARK-32528][SQL][TEST][3.0] The analyze method should make sure the…
cloud-fan Aug 10, 2020
bfe9489
[SPARK-32543][R] Remove arrow::as_tibble usage in SparkR
HyukjinKwon Aug 5, 2020
292bfc3
[SPARK-32586][SQL] Fix NumberFormatException error message when ansi …
wangyum Aug 12, 2020
e7d45f8
[SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
MaxGekk Aug 12, 2020
9a3811d
[SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on b…
tinhto-000 Aug 12, 2020
ecc2997
[SPARK-32599][SQL][TESTS] Check the TEXTFILE file format in `HiveSerD…
MaxGekk Aug 12, 2020
742c4df
[SPARK-32250][SPARK-27510][CORE][TEST] Fix flaky MasterSuite.test(...…
Ngone51 Aug 12, 2020
21c2fa4
[MINOR] Update URL of the parquet project in code comment
izchen Aug 13, 2020
89765f5
[SPARK-32018][SQL][FOLLOWUP][3.0] Throw exception on decimal value ov…
gengliangwang Aug 13, 2020
81d7747
[MINOR][SQL] Fixed approx_count_distinct rsd param description
Aug 14, 2020
05144a5
Preparing Spark release v3.0.1-rc1
zhengruifeng Aug 15, 2020
38ab936
Preparing development version 3.0.2-SNAPSHOT
zhengruifeng Aug 15, 2020
6a88924
[SPARK-32625][SQL] Log error message when falling back to interpreter…
wangyum Aug 15, 2020
c4807ce
[SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitori…
sarutak Aug 16, 2020
ee12374
[3.0][SQL] Revert SPARK-32018
gengliangwang Aug 17, 2020
6cdc32f
[SPARK-32622][SQL][TEST] Add case-sensitivity test for ORC predicate …
viirya Aug 17, 2020
a36514e
[3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurre…
Ngone51 Aug 18, 2020
753d414
[SPARK-32652][SQL] ObjectSerializerPruning fails for RowEncoder
cloud-fan Aug 19, 2020
b3a971a
[SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
HyukjinKwon Aug 18, 2020
6dc7457
[SPARK-32624][SQL] Use getCanonicalName to fix byte[] compile issue
wangyum Aug 19, 2020
56ec5dd
[SPARK-32249][INFRA][3.0] Run Github Actions builds in branch-3.0
HyukjinKwon Aug 19, 2020
2b41bc9
[SPARK-32451][R][3.0] Support Apache Arrow 1.0.0
HyukjinKwon Aug 19, 2020
c4a12f2
[SPARK-28863][SQL] Introduce AlreadyPlanned to prevent reanalysis of …
brkyvz Aug 19, 2020
b87ec5d
[SPARK-32658][CORE] Fix `PartitionWriterStream` partition length over…
jiangxb1987 Aug 20, 2020
87d7ab6
[SPARK-32608][SQL][3.0] Script Transform ROW FORMAT DELIMIT value sho…
AngersZhuuuu Aug 20, 2020
29a10a4
[SPARK-28863][SQL][FOLLOWUP] Do not reuse the physical plan
cloud-fan Aug 20, 2020
8755e3f
[SPARK-32621][SQL][3.0] path' option can cause issues while inferring…
imback82 Aug 20, 2020
2932926
[SPARK-32660][SQL][DOC] Show Avro related API in documentation
gengliangwang Aug 21, 2020
f73e6ca
[SPARK-32663][CORE] Avoid individual closing of pooled TransportClien…
attilapiros Aug 21, 2020
a5f4230
[SPARK-32674][DOC] Add suggestion for parallel directory listing in t…
sunchao Aug 21, 2020
9ccc790
[MINOR][DOCS] backport PR#29443 to fix typo in doc,log messages and c…
brandonJY Aug 22, 2020
87f1d51
[SPARK-32672][SQL] Fix data corruption in boolean bit set compression
revans2 Aug 22, 2020
a6df16b
[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some o…
xuanyuanking Aug 22, 2020
85c9e8c
[SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossVa…
Aug 22, 2020
f5d5422
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Scri…
AngersZhuuuu Aug 23, 2020
f088c28
[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `…
MaxGekk Aug 23, 2020
898211b
[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataS…
mingjialiu Aug 24, 2020
da60de5
[SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued …
huaxingao Aug 24, 2020
8aa644e
[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
Aug 24, 2020
4a67f1e
[SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests
mundaym Aug 24, 2020
007acba
[SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
huaxingao Aug 24, 2020
82aef3e
[MINOR][SQL] Add missing documentation for LongType mapping
yeshengm Aug 25, 2020
6c88d7c
[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate p…
viirya Aug 25, 2020
21ac7e2
[SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for da…
yaooqinn Aug 25, 2020
68ff809
[SPARK-32614][SQL] Don't apply comment processing if 'comment' unset …
srowen Aug 25, 2020
9e937c5
[SPARK-32620][SQL] Reset the numPartitions metric when DPP is enabled
wangyum Aug 26, 2020
24552b6
[SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in Gi…
HyukjinKwon Aug 26, 2020
9e8fb48
[SPARK-32659][SQL] Fix the data issue when pruning DPP on non-atomic …
wangyum Aug 26, 2020
60f4856
[SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.ver…
waleedfateem Aug 27, 2020
2b147c4
Preparing Spark release v3.0.1-rc3
zhengruifeng Aug 28, 2020
ed7be8c
Preparing development version 3.0.2-SNAPSHOT
zhengruifeng Aug 28, 2020
87327a2
[SPARK-28612][SQL][FOLLOWUP] Correct method doc of DataFrameWriterV2.…
HeartSaVioR Aug 28, 2020
0dff99b
[SPARK-32693][SQL][3.0] Compare two dataframes with same schema excep…
viirya Aug 29, 2020
70c477e
[SPARK-31511][SQL] Make BytesToBytesMap iterators thread-safe
hvanhovell Apr 22, 2020
985c593
[SPARK-32747][R][TESTS] Deduplicate configuration set/unset in test_s…
HyukjinKwon Aug 31, 2020
3a516ae
[MINOR][R] Fix a R style in try and finally at DataFrame.R
lu-wang-dl Sep 1, 2020
98a1247
[SPARK-32659][SQL][FOLLOWUP] Improve test for pruning DPP on non-atom…
wangyum Sep 1, 2020
d1c6a00
[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addRefe…
rednaxelafx Sep 1, 2020
88bc08e
[SPARK-32774][BUILD] Don't track docs/.jekyll-cache
sarutak Sep 2, 2020
83ade58
[SPARK-32771][DOCS] The example of expressions.Aggregator in Javadoc …
sarutak Sep 2, 2020
8506728
[SPARK-32776][SS] Limit in streaming should not be optimized away by …
liwensun Sep 2, 2020
5288fb4
[SPARK-32788][SQL] non-partitioned table scan should not have partiti…
cloud-fan Sep 3, 2020
0b5266b
[SPARK-32767][SQL][3.0] Bucket join should work if spark.sql.shuffle.…
wangyum Sep 4, 2020
7320db2
[SPARK-32786][SQL][TEST] Improve performance for some slow DPP tests
wzhfy Sep 4, 2020
0c7616b
[SPARK-32791][SQL] Non-partitioned table metric should not have dynam…
wangyum Sep 5, 2020
c2c7c9e
[SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in …
sandeep-katta Sep 7, 2020
a22e1c5
[SPARK-32810][SQL][3.0] CSV/JSON data sources should avoid globbing p…
MaxGekk Sep 8, 2020
6b47abd
[SPARK-32812][PYTHON][TESTS] Avoid initiating a process during the ma…
itholic Sep 8, 2020
f42b56c
[SPARK-32764][SQL] -0.0 should be equal to 0.0
cloud-fan Sep 8, 2020
3b32ddf
[SPARK-32785][SQL][3.0] Interval with dangling parts should not resul…
yaooqinn Sep 8, 2020
4656ee5
[SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap iterators th…
cxzl25 Sep 8, 2020
9b39e4b
[SPARK-32753][SQL][3.0] Only copy tags to node with no tags
manuzhang Sep 8, 2020
8c0b9cb
[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file p…
MaxGekk Sep 8, 2020
3f20f14
[SPARK-32638][SQL][3.0] Corrects references when adding aliases in Wi…
cloud-fan Sep 8, 2020
e86d90b
[SPARK-32824][CORE] Improve the error message when the user forgets t…
tgravescs Sep 9, 2020
86b9dd9
[SPARK-32823][WEB UI] Fix the master ui resources reporting
tgravescs Sep 9, 2020
4c0f9d8
[SPARK-32813][SQL] Get default config of ParquetSource vectorized rea…
viirya Sep 9, 2020
837843b
[SPARK-32810][SQL][TESTS][FOLLOWUP][3.0] Check path globbing in JSON/…
MaxGekk Sep 9, 2020
e632e7c
[SPARK-32794][SS] Fixed rare corner case error in micro-batch engine …
tdas Sep 9, 2020
5a81f60
[SPARK-32836][SS][TESTS] Fix DataStreamReaderWriterSuite to check wri…
dongjoon-hyun Sep 10, 2020
44acb5a
[SPARK-32832][SS] Use CaseInsensitiveMap for DataStreamReader/Writer …
dongjoon-hyun Sep 10, 2020
5708045
[SPARK-32819][SQL][3.0] ignoreNullability parameter should be effecti…
viirya Sep 10, 2020
4fdd818
[SPARK-32840][SQL][3.0] Invalid interval value can happen to be just …
yaooqinn Sep 11, 2020
cf14897
[SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctio…
cloud-fan Sep 11, 2020
ec45d10
[SPARK-32845][SS][TESTS] Add sinkParameter to check sink options robu…
dongjoon-hyun Sep 11, 2020
2e04689
[SPARK-32779][SQL][FOLLOW-UP] Delete Unused code
sandeep-katta Sep 12, 2020
d4d2f5c
[SPARK-32865][DOC] python section in quickstart page doesn't display …
bowenli86 Sep 13, 2020
828603d
[SPARK-32876][SQL] Change default fallback versions to 3.0.1 and 2.4.…
HyukjinKwon Sep 14, 2020
990d49a
[SPARK-32872][CORE] Prevent BytesToBytesMap at MAX_CAPACITY from exce…
ankurdave Sep 14, 2020
fe6ff15
[SPARK-32715][CORE] Fix memory leak when failed to store pieces of br…
LantaoJin Sep 15, 2020
cb6a0d0
[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for f…
tanelk Sep 16, 2020
75a225e
[SPARK-32888][DOCS] Add user document about header flag and RDD as pa…
viirya Sep 16, 2020
aa9563e
[SPARK-32897][PYTHON] Don't show a deprecation warning at SparkSessio…
HyukjinKwon Sep 16, 2020
2e94d9a
[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there ar…
tomvanbussel Sep 17, 2020
b3b6f38
[SPARK-32887][DOC] Correct the typo for SHOW TABLE
Udbhav30 Sep 17, 2020
17a5195
[SPARK-32738][CORE][3.0] Should reduce the number of active threads i…
wzhfy Sep 17, 2020
ecc2f5d
[SPARK-32635][SQL] Fix foldable propagation
peter-toth Sep 17, 2020
5581a92
[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
MaxGekk Sep 18, 2020
2d55de5
[SPARK-32906][SQL] Struct field names should not change after normali…
maropu Sep 18, 2020
ffcd757
[SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDe…
yaooqinn Sep 18, 2020
20cd7bb
[SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun Sep 18, 2020
7746c20
[SPARK-32635][SQL][FOLLOW-UP] Add a new test case in catalyst module
peter-toth Sep 18, 2020
03fb144
[SPARK-32898][CORE] Fix wrong executorRunTime when task killed before…
Ngone51 Sep 18, 2020
0a4b668
[SPARK-32886][WEBUI] fix 'undefined' link in event timeline view
zhli1142015 Sep 21, 2020
b27bbbb
[SPARK-32718][SQL][3.0] Remove unnecessary keywords for interval units
cloud-fan Sep 21, 2020
8a481d8
[SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in I…
cloud-fan Sep 22, 2020
58124bd
[MINOR][SQL][3.0] Improve examples for `percentile_approx()`
MaxGekk Sep 23, 2020
542dc97
[SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_appro…
MaxGekk Sep 23, 2020
21b6b69
[SPARK-32977][SQL][DOCS] Fix JavaDoc on Default Save Mode
RussellSpitzer Sep 24, 2020
4b84e57
[SPARK-32877][SQL][TEST] Add test for Hive UDF complex decimal type
ulysses-you Sep 25, 2020
4425c3a
[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed…
rednaxelafx Sep 26, 2020
424f16e
[SPARK-33015][SQL] Compute the current date only once
MaxGekk Sep 29, 2020
118de10
[MINOR][DOCS] Document when `current_date` and `current_timestamp` ar…
MaxGekk Sep 29, 2020
97d8634
[SPARK-33021][PYTHON][TESTS] Move functions related test cases into t…
HyukjinKwon Sep 29, 2020
2160dc5
[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCu…
MaxGekk Sep 29, 2020
d3cc564
[SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExter…
tomvanbussel Sep 29, 2020
39bfae2
[MINOR][DOCS] Fixing log message for better clarity
akshatb1 Sep 29, 2020
ae8b35a
[SPARK-33018][SQL] Fix estimate statistics issue if child has 0 bytes
wangyum Sep 29, 2020
f3b80f8
[SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.al…
dongjoon-hyun Sep 29, 2020
db6ba04
[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
GuoPhilipse Sep 30, 2020
bc29602
[SQL][DOC][MINOR] Corrects input table names in the examples of CREAT…
iRakson Oct 1, 2020
41e1919
[SPARK-32996][WEB-UI][3.0] Handle empty ExecutorMetrics in ExecutorMe…
shrutig Oct 2, 2020
31684d6
[SPARK-33051][INFRA][R] Uses setup-r to install R in GitHub Actions b…
HyukjinKwon Oct 2, 2020
c9b6271
[SPARK-33043][ML] Handle spark.driver.maxResultSize=0 in RowMatrix he…
srowen Oct 3, 2020
75003fc
[SPARK-33065][TESTS] Expand the stack size of a thread in a test in L…
sarutak Oct 4, 2020
46a62ca
[SPARK-33069][INFRA] Skip test result report if no JUnit XML files ar…
HyukjinKwon Oct 6, 2020
4f71231
[SPARK-33073][PYTHON] Improve error handling on Pandas to Arrow conve…
BryanCutler Oct 6, 2020
d51b8d6
[SPARK-27428][CORE][TEST] Increase receive buffer size used in Statsd…
mundaym Oct 6, 2020
2076abc
Revert "[SPARK-33073][PYTHON] Improve error handling on Pandas to Arr…
HyukjinKwon Oct 7, 2020
23207fc
[SPARK-33035][SQL][3.0] Updates the obsoleted entries of attribute ma…
maropu Oct 7, 2020
7981f67
[SPARK-33073][PYTHON][3.0] Improve error handling on Pandas to Arrow …
BryanCutler Oct 7, 2020
45475af
[SPARK-32067][K8S] Use unique ConfigMap name for executor pod template
stijndehaes Oct 7, 2020
a7e4318
[SPARK-33089][SQL] make avro format propagate Hadoop config from DS o…
yuningzh-db Oct 8, 2020
782ab8e
[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potent…
HyukjinKwon Oct 8, 2020
c1b660e
[SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedE…
dongjoon-hyun Oct 8, 2020
dcffa56
[SPARK-33101][ML][3.0] Make LibSVM format propagate Hadoop config fro…
MaxGekk Oct 9, 2020
9892b3e
[SPARK-33094][SQL][3.0] Make ORC format propagate Hadoop config from …
MaxGekk Oct 9, 2020
0601fc7
[SPARK-33118][SQL] CREATE TEMPORARY TABLE fails with location
pablolanga-stratio Oct 12, 2020
9430ae6
[SPARK-33115][BUILD][DOCS] Fix javadoc errors in `kvstore` and `unsaf…
gemelen Oct 13, 2020
205b65e
[SPARK-33134][SQL][3.0] Return partial results only for root JSON obj…
MaxGekk Oct 14, 2020
2ebea13
[SPARK-33136][SQL] Fix mistakenly swapped parameter in V2WriteCommand…
HeartSaVioR Oct 14, 2020
d9669bd
[SPARK-33146][CORE] Check for non-fatal errors when loading new appli…
Oct 15, 2020
0b7b811
[SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in HiveExternalCatalogVers…
dongjoon-hyun Oct 15, 2020
e40c147
Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading n…
dongjoon-hyun Oct 15, 2020
d0f1120
[SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.le…
MaxGekk Oct 16, 2020
160f458
[SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) fro…
maropu Oct 16, 2020
37d6b3c
[SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct …
linhongliu-db Oct 16, 2020
698ac6a
[SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead
HyukjinKwon Oct 16, 2020
b66bd79
[SPARK-33171][INFRA] Mark ParquetV*FilterSuite/ParquetV*SchemaPruning…
dongjoon-hyun Oct 16, 2020
1bec8a3
[SPARK-32436][CORE] Initialize numNonEmptyBlocks in HighlyCompressedM…
dongjoon-hyun Jul 25, 2020
fab10f0
[SPARK-33131][SQL][3.0] Fix grouping sets with having clause can not …
ulysses-you Oct 17, 2020
56a60ca
[SPARK-33170][SQL] Add SQL config to control fast-fail behavior in Fi…
viirya Oct 18, 2020
7e65b12
[MINOR][DOCS][EXAMPLE] Fix the Python manual_load_options_csv example
kjmrknsn Oct 18, 2020
05fbbb1
[SPARK-33176][K8S] Use 11-jre-slim as default in K8s Dockerfile
dongjoon-hyun Oct 18, 2020
0bff1f6
[SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build
williamhyun Oct 19, 2020
15ed312
[SPARK-32557][CORE] Logging and swallowing the exception per entry in…
yanxiaole Aug 9, 2020
02f80cf
Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when l…
HeartSaVioR Oct 15, 2020
b1d5a08
Revert "[SPARK-33069][INFRA] Skip test result report if no JUnit XML …
HyukjinKwon Oct 19, 2020
c3af7c6
[SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQ…
liaoaoyuan97 Oct 20, 2020
3b5b533
[SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow version in Git…
HyukjinKwon Oct 20, 2020
4373c71
[MINOR][DOCS] Fix the description about to_avro and from_avro functions
kjmrknsn Oct 20, 2020
5e33155
[SPARK-33189][PYTHON][TESTS] Add env var to tests for legacy nested t…
BryanCutler Oct 21, 2020
a36b3c4
[SPARK-32785][SQL][DOCS][FOLLOWUP][3.0] Update migration guide for in…
yaooqinn Oct 21, 2020
e31fe6c
[SPARK-33189][FOLLOWUP][3.0] Fix syntax error in python/run-tests.py
dongjoon-hyun Oct 22, 2020
933dc6c
[SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions
HyukjinKwon Oct 15, 2020
f7c7f4f
[SPARK-30821][K8S] Handle executor failure with multiple containers
huskysun Oct 24, 2020
80716d1
[SPARK-33228][SQL] Don't uncache data when replacing a view having th…
maropu Oct 25, 2020
590ccb3
[SPARK-33197][SQL] Make changes to spark.sql.analyzer.maxIterations t…
yuningzh-db Oct 26, 2020
22392be
[SPARK-33230][SQL] Hadoop committers to get unique job ID in "spark.s…
steveloughran Oct 26, 2020
c95d925
[SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder…
ankurdave Oct 27, 2020
e37859a
[SPARK-33246][SQL][DOCS] Correct documentation for null semantics of …
Oct 27, 2020
737a850
[SPARK-32090][SQL] Improve UserDefinedType.equal() to make it be symm…
Ngone51 Jun 29, 2020
ba2a113
[SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL …
maropu Oct 28, 2020
f6c72e6
[SPARK-33208][SQL] Update the document of SparkSession#sql
waitinfuture Oct 28, 2020
3ce335d
[SPARK-33267][SQL] Fix NPE issue on 'In' filter when one of values co…
HeartSaVioR Oct 28, 2020
f5dc06e
[SPARK-32119][CORE][3.0] ExecutorPlugin doesn't work with Standalone …
sarutak Oct 28, 2020
f03bca8
[SQL][MINOR] Update from_unixtime doc
Obbay2 Oct 29, 2020
563a678
[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representati…
dongjoon-hyun Oct 30, 2020
8f57603
[SPARK-33268][SQL][PYTHON][3.0] Fix bugs for casting data from/to Pyt…
maropu Oct 30, 2020
83f259f
[SPARK-33183][SQL][3.0] Fix Optimizer rule EliminateSorts and add a p…
allisonwang-db Oct 30, 2020
fc10531
[SPARK-33290][SQL] REFRESH TABLE should invalidate cache even though …
sunchao Oct 31, 2020
49e9575
[SPARK-33306][SQL] Timezone is needed when cast date to string
WangGuangxin Oct 31, 2020
92ba08d
[SPARK-33277][PYSPARK][SQL][3.0] Use ContextAwareIterator to stop con…
ueshin Nov 2, 2020
131179a
[SPARK-33313][TESTS][R][3.0][2.4] Add testthat 3.x support
HyukjinKwon Nov 2, 2020
71ef48e
[SPARK-33156][INFRA][3.0] Upgrade GithubAction image from 18.04 to 20.04
dongjoon-hyun Nov 3, 2020
d99ff20
[SPARK-24266][K8S][3.0] Restart the watcher when we receive a version…
stijndehaes Nov 3, 2020
55105a0
[SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to s…
echohlne Nov 3, 2020
5dd36f3
[SPARK-33333][BUILD][3.0] Upgrade Jetty to 9.4.28.v20200408
dongjoon-hyun Nov 4, 2020
e7a6211
[SPARK-33338][SQL] GROUP BY using literal map should not fail
dongjoon-hyun Nov 4, 2020
b43572e
[SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpar…
dongjoon-hyun Nov 5, 2020
14eb8b1
[SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR…
dongjoon-hyun Nov 5, 2020
74d8eac
Revert "[SPARK-33277][PYSPARK][SQL][3.0] Use ContextAwareIterator to …
HyukjinKwon Nov 5, 2020
c43231c
[MINOR][SS][DOCS] Update join type in stream static joins code examples
sarveshdave1 Nov 5, 2020
6da60bf
[SPARK-33362][SQL] skipSchemaResolution should still require query to…
cloud-fan Nov 5, 2020
3223e3e
[SPARK-32860][DOCS][SQL] Updating documentation about map support in …
Nov 8, 2020
808dd8f
[SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9
HyukjinKwon Nov 9, 2020
c157fa3
[SPARK-33372][SQL] Fix InSet bucket pruning
wangyum Nov 9, 2020
a418495
[SPARK-33397][YARN][DOC] Fix generating md to html for available-patt…
yaooqinn Nov 10, 2020
1aa8f4f
[SPARK-33405][BUILD][3.0] Upgrade commons-compress to 1.20
dongjoon-hyun Nov 10, 2020
b905d65
[SPARK-33391][SQL] element_at with CreateArray not respect one based …
leanken-zz Nov 10, 2020
4a1c143
[SPARK-33339][PYTHON] Pyspark application will hang due to non Except…
Nov 10, 2020
577dbb9
[SPARK-33417][SQL][TEST] Correct the behaviour of query filters in TP…
maropu Nov 11, 2020
1e2984b
[SPARK-33412][SQL][3.0] OverwriteByExpression should resolve its dele…
cloud-fan Nov 11, 2020
3edec10
[SPARK-33402][CORE] Jobs launched in same second have duplicate MapRe…
steveloughran Nov 11, 2020
00be83a
[SPARK-33404][SQL][3.0] Fix incorrect results in `date_trunc` expression
utkarsh39 Nov 11, 2020
2eadedc
[SPARK-33408][K8S][R][3.0] Use R 3.6.3 in K8s R image
dongjoon-hyun Nov 12, 2020
5ee76e6
[MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only
yaooqinn Nov 12, 2020
e684720
[SPARK-33435][SQL][3.0] DSv2: REFRESH TABLE should invalidate caches …
sunchao Nov 13, 2020
921daa8
[SPARK-33439][INFRA] Use SERIAL_SBT_TESTS=1 for SQL modules
dongjoon-hyun Nov 13, 2020
45bdb58
[SPARK-33358][SQL] Return code when command process failed
artiship Nov 16, 2020
265363d
[SPARK-33451][DOCS] Change to 'spark.sql.adaptive.skewJoin.skewedPart…
Southwest16 Nov 16, 2020
26c0404
[MINOR][GRAPHX][3.0] Correct typos in the sub-modules: graphx, extern…
jsoref Nov 17, 2020
c301d9c
[SPARK-33464][INFRA][3.0] Add/remove (un)necessary cache and restruct…
HyukjinKwon Nov 19, 2020
1101938
[SPARK-27421][SQL] Fix filter for int column and value class java.lan…
wangyum Nov 19, 2020
8ac37df
[SPARK-33483][INFRA][TESTS][3.0] Fix rat exclusion patterns and add a…
dongjoon-hyun Nov 19, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
378 changes: 378 additions & 0 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
name: Build and test

on:
push:
branches:
- branch-3.0
pull_request:
branches:
- branch-3.0

jobs:
# Build: build Spark and run the tests for specified modules.
build:
name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }} (JDK ${{ matrix.java }}, ${{ matrix.hadoop }}, ${{ matrix.hive }})"
# Ubuntu 20.04 is the latest LTS. The next LTS is 22.04.
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
java:
- 8
hadoop:
- hadoop2.7
hive:
- hive2.3
# TODO(SPARK-32246): We don't test 'streaming-kinesis-asl' for now.
# Kinesis tests depends on external Amazon kinesis service.
# Note that the modules below are from sparktestsupport/modules.py.
modules:
- >-
core, unsafe, kvstore, avro,
network-common, network-shuffle, repl, launcher,
examples, sketch, graphx
- >-
catalyst, hive-thriftserver
- >-
streaming, sql-kafka-0-10, streaming-kafka-0-10,
mllib-local, mllib,
yarn, mesos, kubernetes, hadoop-cloud, spark-ganglia-lgpl
# Here, we split Hive and SQL tests into some of slow ones and the rest of them.
included-tags: [""]
# Some tests are disabled in GitHun Actions. Ideally, we should remove this tag
# and run all tests.
excluded-tags: ["org.apache.spark.tags.GitHubActionsUnstableTest"]
comment: [""]
include:
# Hive tests
- modules: hive
java: 8
hadoop: hadoop2.7
hive: hive2.3
included-tags: org.apache.spark.tags.SlowHiveTest
comment: "- slow tests"
- modules: hive
java: 8
hadoop: hadoop2.7
hive: hive2.3
excluded-tags: org.apache.spark.tags.SlowHiveTest,org.apache.spark.tags.GitHubActionsUnstableTest
comment: "- other tests"
# SQL tests
- modules: sql
java: 8
hadoop: hadoop2.7
hive: hive2.3
included-tags: org.apache.spark.tags.ExtendedSQLTest
comment: "- slow tests"
- modules: sql
java: 8
hadoop: hadoop2.7
hive: hive2.3
excluded-tags: org.apache.spark.tags.ExtendedSQLTest,org.apache.spark.tags.GitHubActionsUnstableTest
comment: "- other tests"
env:
MODULES_TO_TEST: ${{ matrix.modules }}
EXCLUDED_TAGS: ${{ matrix.excluded-tags }}
INCLUDED_TAGS: ${{ matrix.included-tags }}
HADOOP_PROFILE: ${{ matrix.hadoop }}
HIVE_PROFILE: ${{ matrix.hive }}
# GitHub Actions' default miniconda to use in pip packaging test.
CONDA_PREFIX: /usr/share/miniconda
GITHUB_PREV_SHA: ${{ github.event.before }}
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
# In order to fetch changed files
with:
fetch-depth: 0
# Cache local repositories. Note that GitHub Actions cache has a 2G limit.
- name: Cache Scala, SBT, Maven and Zinc
uses: actions/cache@v2
with:
path: |
build/apache-maven-*
build/zinc-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Ivy local repository
uses: actions/cache@v2
with:
path: ~/.ivy2/cache
key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
${{ matrix.java }}-${{ matrix.hadoop }}-ivy-
- name: Install Java ${{ matrix.java }}
uses: actions/setup-java@v1
with:
java-version: ${{ matrix.java }}
- name: Install Python 3.8
uses: actions/setup-python@v2
# We should install one Python that is higher then 3+ for SQL and Yarn because:
# - SQL component also has Python related tests, for example, IntegratedUDFTestUtils.
# - Yarn has a Python specific test too, for example, YarnClusterSuite.
if: contains(matrix.modules, 'yarn') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
with:
python-version: 3.8
architecture: x64
- name: Install Python packages (Python 3.8)
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
run: |
python3.8 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
python3.8 -m pip list
# Run the tests.
- name: Run tests
run: |
# Hive and SQL tests become flaky when running in parallel as it's too intensive.
if [[ "$MODULES_TO_TEST" == "hive" ]] || [[ "$MODULES_TO_TEST" == "sql" ]]; then export SERIAL_SBT_TESTS=1; fi
./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS"
- name: Upload test results to report
if: always()
uses: actions/upload-artifact@v2
with:
name: test-results-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{ matrix.hadoop }}-${{ matrix.hive }}
path: "**/target/test-reports/*.xml"
- name: Upload unit tests log files
if: failure()
uses: actions/upload-artifact@v2
with:
name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{ matrix.hadoop }}-${{ matrix.hive }}
path: "**/target/unit-tests.log"

pyspark:
name: "Build modules: ${{ matrix.modules }}"
runs-on: ubuntu-20.04
container:
image: dongjoon/apache-spark-github-action-image:20201025
strategy:
fail-fast: false
matrix:
modules:
- >-
pyspark-sql, pyspark-mllib
- >-
pyspark-core, pyspark-streaming, pyspark-ml
env:
MODULES_TO_TEST: ${{ matrix.modules }}
HADOOP_PROFILE: hadoop2.7
HIVE_PROFILE: hive2.3
# GitHub Actions' default miniconda to use in pip packaging test.
CONDA_PREFIX: /usr/share/miniconda
GITHUB_PREV_SHA: ${{ github.event.before }}
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
# In order to fetch changed files
with:
fetch-depth: 0
# Cache local repositories. Note that GitHub Actions cache has a 2G limit.
- name: Cache Scala, SBT, Maven and Zinc
uses: actions/cache@v2
with:
path: |
build/apache-maven-*
build/zinc-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Ivy local repository
uses: actions/cache@v2
with:
path: ~/.ivy2/cache
key: pyspark-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
pyspark-ivy-
- name: Install Python 2.7
uses: actions/setup-python@v2
with:
python-version: 2.7
architecture: x64
- name: Install Python packages (Python 2.7 )
run: |
python2.7 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
python2.7 -m pip list
# Run the tests.
- name: Run tests
run: |
./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST"
- name: Upload test results to report
if: always()
uses: actions/upload-artifact@v2
with:
name: test-results-${{ matrix.modules }}--8-hadoop2.7-hive2.3
path: "**/target/test-reports/*.xml"
- name: Upload unit tests log files
if: failure()
uses: actions/upload-artifact@v2
with:
name: unit-tests-log-${{ matrix.modules }}--8-hadoop2.7-hive2.3
path: "**/target/unit-tests.log"

sparkr:
name: "Build modules: sparkr"
runs-on: ubuntu-20.04
container:
image: dongjoon/apache-spark-github-action-image:20201025
env:
HADOOP_PROFILE: hadoop2.7
HIVE_PROFILE: hive2.3
GITHUB_PREV_SHA: ${{ github.event.before }}
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
# In order to fetch changed files
with:
fetch-depth: 0
# Cache local repositories. Note that GitHub Actions cache has a 2G limit.
- name: Cache Scala, SBT, Maven and Zinc
uses: actions/cache@v2
with:
path: |
build/apache-maven-*
build/zinc-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Ivy local repository
uses: actions/cache@v2
with:
path: ~/.ivy2/cache
key: sparkr-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
sparkr-ivy-
- name: Run tests
run: |
# The followings are also used by `r-lib/actions/setup-r` to avoid
# R issues at docker environment
export TZ=UTC
export _R_CHECK_SYSTEM_CLOCK_=FALSE
./dev/run-tests --parallelism 2 --modules sparkr
- name: Upload test results to report
if: always()
uses: actions/upload-artifact@v2
with:
name: test-results-sparkr--8-hadoop2.7-hive2.3
path: "**/target/test-reports/*.xml"
- name: Upload unit tests log files
if: failure()
uses: actions/upload-artifact@v2
with:
name: unit-tests-log-sparkr--8-hadoop2.7-hive2.3
path: "**/target/unit-tests.log"

# Static analysis, and documentation build
lint:
name: Linters, licenses, dependencies and documentation generation
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
# Cache local repositories. Note that GitHub Actions cache has a 2G limit.
- name: Cache Scala, SBT, Maven and Zinc
uses: actions/cache@v2
with:
path: |
build/apache-maven-*
build/zinc-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Ivy local repository
uses: actions/cache@v2
with:
path: ~/.ivy2/cache
key: docs-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
docs-ivy-
- name: Cache Maven local repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: docs-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
docs-maven-
- name: Install Java 8
uses: actions/setup-java@v1
with:
java-version: 8
- name: Install Python 3.6
uses: actions/setup-python@v2
with:
python-version: 3.6
architecture: x64
- name: Install Python linter dependencies
run: |
pip3 install flake8 sphinx numpy
- name: Install R 4.0
uses: r-lib/actions/setup-r@v1
with:
r-version: 4.0
- name: Install R linter dependencies and SparkR
run: |
sudo apt-get install -y libcurl4-openssl-dev
sudo Rscript -e "install.packages(c('devtools'), repos='https://cloud.r-project.org/')"
sudo Rscript -e "devtools::install_github('jimhester/[email protected]')"
./R/install-dev.sh
- name: Install Ruby 2.7 for documentation generation
uses: actions/setup-ruby@v1
with:
ruby-version: 2.7
- name: Install dependencies for documentation generation
run: |
sudo apt-get install -y libcurl4-openssl-dev pandoc
pip install sphinx mkdocs numpy
gem install jekyll jekyll-redirect-from rouge
sudo Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
- name: Scala linter
run: ./dev/lint-scala
- name: Java linter
run: ./dev/lint-java
- name: Python linter
run: ./dev/lint-python
- name: R linter
run: ./dev/lint-r
- name: License test
run: ./dev/check-license
- name: Dependencies test
run: ./dev/test-dependencies.sh
- name: Run documentation build
run: |
cd docs
jekyll build

java-11:
name: Java 11 build with Maven
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
- name: Cache Maven local repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: java11-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
java11-maven-
- name: Install Java 11
uses: actions/setup-java@v1
with:
java-version: 11
- name: Build with Maven
run: |
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
# It uses Maven's 'install' intentionally, see https://github.com/apache/spark/pull/26414.
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=11 install
rm -rf ~/.m2/repository/org/apache/spark
Loading