Skip to content

Commit 85e0744

Browse files
committed
Merge remote-tracking branch 'apache/master' into SPARK-16406
2 parents a1e5312 + 0dd97f6 commit 85e0744

4,101 files changed

Lines changed: 383276 additions & 116854 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/PULL_REQUEST_TEMPLATE

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@
22

33
(Please fill in changes proposed in this fix)
44

5-
65
## How was this patch tested?
76

87
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
9-
10-
118
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
129

10+
Please review http://spark.apache.org/contributing.html before opening a pull request.

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
R-unit-tests.log
2525
R/unit-tests.out
2626
R/cran-check.out
27+
R/pkg/vignettes/sparkr-vignettes.html
28+
R/pkg/tests/fulltests/Rplots.pdf
2729
build/*.jar
2830
build/apache-maven*
2931
build/scala*
@@ -41,9 +43,12 @@ dependency-reduced-pom.xml
4143
derby.log
4244
dev/create-release/*final
4345
dev/create-release/*txt
46+
dev/pr-deps/
4447
dist/
4548
docs/_site
4649
docs/api
50+
sql/docs
51+
sql/site
4752
lib_managed/
4853
lint-r-report.log
4954
log/
@@ -56,6 +61,10 @@ project/plugins/project/build.properties
5661
project/plugins/src_managed/
5762
project/plugins/target/
5863
python/lib/pyspark.zip
64+
python/deps
65+
python/test_coverage/coverage_data
66+
python/test_coverage/htmlcov
67+
python/pyspark/python
5968
reports/
6069
scalastyle-on-compile.generated.xml
6170
scalastyle-output.xml
@@ -67,6 +76,7 @@ streaming-tests.log
6776
target/
6877
unit-tests.log
6978
work/
79+
docs/.jekyll-metadata
7080

7181
# For Hive
7282
TempStatsStore/

.travis.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ dist: trusty
2828
# 2. Choose language and target JDKs for parallel builds.
2929
language: java
3030
jdk:
31-
- oraclejdk7
3231
- oraclejdk8
3332

3433
# 3. Setup cache directory for SBT and Maven.
@@ -44,7 +43,7 @@ notifications:
4443
# 5. Run maven install before running lint-java.
4544
install:
4645
- export MAVEN_SKIP_RC=1
47-
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
46+
- build/mvn -T 4 -q -DskipTests -Pkubernetes -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver install
4847

4948
# 6. Run lint-java.
5049
script:

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Contributing to Spark
22

33
*Before opening a pull request*, review the
4-
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
4+
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
55
It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
9+
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -249,11 +249,11 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
249249
(Interpreter classes (all .scala files in repl/src/main/scala
250250
except for Main.Scala, SparkHelper.scala and ExecutorClassLoader.scala),
251251
and for SerializableMapWrapper in JavaUtils.scala)
252-
(BSD-like) Scala Actors library (org.scala-lang:scala-actors:2.11.7 - http://www.scala-lang.org/)
253-
(BSD-like) Scala Compiler (org.scala-lang:scala-compiler:2.11.7 - http://www.scala-lang.org/)
254-
(BSD-like) Scala Compiler (org.scala-lang:scala-reflect:2.11.7 - http://www.scala-lang.org/)
255-
(BSD-like) Scala Library (org.scala-lang:scala-library:2.11.7 - http://www.scala-lang.org/)
256-
(BSD-like) Scalap (org.scala-lang:scalap:2.11.7 - http://www.scala-lang.org/)
252+
(BSD-like) Scala Actors library (org.scala-lang:scala-actors:2.11.8 - http://www.scala-lang.org/)
253+
(BSD-like) Scala Compiler (org.scala-lang:scala-compiler:2.11.8 - http://www.scala-lang.org/)
254+
(BSD-like) Scala Compiler (org.scala-lang:scala-reflect:2.11.8 - http://www.scala-lang.org/)
255+
(BSD-like) Scala Library (org.scala-lang:scala-library:2.11.8 - http://www.scala-lang.org/)
256+
(BSD-like) Scalap (org.scala-lang:scalap:2.11.8 - http://www.scala-lang.org/)
257257
(BSD-style) scalacheck (org.scalacheck:scalacheck_2.11:1.10.0 - http://www.scalacheck.org)
258258
(BSD-style) spire (org.spire-math:spire_2.11:0.7.1 - http://spire-math.org)
259259
(BSD-style) spire-macros (org.spire-math:spire-macros_2.11:0.7.1 - http://spire-math.org)
@@ -263,12 +263,14 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
265265
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
266-
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
266+
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.6 - http://py4j.sourceforge.net/)
267267
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
268268
(BSD licence) sbt and sbt-launch-lib.bash
269269
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
270270
(BSD 3 Clause) DPark (https://github.com/douban/dpark/blob/master/LICENSE)
271271
(BSD 3 Clause) CloudPickle (https://github.com/cloudpipe/cloudpickle/blob/master/LICENSE)
272+
(BSD 2 Clause) Zstd-jni (https://github.com/luben/zstd-jni/blob/master/LICENSE)
273+
(BSD license) Zstd (https://github.com/facebook/zstd/blob/v1.3.1/LICENSE)
272274

273275
========================================================================
274276
MIT licenses
@@ -297,3 +299,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
297299
(MIT License) RowsGroup (http://datatables.net/license/mit)
298300
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
299301
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
302+
(MIT License) machinist (https://github.com/typelevel/machinist)

NOTICE

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
421421
This product includes/uses ASM (http://asm.ow2.org/),
422422
Copyright (c) 2000-2007 INRIA, France Telecom.
423423

424-
This product includes/uses org.json (http://www.json.org/java/index.html),
425-
Copyright (c) 2002 JSON.org
426-
427424
This product includes/uses JLine (http://jline.sourceforge.net/),
428425
Copyright (c) 2002-2006, Marc Prud'hommeaux <mwp1@cornell.edu>.
429426

@@ -451,6 +448,12 @@ Copyright (C) 2011 Google Inc.
451448
Apache Commons Pool
452449
Copyright 1999-2009 The Apache Software Foundation
453450

451+
This product includes/uses Kubernetes & OpenShift 3 Java Client (https://github.com/fabric8io/kubernetes-client)
452+
Copyright (C) 2015 Red Hat, Inc.
453+
454+
This product includes/uses OkHttp (https://github.com/square/okhttp)
455+
Copyright (C) 2012 The Android Open Source Project
456+
454457
=========================================================================
455458
== NOTICE file corresponding to section 4(d) of the Apache License, ==
456459
== Version 2.0, in this case for the DataNucleus distribution. ==

R/CRAN_RELEASE.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# SparkR CRAN Release
2+
3+
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
4+
`dev@spark.apache.org` community and R package maintainer on this.
5+
6+
### Release
7+
8+
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
9+
10+
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
11+
12+
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
13+
14+
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
15+
16+
```R
17+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
18+
```
19+
20+
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
21+
22+
### Testing: build package manually
23+
24+
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
25+
26+
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
27+
28+
#### Build source package
29+
30+
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
31+
32+
```R
33+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
34+
```
35+
36+
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
37+
38+
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
39+
40+
For example, this should be the content of the source package:
41+
42+
```sh
43+
DESCRIPTION R inst tests
44+
NAMESPACE build man vignettes
45+
46+
inst/doc/
47+
sparkr-vignettes.html
48+
sparkr-vignettes.Rmd
49+
sparkr-vignettes.Rman
50+
51+
build/
52+
vignette.rds
53+
54+
man/
55+
*.Rd files...
56+
57+
vignettes/
58+
sparkr-vignettes.Rmd
59+
```
60+
61+
#### Test source package
62+
63+
To install, run this:
64+
65+
```sh
66+
R CMD INSTALL SparkR_2.1.0.tar.gz
67+
```
68+
69+
With "2.1.0" replaced with the version of SparkR.
70+
71+
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
72+
73+
```R
74+
library(SparkR)
75+
vignette("sparkr-vignettes", package="SparkR")
76+
```
77+
78+
#### Build binary package
79+
80+
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
81+
82+
```R
83+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
84+
```
85+
86+
For example, this should be the content of the binary package:
87+
88+
```sh
89+
DESCRIPTION Meta R html tests
90+
INDEX NAMESPACE help profile worker
91+
```

R/README.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R
66

77
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
88
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
9-
Example:
9+
Example:
1010
```bash
1111
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1212
export R_HOME=/home/username/R
@@ -46,31 +46,27 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
4646
# This line loads SparkR from the installed directory
4747
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
4848
library(SparkR)
49-
sc <- sparkR.init(master="local")
49+
sparkR.session()
5050
```
5151

5252
#### Making changes to SparkR
5353

54-
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
54+
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
5555
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
5656
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
57-
57+
5858
#### Generating documentation
5959

6060
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
61-
61+
6262
### Examples, Unit tests
6363

6464
SparkR comes with several sample programs in the `examples/src/main/r` directory.
6565
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
6666
```bash
6767
./bin/spark-submit examples/src/main/r/dataframe.R
6868
```
69-
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
70-
```bash
71-
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
72-
./R/run-tests.sh
73-
```
69+
You can run R unit tests by following the instructions under [Running R Tests](http://spark.apache.org/docs/latest/building-spark.html#running-r-tests).
7470

7571
### Running on YARN
7672

R/WINDOWS.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ To build SparkR on Windows, the following steps are required
66
include Rtools and R in `PATH`.
77

88
2. Install
9-
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
9+
[JDK8](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) and set
1010
`JAVA_HOME` in the system environment variables.
1111

1212
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
@@ -34,10 +34,9 @@ To run the SparkR unit tests on Windows, the following steps are required —ass
3434

3535
4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.
3636

37-
5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
37+
5. Run unit tests for SparkR by running the command below. You need to install the needed packages following the instructions under [Running R Tests](http://spark.apache.org/docs/latest/building-spark.html#running-r-tests) first:
3838

3939
```
40-
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
41-
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
40+
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R
4241
```
4342

R/check-cran.sh

Lines changed: 32 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -20,30 +20,36 @@
2020
set -o pipefail
2121
set -e
2222

23-
FWDIR="$(cd `dirname $0`; pwd)"
24-
pushd $FWDIR > /dev/null
23+
FWDIR="$(cd "`dirname "${BASH_SOURCE[0]}"`"; pwd)"
24+
pushd "$FWDIR" > /dev/null
2525

26-
if [ ! -z "$R_HOME" ]
27-
then
28-
R_SCRIPT_PATH="$R_HOME/bin"
29-
else
30-
# if system wide R_HOME is not found, then exit
31-
if [ ! `command -v R` ]; then
32-
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
33-
exit 1
34-
fi
35-
R_SCRIPT_PATH="$(dirname $(which R))"
26+
. "$FWDIR/find-r.sh"
27+
28+
# Install the package (this is required for code in vignettes to run when building it later)
29+
# Build the latest docs, but not vignettes, which is built with the package next
30+
. "$FWDIR/install-dev.sh"
31+
32+
# Build source package with vignettes
33+
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
34+
. "${SPARK_HOME}/bin/load-spark-env.sh"
35+
if [ -f "${SPARK_HOME}/RELEASE" ]; then
36+
SPARK_JARS_DIR="${SPARK_HOME}/jars"
37+
else
38+
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
3639
fi
37-
echo "USING R_HOME = $R_HOME"
3840

39-
# Build the latest docs
40-
$FWDIR/create-docs.sh
41+
if [ -d "$SPARK_JARS_DIR" ]; then
42+
# Build a zip file containing the source package with vignettes
43+
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/R" CMD build "$FWDIR/pkg"
4144

42-
# Build a zip file containing the source package
43-
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
45+
find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
46+
else
47+
echo "Error Spark JARs not found in '$SPARK_HOME'"
48+
exit 1
49+
fi
4450

4551
# Run check as-cran.
46-
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
52+
VERSION=`grep Version "$FWDIR/pkg/DESCRIPTION" | awk '{print $NF}'`
4753

4854
CRAN_CHECK_OPTIONS="--as-cran"
4955

@@ -54,11 +60,17 @@ fi
5460

5561
if [ -n "$NO_MANUAL" ]
5662
then
57-
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
63+
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
5864
fi
5965

6066
echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
6167

62-
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
68+
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
69+
then
70+
"$R_SCRIPT_PATH/R" CMD check $CRAN_CHECK_OPTIONS "SparkR_$VERSION.tar.gz"
71+
else
72+
# This will run tests and/or build vignettes, and require SPARK_HOME
73+
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/R" CMD check $CRAN_CHECK_OPTIONS "SparkR_$VERSION.tar.gz"
74+
fi
6375

6476
popd > /dev/null

0 commit comments

Comments
 (0)