Skip to content

Commit 7f1c9f6

Browse files
author
Devaraj K
committed
Merge branch 'master' into SPARK-15288, and resolving conflicts
2 parents 2f306a7 + 28ab0ec commit 7f1c9f6

3,048 files changed

Lines changed: 170680 additions & 58722 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/PULL_REQUEST_TEMPLATE

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@
22

33
(Please fill in changes proposed in this fix)
44

5-
65
## How was this patch tested?
76

87
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
9-
10-
118
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
129

10+
Please review http://spark.apache.org/contributing.html before opening a pull request.

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,14 @@
1717
.idea/
1818
.idea_modules/
1919
.project
20+
.pydevproject
2021
.scala_dependencies
2122
.settings
2223
/lib/
2324
R-unit-tests.log
2425
R/unit-tests.out
26+
R/cran-check.out
27+
R/pkg/vignettes/sparkr-vignettes.html
2528
build/*.jar
2629
build/apache-maven*
2730
build/scala*
@@ -54,6 +57,8 @@ project/plugins/project/build.properties
5457
project/plugins/src_managed/
5558
project/plugins/target/
5659
python/lib/pyspark.zip
60+
python/deps
61+
python/pyspark/python
5762
reports/
5863
scalastyle-on-compile.generated.xml
5964
scalastyle-output.xml
@@ -77,3 +82,8 @@ spark-warehouse/
7782
# For R session data
7883
.RData
7984
.RHistory
85+
.Rhistory
86+
*.Rproj
87+
*.Rproj.*
88+
89+
.Rproj.user

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ notifications:
4444
# 5. Run maven install before running lint-java.
4545
install:
4646
- export MAVEN_SKIP_RC=1
47-
- build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
47+
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
4848

4949
# 6. Run lint-java.
5050
script:

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Contributing to Spark
22

33
*Before opening a pull request*, review the
4-
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
4+
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
55
It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
9+
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
265265
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
266-
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
266+
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.4 - http://py4j.sourceforge.net/)
267267
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
268268
(BSD licence) sbt and sbt-launch-lib.bash
269269
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
@@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
296296
(MIT License) blockUI (http://jquery.malsup.com/block/)
297297
(MIT License) RowsGroup (http://datatables.net/license/mit)
298298
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
299+
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

NOTICE

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Apache Spark
2-
Copyright 2014 The Apache Software Foundation.
2+
Copyright 2014 and onwards The Apache Software Foundation.
33

44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
421421
This product includes/uses ASM (http://asm.ow2.org/),
422422
Copyright (c) 2000-2007 INRIA, France Telecom.
423423

424-
This product includes/uses org.json (http://www.json.org/java/index.html),
425-
Copyright (c) 2002 JSON.org
426-
427424
This product includes/uses JLine (http://jline.sourceforge.net/),
428425
Copyright (c) 2002-2006, Marc Prud'hommeaux <mwp1@cornell.edu>.
429426

R/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
lib
55
pkg/man
66
pkg/html
7+
SparkR.Rcheck/
8+
SparkR_*.tar.gz

R/CRAN_RELEASE.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# SparkR CRAN Release
2+
3+
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
4+
`dev@spark.apache.org` community and R package maintainer on this.
5+
6+
### Release
7+
8+
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
9+
10+
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
11+
12+
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
13+
14+
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
15+
16+
```R
17+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
18+
```
19+
20+
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
21+
22+
### Testing: build package manually
23+
24+
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
25+
26+
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
27+
28+
#### Build source package
29+
30+
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
31+
32+
```R
33+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
34+
```
35+
36+
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
37+
38+
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
39+
40+
For example, this should be the content of the source package:
41+
42+
```sh
43+
DESCRIPTION R inst tests
44+
NAMESPACE build man vignettes
45+
46+
inst/doc/
47+
sparkr-vignettes.html
48+
sparkr-vignettes.Rmd
49+
sparkr-vignettes.Rman
50+
51+
build/
52+
vignette.rds
53+
54+
man/
55+
*.Rd files...
56+
57+
vignettes/
58+
sparkr-vignettes.Rmd
59+
```
60+
61+
#### Test source package
62+
63+
To install, run this:
64+
65+
```sh
66+
R CMD INSTALL SparkR_2.1.0.tar.gz
67+
```
68+
69+
With "2.1.0" replaced with the version of SparkR.
70+
71+
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
72+
73+
```R
74+
library(SparkR)
75+
vignette("sparkr-vignettes", package="SparkR")
76+
```
77+
78+
#### Build binary package
79+
80+
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
81+
82+
```R
83+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
84+
```
85+
86+
For example, this should be the content of the binary package:
87+
88+
```sh
89+
DESCRIPTION Meta R html tests
90+
INDEX NAMESPACE help profile worker
91+
```

R/DOCUMENTATION.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# SparkR Documentation
22

3-
SparkR documentation is generated using in-source comments annotated using using
4-
`roxygen2`. After making changes to the documentation, to generate man pages,
3+
SparkR documentation is generated by using in-source comments and annotated by using
4+
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
55
you can run the following from an R console in the SparkR home directory
6-
7-
library(devtools)
8-
devtools::document(pkg="./pkg", roclets=c("rd"))
9-
6+
```R
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
```
1010
You can verify if your changes are good by running
1111

1212
R CMD check pkg/

R/README.md

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# R on Spark
22

33
SparkR is an R package that provides a light-weight frontend to use Spark from R.
4+
45
### Installing sparkR
56

67
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
78
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
8-
Example:
9-
```
9+
Example:
10+
```bash
1011
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1112
export R_HOME=/home/username/R
1213
./install-dev.sh
@@ -17,8 +18,9 @@ export R_HOME=/home/username/R
1718
#### Build Spark
1819

1920
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
20-
```
21-
build/mvn -DskipTests -Psparkr package
21+
22+
```bash
23+
build/mvn -DskipTests -Psparkr package
2224
```
2325

2426
#### Running sparkR
@@ -37,41 +39,43 @@ To set other options like driver memory, executor memory etc. you can pass in th
3739

3840
#### Using SparkR from RStudio
3941

40-
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
41-
```
42+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
43+
```R
4244
# Set this to where Spark is installed
4345
Sys.setenv(SPARK_HOME="/Users/username/spark")
4446
# This line loads SparkR from the installed directory
4547
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
4648
library(SparkR)
47-
sc <- sparkR.init(master="local")
49+
sparkR.session()
4850
```
4951

5052
#### Making changes to SparkR
5153

52-
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
54+
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
5355
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
5456
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
55-
57+
5658
#### Generating documentation
5759

58-
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
59-
60+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
61+
6062
### Examples, Unit tests
6163

6264
SparkR comes with several sample programs in the `examples/src/main/r` directory.
6365
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
64-
65-
./bin/spark-submit examples/src/main/r/dataframe.R
66-
67-
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
68-
69-
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
70-
./R/run-tests.sh
66+
```bash
67+
./bin/spark-submit examples/src/main/r/dataframe.R
68+
```
69+
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
70+
```bash
71+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
72+
./R/run-tests.sh
73+
```
7174

7275
### Running on YARN
76+
7377
The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
74-
```
78+
```bash
7579
export YARN_CONF_DIR=/etc/hadoop/conf
7680
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
7781
```

0 commit comments

Comments
 (0)