Skip to content
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,19 @@ include Rtools and R in `PATH`.
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

## Unit tests

To run existing unit tests in SparkR on Windows, the following setps are required (the steps below suppose you are in Spark root directory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthetical is fine as a sentence by itself. "in the Spark root ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Thanks!


1. Set `HADOOP_HOME`.
2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`.

It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @steveloughran for comment. I think the paragraph should start with "It is not included in the Hadoop binary releases, so .... However it is downloadable from, for example [...]"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will wait for the comment and will fix.

Copy link
Contributor

@steveloughran steveloughran May 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't recommend putting it under the root of the project, as that only complicates the source tree and path cleanup; an adjacent directory works. And I think you may find that HADOOP.DLL is needed in places, as there are some JNI calls related to local file access and permissions/ACLs

At some point we (the Hadoop team) may start releasing the Windows binaries direct. It's only avoided as it complicates the release process somewhat, though if it encourages testing it can only be encouraged.

I'd suggest the following text:


To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:

  1. cd ..
  2. mkdir hadoop
  3. Download the relevant Hadoop bin package from steveloughran/winutils. While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM.
  4. Install the files into hadoop\bin; make sure that winutils.exe and hadoop.dll are present.
  5. Set the environment variable HADOOP_HOME to the full path to the newly created hadoop directory.
  6. For further reading, consult Windows Problems on the Hadoop wiki


3. Run unit-tests for SparkR by running below (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unit tests" and "by running the command below". Again parenthetical can be a sentence. This step is already documented in R docs though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will fix. I wrote it just in case becase the commends are a little bit different (I am not used to Windows and it took me a while to find out the equivalent commands).


```
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" R\pkg\tests\run-all.R
```