Added Java v8 reference and install code for MacOS#1209
Conversation
SETUP.md
Outdated
| <details> | ||
| <summary><strong><em>Java version for PySpark</em></strong></summary> | ||
|
|
||
| **Note.** By default we use pyspark v2.4.3. It doesn't work on Java versions >8. |
There was a problem hiding this comment.
@yegorkryukov thanks for the contribution. In linux we are using 2.4.5, see this. Does 2.4.5 work with Java 8 on Mac?
There was a problem hiding this comment.
@miguelgfierro I'm not sure. My reco_pyspark environment uses Spark 2.4.3 and it works with Java 8.
There was a problem hiding this comment.
I just installed 2.4.5 and now pyspark won't work.
There was a problem hiding this comment.
wow, not very stable :-)
would 2.4.5 work if you install java 9?
There was a problem hiding this comment.
Nope. Didn't work with Java 9 either.
There was a problem hiding this comment.
@yegorkryukov I just tested on my mac (Catalina 10.15.7) with Spark 2.4.5 + Java 8 and I could run als_movielens notebook w/o any issues.
Few things we can improve the installation steps in this PR are:
- JAVA8 installation on MacOS is relevant to readers only when they install pyspark. We can move the section under PySpark environment installation section. See Set PySpark environment variables on Linux or MacOS section e.g.. We can move JAVA8 installation section right before Set PySpark environment variables on Linux or MacOS.
- At the end of JAVA8 installation instruction, let readers verify installation via
java -version. Also, one may still use bash. We can simply add a comment:# in bash, run: . ~/.asdf/plugins/java/set-java-home.bashat the end of the instruction script. - @miguelgfierro I think we should remove SPARK_HOME (maybe we backup the path at env-activation and recover when deactivate env. I recall we had that codes before). If we set SPARK_HOME to spark installation path, pyspark uses that installed version instead of the version we installed (2.4.5).
| brew install asdf | ||
| asdf plugin add Java | ||
| asdf install java adoptopenjdk-8.0.265+1 | ||
| asdf global java adoptopenjdk-8.0.265+1 | ||
| . ~/.asdf/plugins/java/set-java-home.zsh |
There was a problem hiding this comment.
I haven't used a mac in a bit, but curious if asdf is needed here vs more direct use of adoptopenjdk?
brew tap AdoptOpenJDK/openjdk
brew cask install adoptopenjdk8
There was a problem hiding this comment.
I've tried this one more time rn and when I call java -version I get this:
openjdk version "14.0.1" 2020-04-14
OpenJDK Runtime Environment (build 14.0.1+14)
OpenJDK 64-Bit Server VM (build 14.0.1+14, mixed mode, sharing)
Which stops
So I chose asdf the first time because it set environmental variables correctly (or so I think).
I'm happy to use this solution though. What do you think should I do to fix this?
There was a problem hiding this comment.
I think typically you would run something like this (and add it in .bashrc or .bash_profile)
export JAVA_HOME=$(/usr/libexec/java_home -v1.8);
but if asdf handles installation without extra setup it's fine. Does this require use of z-shell?
There was a problem hiding this comment.
@loomlike can you test this out on mac and the 2.4.3 / 2.4.5 versioning issues?
There was a problem hiding this comment.
Does this require use of z-shell?
zsh is the default shell on Catalina now I believe
There was a problem hiding this comment.
oh wow, i'm really falling behind the times in os/x world =)
There was a problem hiding this comment.
@gramhagen I tested and everything looks good. Please see my comment in the previous thread.
SETUP.md
Outdated
|
|
||
| You can specify the environment name as well with the flag `-n`. | ||
|
|
||
| <details> |
There was a problem hiding this comment.
what are your thoughts about putting this in the section for PySpark setup? do you think that hides it too much? If so perhaps keeping the note here, but referring to Pyspark Environment setup for details on installing Java dependencies would be more organized.
There was a problem hiding this comment.
Yeah, I thought about this one for some time. Initially I've placed this under PySpark environment setup. But then I thought that I would need to duplicate this piece and place it under Full (PySpark & Python GPU) environment. Hence I've placed it above those two. If you think having this piece twice under those two section makes more sense I'll move it.
There was a problem hiding this comment.
I think it would be great to put the note that using PySpark 2.4.x requires Java 8 underneath the note about xlearn so it's visible at the top level. We can add a reference there to PySpark Environment section for more details.
we should add a section in PySpark environment about Installing Java for Mac.
The Full environment section already references PySpark, so no need to repeat it there.
There was a problem hiding this comment.
Will this work?
loomlike
left a comment
There was a problem hiding this comment.
Thank you for your contribution and we are really sorry for delayed response! The PR looks great. I left few comments. Please check them, try to address them if possible, and make sure you merge the latest staging branch first before complete this PR.
gramhagen
left a comment
There was a problem hiding this comment.
This is good, we can tweak it later if we decide to move anything around. Thanks a lot for testing this out!
Description
Added info about Java v8 needed for PySpark to work. Added MacOS install instructions.
Related Issues
#1207
Checklist:
stagingand notmaster.