Skip to content
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4e96c01
Add YARN/Stable compiled classes to the CLASSPATH.
berngp Apr 15, 2014
1342886
The `spark-class` shell now ignores non jar files in the assembly dir…
berngp Apr 15, 2014
ddf2547
The `spark-shell` option `--log-conf` also enables the SPARK_PRINT_LA…
berngp Apr 15, 2014
2204539
Root is now Spark and qualify the assembly if it was built with YARN.
berngp Apr 15, 2014
889bf4e
Upgrade the Maven Build to YARN 2.3.0.
berngp Apr 16, 2014
460510a
merge https://github.com/berngp/spark/commits/feature/small-shell-cha…
witgo Apr 29, 2014
f1c7535
Improved build configuration Ⅱ
witgo Apr 29, 2014
8540e83
review commit
witgo Apr 30, 2014
c4c6e45
review commit
witgo Apr 30, 2014
9f08e80
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 1, 2014
e1a7e00
improve travis tests coverage
witgo May 1, 2014
effe79c
missing ","
witgo May 1, 2014
9ea1af9
add the dependency of commons-lang
witgo May 1, 2014
0ed124d
SPARK-1693: Most of the tests throw a java.lang.SecurityException whe…
witgo May 1, 2014
03b136f
revert .travis.yml
witgo May 1, 2014
d3488c6
Add the missing yarn dependencies
witgo May 1, 2014
779ae5d
Fix SPARK-1693: Dependent on multiple versions of servlet-api jars le…
witgo May 1, 2014
27bd426
review commit
witgo May 1, 2014
54a86b0
review commit
witgo May 2, 2014
882e35d
review commit
witgo May 2, 2014
31451df
Compile hive optional
witgo May 3, 2014
5fb961f
revert exclusion org.eclipse.jetty.orbit:javax.servlet
witgo May 3, 2014
ea53549
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 3, 2014
a5ff7d1
revert exclusion org.eclipse.jetty.orbit:javax.servlet
witgo May 3, 2014
17f6e7d
merge master
witgo May 4, 2014
3218d3b
merge master
witgo May 5, 2014
e788690
merge master
witgo May 7, 2014
8b0c63f
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 12, 2014
427d499
merge master
witgo May 12, 2014
f1eb268
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 12, 2014
4cc0c90
revert profile hive
witgo May 12, 2014
4277fed
review commit
witgo May 12, 2014
31c6409
review commit
witgo May 12, 2014
7d8cabf
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 14, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"

DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar`
CLASSPATH="$CLASSPATH:$DEPS_ASSEMBLY_JAR"
Expand Down
4 changes: 2 additions & 2 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@ export JAVA_OPTS

if [ ! -f "$FWDIR/RELEASE" ]; then
# Exit if the user hasn't compiled Spark
num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep -E "spark-assembly.*hadoop.*.jar$" | wc -l)
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep -E "spark-assembly.*hadoop.*.jar$")
if [ "$num_jars" -eq "0" ]; then
echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2
Expand Down
9 changes: 6 additions & 3 deletions docs/building-with-maven.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,20 @@ For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions wit
For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you can enable the "yarn-alpha" or "yarn" profile and set the "hadoop.version", "yarn.version" property. Note that Hadoop 0.23.X requires a special `-Phadoop-0.23` profile:

# Apache Hadoop 2.0.5-alpha
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -Dyarn.version=2.0.5-alpha -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -DskipTests clean package

# Cloudera CDH 4.2.0 with MapReduce v2
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -Dyarn.version=2.0.0-cdh4.2.0 -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package

# Apache Hadoop 2.2.X (e.g. 2.2.0 as below) and newer
$ mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package
$ mvn -Pyarn -Dhadoop.version=2.2.0 -DskipTests clean package

# Apache Hadoop 0.23.x
$ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -Dyarn.version=0.23.7 -DskipTests clean package

# Different versions of HDFS vs YARN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but could this say:
Differing versions of HDFS and YARN (no period)

$ mvn -Pyarn-alpha -Dhadoop.version=2.3.0 -Dyarn.version= 0.23.7 -DskipTests clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Dyarn.version=0.23.7


## Spark Tests in Maven ##

Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin). Some of the require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. You can then run the tests with `mvn -Dhadoop.version=... test`.
Expand Down
134 changes: 67 additions & 67 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache</groupId>
Expand Down Expand Up @@ -119,7 +120,7 @@
<log4j.version>1.2.17</log4j.version>
<hadoop.version>1.0.4</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<yarn.version>0.23.7</yarn.version>
<yarn.version>${hadoop.version}</yarn.version>
<hbase.version>0.94.6</hbase.version>
<hive.version>0.12.0</hive.version>
<parquet.version>1.3.2</parquet.version>
Expand All @@ -135,7 +136,8 @@

<repositories>
<repository>
<id>maven-repo</id> <!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution -->
<id>maven-repo</id>
<!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution -->
<name>Maven Repository</name>
<!-- HTTPS is unavailable for Maven Central -->
<url>http://repo.maven.apache.org/maven2</url>
Expand Down Expand Up @@ -558,64 +560,7 @@
<artifactId>jets3t</artifactId>
<version>0.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<!-- Matches the version of jackson-core-asl pulled in by avro -->
<groupId>org.codehaus.jackson</groupId>
Expand Down Expand Up @@ -850,12 +795,6 @@
<modules>
<module>yarn</module>
</modules>
<dependencies>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
</dependency>
</dependencies>
</profile>

<!-- Ganglia integration is not included by default due to LGPL-licensed code -->
Expand Down Expand Up @@ -895,13 +834,74 @@
<id>yarn</id>
<properties>
<hadoop.major.version>2</hadoop.major.version>
<hadoop.version>2.2.0</hadoop.version>
<hadoop.version>2.3.0</hadoop.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to change this so late in the game... could you revert this back to 2.2

<protobuf.version>2.5.0</protobuf.version>
</properties>
<modules>
<module>yarn</module>
</modules>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</dependencyManagement>
</profile>

<!-- Build without Hadoop dependencies that are included in some runtime environments. -->
Expand Down
4 changes: 2 additions & 2 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ object SparkBuild extends Build {
val SCALAC_JVM_VERSION = "jvm-1.6"
val JAVAC_JVM_VERSION = "1.6"

lazy val root = Project("root", file("."), settings = rootSettings) aggregate(allProjects: _*)
lazy val root = Project("spark", file("."), settings = rootSettings) aggregate(allProjects: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - what is the benefit of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to increase readability.


lazy val core = Project("core", file("core"), settings = coreSettings)

Expand Down Expand Up @@ -569,7 +569,7 @@ object SparkBuild extends Build {
libraryDependencies += "net.sf.py4j" % "py4j" % "0.8.1",
name := "spark-assembly",
assembleDeps in Compile <<= (packageProjects.map(packageBin in Compile in _) ++ Seq(packageDependency in Compile)).dependOn,
jarName in assembly <<= version map { v => "spark-assembly-" + v + "-hadoop" + hadoopVersion + ".jar" },
jarName in assembly <<= version map { v => s"spark-assembly-${v}-hadoop${hadoopVersion}${if (isYarnEnabled) "-yarn" else ""}.jar" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep this as-is for now. We have many parameters in the build (hadoop version, hive support, ganglia, yarn etc) and I don't think it will scale to add new suffixes for each one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this doesn't seem consistent with the maven build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because it was getting a bit confusing if some of the assembled artifacts were built with YARN or not. If we can't qualify how about creating a Manifest of some sort the deployment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd just prefer to keep it simple and use uniform naming.

One thing we should probably do is in ./make-distribution.sh we should add other parameters to the RELEASE file that gets created. But I that can be in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RELEASE already logs the SPARK_HADOOP_VERSION. This might suffice the general use case.

echo "Spark $VERSION built for Hadoop $SPARK_HADOOP_VERSION" > "$DISTDIR/RELEASE"

jarName in packageDependency <<= version map { v => "spark-assembly-" + v + "-hadoop" + hadoopVersion + "-deps.jar" }
) ++ assemblySettings ++ extraAssemblySettings

Expand Down
1 change: 0 additions & 1 deletion yarn/alpha/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
<groupId>org.apache.spark</groupId>
<artifactId>yarn-parent_2.10</artifactId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<groupId>org.apache.spark</groupId>
Expand Down
4 changes: 1 addition & 3 deletions yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,13 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent</artifactId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<groupId>org.apache.spark</groupId>
<artifactId>yarn-parent_2.10</artifactId>
<packaging>pom</packaging>
<name>Spark Project YARN Parent POM</name>

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
Expand All @@ -50,7 +49,6 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${yarn.version}</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
Expand Down
1 change: 0 additions & 1 deletion yarn/stable/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
<groupId>org.apache.spark</groupId>
<artifactId>yarn-parent_2.10</artifactId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these might be necessary for users who link against this artifact. In general the yarn module is not something people really link against in spark, but we do publish it, so I think it might be good to include these.


<groupId>org.apache.spark</groupId>
Expand Down