Skip to content

Commit f2d76a0

Browse files
committed
Merge branch 'master' into pyspark-inputformats
Conflicts: project/SparkBuild.scala
2 parents 41856a5 + a18ea00 commit f2d76a0

291 files changed

Lines changed: 8527 additions & 2983 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

LICENSE

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,3 +396,35 @@ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
396396
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
397397
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
398398
POSSIBILITY OF SUCH DAMAGE.
399+
400+
401+
========================================================================
402+
For sbt and sbt-launch-lib.bash in sbt/:
403+
========================================================================
404+
405+
// Generated from http://www.opensource.org/licenses/bsd-license.php
406+
Copyright (c) 2011, Paul Phillips.
407+
All rights reserved.
408+
409+
Redistribution and use in source and binary forms, with or without
410+
modification, are permitted provided that the following conditions are met:
411+
412+
* Redistributions of source code must retain the above copyright notice,
413+
this list of conditions and the following disclaimer.
414+
* Redistributions in binary form must reproduce the above copyright notice,
415+
this list of conditions and the following disclaimer in the documentation
416+
and/or other materials provided with the distribution.
417+
* Neither the name of the author nor the names of its contributors may be
418+
used to endorse or promote products derived from this software without
419+
specific prior written permission.
420+
421+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
422+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
423+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
424+
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
425+
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
426+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
427+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
428+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
429+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
430+
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Apache Spark
2-
Copyright 2013 The Apache Software Foundation.
2+
Copyright 2014 The Apache Software Foundation.
33

44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).

README.md

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.incubator.apache.org/>
3+
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
44

55

66
## Online Documentation
77

88
You can find the latest Spark documentation, including a programming
9-
guide, on the project webpage at <http://spark.incubator.apache.org/documentation.html>.
9+
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
1010
This README file only contains basic setup instructions.
1111

1212

@@ -92,21 +92,10 @@ If your project is built with Maven, add this to your POM file's `<dependencies>
9292

9393
## Configuration
9494

95-
Please refer to the [Configuration guide](http://spark.incubator.apache.org/docs/latest/configuration.html)
95+
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
9696
in the online documentation for an overview on how to configure Spark.
9797

9898

99-
## Apache Incubator Notice
100-
101-
Apache Spark is an effort undergoing incubation at The Apache Software
102-
Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of
103-
all newly accepted projects until a further review indicates that the
104-
infrastructure, communications, and decision making process have stabilized in
105-
a manner consistent with other successful ASF projects. While incubation status
106-
is not necessarily a reflection of the completeness or stability of the code,
107-
it does indicate that the project has yet to be fully endorsed by the ASF.
108-
109-
11099
## Contributing to Spark
111100

112101
Contributions via GitHub pull requests are gladly accepted from their original

assembly/pom.xml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,20 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.0.0-incubating-SNAPSHOT</version>
24+
<version>1.0.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-assembly_2.10</artifactId>
3030
<name>Spark Project Assembly</name>
31-
<url>http://spark.incubator.apache.org/</url>
31+
<url>http://spark.apache.org/</url>
32+
<packaging>pom</packaging>
3233

3334
<properties>
34-
<spark.jar>${project.build.directory}/scala-${scala.binary.version}/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</spark.jar>
35+
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
36+
<spark.jar.basename>${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
37+
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>
3538
<deb.pkg.name>spark</deb.pkg.name>
3639
<deb.install.path>/usr/share/spark</deb.install.path>
3740
<deb.user>root</deb.user>
@@ -155,6 +158,16 @@
155158
</dependency>
156159
</dependencies>
157160
</profile>
161+
<profile>
162+
<id>spark-ganglia-lgpl</id>
163+
<dependencies>
164+
<dependency>
165+
<groupId>org.apache.spark</groupId>
166+
<artifactId>spark-ganglia-lgpl_${scala.binary.version}</artifactId>
167+
<version>${project.version}</version>
168+
</dependency>
169+
</dependencies>
170+
</profile>
158171
<profile>
159172
<id>bigtop-dist</id>
160173
<!-- This profile uses the assembly plugin to create a special "dist" package for BigTop

assembly/src/main/assembly/assembly.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@
5555
<include>**/*</include>
5656
</includes>
5757
</fileSet>
58+
<fileSet>
59+
<directory>
60+
${project.parent.basedir}/assembly/target/${spark.jar.dir}
61+
</directory>
62+
<outputDirectory>/</outputDirectory>
63+
<includes>
64+
<include>${spark.jar.basename}</include>
65+
</includes>
66+
</fileSet>
5867
</fileSets>
5968

6069
<dependencySets>
@@ -75,6 +84,8 @@
7584
<excludes>
7685
<exclude>org.apache.hadoop:*:jar</exclude>
7786
<exclude>org.apache.spark:*:jar</exclude>
87+
<exclude>org.apache.zookeeper:*:jar</exclude>
88+
<exclude>org.apache.avro:*:jar</exclude>
7889
</excludes>
7990
</dependencySet>
8091
</dependencySets>

bagel/pom.xml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,29 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.0.0-incubating-SNAPSHOT</version>
24+
<version>1.0.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
3030
<packaging>jar</packaging>
3131
<name>Spark Project Bagel</name>
32-
<url>http://spark.incubator.apache.org/</url>
32+
<url>http://spark.apache.org/</url>
33+
34+
<profiles>
35+
<profile>
36+
<!-- SPARK-1121: SPARK-1121: Adds an explicit dependency on Avro to work around
37+
a Hadoop 0.23.X issue -->
38+
<id>yarn-alpha</id>
39+
<dependencies>
40+
<dependency>
41+
<groupId>org.apache.avro</groupId>
42+
<artifactId>avro</artifactId>
43+
</dependency>
44+
</dependencies>
45+
</profile>
46+
</profiles>
3347

3448
<dependencies>
3549
<dependency>

bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ object Bagel extends Logging {
2727

2828
/**
2929
* Runs a Bagel program.
30-
* @param sc [[org.apache.spark.SparkContext]] to use for the program.
30+
* @param sc org.apache.spark.SparkContext to use for the program.
3131
* @param vertices vertices of the graph represented as an RDD of (Key, Vertex) pairs. Often the
3232
* Key will be the vertex id.
3333
* @param messages initial set of messages represented as an RDD of (Key, Message) pairs. Often
@@ -38,10 +38,10 @@ object Bagel extends Logging {
3838
* @param aggregator [[org.apache.spark.bagel.Aggregator]] performs a reduce across all vertices
3939
* after each superstep and provides the result to each vertex in the next
4040
* superstep.
41-
* @param partitioner [[org.apache.spark.Partitioner]] partitions values by key
41+
* @param partitioner org.apache.spark.Partitioner partitions values by key
4242
* @param numPartitions number of partitions across which to split the graph.
4343
* Default is the default parallelism of the SparkContext
44-
* @param storageLevel [[org.apache.spark.storage.StorageLevel]] to use for caching of
44+
* @param storageLevel org.apache.spark.storage.StorageLevel to use for caching of
4545
* intermediate RDDs in each superstep. Defaults to caching in memory.
4646
* @param compute function that takes a Vertex, optional set of (possibly combined) messages to
4747
* the Vertex, optional Aggregator and the current superstep,
@@ -131,7 +131,7 @@ object Bagel extends Logging {
131131

132132
/**
133133
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]], default
134-
* [[org.apache.spark.HashPartitioner]] and default storage level
134+
* org.apache.spark.HashPartitioner and default storage level
135135
*/
136136
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
137137
sc: SparkContext,
@@ -146,7 +146,7 @@ object Bagel extends Logging {
146146

147147
/**
148148
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] and the
149-
* default [[org.apache.spark.HashPartitioner]]
149+
* default org.apache.spark.HashPartitioner
150150
*/
151151
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
152152
sc: SparkContext,
@@ -166,7 +166,7 @@ object Bagel extends Logging {
166166

167167
/**
168168
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]],
169-
* default [[org.apache.spark.HashPartitioner]],
169+
* default org.apache.spark.HashPartitioner,
170170
* [[org.apache.spark.bagel.DefaultCombiner]] and the default storage level
171171
*/
172172
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest](
@@ -180,7 +180,7 @@ object Bagel extends Logging {
180180

181181
/**
182182
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]],
183-
* the default [[org.apache.spark.HashPartitioner]]
183+
* the default org.apache.spark.HashPartitioner
184184
* and [[org.apache.spark.bagel.DefaultCombiner]]
185185
*/
186186
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest](

bin/spark-class

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -40,34 +40,46 @@ if [ -z "$1" ]; then
4040
exit 1
4141
fi
4242

43-
# If this is a standalone cluster daemon, reset SPARK_JAVA_OPTS and SPARK_MEM to reasonable
44-
# values for that; it doesn't need a lot
45-
if [ "$1" = "org.apache.spark.deploy.master.Master" -o "$1" = "org.apache.spark.deploy.worker.Worker" ]; then
46-
SPARK_MEM=${SPARK_DAEMON_MEMORY:-512m}
47-
SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"
48-
# Do not overwrite SPARK_JAVA_OPTS environment variable in this script
49-
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS" # Empty by default
50-
else
51-
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS"
43+
if [ -n "$SPARK_MEM" ]; then
44+
echo "Warning: SPARK_MEM is deprecated, please use a more specific config option"
45+
echo "(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY)."
5246
fi
5347

48+
# Use SPARK_MEM or 512m as the default memory, to be overridden by specific options
49+
DEFAULT_MEM=${SPARK_MEM:-512m}
50+
51+
SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"
5452

55-
# Add java opts for master, worker, executor. The opts maybe null
53+
# Add java opts and memory settings for master, worker, executors, and repl.
5654
case "$1" in
55+
# Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
5756
'org.apache.spark.deploy.master.Master')
58-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_MASTER_OPTS"
57+
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_MASTER_OPTS"
58+
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
5959
;;
6060
'org.apache.spark.deploy.worker.Worker')
61-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_WORKER_OPTS"
61+
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_WORKER_OPTS"
62+
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
6263
;;
64+
65+
# Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
6366
'org.apache.spark.executor.CoarseGrainedExecutorBackend')
64-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
67+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
68+
OUR_JAVA_MEM=${SPARK_EXECUTOR_MEMORY:-$DEFAULT_MEM}
6569
;;
6670
'org.apache.spark.executor.MesosExecutorBackend')
67-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
71+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
72+
OUR_JAVA_MEM=${SPARK_EXECUTOR_MEMORY:-$DEFAULT_MEM}
6873
;;
74+
75+
# All drivers use SPARK_JAVA_OPTS + SPARK_DRIVER_MEMORY. The repl also uses SPARK_REPL_OPTS.
6976
'org.apache.spark.repl.Main')
70-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_REPL_OPTS"
77+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_REPL_OPTS"
78+
OUR_JAVA_MEM=${SPARK_DRIVER_MEMORY:-$DEFAULT_MEM}
79+
;;
80+
*)
81+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS"
82+
OUR_JAVA_MEM=${SPARK_DRIVER_MEMORY:-$DEFAULT_MEM}
7183
;;
7284
esac
7385

@@ -83,14 +95,10 @@ else
8395
fi
8496
fi
8597

86-
# Set SPARK_MEM if it isn't already set since we also use it for this process
87-
SPARK_MEM=${SPARK_MEM:-512m}
88-
export SPARK_MEM
89-
9098
# Set JAVA_OPTS to be able to load native libraries and to set heap size
9199
JAVA_OPTS="$OUR_JAVA_OPTS"
92100
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
93-
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
101+
JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"
94102
# Load extra JAVA_OPTS from conf/java-opts, if it exists
95103
if [ -e "$FWDIR/conf/java-opts" ] ; then
96104
JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"

bin/spark-class2.cmd

Lines changed: 35 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -34,22 +34,45 @@ if not "x%1"=="x" goto arg_given
3434
goto exit
3535
:arg_given
3636

37-
set RUNNING_DAEMON=0
38-
if "%1"=="spark.deploy.master.Master" set RUNNING_DAEMON=1
39-
if "%1"=="spark.deploy.worker.Worker" set RUNNING_DAEMON=1
40-
if "x%SPARK_DAEMON_MEMORY%" == "x" set SPARK_DAEMON_MEMORY=512m
37+
if not "x%SPARK_MEM%"=="x" (
38+
echo Warning: SPARK_MEM is deprecated, please use a more specific config option
39+
echo e.g., spark.executor.memory or SPARK_DRIVER_MEMORY.
40+
)
41+
42+
rem Use SPARK_MEM or 512m as the default memory, to be overridden by specific options
43+
set OUR_JAVA_MEM=%SPARK_MEM%
44+
if "x%OUR_JAVA_MEM%"=="x" set OUR_JAVA_MEM=512m
45+
4146
set SPARK_DAEMON_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% -Dspark.akka.logLifecycleEvents=true
42-
if "%RUNNING_DAEMON%"=="1" set SPARK_MEM=%SPARK_DAEMON_MEMORY%
43-
rem Do not overwrite SPARK_JAVA_OPTS environment variable in this script
44-
if "%RUNNING_DAEMON%"=="0" set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS%
45-
if "%RUNNING_DAEMON%"=="1" set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS%
4647

47-
rem Figure out how much memory to use per executor and set it as an environment
48-
rem variable so that our process sees it and can report it to Mesos
49-
if "x%SPARK_MEM%"=="x" set SPARK_MEM=512m
48+
rem Add java opts and memory settings for master, worker, executors, and repl.
49+
rem Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
50+
if "%1"=="org.apache.spark.deploy.master.Master" (
51+
set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% %SPARK_MASTER_OPTS%
52+
if not "x%SPARK_DAEMON_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DAEMON_MEMORY%
53+
) else if "%1"=="org.apache.spark.deploy.worker.Worker" (
54+
set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% %SPARK_WORKER_OPTS%
55+
if not "x%SPARK_DAEMON_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DAEMON_MEMORY%
56+
57+
rem Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
58+
) else if "%1"=="org.apache.spark.executor.CoarseGrainedExecutorBackend" (
59+
set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_EXECUTOR_OPTS%
60+
if not "x%SPARK_EXECUTOR_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_EXECUTOR_MEMORY%
61+
) else if "%1"=="org.apache.spark.executor.MesosExecutorBackend" (
62+
set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_EXECUTOR_OPTS%
63+
if not "x%SPARK_EXECUTOR_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_EXECUTOR_MEMORY%
64+
65+
rem All drivers use SPARK_JAVA_OPTS + SPARK_DRIVER_MEMORY. The repl also uses SPARK_REPL_OPTS.
66+
) else if "%1"=="org.apache.spark.repl.Main" (
67+
set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_REPL_OPTS%
68+
if not "x%SPARK_DRIVER_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DRIVER_MEMORY%
69+
) else (
70+
set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS%
71+
if not "x%SPARK_DRIVER_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DRIVER_MEMORY%
72+
)
5073

5174
rem Set JAVA_OPTS to be able to load native libraries and to set heap size
52-
set JAVA_OPTS=%OUR_JAVA_OPTS% -Djava.library.path=%SPARK_LIBRARY_PATH% -Xms%SPARK_MEM% -Xmx%SPARK_MEM%
75+
set JAVA_OPTS=%OUR_JAVA_OPTS% -Djava.library.path=%SPARK_LIBRARY_PATH% -Xms%OUR_JAVA_MEM% -Xmx%OUR_JAVA_MEM%
5376
rem Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in ExecutorRunner.scala!
5477

5578
rem Test whether the user has built Spark

0 commit comments

Comments
 (0)