You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 15, 2024. It is now read-only.
[SPARK-21267][DOCS][MINOR] Follow up to avoid referencing programming-guide redirector
## What changes were proposed in this pull request?
Update internal references from programming-guide to rdd-programming-guide
See apache/spark-website@5ddf243 and apache#18485 (comment)
Let's keep the redirector even if it's problematic to build, but not rely on it internally.
## How was this patch tested?
(Doc build)
Author: Sean Owen <sowen@cloudera.com>
Closesapache#18625 from srowen/SPARK-21267.2.
Copy file name to clipboardExpand all lines: docs/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -87,7 +87,7 @@ options for deployment:
87
87
**Programming Guides:**
88
88
89
89
*[Quick Start](quick-start.html): a quick introduction to the Spark API; start here!
90
-
*[RDD Programming Guide](programming-guide.html): overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
90
+
*[RDD Programming Guide](rdd-programming-guide.html): overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
91
91
*[Spark SQL, Datasets, and DataFrames](sql-programming-guide.html): processing structured data with relational queries (newer API than RDDs)
92
92
*[Structured Streaming](structured-streaming-programming-guide.html): processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
93
93
*[Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API)
Copy file name to clipboardExpand all lines: docs/ml-guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ At a high level, it provides tools such as:
18
18
19
19
**The MLlib RDD-based API is now in maintenance mode.**
20
20
21
-
As of Spark 2.0, the [RDD](programming-guide.html#resilient-distributed-datasets-rdds)-based APIs in the `spark.mllib` package have entered maintenance mode.
21
+
As of Spark 2.0, the [RDD](rdd-programming-guide.html#resilient-distributed-datasets-rdds)-based APIs in the `spark.mllib` package have entered maintenance mode.
22
22
The primary Machine Learning API for Spark is now the [DataFrame](sql-programming-guide.html)-based API in the `spark.ml` package.
Copy file name to clipboardExpand all lines: docs/streaming-programming-guide.md
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -535,7 +535,7 @@ After a context is defined, you have to do the following.
535
535
It represents a continuous stream of data, either the input data stream received from source,
536
536
or the processed data stream generated by transforming the input stream. Internally,
537
537
a DStream is represented by a continuous series of RDDs, which is Spark's abstraction of an immutable,
538
-
distributed dataset (see [Spark Programming Guide](programming-guide.html#resilient-distributed-datasets-rdds) for more details). Each RDD in a DStream contains data from a certain interval,
538
+
distributed dataset (see [Spark Programming Guide](rdd-programming-guide.html#resilient-distributed-datasets-rdds) for more details). Each RDD in a DStream contains data from a certain interval,
539
539
as shown in the following figure.
540
540
541
541
<pstyle="text-align: center;">
@@ -1531,7 +1531,7 @@ default persistence level is set to replicate the data to two nodes for fault-to
1531
1531
1532
1532
Note that, unlike RDDs, the default persistence level of DStreams keeps the data serialized in
1533
1533
memory. This is further discussed in the [Performance Tuning](#memory-tuning) section. More
1534
-
information on different persistence levels can be found in the [Spark Programming Guide](programming-guide.html#rdd-persistence).
1534
+
information on different persistence levels can be found in the [Spark Programming Guide](rdd-programming-guide.html#rdd-persistence).
1535
1535
1536
1536
***
1537
1537
@@ -1720,7 +1720,13 @@ batch interval that is at least 10 seconds. It can be set by using
1720
1720
1721
1721
## Accumulators, Broadcast Variables, and Checkpoints
1722
1722
1723
-
[Accumulators](programming-guide.html#accumulators) and [Broadcast variables](programming-guide.html#broadcast-variables) cannot be recovered from checkpoint in Spark Streaming. If you enable checkpointing and use [Accumulators](programming-guide.html#accumulators) or [Broadcast variables](programming-guide.html#broadcast-variables) as well, you'll have to create lazily instantiated singleton instances for [Accumulators](programming-guide.html#accumulators) and [Broadcast variables](programming-guide.html#broadcast-variables) so that they can be re-instantiated after the driver restarts on failure. This is shown in the following example.
1723
+
[Accumulators](rdd-programming-guide.html#accumulators) and [Broadcast variables](rdd-programming-guide.html#broadcast-variables)
1724
+
cannot be recovered from checkpoint in Spark Streaming. If you enable checkpointing and use
1725
+
[Accumulators](rdd-programming-guide.html#accumulators) or [Broadcast variables](rdd-programming-guide.html#broadcast-variables)
1726
+
as well, you'll have to create lazily instantiated singleton instances for
1727
+
[Accumulators](rdd-programming-guide.html#accumulators) and [Broadcast variables](rdd-programming-guide.html#broadcast-variables)
1728
+
so that they can be re-instantiated after the driver restarts on failure.
1729
+
This is shown in the following example.
1724
1730
1725
1731
<divclass="codetabs">
1726
1732
<divdata-lang="scala"markdown="1">
@@ -2182,7 +2188,7 @@ overall processing throughput of the system, its use is still recommended to ach
2182
2188
consistent batch processing times. Make sure you set the CMS GC on both the driver (using `--driver-java-options` in `spark-submit`) and the executors (using [Spark configuration](configuration.html#runtime-environment)`spark.executor.extraJavaOptions`).
2183
2189
2184
2190
***Other tips**: To further reduce GC overheads, here are some more tips to try.
2185
-
- Persist RDDs using the `OFF_HEAP` storage level. See more detail in the [Spark Programming Guide](programming-guide.html#rdd-persistence).
2191
+
- Persist RDDs using the `OFF_HEAP` storage level. See more detail in the [Spark Programming Guide](rdd-programming-guide.html#rdd-persistence).
2186
2192
- Use more executors with smaller heap sizes. This will reduce the GC pressure within each JVM heap.
0 commit comments