Skip to content

Commit f746951

Browse files
committed
Mentioned performance problem with WAL
1 parent 7787209 commit f746951

File tree

2 files changed

+18
-10
lines changed

2 files changed

+18
-10
lines changed

docs/streaming-kafka-integration.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,8 @@ data loss on failures. This receiver is automatically used when the write ahead
5252
may reduce the receiving throughput of individual Kafka receivers compared to the unreliable
5353
receivers, but this can be corrected by running
5454
[more receivers in parallel](streaming-programming-guide.html#level-of-parallelism-in-data-receiving)
55-
to increase aggregate throughput. Also it is strongly recommended that the replication in the
56-
storage level be disabled when the write ahead log is enabled because the log is already stored
57-
in a replicated storage system. This is done using `KafkaUtils.createStream(...,
58-
StorageLevel.MEMORY_AND_DISK_SER)`.
55+
to increase aggregate throughput. Additionally, it is recommended that the replication of the
56+
received data within Spark be disabled when the write ahead log is enabled as the log is already stored
57+
in a replicated storage system. This can be done by setting the storage level for the input
58+
stream to `StorageLevel.MEMORY_AND_DISK_SER` (that is, use
59+
`KafkaUtils.createStream(..., StorageLevel.MEMORY_AND_DISK_SER)`).

docs/streaming-programming-guide.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1568,13 +1568,20 @@ To run a Spark Streaming applications, you need to have the following.
15681568
with Mesos.
15691569

15701570

1571-
- *Configuring write ahead logs (Spark 1.2+)* - Starting for Spark 1.2, we have introduced a new
1572-
feature of write ahead logs. If enabled, all the data received from a receiver gets written into
1571+
- *[Experimental in Spark 1.2] Configuring write ahead logs* - In Spark 1.2,
1572+
we have introduced a new experimental feature of write ahead logs for achieved strong
1573+
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
15731574
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
1574-
recovery, thus allowing zero data loss guarantees which is discussed in detail in the
1575-
[Fault-tolerance Semantics](#fault-tolerance-semantics) section. Enable this by setting the
1576-
[configuration parameter](configuration.html#spark-streaming)
1577-
`spark.streaming.receiver.writeAheadLogs.enable` to `true`.
1575+
recovery, thus ensuring zero data loss (discussed in detail in the
1576+
[Fault-tolerance Semantics](#fault-tolerance-semantics) section). This can be enabled by setting
1577+
the [configuration parameter](configuration.html#spark-streaming)
1578+
`spark.streaming.receiver.writeAheadLogs.enable` to `true`. However, this stronger semantics may
1579+
come at the cost of the receiving throughput of individual receivers. can be corrected by running
1580+
[more receivers in parallel](#level-of-parallelism-in-data-receiving)
1581+
to increase aggregate throughput. Additionally, it is recommended that the replication of the
1582+
received data within Spark be disabled when the write ahead log is enabled as the log is already
1583+
stored in a replicated storage system. This can be done by setting the storage level for the
1584+
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
15781585

15791586
### Upgrading Application Code
15801587
{:.no_toc}

0 commit comments

Comments
 (0)