-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16963] [STREAMING] [SQL] Changes to Source trait and related implementation classes #14553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
6c9acde
dae72ff
f78b4d5
cf426fa
c028432
f92a9a7
4cd181d
fcc90bd
35cdae9
9096c56
ecaf732
5638281
43ffbf3
f5c15f8
a79c557
7c6a30d
5e340c2
128f7fe
6334a4b
09e4b8e
aaf0307
947b510
ed887ca
ec67429
e7ef7ab
7d98c6b
c726549
47eee52
46f6411
d9eaf5a
0a56e4a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,21 +25,42 @@ import org.apache.spark.sql.types.StructType | |
| * monotonically increasing notion of progress that can be represented as an [[Offset]]. Spark | ||
| * will regularly query each [[Source]] to see if any more data is available. | ||
| */ | ||
| trait Source { | ||
| trait Source { | ||
|
|
||
| /** Returns the schema of the data from this source */ | ||
| def schema: StructType | ||
|
|
||
| /** Returns the maximum available offset for this source. */ | ||
| def getOffset: Option[Offset] | ||
| /** | ||
| * Returns the highest offset that this source has <b>removed</b> from its internal buffer | ||
| * in response to a call to `commit`. | ||
| * Returns `None` if this source has not removed any data. | ||
| */ | ||
| def getMinOffset: Option[Offset] | ||
|
|
||
| /** | ||
| * Returns the maximum available offset for this source. | ||
| * Returns `None` if this source has never received any data. | ||
| */ | ||
| def getMaxOffset: Option[Offset] | ||
|
|
||
| /** | ||
| * Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then | ||
| * the batch should begin with the first available record. This method must always return the | ||
| * same data for a particular `start` and `end` pair. | ||
| * Returns the data that is between the offsets (`start`, `end`]. When `start` is `None`, | ||
| * then the batch should begin with the first record. This method must always return the | ||
| * same data for a particular `start` and `end` pair; even after the Source has been restarted | ||
| * on a different node. | ||
| * <p> | ||
|
||
| * Higher layers will always call this method with a value of `start` greater than or equal | ||
| * to the last value passed to `commit` and a value of `end` less than or equal to the | ||
| * last value returned by `getMaxOffset` | ||
|
||
| */ | ||
| def getBatch(start: Option[Offset], end: Offset): DataFrame | ||
|
|
||
| /** | ||
| * Informs the source that Spark has completed processing all data for offsets less than or | ||
| * equal to `end` and will only request offsets greater than `end` in the future. | ||
| */ | ||
| def commit(end: Offset) | ||
|
|
||
| /** Stop this source and free any resources it has allocated. */ | ||
| def stop(): Unit | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in my local copy.