This repository was archived by the owner on Nov 11, 2022. It is now read-only.
Version 2.0.0
The Dataflow SDK for Java 2.0.0 is the first stable 2.x release of the Dataflow SDK for Java, based on a subset of Apache Beam 2.0.0. See the Apache Beam 2.0.0 release notes for additional change information.
Note for users upgrading from version 1.x
This is a new major version, and therefore comes with the following caveats:
- Breaking Changes: The Dataflow SDK 2.x for Java has a number of breaking changes from the 1.x series of releases.
- Update Incompatibility: The Dataflow SDK 2.x for Java is update-incompatible with Dataflow 1.x. Streaming jobs using a Dataflow 1.x SDK cannot be updated to use a Dataflow 2.x SDK. Dataflow 2.x pipelines may only be updated across versions starting with SDK version 2.0.0.
Updates and improvements since 2.0.0-beta3
Version 2.0.0 is based on a subset of Apache Beam 2.0.0. The most relevant changes in this release for Cloud Dataflow customers include:
- Added new API in
BigQueryIOfor writing into multiple tables, possibly with different schemas, based on data. See BigQueryIO.Write.to(SerializableFunction) and BigQueryIO.Write.to(DynamicDestinations). - Added new API for writing windowed and unbounded collections to
TextIOandAvroIO. For example, see TextIO.Write.withWindowedWrites() and TextIO.Write.withFilenamePolicy(FilenamePolicy). - Added
TFRecordIOto read and write TensorFlow TFRecord files. - Added the ability to automatically register
CoderProviders in the defaultCoderRegistry.CoderProviders are registered by aServiceLoadervia concrete implementations of aCoderProviderRegistrar. - Changed order of parameters for
ParDowith side inputs and outputs. - Changed order of parameters for
MapElementsandFlatMapElementstransforms when specifying an output type. - Changed the pattern for reading and writing custom types to
PubsubIOandKafkaIO. - Changed the syntax for reading to and writing from
TextIO,AvroIO,TFRecordIO,KinesisIO,BigQueryIO. - Changed syntax for configuring windowing parameters other than the
WindowFnitself using theWindowtransform. - Consolidated
XmlSourceandXmlSinkintoXmlIO. - Renamed
CountingInputtoGenerateSequenceand unified the syntax for producing bounded and unbounded sequences. - Renamed
BoundedSource#splitIntoBundlesto#split. - Renamed
UnboundedSource#generateInitialSplitsto#split. - Output from
@StartBundleis no longer possible. Instead of accepting a parameter of typeContext, this method may optionally accept an argument of typeStartBundleContextto accessPipelineOptions. - Output from
@FinishBundlenow always requires an explicit timestamp and window. Instead of accepting a parameter of typeContext, this method may optionally accept an argument of typeFinishBundleContextto accessPipelineOptionsand emit output to specific windows. XmlIOis no longer part of the SDK core. It must be added manually using the newxml-iopackage.
More information
Please see Cloud Dataflow documentation and release notes for version 2.0.