AWS: Add Retries to Analytics Stream #13739
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
We have integrated analytics-accelerator-s3 into Iceberg in PR #12299. Currently, Iceberg customers need to enable the s3.analytics-accelerator.enabled flag in S3FileIOProperties to use the library. There is a proposal to make Analytics stream the default input stream: https://docs.google.com/document/d/13shy0RWotwfWC_qQksb95PXdi-vSUCKQyDzjoExQEN0
Description of Change
Starting from 1.2.2 AAL will allow consumers to pass Retry strategy to execute on Input Stream from S3 to AAL. (awslabs/analytics-accelerator-s3#340)
With this change we are using that stream to ensure parity with S3SeekableInputStream and Analytics Stream on Iceberg.
Unlike current retries where stream has to be re-opened from the last-read position, in these retries, we do not need to do anything on the stream level as AAL will ensure reads are idempotent.
Testing
Extended FlakyInputStream tests to AAL and confirmed all tests are passing in the presence of exceptions and non-retriable exceptions are not retries.