-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19795: ABFS. GetPathStatus Optimization on OpenFileForRead #8212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Test Results============================================================
|
|
💔 -1 overall
This message was automatically generated. |
| getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT | ||
| || ((VersionedFileStatus) fileStatus).getEncryptionContext() | ||
| != null)) { | ||
| getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional space changes can be reverted
| contextEncryptionAdapter = new ContextProviderEncryptionAdapter( | ||
| getClient().getEncryptionContextProvider(), getRelativePath(path), | ||
| fileEncryptionContext.getBytes(StandardCharsets.UTF_8)); | ||
| getClient().getEncryptionContextProvider(), getRelativePath(path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| encryptionContext.getBytes(StandardCharsets.UTF_8)); | ||
| } | ||
| } else { | ||
| if (parseIsDirectory(resourceType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved to common part as is getting checked in both the cases
| * - restrictGpsOnOpenFile config is enabled with null FileStatus and encryptionType not as ENCRYPTION_CONTEXT | ||
| * In this case, we don't need to call GetPathStatus API. | ||
| */ | ||
| else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this lead to going ahead and opening the stream without checks? Do we fail later for this case ?
| INVALID_APPEND_OPERATION("InvalidAppendOperation", HttpURLConnection.HTTP_CONFLICT, null), | ||
| UNAUTHORIZED_BLOB_OVERWRITE("UnauthorizedBlobOverwrite", HttpURLConnection.HTTP_FORBIDDEN, | ||
| "This request is not authorized to perform blob overwrites."), | ||
| INVALID_RANGE("InvalidRange", 416, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
416 should come from a constant defined in HttpURLConnection class
| // Reset Read Type back to normal and set again based on code flow. | ||
| getTracingContext().setReadType(ReadType.NORMAL_READ); | ||
| if (shouldAlwaysReadBufferSize()) { | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
| // Reset Read Type back to normal and set again based on code flow. | ||
| getTracingContext().setReadType(ReadType.NORMAL_READ); | ||
| if (shouldAlwaysReadBufferSize()) { | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment for this condition as well
| private final int footerReadSize; // default buffer size to read when reading footer | ||
| private final int readAheadQueueDepth; // initialized in constructor | ||
| private final String eTag; // eTag of the path when InputStream are created | ||
| private String eTag; // eTag of the path when InputStream are created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: InputStream is created
| } | ||
| } | ||
|
|
||
| String getRelativePath(final Path path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc missing
| tracingContext, | ||
| contextEncryptionAdapter).getResult(); | ||
|
|
||
| String resourceType = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use client.checkIsDir method which handles null case as well
| } | ||
| contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE). | ||
| split(AbfsHttpConstants.FORWARD_SLASH)[1]); | ||
| eTag = op.getResult().getResponseHeader("ETag"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use extractEtagHeader method
| if (Objects.equals(resourceType, DIRECTORY)) { | ||
| throw directoryReadException(); | ||
| } | ||
| contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same use extractContentLen method
| if (ere.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { | ||
| throw new FileNotFoundException(ere.getMessage()); | ||
| int status = ere.getStatusCode(); | ||
| if(ere.getErrorMessage().contains(readOnDirectoryErrorMsg)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
|
|
||
| } catch (AzureBlobFileSystemException gpsEx) { | ||
| AbfsRestOperationException gpsEre = (AbfsRestOperationException) gpsEx; | ||
| if(gpsEre.getErrorMessage().contains(readOnDirectoryErrorMsg)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
| } | ||
|
|
||
| // Default: propagate original error | ||
| throw new IOException(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be done once only on line 715
| This is required since contentLength is not available yet to determine prefetch block size. | ||
| */ | ||
| bytesRead = readInternal(getFCursor(), getBuffer(), 0, getBufferSize(), false); | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
|
Description of PR
JIRA: https://issues.apache.org/jira/browse/HADOOP-19795
We do a getPathStatus call during file open for read. This call is primarily used to fetch the file’s metadata properties before the actual read begins.
We are now introducing an optional, config-driven read flow that avoids the getPathStatus call during open and instead derives required metadata from the read response itself.
How was this patch tested?
New tests were added and test suite was run. Adding the results in comments below