Skip to content

Conversation

@GuoPhilipse
Copy link
Owner

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

monthonk and others added 30 commits June 8, 2022 19:05
#3877)


Adds a new option fs.s3a.create.storage.class which can
be used to set the storage class for files created in AWS S3.
Consult the documentation for details and instructions on how
disable the relevant tests when testing against third-party
stores.

Contributed by Monthon Klongklaew
…, updateApplicationTimeouts API's for Federation (#4396)
…-csi (#4417)


This is a followup to HADOOP-18275 and its upgrade of os-maven-plugin.version
When that change is merged in, this MUST follow it.

Contributed by Steve Loughran
Co-authored-by: Ashutosh Gupta <[email protected]>
Reviewed-by: Tao Li <[email protected]>
Signed-off-by: Akira Ajisaka <[email protected]>
* jnihelper.c in HDFS native client uses
  dirent.h. This header file isn't available
  on Windows.
* This PR provides a cross platform
  compatible implementation for dirent
  under the XPlatform library.
* HDFS-16623. Avoid IllegalArgumentException in LifelineSender

Co-authored-by: zengqiang.xu <[email protected]>
…4366). Contributed by ZanderXu.

Reviewed-by: Mingxiang Li <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
* We use the TARGET_FILE CMake
  generator expression to get
  the location of the
  protoc-gen-hrpc CMake target.
* Maven runs the ember build script.
  The environment variable TMPDIR was
  set as per bash syntax.
* This failed on Windows since the
  Windows command prompt doesn't
  support bash syntax.
* We're now detecting the OS and
  setting a Maven property
  "emberBuildScript" in a cross
  platform compatible way.
…ros login user. (#4424). Contributed by Xiping Zhang.
…ist.txt (#4444)


Bump cos_api-bundle to 5.6.69

All copies of httpclient, including shaded ones in libraries used
by the s3a, gs and cos cloud connectors, turn out to load their
TLD list from the same resource mozilla/public-suffix-list.txt 

Updating the hadoop-cos dependency ensures that its version 
of public-suffix-list.txt is up to date -and so the s3a connector 
able to talk to s3 resources if the cos-api-bundle JAR is where
the resource is loaded from.

Contributed by André Fonseca
Co-authored-by: slfan1989 <louj1988@@>
…ode. (#4367). Contributed by ZanderXu.

Reviewed-by: Mingxiang Li <[email protected]>
Reviewed-by: Ayush Saxena <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
	created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.  	

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename. 

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.


Contributed by Steve Loughran.
…DFS-16563. (#4408)


Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.

Contributed by zhengchenyu
…mber of usable replicas (#4410)

Co-authored-by: Kevin Wikant <[email protected]>
Signed-off-by: Akira Ajisaka <[email protected]>
Co-authored-by: Ashutosh Gupta <[email protected]>

Reviewed by Akira Ajisaka.
Samrat002 and others added 29 commits June 22, 2022 10:17
…-16202 (#4472)


Fixing a mockito-based test which broke when HADOOP-16202
changed the methods being invoked.

Contributed by Steve Loughran
…d even if multiple log aggregation file controllers are configured. Contributed by Szilard Nemeth.
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.

Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.

Contributed By: Owen O'Malley and Mukund Thakur
… maxReadSizeForVectorReads (#3964)

Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.

Contributed By: Mukund Thakur
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.

Contributed By: Mukund Thakur
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
This feature adds methods for ranged vectored read operations
in PositionedReadable.

All stream which implement that interface support the new API.

The default implementation reads each range in the vector
sequentially.

However, specific implementations may provide higher performance
versions. This is done in two places

* Local FileSystem/Checksum FileSystem
* The S3A client.

The S3A client first coalesces adjacent and "nearby" ranges
together, then fetches each range in separate HTTP GET requests,
executed in parallel. As such it delivers significant speedups
to applications reading separate blocks of data from the same
file, columnar data format libraries in particular.

This is the merge commit of the feature branch; the work is in

HADOOP-11867. Add a high-performance vectored read API.
HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads.
HADOOP-18107. Adding scale test for vectored reads for large file
HADOOP-18105. Implement buffer pooling with weak references.
HADOOP-18106. Handle memory fragmentation in S3A Vectored IO.

Contributed By: Owen O'Malley and Mukund Thakur
…. Contributed by fanshilun.

Signed-off-by: Ayush Saxena <[email protected]>
…gation (#4486)

* YARN-10320.Replace FSDataInputStream#read with readFully in Log Aggregation

Co-authored-by: Ashutosh Gupta <[email protected]>
…onStore#confirmMutation (#4487)

Co-authored-by: Ashutosh Gupta <[email protected]>
…n some cases (#4452)

* HDFS-16633.Reserved Space For Replicas is not released on some cases

Co-authored-by: Ashutosh Gupta <[email protected]>
Update the dependencies of the LDAP libraries used for testing:

ldap-api.version = 2.0.0
apacheds.version = 2.0.0.AM26

Contributed by Colm O hEigeartaigh.
…omplete state (#4331)


ABFS rename fails intermittently when the Storage-blob tracking
metadata is in an incomplete state. This surfaces as the error code
404 and an error message of "RenameDestinationParentPathNotFound"

To mitigate this issue, when a request fails with this response.
the ABFS client issues a HEAD call on the source file
and then retries the rename operation again

ABFS filesystem statistics track when this occurs with new counters
  rename_recovery
  metadata_incomplete_rename_failures
  rename_path_attempts

This is very rare occurrence and appears to be triggered under certain
heavy load conditions, just as with HADOOP-18163.

Contributed by Mehakmeet Singh.
…user not present on client (#4474). Contributed by swamirishi.
…etionService.stopRMClient. Contributed by Szilard Nemeth.
…Tamas Domok.

Change-Id: I55ddb46fd0e4cdb644747d6d43083215f10861b5
…emoving queue which is referred in queue mapping (#4515)

* YARN-10287.Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping

Co-authored-by: Ashutosh Gupta <[email protected]>
…ation table instead of entity table (#4516)

Co-authored-by: Ashutosh Gupta <[email protected]>
@GuoPhilipse GuoPhilipse merged commit b5b8482 into GuoPhilipse:trunk Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.