HBASE-26353 Support loadable dictionaries in hbase-compression-zstd #3748

apurtell · 2021-10-13T01:49:51Z

[ Requires #3730 ]

ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression

Example:

Training:

$ zstd --maxdict=1126400 --train-fastcover=shrink \
    -o mytable.dict training_files/*
Trying 82 different sets of parameters
...
k=674                                      
d=8
f=20
steps=40
split=75
accel=1
Save dictionary of size 1126400 into file mytable.dict

Deploy the dictionary file to HDFS, or S3, etc.

Create the table:

hbase> create "mytable", 
  ... ,
  CONFIGURATION => {
    'hbase.io.compress.zstd.level' => '6',
    'hbase.io.compress.zstd.dictionary' => true,
    'hbase.io.compress.zstd.dictonary.file' => 'hdfs://nn/zdicts/mytable.dict'
  }

Now start storing data. Compression results even for small values will be excellent.

Note: Beware, if the dictionary is lost, the data will not be decompressable.

Apache9 · 2021-10-13T02:09:08Z

Note: Beware, if the dictionary is lost, the data will not be decompressable.

Haven't read the code yet, but is it possible to copy the dict into the hbase storage so it is controlled by us?

Apache-HBase · 2021-10-13T02:22:30Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	3m 52s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 10s	master passed
+1 💚	compile	0m 21s	master passed
+1 💚	shadedjars	8m 13s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 19s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 52s	the patch passed
+1 💚	compile	0m 21s	the patch passed
+1 💚	javac	0m 21s	the patch passed
+1 💚	shadedjars	8m 21s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 16s	the patch passed
		_ Other Tests _
+1 💚	unit	0m 50s	hbase-compression-zstd in the patch passed.
		31m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 208c330d222f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/testReport/
Max. process+thread count	286 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2021-10-13T02:21:06Z

...hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdCodec.java

    return size > 0 ? size : 256 * 1024; // Don't change this default
  }

+  static LoadingCache<Configuration,byte[]> CACHE = CacheBuilder.newBuilder()


Using Configuration as the key makes me a bit nervous, although after checking the code, there is no hashCode and equals methods in Configuration so it will perform like IdentityHashMap...

So is it possible to use the file name as the map key here? I suppose different tables could use the same dict.

This is definitely a concern.

In the latest version of the patch I override hashCode in CompoundConfiguration so we are doing something better than object identity when caching the dictionaries for the store writer case. It is kind of expensive to compute the hashCode given how CompoundConfiguration works but at least we do not do it that often, and not in performance critical code. Once a compressor or decompressor is created it is reused for the lifetime of the reader or writer. Otherwise we are using object identity. That is not the worst thing, at least. The cache is capped at 100 and will also expire entries if they are not used for one hour.

Let me try your suggestion. I was thinking we could avoid doing two lookups into the Configuration -- to get the boolean, and then the path, for the key -- but that hashCode calculation is pretty expensive. Getting the path from the configuration object and using that would be less.

Apache9 · 2021-10-13T02:22:04Z

...hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdCodec.java

+          final Path p = new Path(s);
+          final ByteArrayOutputStream baos = new ByteArrayOutputStream();
+          final byte[] buffer = new byte[8192];
+          try (final FSDataInputStream in = FileSystem.get(p.toUri(), conf).open(p)) {


Do we need to limit the max dict size here? If an user create a table with a very large dict file, it could bring down the whole cluster if we do not truncate here?

Yes. If there is a size limit and it is exceeded the codec load should be rejected by throwing a RuntimeException probably.

Apache-HBase · 2021-10-13T02:26:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 33s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	6m 21s	master passed
+1 💚	compile	0m 22s	master passed
+1 💚	shadedjars	9m 49s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 19s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	5m 2s	the patch passed
+1 💚	compile	0m 20s	the patch passed
+1 💚	javac	0m 20s	the patch passed
+1 💚	shadedjars	9m 6s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 16s	the patch passed
		_ Other Tests _
+1 💚	unit	0m 47s	hbase-compression-zstd in the patch passed.
		35m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 62718614cfda 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/testReport/
Max. process+thread count	276 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T02:29:50Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	4m 9s	master passed
+1 💚	compile	0m 30s	master passed
+1 💚	checkstyle	0m 15s	master passed
+1 💚	spotbugs	0m 33s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 52s	the patch passed
+1 💚	compile	0m 26s	the patch passed
-0 ⚠️	javac	0m 26s	hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	19m 30s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	0m 41s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 15s	The patch does not generate ASF License warnings.
		39m 5s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 9548123745c5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javac	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count	95 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T17:54:51Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 26s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 44s	master passed
+1 💚	compile	0m 45s	master passed
+1 💚	shadedjars	8m 17s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 51s	the patch passed
+1 💚	compile	0m 45s	the patch passed
+1 💚	javac	0m 45s	the patch passed
+1 💚	shadedjars	8m 15s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	the patch passed
		_ Other Tests _
-1 ❌	unit	1m 5s	hbase-common in the patch failed.
+1 💚	unit	0m 48s	hbase-compression-zstd in the patch passed.
		31m 23s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 7a564a7fca1a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-common.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/testReport/
Max. process+thread count	341 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T18:00:17Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 5s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 9s	master passed
+1 💚	compile	0m 46s	master passed
+1 💚	shadedjars	9m 8s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	5m 3s	the patch passed
+1 💚	compile	0m 45s	the patch passed
+1 💚	javac	0m 45s	the patch passed
+1 💚	shadedjars	9m 10s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	the patch passed
		_ Other Tests _
-1 ❌	unit	1m 36s	hbase-common in the patch failed.
+1 💚	unit	0m 47s	hbase-compression-zstd in the patch passed.
		36m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 4c1685836c68 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-11.0.10+9
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-common.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/testReport/
Max. process+thread count	271 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T18:12:22Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 30s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 20s	master passed
+1 💚	compile	1m 25s	master passed
+1 💚	checkstyle	0m 41s	master passed
+1 💚	spotbugs	1m 21s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 10s	the patch passed
+1 💚	compile	1m 21s	the patch passed
-0 ⚠️	javac	0m 27s	hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️	checkstyle	0m 26s	hbase-common: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	22m 0s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	1m 42s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 27s	The patch does not generate ASF License warnings.
		48m 51s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 61c6a7457421 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javac	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T19:06:04Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 25s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 48s	master passed
+1 💚	compile	0m 45s	master passed
+1 💚	shadedjars	8m 14s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 55s	the patch passed
+1 💚	compile	0m 46s	the patch passed
+1 💚	javac	0m 46s	the patch passed
+1 💚	shadedjars	8m 16s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 52s	hbase-common in the patch passed.
+1 💚	unit	0m 48s	hbase-compression-zstd in the patch passed.
		32m 12s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 3c0e0fe83539 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/testReport/
Max. process+thread count	341 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T19:11:34Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 3s	Docker mode activated.
-0 ⚠️	yetus	0m 2s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 4s	master passed
+1 💚	compile	0m 47s	master passed
+1 💚	shadedjars	9m 7s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	5m 5s	the patch passed
+1 💚	compile	0m 47s	the patch passed
+1 💚	javac	0m 47s	the patch passed
+1 💚	shadedjars	9m 9s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 41s	hbase-common in the patch passed.
+1 💚	unit	0m 46s	hbase-compression-zstd in the patch passed.
		37m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 1fc6430da8a9 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/testReport/
Max. process+thread count	257 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-13T19:17:50Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 29s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 19s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 2s	master passed
+1 💚	compile	1m 17s	master passed
+1 💚	checkstyle	0m 38s	master passed
+1 💚	spotbugs	1m 15s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 50s	the patch passed
+1 💚	compile	1m 16s	the patch passed
-0 ⚠️	javac	0m 27s	hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️	checkstyle	0m 25s	hbase-common: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	19m 22s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	1m 33s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 27s	The patch does not generate ASF License warnings.
		44m 0s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 6422a3dc0aa9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ede4d27`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javac	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

apurtell · 2021-10-13T20:49:12Z

Let me re-run the performance evaluation from HBASE-26259 but with synthetic small value data and compare speed and efficiency with precomputed dictionary vs without. Gains are expected but I'd like to present some hard comparison data here.

apurtell · 2021-10-13T20:52:50Z

Haven't read the code yet, but is it possible to copy the dict into the hbase storage so it is controlled by us?

@Apache9 I was thinking about writing the dictionary used to compress values in an HFile or WAL into the HFile or WAL in the metadata section, but there would need to be format extensions to the WAL (perhaps just an extra field in the header and/or trailer PB). Hopefully there can be some re-use of meta blocks for HFiles. But this raises questions. There should be some way for a codec to read and write metadata into the container of the thing they are processing, but we don't have API support for that. I would consider it future work, but definitely of interest. The interest is ensuring that HFiles have all of the information they need to read themselves added at write time.

Otherwise I think the current scheme is ok. The operator is already in charge of their table schema and compression codec dependencies (like deployment of native link libraries). This is an incremental responsibility... if you put a compression dictionary attribute into your schema, don't lose the dictionary.

Mostly it is already true that HFiles carry all of the information within their trailer or meta blocks a reader requires to process them. I can think of one exception, that being encryption, where the data encryption key (DEK) is stored in the HFile, but the master encryption key (MEK) used to encrypt the DEK is by design kept in a trust store or HSM and if the MEK is lost all data is not decryptable. There are some parallels between external MEK data and external compression dictionary data. One could claim the same general rules for managing them apply. The difference is the dictionary is not sensitive and can be copied into the file, whereas the master encryption key must be carefully guarded and not written colocated with data.

Apache9

Overall LGTM.

Just some simple nits, and please fix the checkstyle and javac issues if possible.

Apache9 · 2021-10-14T02:05:16Z

...hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdCodec.java

+    if (DICTIONARY_CACHE == null) {
+      synchronized (ZstdCodec.class) {
+        if (DICTIONARY_CACHE == null) {
+          DICTIONARY_CACHE = CacheBuilder.newBuilder()


nits: better abstract the creation code to a separated method? It could make the code easier to read.

Apache9 · 2021-10-14T02:06:11Z

...hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdCodec.java

+                      n = in.read(buffer);
+                      if (n > 0) {
+                        baos.write(buffer, 0, n);
+                         }


nits: indent

Apache-HBase · 2021-10-14T02:12:59Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 26s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 9s	master passed
+1 💚	compile	0m 21s	master passed
+1 💚	shadedjars	8m 13s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 19s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 53s	the patch passed
+1 💚	compile	0m 20s	the patch passed
+1 💚	javac	0m 20s	the patch passed
+1 💚	shadedjars	8m 9s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 17s	the patch passed
		_ Other Tests _
+1 💚	unit	0m 49s	hbase-compression-zstd in the patch passed.
		28m 12s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux e587a4ac0486 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4d27c47`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/testReport/
Max. process+thread count	277 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-14T02:16:44Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 2s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	5m 18s	master passed
+1 💚	compile	0m 20s	master passed
+1 💚	shadedjars	9m 8s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 18s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	5m 4s	the patch passed
+1 💚	compile	0m 20s	the patch passed
+1 💚	javac	0m 20s	the patch passed
+1 💚	shadedjars	9m 6s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 16s	the patch passed
		_ Other Tests _
+1 💚	unit	0m 46s	hbase-compression-zstd in the patch passed.
		32m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux c0328d761d07 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4d27c47`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/testReport/
Max. process+thread count	262 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-14T02:24:19Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 31s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	3m 48s	master passed
+1 💚	compile	0m 25s	master passed
+1 💚	checkstyle	0m 12s	master passed
+1 💚	spotbugs	0m 27s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 8s	the patch passed
+1 💚	compile	0m 26s	the patch passed
-0 ⚠️	javac	0m 26s	hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️	checkstyle	0m 13s	hbase-compression/hbase-compression-zstd: The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	20m 18s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	0m 41s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 12s	The patch does not generate ASF License warnings.
		40m 19s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 3e331d211b0f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4d27c47`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javac	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count	95 (vs. ulimit of 30000)
modules	C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-15T02:56:36Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 28s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 19s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 8s	master passed
+1 💚	compile	6m 38s	master passed
+1 💚	checkstyle	2m 38s	master passed
+1 💚	spotbugs	5m 39s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 48s	the patch passed
+1 💚	compile	6m 35s	the patch passed
-0 ⚠️	javac	0m 26s	hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️	checkstyle	0m 28s	hbase-common: The patch generated 1 new + 1 unchanged - 2 fixed = 2 total (was 3)
-0 ⚠️	checkstyle	1m 7s	hbase-server: The patch generated 1 new + 85 unchanged - 2 fixed = 86 total (was 87)
-0 ⚠️	checkstyle	0m 13s	hbase-compression/hbase-compression-zstd: The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	32m 43s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	8m 9s	the patch passed
		_ Other Tests _
+1 💚	asflicense	1m 26s	The patch does not generate ASF License warnings.
		88m 14s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 3bc458b0d1b9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ad7d698`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javac	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-15T04:58:46Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 28s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 30s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 48s	master passed
+1 💚	compile	3m 14s	master passed
+1 💚	shadedjars	8m 13s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	2m 24s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 49s	the patch passed
+1 💚	compile	3m 16s	the patch passed
+1 💚	javac	3m 16s	the patch passed
+1 💚	shadedjars	8m 13s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 38s	hbase-server generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21)
		_ Other Tests _
+1 💚	unit	1m 53s	hbase-common in the patch passed.
+1 💚	unit	151m 26s	hbase-server in the patch passed.
+1 💚	unit	11m 43s	hbase-mapreduce in the patch passed.
+1 💚	unit	1m 19s	hbase-compression-aircompressor in the patch passed.
+1 💚	unit	1m 5s	hbase-compression-lz4 in the patch passed.
+1 💚	unit	1m 5s	hbase-compression-snappy in the patch passed.
+1 💚	unit	1m 5s	hbase-compression-zstd in the patch passed.
		210m 0s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 39f7b60bc3f1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ad7d698`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/testReport/
Max. process+thread count	4586 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-15T06:05:55Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 3s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 18s	master passed
+1 💚	compile	3m 40s	master passed
+1 💚	shadedjars	9m 11s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	2m 33s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 58s	the patch passed
+1 💚	compile	3m 37s	the patch passed
+1 💚	javac	3m 37s	the patch passed
+1 💚	shadedjars	9m 11s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 42s	hbase-server generated 1 new + 86 unchanged - 0 fixed = 87 total (was 86)
		_ Other Tests _
+1 💚	unit	2m 39s	hbase-common in the patch passed.
-1 ❌	unit	209m 4s	hbase-server in the patch failed.
+1 💚	unit	15m 28s	hbase-mapreduce in the patch passed.
+1 💚	unit	1m 23s	hbase-compression-aircompressor in the patch passed.
+1 💚	unit	0m 55s	hbase-compression-lz4 in the patch passed.
+1 💚	unit	0m 57s	hbase-compression-snappy in the patch passed.
+1 💚	unit	0m 57s	hbase-compression-zstd in the patch passed.
		277m 8s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 1492b053507d 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ad7d698`
Default Java	AdoptOpenJDK-11.0.10+9
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/testReport/
Max. process+thread count	3078 (vs. ulimit of 30000)
modules	C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

apurtell · 2021-10-15T23:26:43Z

Here is the performance test result.

I wrote an integration test that simulates a location data tracking use case. It writes 10 million rows, each row has a 64-bit random row key (not important), one column family, with four qualifiers, one for: first name, last name, latitude (encoded as an integer with scale of 3), and longitude (also encoded as an integer with scale of 3). Details aren't really important except to say the character strings are short, corresponding with typical length for English first and last names, and there are two 32-bit integer values. The 32-bit integer values are generated with a zipfian distribution to reduce entropy and allow for potentially successful dictionary compression. But they are also short. When creating the table the IT specified a block size of 1K. Perhaps not unreasonable for a heavily indexed use case with short values. I could have achieved a higher compression ratio if the row keys were sequential instead of completely random. This is not really important.

I also wrote a simple utility that iterates over an HFile and saves each DATA or ENCODED_DATA block as a separate file somewhere else, just the block data. These files were used as the training set for zstd. I extracted a training set of 20,000 blocks to train a 1MB dictionary. The parameters I used for training with zstd were basic and not especially tuned. I am not expert in this aspect of ZStandard so can't estimate how much additional gain is possible.

The results demonstrate compression speed improvements as expected (a 22-33% improvement), as described by the ZStandard documentation. They also demonstrate efficiency gains (a modest 6-8%), especially in combination with higher levels, where even modest gains are meaningful at scale. Specifying higher levels is more affordable because of the relative speedups at each level. There is a demonstration of meaningful gains in just this simple case, with potential for more benefits when applied by someone with expert knowledge. It seems reasonable to support this feature.

No Dictionary

Level	On Disk Size	Compression	Compaction Time (sec)
-	1,686,075,803	-	-

1	767,926,618	54.5%	42
3	756,427,617	55.1%	37
5	746,302,550	55.7%	48
6	744,741,449	55.8%	50
7	744,701,778	55.8%	54
12	731,150,341	56.6%	115

With Dictionary

Level	On Disk Size	Compression	Compaction Time (sec)

1	679,408,139	59.7%	28
3	652,587,956	61.3%	31
5	630,927,508	62.6%	37
6	632,251,996	62.5%	39
7	625,972,642	62.9%	56
12	626,293,580	62.9%	89

Let me clean up checkstyle and other review feedback and merge this, after merging the prerequisite PR for HBASE-26316 first.

apurtell · 2021-10-17T18:55:34Z

Just to double check, I re-ran the earlier described test, except when generating the test data it only emitted:

10 million rows
A 64-bit monotonically increasing row key
Two values, both 32 bit integers, generated using random number generators obeying a zipfian distribution (using our RandomDistribution.Zipf with a sigma of 1.2)

When training the dictionary I gave the trainer the parameters k=32 (bit width to enter into the dictionary) and d=8 (stride for walking over content, in bits). This is a good approximation of designing these parameters with intent in a real use case. The result demonstrates significant speedups in compression as advertised and allows for achieving a better overall compression by enabling higher compression levels given an equivalent time budget as a no dictionary case.

Integers Only, No Dictionary

Level	On Disk Size	Compression	Compaction Time (sec)
1	261,658,729	68.3%	21
3	251,343,431	69.6%	22
5	251,968,603	69.5%	25
6	251,467,677	69.5%	26
7	251,509,580	69.5%	27
12	235,410,126	71.5%	51

Integers Only, With Dictionary (k=32,d=8)

Level	On Disk Size	Compression	Compaction Time (sec)
1	248,971,553	69.8%	13
3	248,528,035	69.9%	14
5	245,846,087	70.2%	16
6	245,705,224	70.2%	17
7	226,998,954	72.5%	25
12	226,796,109	72.5%	39
15	226,553,944	72.6%	44
18	~~216,373,878~~	~~73.8%~~	~~153~~
22	~~216,373,736~~	~~73.8%~~	~~165~~

This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.

Apache-HBase · 2021-10-18T23:19:54Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 25s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 29s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 46s	master passed
+1 💚	compile	0m 45s	master passed
+1 💚	shadedjars	8m 9s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 50s	the patch passed
+1 💚	compile	0m 46s	the patch passed
+1 💚	javac	0m 46s	the patch passed
+1 💚	shadedjars	8m 16s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 52s	hbase-common in the patch passed.
+1 💚	unit	0m 48s	hbase-compression-zstd in the patch passed.
		32m 17s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux a227284481a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `cadac18`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/testReport/
Max. process+thread count	340 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-18T23:25:08Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 2s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 59s	master passed
+1 💚	compile	0m 46s	master passed
+1 💚	shadedjars	9m 8s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	5m 5s	the patch passed
+1 💚	compile	0m 47s	the patch passed
+1 💚	javac	0m 47s	the patch passed
+1 💚	shadedjars	9m 8s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 38s	hbase-common in the patch passed.
+1 💚	unit	0m 45s	hbase-compression-zstd in the patch passed.
		37m 33s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 76087b21734a 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `cadac18`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/testReport/
Max. process+thread count	258 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-18T23:30:28Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 25s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 46s	master passed
+1 💚	compile	1m 13s	master passed
+1 💚	checkstyle	0m 36s	master passed
+1 💚	spotbugs	1m 14s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 45s	the patch passed
+1 💚	compile	1m 16s	the patch passed
+1 💚	javac	1m 16s	the patch passed
-0 ⚠️	checkstyle	0m 24s	hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️	whitespace	0m 0s	The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	hadoopcheck	18m 57s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	1m 35s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 25s	The patch does not generate ASF License warnings.
		42m 46s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 6c2f90885f4c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `cadac18`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	95 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.

apurtell · 2021-10-19T00:49:34Z

Rebase

Apache-HBase · 2021-10-19T01:22:34Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 25s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 29s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 45s	master passed
+1 💚	compile	0m 45s	master passed
+1 💚	shadedjars	8m 22s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 19s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 47s	the patch passed
+1 💚	compile	0m 45s	the patch passed
+1 💚	javac	0m 45s	the patch passed
+1 💚	shadedjars	8m 12s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 52s	hbase-common in the patch passed.
+1 💚	unit	0m 48s	hbase-compression-zstd in the patch passed.
		32m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 1f7b3429b63e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `26ab9d0`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/testReport/
Max. process+thread count	357 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T01:24:11Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 30s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 24s	master passed
+1 💚	compile	0m 47s	master passed
+1 💚	shadedjars	8m 19s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 43s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 30s	the patch passed
+1 💚	compile	0m 48s	the patch passed
+1 💚	javac	0m 48s	the patch passed
+1 💚	shadedjars	8m 20s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 43s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 54s	hbase-common in the patch passed.
+1 💚	unit	0m 44s	hbase-compression-zstd in the patch passed.
		33m 59s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux bc64dc5f4b2f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `26ab9d0`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/testReport/
Max. process+thread count	290 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T01:46:56Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 34s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 19s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 33s	master passed
+1 💚	compile	1m 43s	master passed
+1 💚	checkstyle	0m 44s	master passed
+1 💚	spotbugs	1m 33s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 41s	the patch passed
+1 💚	compile	1m 23s	the patch passed
+1 💚	javac	1m 23s	the patch passed
-0 ⚠️	checkstyle	0m 27s	hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️	whitespace	0m 0s	The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	hadoopcheck	23m 45s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	2m 34s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 33s	The patch does not generate ASF License warnings.
		56m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux aa026114d7d2 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `26ab9d0`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression https://github.com/facebook/zstd#the-case-for-small-data-compression

apurtell · 2021-10-19T19:23:19Z

Rebase to resolve expected conflicts after #3730

Apache-HBase · 2021-10-19T19:58:12Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 26s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 39s	master passed
+1 💚	compile	0m 49s	master passed
+1 💚	shadedjars	8m 17s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 44s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 31s	the patch passed
+1 💚	compile	0m 48s	the patch passed
+1 💚	javac	0m 48s	the patch passed
+1 💚	shadedjars	8m 12s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 44s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 53s	hbase-common in the patch passed.
+1 💚	unit	0m 46s	hbase-compression-zstd in the patch passed.
		33m 56s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 81e753d11848 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/testReport/
Max. process+thread count	289 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T20:06:42Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 40s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 37s	master passed
+1 💚	compile	0m 57s	master passed
+1 💚	shadedjars	10m 59s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 42s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 58s	the patch passed
+1 💚	compile	0m 46s	the patch passed
+1 💚	javac	0m 46s	the patch passed
+1 💚	shadedjars	10m 25s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 44s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 37s	hbase-common in the patch passed.
+1 💚	unit	0m 52s	hbase-compression-zstd in the patch passed.
		42m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux ed667c039c4e 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/testReport/
Max. process+thread count	270 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T20:16:17Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 29s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 33s	Maven dependency ordering for branch
+1 💚	mvninstall	3m 59s	master passed
+1 💚	compile	1m 19s	master passed
+1 💚	checkstyle	0m 39s	master passed
+1 💚	spotbugs	1m 14s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 47s	the patch passed
+1 💚	compile	1m 16s	the patch passed
+1 💚	javac	1m 16s	the patch passed
-0 ⚠️	checkstyle	0m 24s	hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️	checkstyle	0m 12s	hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️	whitespace	0m 0s	The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	hadoopcheck	24m 51s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	1m 57s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 24s	The patch does not generate ASF License warnings.
		52m 8s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 639523a3a9c2 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

apurtell · 2021-10-19T20:37:55Z

@Apache9 I merged based on your prior approval. If you disagree with this action please let me know and I will revert/restart this PR.

Apache-HBase · 2021-10-19T20:55:12Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 5s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 11s	master passed
+1 💚	compile	0m 47s	master passed
+1 💚	shadedjars	9m 16s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 41s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 15s	Maven dependency ordering for patch
+1 💚	mvninstall	5m 6s	the patch passed
+1 💚	compile	0m 47s	the patch passed
+1 💚	javac	0m 47s	the patch passed
+1 💚	shadedjars	9m 8s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 39s	hbase-common in the patch passed.
+1 💚	unit	0m 45s	hbase-compression-zstd in the patch passed.
		38m 4s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux a6628fa6ad4c 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/testReport/
Max. process+thread count	224 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T20:59:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 42s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 16s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 10s	master passed
+1 💚	compile	0m 49s	master passed
+1 💚	shadedjars	10m 53s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 40s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 18s	Maven dependency ordering for patch
+1 💚	mvninstall	5m 26s	the patch passed
+1 💚	compile	0m 57s	the patch passed
+1 💚	javac	0m 57s	the patch passed
+1 💚	shadedjars	10m 37s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 41s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 34s	hbase-common in the patch passed.
+1 💚	unit	0m 51s	hbase-compression-zstd in the patch passed.
		42m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 223533be0fac 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/testReport/
Max. process+thread count	273 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-10-19T21:12:15Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 38s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 1s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for branch
+1 💚	mvninstall	5m 2s	master passed
+1 💚	compile	1m 30s	master passed
+1 💚	checkstyle	0m 43s	master passed
+1 💚	spotbugs	1m 24s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 56s	the patch passed
+1 💚	compile	1m 30s	the patch passed
+1 💚	javac	1m 30s	the patch passed
-0 ⚠️	checkstyle	0m 28s	hbase-common: The patch generated 3 new + 0 unchanged - 2 fixed = 3 total (was 2)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	25m 17s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	1m 54s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 25s	The patch does not generate ASF License warnings.
		55m 3s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3748
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 8ca7e4b2b0d3 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `8a6fed7`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

…3748) ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression https://github.com/facebook/zstd#the-case-for-small-data-compression Signed-off-by: Duo Zhang <[email protected]>

…n-zstd (#3748)" This reverts commit 8ac0b5e. This is not ready yet. There are some code paths remaining where store configuration (CompoundConfiguration) is not passed into the block decoding context. Found with additional integration tests.

…n-zstd (#3748)" This reverts commit bfa4584. This is not ready yet. There are some code paths remaining where store configuration (CompoundConfiguration) is not passed into the block decoding context. Found with additional integration tests.

apurtell requested review from Apache9 and virajjasani October 13, 2021 01:50

Apache9 reviewed Oct 13, 2021

View reviewed changes

apurtell force-pushed the HBASE-26353 branch from 5390b55 to 1638f25 Compare October 13, 2021 17:22

apurtell force-pushed the HBASE-26353 branch from 1638f25 to 04a63b6 Compare October 13, 2021 18:33

apurtell force-pushed the HBASE-26353 branch from 04a63b6 to 8b813db Compare October 14, 2021 01:43

Apache9 approved these changes Oct 14, 2021

View reviewed changes

apurtell force-pushed the HBASE-26353 branch from 8b813db to 816f2fd Compare October 15, 2021 01:26

apurtell mentioned this pull request Oct 18, 2021

HBASE-26316 Per-table or per-CF compression codec setting overrides #3730

Merged

apurtell force-pushed the HBASE-26353 branch from 816f2fd to bbd6277 Compare October 18, 2021 19:03

apurtell added a commit to apurtell/hbase that referenced this pull request Oct 18, 2021

Extend TestZstdCodec with a dictionary compression test

8931a00

This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.

apurtell added a commit to apurtell/hbase that referenced this pull request Oct 19, 2021

Extend TestZstdCodec with a dictionary compression test

5954300

This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.

apurtell force-pushed the HBASE-26353 branch from 8931a00 to 5954300 Compare October 19, 2021 00:49

apurtell added 4 commits October 19, 2021 12:07

Refactor dictionary loading to DictionaryCache

76351e6

Fix whitespace

e381b91

Extend TestZstdCodec with a dictionary compression test

3b126c5

apurtell force-pushed the HBASE-26353 branch from 5954300 to 3b126c5 Compare October 19, 2021 19:23

Checkstyle and whitespace fixes and improve testZstdCodecWithDictionary

754c6ee

Final checkstyle fix

ac656a3

apurtell merged commit bfa4584 into apache:master Oct 19, 2021

apurtell deleted the HBASE-26353 branch October 19, 2021 20:37

HBASE-26353 Support loadable dictionaries in hbase-compression-zstd #3748

HBASE-26353 Support loadable dictionaries in hbase-compression-zstd #3748

Uh oh!

Conversation

apurtell commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Apache9 commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache9 Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

apurtell Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

apurtell Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

Apache-HBase commented Oct 13, 2021

Uh oh!

apurtell commented Oct 13, 2021

Uh oh!

apurtell commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Apache9 left a comment

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Oct 14, 2021

Uh oh!

Apache-HBase commented Oct 14, 2021

Uh oh!

Apache-HBase commented Oct 14, 2021

Uh oh!

Apache-HBase commented Oct 15, 2021

Uh oh!

Apache-HBase commented Oct 15, 2021

Uh oh!

Apache-HBase commented Oct 15, 2021

Uh oh!

apurtell commented Oct 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apurtell commented Oct 17, 2021

Uh oh!

Apache-HBase commented Oct 18, 2021

Uh oh!

Apache-HBase commented Oct 18, 2021

Uh oh!

Apache-HBase commented Oct 18, 2021

Uh oh!

apurtell commented Oct 19, 2021

Uh oh!

apurtell commented Oct 13, 2021 •

edited

Loading

apurtell Oct 13, 2021 •

edited

Loading

apurtell Oct 13, 2021 •

edited

Loading

apurtell commented Oct 13, 2021 •

edited

Loading

apurtell commented Oct 15, 2021 •

edited

Loading