Skip to content

Conversation

@apurtell
Copy link
Contributor

@apurtell apurtell commented Oct 13, 2021

[ Requires #3730 ]

ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression

Example:

Training:

$ zstd --maxdict=1126400 --train-fastcover=shrink \
    -o mytable.dict training_files/*
Trying 82 different sets of parameters
...
k=674                                      
d=8
f=20
steps=40
split=75
accel=1
Save dictionary of size 1126400 into file mytable.dict

Deploy the dictionary file to HDFS, or S3, etc.

Create the table:

hbase> create "mytable", 
  ... ,
  CONFIGURATION => {
    'hbase.io.compress.zstd.level' => '6',
    'hbase.io.compress.zstd.dictionary' => true,
    'hbase.io.compress.zstd.dictonary.file' => 'hdfs://nn/zdicts/mytable.dict'
  }

Now start storing data. Compression results even for small values will be excellent.

Note: Beware, if the dictionary is lost, the data will not be decompressable.

@Apache9
Copy link
Contributor

Apache9 commented Oct 13, 2021

Note: Beware, if the dictionary is lost, the data will not be decompressable.

Haven't read the code yet, but is it possible to copy the dict into the hbase storage so it is controlled by us?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 3m 52s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 10s master passed
+1 💚 compile 0m 21s master passed
+1 💚 shadedjars 8m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 0m 21s the patch passed
+1 💚 javac 0m 21s the patch passed
+1 💚 shadedjars 8m 21s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s the patch passed
_ Other Tests _
+1 💚 unit 0m 50s hbase-compression-zstd in the patch passed.
31m 49s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 208c330d222f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/testReport/
Max. process+thread count 286 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

return size > 0 ? size : 256 * 1024; // Don't change this default
}

static LoadingCache<Configuration,byte[]> CACHE = CacheBuilder.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Configuration as the key makes me a bit nervous, although after checking the code, there is no hashCode and equals methods in Configuration so it will perform like IdentityHashMap...

So is it possible to use the file name as the map key here? I suppose different tables could use the same dict.

Copy link
Contributor Author

@apurtell apurtell Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a concern.

In the latest version of the patch I override hashCode in CompoundConfiguration so we are doing something better than object identity when caching the dictionaries for the store writer case. It is kind of expensive to compute the hashCode given how CompoundConfiguration works but at least we do not do it that often, and not in performance critical code. Once a compressor or decompressor is created it is reused for the lifetime of the reader or writer. Otherwise we are using object identity. That is not the worst thing, at least. The cache is capped at 100 and will also expire entries if they are not used for one hour.

Let me try your suggestion. I was thinking we could avoid doing two lookups into the Configuration -- to get the boolean, and then the path, for the key -- but that hashCode calculation is pretty expensive. Getting the path from the configuration object and using that would be less.

final Path p = new Path(s);
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final byte[] buffer = new byte[8192];
try (final FSDataInputStream in = FileSystem.get(p.toUri(), conf).open(p)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to limit the max dict size here? If an user create a table with a very large dict file, it could bring down the whole cluster if we do not truncate here?

Copy link
Contributor Author

@apurtell apurtell Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If there is a size limit and it is exceeded the codec load should be rejected by throwing a RuntimeException probably.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 33s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 6m 21s master passed
+1 💚 compile 0m 22s master passed
+1 💚 shadedjars 9m 49s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 2s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 shadedjars 9m 6s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s the patch passed
_ Other Tests _
+1 💚 unit 0m 47s hbase-compression-zstd in the patch passed.
35m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 62718614cfda 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/testReport/
Max. process+thread count 276 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 9s master passed
+1 💚 compile 0m 30s master passed
+1 💚 checkstyle 0m 15s master passed
+1 💚 spotbugs 0m 33s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 0m 26s the patch passed
-0 ⚠️ javac 0m 26s hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 19m 30s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 0m 41s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
39m 5s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 9548123745c5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count 95 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 44s master passed
+1 💚 compile 0m 45s master passed
+1 💚 shadedjars 8m 17s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 3m 51s the patch passed
+1 💚 compile 0m 45s the patch passed
+1 💚 javac 0m 45s the patch passed
+1 💚 shadedjars 8m 15s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
-1 ❌ unit 1m 5s hbase-common in the patch failed.
+1 💚 unit 0m 48s hbase-compression-zstd in the patch passed.
31m 23s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 7a564a7fca1a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-common.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/testReport/
Max. process+thread count 341 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 5s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 5m 9s master passed
+1 💚 compile 0m 46s master passed
+1 💚 shadedjars 9m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 3s the patch passed
+1 💚 compile 0m 45s the patch passed
+1 💚 javac 0m 45s the patch passed
+1 💚 shadedjars 9m 10s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
-1 ❌ unit 1m 36s hbase-common in the patch failed.
+1 💚 unit 0m 47s hbase-compression-zstd in the patch passed.
36m 49s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4c1685836c68 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-common.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/testReport/
Max. process+thread count 271 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 4m 20s master passed
+1 💚 compile 1m 25s master passed
+1 💚 checkstyle 0m 41s master passed
+1 💚 spotbugs 1m 21s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 4m 10s the patch passed
+1 💚 compile 1m 21s the patch passed
-0 ⚠️ javac 0m 27s hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️ checkstyle 0m 26s hbase-common: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 22m 0s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 42s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 27s The patch does not generate ASF License warnings.
48m 51s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 61c6a7457421 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 3m 48s master passed
+1 💚 compile 0m 45s master passed
+1 💚 shadedjars 8m 14s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 3m 55s the patch passed
+1 💚 compile 0m 46s the patch passed
+1 💚 javac 0m 46s the patch passed
+1 💚 shadedjars 8m 16s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s the patch passed
_ Other Tests _
+1 💚 unit 1m 52s hbase-common in the patch passed.
+1 💚 unit 0m 48s hbase-compression-zstd in the patch passed.
32m 12s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 3c0e0fe83539 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/testReport/
Max. process+thread count 341 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 3s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 5m 4s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 9m 7s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 5s the patch passed
+1 💚 compile 0m 47s the patch passed
+1 💚 javac 0m 47s the patch passed
+1 💚 shadedjars 9m 9s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
+1 💚 unit 2m 41s hbase-common in the patch passed.
+1 💚 unit 0m 46s hbase-compression-zstd in the patch passed.
37m 44s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1fc6430da8a9 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/testReport/
Max. process+thread count 257 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for branch
+1 💚 mvninstall 4m 2s master passed
+1 💚 compile 1m 17s master passed
+1 💚 checkstyle 0m 38s master passed
+1 💚 spotbugs 1m 15s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 3m 50s the patch passed
+1 💚 compile 1m 16s the patch passed
-0 ⚠️ javac 0m 27s hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️ checkstyle 0m 25s hbase-common: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 19m 22s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 33s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 27s The patch does not generate ASF License warnings.
44m 0s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 6422a3dc0aa9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ede4d27
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/3/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor Author

Let me re-run the performance evaluation from HBASE-26259 but with synthetic small value data and compare speed and efficiency with precomputed dictionary vs without. Gains are expected but I'd like to present some hard comparison data here.

@apurtell
Copy link
Contributor Author

apurtell commented Oct 13, 2021

Haven't read the code yet, but is it possible to copy the dict into the hbase storage so it is controlled by us?

@Apache9 I was thinking about writing the dictionary used to compress values in an HFile or WAL into the HFile or WAL in the metadata section, but there would need to be format extensions to the WAL (perhaps just an extra field in the header and/or trailer PB). Hopefully there can be some re-use of meta blocks for HFiles. But this raises questions. There should be some way for a codec to read and write metadata into the container of the thing they are processing, but we don't have API support for that. I would consider it future work, but definitely of interest. The interest is ensuring that HFiles have all of the information they need to read themselves added at write time.

Otherwise I think the current scheme is ok. The operator is already in charge of their table schema and compression codec dependencies (like deployment of native link libraries). This is an incremental responsibility... if you put a compression dictionary attribute into your schema, don't lose the dictionary.

Mostly it is already true that HFiles carry all of the information within their trailer or meta blocks a reader requires to process them. I can think of one exception, that being encryption, where the data encryption key (DEK) is stored in the HFile, but the master encryption key (MEK) used to encrypt the DEK is by design kept in a trust store or HSM and if the MEK is lost all data is not decryptable. There are some parallels between external MEK data and external compression dictionary data. One could claim the same general rules for managing them apply. The difference is the dictionary is not sensitive and can be copied into the file, whereas the master encryption key must be carefully guarded and not written colocated with data.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Just some simple nits, and please fix the checkstyle and javac issues if possible.

if (DICTIONARY_CACHE == null) {
synchronized (ZstdCodec.class) {
if (DICTIONARY_CACHE == null) {
DICTIONARY_CACHE = CacheBuilder.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: better abstract the creation code to a separated method? It could make the code easier to read.

n = in.read(buffer);
if (n > 0) {
baos.write(buffer, 0, n);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: indent

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 9s master passed
+1 💚 compile 0m 21s master passed
+1 💚 shadedjars 8m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 53s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 shadedjars 8m 9s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 17s the patch passed
_ Other Tests _
+1 💚 unit 0m 49s hbase-compression-zstd in the patch passed.
28m 12s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux e587a4ac0486 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4d27c47
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/testReport/
Max. process+thread count 277 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 18s master passed
+1 💚 compile 0m 20s master passed
+1 💚 shadedjars 9m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 18s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 4s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 shadedjars 9m 6s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s the patch passed
_ Other Tests _
+1 💚 unit 0m 46s hbase-compression-zstd in the patch passed.
32m 49s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux c0328d761d07 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4d27c47
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/testReport/
Max. process+thread count 262 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 48s master passed
+1 💚 compile 0m 25s master passed
+1 💚 checkstyle 0m 12s master passed
+1 💚 spotbugs 0m 27s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 8s the patch passed
+1 💚 compile 0m 26s the patch passed
-0 ⚠️ javac 0m 26s hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️ checkstyle 0m 13s hbase-compression/hbase-compression-zstd: The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 18s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 0m 41s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
40m 19s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 3e331d211b0f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4d27c47
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count 95 (vs. ulimit of 30000)
modules C: hbase-compression/hbase-compression-zstd U: hbase-compression/hbase-compression-zstd
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/4/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for branch
+1 💚 mvninstall 4m 8s master passed
+1 💚 compile 6m 38s master passed
+1 💚 checkstyle 2m 38s master passed
+1 💚 spotbugs 5m 39s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 3m 48s the patch passed
+1 💚 compile 6m 35s the patch passed
-0 ⚠️ javac 0m 26s hbase-compression_hbase-compression-zstd generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-0 ⚠️ checkstyle 0m 28s hbase-common: The patch generated 1 new + 1 unchanged - 2 fixed = 2 total (was 3)
-0 ⚠️ checkstyle 1m 7s hbase-server: The patch generated 1 new + 85 unchanged - 2 fixed = 86 total (was 87)
-0 ⚠️ checkstyle 0m 13s hbase-compression/hbase-compression-zstd: The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 32m 43s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 8m 9s the patch passed
_ Other Tests _
+1 💚 asflicense 1m 26s The patch does not generate ASF License warnings.
88m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 3bc458b0d1b9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ad7d698
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-compile-javac-hbase-compression_hbase-compression-zstd.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for branch
+1 💚 mvninstall 3m 48s master passed
+1 💚 compile 3m 14s master passed
+1 💚 shadedjars 8m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 24s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 3m 49s the patch passed
+1 💚 compile 3m 16s the patch passed
+1 💚 javac 3m 16s the patch passed
+1 💚 shadedjars 8m 13s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 38s hbase-server generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21)
_ Other Tests _
+1 💚 unit 1m 53s hbase-common in the patch passed.
+1 💚 unit 151m 26s hbase-server in the patch passed.
+1 💚 unit 11m 43s hbase-mapreduce in the patch passed.
+1 💚 unit 1m 19s hbase-compression-aircompressor in the patch passed.
+1 💚 unit 1m 5s hbase-compression-lz4 in the patch passed.
+1 💚 unit 1m 5s hbase-compression-snappy in the patch passed.
+1 💚 unit 1m 5s hbase-compression-zstd in the patch passed.
210m 0s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 39f7b60bc3f1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ad7d698
Default Java AdoptOpenJDK-1.8.0_282-b08
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/testReport/
Max. process+thread count 4586 (vs. ulimit of 30000)
modules C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 3s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 5m 18s master passed
+1 💚 compile 3m 40s master passed
+1 💚 shadedjars 9m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 33s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 4m 58s the patch passed
+1 💚 compile 3m 37s the patch passed
+1 💚 javac 3m 37s the patch passed
+1 💚 shadedjars 9m 11s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 42s hbase-server generated 1 new + 86 unchanged - 0 fixed = 87 total (was 86)
_ Other Tests _
+1 💚 unit 2m 39s hbase-common in the patch passed.
-1 ❌ unit 209m 4s hbase-server in the patch failed.
+1 💚 unit 15m 28s hbase-mapreduce in the patch passed.
+1 💚 unit 1m 23s hbase-compression-aircompressor in the patch passed.
+1 💚 unit 0m 55s hbase-compression-lz4 in the patch passed.
+1 💚 unit 0m 57s hbase-compression-snappy in the patch passed.
+1 💚 unit 0m 57s hbase-compression-zstd in the patch passed.
277m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1492b053507d 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ad7d698
Default Java AdoptOpenJDK-11.0.10+9
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/testReport/
Max. process+thread count 3078 (vs. ulimit of 30000)
modules C: hbase-common hbase-server hbase-mapreduce hbase-compression/hbase-compression-aircompressor hbase-compression/hbase-compression-lz4 hbase-compression/hbase-compression-snappy hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/5/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor Author

apurtell commented Oct 15, 2021

Here is the performance test result.

I wrote an integration test that simulates a location data tracking use case. It writes 10 million rows, each row has a 64-bit random row key (not important), one column family, with four qualifiers, one for: first name, last name, latitude (encoded as an integer with scale of 3), and longitude (also encoded as an integer with scale of 3). Details aren't really important except to say the character strings are short, corresponding with typical length for English first and last names, and there are two 32-bit integer values. The 32-bit integer values are generated with a zipfian distribution to reduce entropy and allow for potentially successful dictionary compression. But they are also short. When creating the table the IT specified a block size of 1K. Perhaps not unreasonable for a heavily indexed use case with short values. I could have achieved a higher compression ratio if the row keys were sequential instead of completely random. This is not really important.

I also wrote a simple utility that iterates over an HFile and saves each DATA or ENCODED_DATA block as a separate file somewhere else, just the block data. These files were used as the training set for zstd. I extracted a training set of 20,000 blocks to train a 1MB dictionary. The parameters I used for training with zstd were basic and not especially tuned. I am not expert in this aspect of ZStandard so can't estimate how much additional gain is possible.

The results demonstrate compression speed improvements as expected (a 22-33% improvement), as described by the ZStandard documentation. They also demonstrate efficiency gains (a modest 6-8%), especially in combination with higher levels, where even modest gains are meaningful at scale. Specifying higher levels is more affordable because of the relative speedups at each level. There is a demonstration of meaningful gains in just this simple case, with potential for more benefits when applied by someone with expert knowledge. It seems reasonable to support this feature.

No Dictionary

Level On Disk Size Compression Compaction Time (sec)
- 1,686,075,803 - -
1 767,926,618 54.5% 42
3 756,427,617 55.1% 37
5 746,302,550 55.7% 48
6 744,741,449 55.8% 50
7 744,701,778 55.8% 54
12 731,150,341 56.6% 115

With Dictionary

Level On Disk Size Compression Compaction Time (sec)
1 679,408,139 59.7% 28
3 652,587,956 61.3% 31
5 630,927,508 62.6% 37
6 632,251,996 62.5% 39
7 625,972,642 62.9% 56
12 626,293,580 62.9% 89

Let me clean up checkstyle and other review feedback and merge this, after merging the prerequisite PR for HBASE-26316 first.

@apurtell
Copy link
Contributor Author

Just to double check, I re-ran the earlier described test, except when generating the test data it only emitted:

  • 10 million rows
  • A 64-bit monotonically increasing row key
  • Two values, both 32 bit integers, generated using random number generators obeying a zipfian distribution (using our RandomDistribution.Zipf with a sigma of 1.2)

When training the dictionary I gave the trainer the parameters k=32 (bit width to enter into the dictionary) and d=8 (stride for walking over content, in bits). This is a good approximation of designing these parameters with intent in a real use case. The result demonstrates significant speedups in compression as advertised and allows for achieving a better overall compression by enabling higher compression levels given an equivalent time budget as a no dictionary case.

Integers Only, No Dictionary

Level On Disk Size Compression Compaction Time (sec)
1 261,658,729 68.3% 21
3 251,343,431 69.6% 22
5 251,968,603 69.5% 25
6 251,467,677 69.5% 26
7 251,509,580 69.5% 27
12 235,410,126 71.5% 51

Integers Only, With Dictionary (k=32,d=8)

Level On Disk Size Compression Compaction Time (sec)
1 248,971,553 69.8% 13
3 248,528,035 69.9% 14
5 245,846,087 70.2% 16
6 245,705,224 70.2% 17
7 226,998,954 72.5% 25
12 226,796,109 72.5% 39
15 226,553,944 72.6% 44
18 216,373,878 73.8% 153
22 216,373,736 73.8% 165

apurtell added a commit to apurtell/hbase that referenced this pull request Oct 18, 2021
This will cause a small merge conflict between apache#3730 and apache#3748 because we need
CanReinit here too.
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for branch
+1 💚 mvninstall 3m 46s master passed
+1 💚 compile 0m 45s master passed
+1 💚 shadedjars 8m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 3m 50s the patch passed
+1 💚 compile 0m 46s the patch passed
+1 💚 javac 0m 46s the patch passed
+1 💚 shadedjars 8m 16s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 1m 52s hbase-common in the patch passed.
+1 💚 unit 0m 48s hbase-compression-zstd in the patch passed.
32m 17s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux a227284481a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cadac18
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/testReport/
Max. process+thread count 340 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 4m 59s master passed
+1 💚 compile 0m 46s master passed
+1 💚 shadedjars 9m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 5s the patch passed
+1 💚 compile 0m 47s the patch passed
+1 💚 javac 0m 47s the patch passed
+1 💚 shadedjars 9m 8s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
+1 💚 unit 2m 38s hbase-common in the patch passed.
+1 💚 unit 0m 45s hbase-compression-zstd in the patch passed.
37m 33s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 76087b21734a 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cadac18
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/testReport/
Max. process+thread count 258 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 3m 46s master passed
+1 💚 compile 1m 13s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotbugs 1m 14s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 3m 45s the patch passed
+1 💚 compile 1m 16s the patch passed
+1 💚 javac 1m 16s the patch passed
-0 ⚠️ checkstyle 0m 24s hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 18m 57s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 35s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
42m 46s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 6c2f90885f4c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cadac18
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 95 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/9/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

apurtell added a commit to apurtell/hbase that referenced this pull request Oct 19, 2021
This will cause a small merge conflict between apache#3730 and apache#3748 because we need
CanReinit here too.
@apurtell
Copy link
Contributor Author

Rebase

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for branch
+1 💚 mvninstall 3m 45s master passed
+1 💚 compile 0m 45s master passed
+1 💚 shadedjars 8m 22s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for patch
+1 💚 mvninstall 3m 47s the patch passed
+1 💚 compile 0m 45s the patch passed
+1 💚 javac 0m 45s the patch passed
+1 💚 shadedjars 8m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s the patch passed
_ Other Tests _
+1 💚 unit 1m 52s hbase-common in the patch passed.
+1 💚 unit 0m 48s hbase-compression-zstd in the patch passed.
32m 21s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1f7b3429b63e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 26ab9d0
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/testReport/
Max. process+thread count 357 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for branch
+1 💚 mvninstall 4m 24s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 8m 19s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 43s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 4m 30s the patch passed
+1 💚 compile 0m 48s the patch passed
+1 💚 javac 0m 48s the patch passed
+1 💚 shadedjars 8m 20s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 43s the patch passed
_ Other Tests _
+1 💚 unit 1m 54s hbase-common in the patch passed.
+1 💚 unit 0m 44s hbase-compression-zstd in the patch passed.
33m 59s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux bc64dc5f4b2f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 26ab9d0
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/testReport/
Max. process+thread count 290 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for branch
+1 💚 mvninstall 5m 33s master passed
+1 💚 compile 1m 43s master passed
+1 💚 checkstyle 0m 44s master passed
+1 💚 spotbugs 1m 33s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 41s the patch passed
+1 💚 compile 1m 23s the patch passed
+1 💚 javac 1m 23s the patch passed
-0 ⚠️ checkstyle 0m 27s hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 23m 45s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 34s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
56m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux aa026114d7d2 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 26ab9d0
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/10/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

ZStandard supports initialization of compressors and decompressors with a
precomputed dictionary, which can dramatically improve and speed up compression
of tables with small values. For more details, please see

  The Case For Small Data Compression
  https://github.com/facebook/zstd#the-case-for-small-data-compression
@apurtell
Copy link
Contributor Author

Rebase to resolve expected conflicts after #3730

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 4m 39s master passed
+1 💚 compile 0m 49s master passed
+1 💚 shadedjars 8m 17s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 44s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 4m 31s the patch passed
+1 💚 compile 0m 48s the patch passed
+1 💚 javac 0m 48s the patch passed
+1 💚 shadedjars 8m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 44s the patch passed
_ Other Tests _
+1 💚 unit 1m 53s hbase-common in the patch passed.
+1 💚 unit 0m 46s hbase-compression-zstd in the patch passed.
33m 56s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 81e753d11848 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/testReport/
Max. process+thread count 289 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 40s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 5m 37s master passed
+1 💚 compile 0m 57s master passed
+1 💚 shadedjars 10m 59s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 58s the patch passed
+1 💚 compile 0m 46s the patch passed
+1 💚 javac 0m 46s the patch passed
+1 💚 shadedjars 10m 25s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 44s the patch passed
_ Other Tests _
+1 💚 unit 2m 37s hbase-common in the patch passed.
+1 💚 unit 0m 52s hbase-compression-zstd in the patch passed.
42m 26s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux ed667c039c4e 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/testReport/
Max. process+thread count 270 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for branch
+1 💚 mvninstall 3m 59s master passed
+1 💚 compile 1m 19s master passed
+1 💚 checkstyle 0m 39s master passed
+1 💚 spotbugs 1m 14s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 3m 47s the patch passed
+1 💚 compile 1m 16s the patch passed
+1 💚 javac 1m 16s the patch passed
-0 ⚠️ checkstyle 0m 24s hbase-common: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
-0 ⚠️ checkstyle 0m 12s hbase-compression/hbase-compression-zstd: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
-0 ⚠️ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 24m 51s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 57s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
52m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 639523a3a9c2 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/diff-checkstyle-hbase-compression_hbase-compression-zstd.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/11/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell apurtell merged commit bfa4584 into apache:master Oct 19, 2021
@apurtell apurtell deleted the HBASE-26353 branch October 19, 2021 20:37
@apurtell
Copy link
Contributor Author

@Apache9 I merged based on your prior approval. If you disagree with this action please let me know and I will revert/restart this PR.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 5s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 5m 11s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 9m 16s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 5m 6s the patch passed
+1 💚 compile 0m 47s the patch passed
+1 💚 javac 0m 47s the patch passed
+1 💚 shadedjars 9m 8s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
+1 💚 unit 2m 39s hbase-common in the patch passed.
+1 💚 unit 0m 45s hbase-compression-zstd in the patch passed.
38m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux a6628fa6ad4c 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/testReport/
Max. process+thread count 224 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 42s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 5m 10s master passed
+1 💚 compile 0m 49s master passed
+1 💚 shadedjars 10m 53s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 5m 26s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 shadedjars 10m 37s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
+1 💚 unit 2m 34s hbase-common in the patch passed.
+1 💚 unit 0m 51s hbase-compression-zstd in the patch passed.
42m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3748
Optional Tests javac javadoc unit shadedjars compile
uname Linux 223533be0fac 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/testReport/
Max. process+thread count 273 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 1s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 5m 2s master passed
+1 💚 compile 1m 30s master passed
+1 💚 checkstyle 0m 43s master passed
+1 💚 spotbugs 1m 24s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 56s the patch passed
+1 💚 compile 1m 30s the patch passed
+1 💚 javac 1m 30s the patch passed
-0 ⚠️ checkstyle 0m 28s hbase-common: The patch generated 3 new + 0 unchanged - 2 fixed = 3 total (was 2)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 25m 17s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 1m 54s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
55m 3s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3748
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 8ca7e4b2b0d3 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8a6fed7
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-common hbase-compression/hbase-compression-zstd U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3748/12/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

asfgit pushed a commit that referenced this pull request Oct 19, 2021
…3748)

ZStandard supports initialization of compressors and decompressors with a
precomputed dictionary, which can dramatically improve and speed up compression
of tables with small values. For more details, please see

  The Case For Small Data Compression
  https://github.com/facebook/zstd#the-case-for-small-data-compression

Signed-off-by: Duo Zhang <[email protected]>
asfgit pushed a commit that referenced this pull request Oct 22, 2021
…n-zstd (#3748)"

This reverts commit 8ac0b5e.

This is not ready yet. There are some code paths remaining where store
configuration (CompoundConfiguration) is not passed into the block decoding
context. Found with additional integration tests.
asfgit pushed a commit that referenced this pull request Oct 22, 2021
…n-zstd (#3748)"

This reverts commit bfa4584.

This is not ready yet. There are some code paths remaining where store
configuration (CompoundConfiguration) is not passed into the block decoding
context. Found with additional integration tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants