Skip to content

Conversation

@ndimiduk
Copy link
Member

@ndimiduk ndimiduk commented Jul 7, 2022

This branch starts with #4574 and then adds a translation of ndimiduk/hbase@93a6fed into otel span events.

How I think you might use this, @apurtell , is to set up tracing with the Logging Exporter, run your test bench, and then parse the logs to summon them into metrics.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2.5 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 18s branch-2.5 passed
+1 💚 compile 4m 33s branch-2.5 passed
+1 💚 checkstyle 1m 13s branch-2.5 passed
-1 ❌ spotless 0m 25s branch has 21 errors when running spotless:check, run spotless:apply to fix.
+1 💚 spotbugs 3m 27s branch-2.5 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 2m 41s the patch passed
+1 💚 compile 4m 16s the patch passed
+1 💚 javac 4m 16s the patch passed
-0 ⚠️ checkstyle 0m 37s hbase-server: The patch generated 3 new + 11 unchanged - 3 fixed = 14 total (was 14)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 14m 1s Patch does not cause any errors with Hadoop 2.10.0 or 3.1.2 3.2.1.
-1 ❌ spotless 0m 27s patch has 21 errors when running spotless:check, run spotless:apply to fix.
+1 💚 spotbugs 3m 40s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
48m 37s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4601
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 92ca957238cc 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 3ca8484
Default Java AdoptOpenJDK-1.8.0_282-b08
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-general-check/output/branch-spotless.txt
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 64 (vs. ulimit of 12500)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 53s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for branch
+1 💚 mvninstall 2m 51s branch-2.5 passed
+1 💚 compile 1m 15s branch-2.5 passed
+1 💚 shadedjars 4m 2s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 57s branch-2.5 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 2m 29s the patch passed
+1 💚 compile 1m 14s the patch passed
+1 💚 javac 1m 14s the patch passed
+1 💚 shadedjars 3m 57s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 51s the patch passed
_ Other Tests _
+1 💚 unit 1m 40s hbase-common in the patch passed.
+1 💚 unit 2m 46s hbase-client in the patch passed.
+1 💚 unit 175m 4s hbase-server in the patch passed.
201m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4601
Optional Tests javac javadoc unit shadedjars compile
uname Linux ccd56a4aa119 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 3ca8484
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/testReport/
Max. process+thread count 2564 (vs. ulimit of 12500)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 53s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for branch
+1 💚 mvninstall 2m 25s branch-2.5 passed
+1 💚 compile 1m 7s branch-2.5 passed
+1 💚 shadedjars 4m 1s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 51s branch-2.5 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 2m 16s the patch passed
+1 💚 compile 1m 7s the patch passed
+1 💚 javac 1m 7s the patch passed
+1 💚 shadedjars 3m 56s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 49s the patch passed
_ Other Tests _
+1 💚 unit 1m 25s hbase-common in the patch passed.
+1 💚 unit 2m 31s hbase-client in the patch passed.
+1 💚 unit 180m 35s hbase-server in the patch passed.
204m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #4601
Optional Tests javac javadoc unit shadedjars compile
uname Linux 5a69f9a5efe8 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 3ca8484
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/testReport/
Max. process+thread count 2424 (vs. ulimit of 12500)
modules C: hbase-common hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4601/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor

apurtell commented Jul 7, 2022

@ndimiduk It does make sense to me that we would decorate the spans with events, and not try to maintain counters, as did the original patch. I only did that because aggregate counts at the end of the RPC were sufficient for my needs then. If I had needed a log of events I would have had to do something more like this. It is nice to have a timeline of events recorded into the span(s) instead, in typical form for otel.

How I think you might use this, @apurtell , is to set up tracing with the Logging Exporter, run your test bench, and then parse the logs to summon them into metrics.

Agreed, converting events into metrics by logging and then counting occurrences over the logs is quite typical here. The events and the times at which they occur are the raw data that can serve several different approaches to analyzing the timeline.

I would approve this draft fwiw

@ndimiduk
Copy link
Member Author

Okay, after too long of a delay, I have collected some data in which I have some amount confidence. The raw data and summaries are in this Google Sheet for your examination. The charts are pasted here for your reference.

Runtime (mins)
Read Latency, 95th Percentile (µs)
Read Latency, 99th Percentile (µs)

Test Methodology

I tested three different builds: a baseline (ecf758b), that baseline (ecf758b) + HBASE-27153, and that baseline (ecf758b) + HBASE-27153 + HBASE-27155. In all cases, the test is run with tracing disabled -- all that's measured here is the impact of the code changes made to facilitate manual instrumentation by each patch. The test run was a YCSB workload that I happened to have handy, with 20% writes and 80% random reads. The test runs for a little over 30 minutes. The data collected is the total test runtime, and the read latencies reported by YCSB at p95 and p99.

The test methodology was to first prepare a dataset by populating a pre-split table using the YCSB load feature, flush and major compact the table, snapshot the table. Each test iteration involved dropping the table, cloning the snapshot back into place, and then applying the test workload.

I kept an eye on cluster metrics as things ran. Client and server generally agree on number of requests served/sec. Each test, it took about 20 minutes for the block cache hit rate to climb from a starting point of 50% to the steady state of around 70%. I made an effort to exclude as much as possible the impact of compactions on the test results -- ASYNC_WAL was used, and dropping the table after each test run dropped the pending compaction work that accumulated via the write portion of the workload.

My Interpretation of results

The changes introduced with HBASE-27153 appear to have an overall positive impact on read throughput and latency, although in latency in particular, there is a disturbingly large amount of internal variance. HBASE-27155 appears to undo all of that improvement and then do additional harm.

It is my opinion that the regression of 2ms at p95 and 1ms at p99 is too expensive to accept for the inclusion of HBASE-27155.

Next Steps

We can conclude analysis here and decide to commit one or both patches. Or, we can attempt further analysis. The next analysis step I would suggest is collection and comparison of flame graphs at several points during the test period.

Please advise.

@apurtell
Copy link
Contributor

Thanks for doing this analysis @ndimiduk . I will note this on HBASE-27155 and resolve that issue. So I would propose the next steps here are:

  1. Proceed with HBASE-27153
  2. Do not proceed with HBASE-27155.

@apurtell
Copy link
Contributor

I resolved HBASE-27155 as WontFix so will close this PR too. All please feel free to reopen/unresolve if you'd like to pursue this at some later time.

@apurtell apurtell closed this Jul 26, 2022
@ndimiduk
Copy link
Member Author

@apurtell Just for my own curiosity, and because I already have the test rig and I earlier did the work of forward porting your counter-based patch, I'm running it through the same cycles. And since I'm here, I'll also run rel/2.4.13.

@ndimiduk
Copy link
Member Author

@apurtell Results from the counter-based patch and rel/2.4.13 are in the spreadsheet and charted, FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants