Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions sql/core/benchmarks/ByteArrayBenchmark-jdk11-results.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
================================================================================================
byte array comparisons
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 11.0.12+8-LTS-237 on Mac OS X 11.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the benchmarks in GitHub Actions according to the instructions at https://spark.apache.org/developer-tools.html#github-workflow-benchmarks and then include those results in this PR in place of these ones? This helps to ensure that checked-in benchmark results come from a consistent environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @JoshRosen , it's really a good tool for me. Updated the benchmark result from GA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also add the benchmark tool guide in pull request template, #34349

Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Byte Array compareTo: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte 405 427 36 161.6 6.2 1.0X
8-16 byte 754 784 17 87.0 11.5 0.5X
16-32 byte 770 808 30 85.1 11.7 0.5X
512-1024 byte 1044 1415 NaN 62.8 15.9 0.4X
512 byte slow 3203 3537 387 20.5 48.9 0.1X
2-7 byte 451 481 36 145.4 6.9 0.9X


16 changes: 16 additions & 0 deletions sql/core/benchmarks/ByteArrayBenchmark-results.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
================================================================================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have 'before' numbers for these? you don't need to include them just want to verify that it also seemed to show an improvement like your local laptop one did

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the old code path benchmark result:

JDK8

================================================================================================
byte array comparisons
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Byte Array compareTo:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                            799            836          24         82.0          12.2       1.0X
8-16 byte                                           832            906          32         78.8          12.7       1.0X
16-32 byte                                          812            854          28         80.7          12.4       1.0X
512-1024 byte                                      1057           1088          20         62.0          16.1       0.8X
512 byte slow                                     24628          26054         NaN          2.7         375.8       0.0X
2-7 byte                                            811            849          23         80.8          12.4       1.0X

JDK11

================================================================================================
byte array comparisons
================================================================================================

OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Byte Array compareTo:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                            771            812          28         85.0          11.8       1.0X
8-16 byte                                           839            857          13         78.1          12.8       0.9X
16-32 byte                                          898            926          17         73.0          13.7       0.9X
512-1024 byte                                      1141           1189          23         57.4          17.4       0.7X
512 byte slow                                     40124          40689         495          1.6         612.2       0.0X
2-7 byte                                            827            847          14         79.3          12.6       0.9X

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows we still have the benefits with GA env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I just notice the env of GA is still different. The two benchmark result based on:

Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to believe it is a win based on your first benchmark. Is there any easy way to run before/after on these Xeons, or is that hard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the two code path within one patch, and here is the result.

JDK8:

================================================================================================
byte array comparisons
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Byte Array compare offHeap:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                            636            661          14        103.0           9.7       1.0X
8-16 byte                                          1067           1112          21         61.4          16.3       0.6X
16-32 byte                                         1226           1352          98         53.4          18.7       0.5X
512-1024 byte                                      1803           1916          46         36.3          27.5       0.4X
512 byte slow                                      4343           4662         171         15.1          66.3       0.1X
2-7 byte                                           1075           1119          26         61.0          16.4       0.6X

OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Byte Array compare onHeap:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                           1511           1570          30         43.4          23.1       1.0X
8-16 byte                                          1522           1564          27         43.1          23.2       1.0X
16-32 byte                                         1426           1554          36         46.0          21.8       1.1X
512-1024 byte                                      2080           2198          86         31.5          31.7       0.7X
512 byte slow                                     28498          29222         410          2.3         434.9       0.1X
2-7 byte                                           1382           1485          61         47.4          21.1       1.1X

JDK11

================================================================================================
byte array comparisons
================================================================================================

OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Byte Array compare offHeap:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                            720            777          21         91.0          11.0       1.0X
8-16 byte                                          1077           1138          32         60.8          16.4       0.7X
16-32 byte                                         1347           1463          84         48.7          20.5       0.5X
512-1024 byte                                      1898           1989          40         34.5          29.0       0.4X
512 byte slow                                      4621           4878         168         14.2          70.5       0.2X
2-7 byte                                           1062           1133          28         61.7          16.2       0.7X

OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Byte Array compare onHeap:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte                                           1377           1471          37         47.6          21.0       1.0X
8-16 byte                                          1398           1475          38         46.9          21.3       1.0X
16-32 byte                                         1452           1547          47         45.2          22.1       0.9X
512-1024 byte                                      1826           1953          55         35.9          27.9       0.8X
512 byte slow                                     45883          47146         NaN          1.4         700.1       0.0X
2-7 byte                                           1401           1484          39         46.8          21.4       1.0X

byte array comparisons
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_271-b09 on Mac OS X 10.16
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Byte Array compareTo: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte 425 471 24 154.2 6.5 1.0X
8-16 byte 751 814 40 87.2 11.5 0.5X
16-32 byte 789 842 42 83.1 12.0 0.5X
512-1024 byte 1038 1175 193 63.1 15.8 0.4X
512 byte slow 3419 3924 NaN 19.2 52.2 0.1X
2-7 byte 421 424 2 155.6 6.4 1.0X


Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.execution.benchmark

import scala.util.Random

import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
import org.apache.spark.unsafe.types.ByteArray

/**
* Benchmark to measure performance for byte array comparisons.
* {{{
* To run this benchmark:
* 1. without sbt:
* bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
* 2. build/sbt "sql/test:runMain <this class>"
* 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
* Results will be written to "benchmarks/<this class>-results.txt".
* }}}
*/
object ByteArrayBenchmark extends BenchmarkBase {

def byteArrayComparisons(iters: Long): Unit = {
val chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
val random = new Random(0)
def randomBytes(min: Int, max: Int): Array[Byte] = {
val len = random.nextInt(max - min) + min
val bytes = new Array[Byte](len)
var i = 0
while (i < len) {
bytes(i) = chars.charAt(random.nextInt(chars.length())).toByte
i += 1
}
bytes
}

val count = 16 * 1000
val dataTiny = Seq.fill(count)(randomBytes(2, 7)).toArray
val dataSmall = Seq.fill(count)(randomBytes(8, 16)).toArray
val dataMedium = Seq.fill(count)(randomBytes(16, 32)).toArray
val dataLarge = Seq.fill(count)(randomBytes(512, 1024)).toArray
val dataLargeSlow = Seq.fill(count)(
Array.tabulate(512) {i => if (i < 511) 0.toByte else 1.toByte}).toArray

def compareBinary(data: Array[Array[Byte]]) = { _: Int =>
var sum = 0L
for (_ <- 0L until iters) {
var i = 0
while (i < count) {
sum += ByteArray.compareBinary(data(i), data((i + 1) % count))
i += 1
}
}
}

val benchmark = new Benchmark("Byte Array compareTo", count * iters, 25, output = output)
benchmark.addCase("2-7 byte")(compareBinary(dataTiny))
benchmark.addCase("8-16 byte")(compareBinary(dataSmall))
benchmark.addCase("16-32 byte")(compareBinary(dataMedium))
benchmark.addCase("512-1024 byte")(compareBinary(dataLarge))
benchmark.addCase("512 byte slow")(compareBinary(dataLargeSlow))
benchmark.addCase("2-7 byte")(compareBinary(dataTiny))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this this case is listed twice. Maybe drop this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first benchmark case may run slower than the latter due to the JIT optimization and this case has small size which can be done in a short time that would be more likely affected.

So I also keep it running twice in case this issue.

benchmark.run()
}

override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
runBenchmark("byte array comparisons") {
byteArrayComparisons(1024 * 4)
}
}
}