Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions sql/core/benchmarks/JSONBenchmark-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Benchmark for performance of JSON parsing
================================================================================================

Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
JSON schema inferring: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
No encoding 62946 / 63310 1.6 629.5 1.0X
UTF-8 is set 112814 / 112866 0.9 1128.1 0.6X
No encoding 52255 / 52438 1.9 522.5 1.0X
UTF-8 is set 76641 / 77110 1.3 766.4 0.7X

Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
No encoding 16468 / 16553 6.1 164.7 1.0X
UTF-8 is set 16420 / 16441 6.1 164.2 1.0X
No encoding 58243 / 58613 1.7 582.4 1.0X
UTF-8 is set 81752 / 83249 1.2 817.5 0.7X

Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
JSON parsing of wide lines: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
No encoding 39789 / 40053 0.3 3978.9 1.0X
UTF-8 is set 39505 / 39584 0.3 3950.5 1.0X
No encoding 117087 / 117211 0.1 11708.7 1.0X
UTF-8 is set 142492 / 143970 0.1 14249.2 0.8X

OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Count a dataset with 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 10 columns + count() 15997 / 16015 0.6 1599.7 1.0X
Select 1 column + count() 13280 / 13326 0.8 1328.0 1.2X
count() 3006 / 3021 3.3 300.6 5.3X
Select 10 columns + count() 14592 / 14811 0.7 1459.2 1.0X
Select 1 column + count() 10885 / 10994 0.9 1088.5 1.3X
count() 2283 / 2300 4.4 228.3 6.4X


Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ object JSONBenchmark extends SqlBasedBenchmark {
spark.read
.schema(schema)
.json(path.getAbsolutePath)
.filter((_: Row) => true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk . This is another benchmark case, isn't it?
We should have different benchmark cases for these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a follow-up. Please create another JIRA to add these test cases.

Copy link
Member Author

@MaxGekk MaxGekk Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another benchmark case, isn't it?

Originally I added the benchmark to check how specifying of encoding impacts on performance (see #20937). This worked well till #21909 . Currently the benchmark just test how fast JSON datasource can create empty rows (in the case of count()) which is checked by another benchmark.

I believe this PR is just follow up of #21909 which must include the changes proposed in the PR.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk . In your PR (#21909), you already showed the effect via benchmark .

What I mean is both test cases are meaningful and worth to have. :) And, we need to compare both results in the future release.

In any way, please create new different benchmark cases for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how do you want to call old and new test cases?

  • For old case, we can give a new name.
  • For new case, JSON per-line parsing: looks not a little bit accurate because we have filters now.

.count()
}

Expand All @@ -94,6 +95,7 @@ object JSONBenchmark extends SqlBasedBenchmark {
.option("encoding", "UTF-8")
.schema(schema)
.json(path.getAbsolutePath)
.filter((_: Row) => true)
.count()
}

Expand Down Expand Up @@ -126,6 +128,7 @@ object JSONBenchmark extends SqlBasedBenchmark {
spark.read
.schema(schema)
.json(path.getAbsolutePath)
.filter((_: Row) => true)
.count()
}

Expand All @@ -134,6 +137,7 @@ object JSONBenchmark extends SqlBasedBenchmark {
.option("encoding", "UTF-8")
.schema(schema)
.json(path.getAbsolutePath)
.filter((_: Row) => true)
.count()
}

Expand Down