Skip to content

Conversation

@XiangpengHao
Copy link
Contributor

This was originally found by @MikeWalrus

Basically the ChunkReader for the async reader is ColumnChunkData:

impl ChunkReader for ColumnChunkData {
type T = bytes::buf::Reader<Bytes>;
fn get_read(&self, start: u64) -> crate::errors::Result<Self::T> {
Ok(self.get(start)?.reader())
}
fn get_bytes(&self, start: u64, length: usize) -> crate::errors::Result<Bytes> {
Ok(self.get(start)?.slice(..length))
}
}

Which by itself is Bytes. The original implementation will copy the data from it and later only to make it a new Bytes.
This PR removes it.

Normally this should mean performance improvements across the board, but here're the nuances:

  1. Zero-copy means we need to hold the underlying buffer longer
  2. Original implementation "accidentally" (I'm not sure) gc'ed the buffer
  3. To show meaningful performance difference, we need to use a proper allocator, i.e., mimalloc

tldr: with mimalloc, it will always improve performance, or at least as fast as the original implementation, tested locally with arrow_reader_clickbench

cc @tustvold and @alamb who might know this better

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 30, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao and @MikeWalrus ❤️

I don't think the parquet crate should be setting a global allocator, but otherwise this PR looks great to me

I think this change is justified on code simplification grounds alone even if we can't measure a performance difference

I am also going to explore the potential decompressor optimziation in a follow on PR

rand = { version = "0.9", default-features = false, features = ["std", "std_rng", "thread_rng"] }
object_store = { version = "0.12.0", default-features = false, features = ["azure", "fs"] }
sysinfo = { version = "0.37.1", default-features = false, features = ["system"] }
mimalloc = { version = "*" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add this in the parquet crate as it will conflict with downstream crates that want to use a different allocator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only dev dependency, used to make benchmark more accurate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get some benchmark results with the different allocator

However, I still don't think it is a good idea to use a non-system allocator for dev/benchmarks as it will make the benchmarks potentially farther from what the (average) user actually experiences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, I've removed mimalloc

let decompressed_size = uncompressed_page_size - offset;
let mut decompressed = Vec::with_capacity(uncompressed_page_size);
decompressed.extend_from_slice(&buffer.as_ref()[..offset]);
decompressed.extend_from_slice(&buffer[..offset]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an unrelated (but nice) cleanup

While looking at this code, it seems like it always copies the compressed bytes, even when it then decompresses it immediately. I'll make a small PR to see if I can remove that unecessary copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read
));
}
let buffer = self.reader.get_bytes(data_start, data_len)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm on review this can potentially avoid a copy if the underlying reader is already Bytes

@alamb alamb changed the title Reduce one copy in SerializedPageReader [Parquet] Reduce one copy in SerializedPageReader Oct 31, 2025
@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                                                                                      main                                   reduce-copy
-----                                                                                                      ----                                   -----------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01  1281.8±10.77µs        ? ?/sec    1.00  1274.0±11.88µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1281.9±9.87µs        ? ?/sec    1.00   1282.4±9.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1285.8±9.84µs        ? ?/sec    1.00   1280.1±3.01µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.00    479.5±3.26µs        ? ?/sec    1.03    492.5±2.66µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.01    650.1±5.90µs        ? ?/sec    1.00    646.1±1.42µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.01    491.7±5.60µs        ? ?/sec    1.00    485.3±3.52µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    560.1±2.04µs        ? ?/sec    1.05    588.9±3.92µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.01    720.2±3.47µs        ? ?/sec    1.00    712.9±2.73µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    583.4±7.68µs        ? ?/sec    1.00    585.2±3.90µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    258.6±2.31µs        ? ?/sec    1.08    278.9±2.86µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.09    255.2±0.53µs        ? ?/sec    1.00    233.3±0.90µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    255.9±1.94µs        ? ?/sec    1.07    274.4±2.85µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.24    362.2±2.11µs        ? ?/sec    1.00    291.0±1.64µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.23    347.9±8.01µs        ? ?/sec    1.00    283.5±1.20µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.26    326.0±1.51µs        ? ?/sec    1.00    259.7±1.23µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.24    371.9±1.94µs        ? ?/sec    1.00    299.3±2.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.05  1125.9±24.13µs        ? ?/sec    1.00   1077.3±6.00µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.07   973.2±14.94µs        ? ?/sec    1.00    912.9±5.27µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.04  1131.3±17.87µs        ? ?/sec    1.00   1084.5±5.86µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.12    461.3±3.35µs        ? ?/sec    1.00    411.2±5.22µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.12   647.8±21.73µs        ? ?/sec    1.00    579.8±2.49µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.12    465.2±3.77µs        ? ?/sec    1.00    414.4±1.78µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    160.7±0.39µs        ? ?/sec    1.26    202.1±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.00    303.2±1.08µs        ? ?/sec    1.14    344.2±1.07µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    166.4±0.49µs        ? ?/sec    1.25    207.7±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.0±0.89µs        ? ?/sec    1.55    119.0±0.28µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.00    259.2±7.24µs        ? ?/sec    1.16    301.2±0.46µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     81.4±0.26µs        ? ?/sec    1.53    124.8±0.55µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.01    740.5±8.77µs        ? ?/sec    1.00    735.8±1.34µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.03   597.5±22.81µs        ? ?/sec    1.00    581.6±1.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.01    747.3±7.63µs        ? ?/sec    1.00    741.1±6.62µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.19     67.9±1.62µs        ? ?/sec    1.00     57.1±5.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.12   270.9±19.66µs        ? ?/sec    1.00    242.9±2.08µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.15     75.2±1.63µs        ? ?/sec    1.00     65.6±6.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     94.4±0.15µs        ? ?/sec    1.00     94.3±0.28µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    233.0±0.84µs        ? ?/sec    1.00    233.7±1.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     99.8±0.37µs        ? ?/sec    1.00     99.6±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.5±0.12µs        ? ?/sec    1.01      9.6±0.15µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.00    190.4±0.53µs        ? ?/sec    1.00    191.2±0.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.9±0.23µs        ? ?/sec    1.01     15.0±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    184.4±0.63µs        ? ?/sec    1.00    184.4±0.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    344.5±1.83µs        ? ?/sec    1.00    344.9±0.74µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    189.7±0.47µs        ? ?/sec    1.00    190.5±0.76µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     13.6±0.10µs        ? ?/sec    1.05     14.3±0.11µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.00    259.4±2.30µs        ? ?/sec    1.00    260.5±1.67µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.1±0.40µs        ? ?/sec    1.01     20.2±0.52µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    366.1±0.59µs        ? ?/sec    1.00    367.6±3.68µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    382.1±1.53µs        ? ?/sec    1.01    385.6±3.68µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    373.6±0.77µs        ? ?/sec    1.00    373.2±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.07     27.2±0.47µs        ? ?/sec    1.00     25.5±0.66µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.02    217.8±0.79µs        ? ?/sec    1.00    214.5±2.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.04     34.0±0.27µs        ? ?/sec    1.00     32.6±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    122.6±0.31µs        ? ?/sec    1.02    125.0±0.30µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    124.0±0.39µs        ? ?/sec    1.11    137.7±0.50µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    126.0±0.33µs        ? ?/sec    1.02    128.2±0.23µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    178.0±0.55µs        ? ?/sec    1.02    180.9±0.64µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    207.9±0.56µs        ? ?/sec    1.12    233.0±1.02µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    183.6±0.52µs        ? ?/sec    1.01    186.3±0.56µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.03     77.0±0.29µs        ? ?/sec    1.00     75.0±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    154.3±1.18µs        ? ?/sec    1.15    178.0±0.44µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.02     82.3±0.23µs        ? ?/sec    1.00     80.7±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.02    136.9±1.20µs        ? ?/sec    1.00    134.8±0.55µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    186.8±1.66µs        ? ?/sec    1.13    210.5±0.96µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    140.8±1.13µs        ? ?/sec    1.00    140.5±0.92µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.0±0.30µs        ? ?/sec    1.00     74.1±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    153.2±0.60µs        ? ?/sec    1.15    176.1±0.49µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.3±0.41µs        ? ?/sec    1.00     78.4±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.02    114.4±0.15µs        ? ?/sec    1.00    112.2±0.57µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.15    135.3±0.44µs        ? ?/sec    1.00    117.7±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.04    118.1±0.22µs        ? ?/sec    1.00    113.5±0.73µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.04    174.3±0.38µs        ? ?/sec    1.00    167.0±0.84µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.16    241.1±1.50µs        ? ?/sec    1.00    207.1±0.62µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.05    180.4±0.62µs        ? ?/sec    1.00    172.0±0.55µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    201.8±0.67µs        ? ?/sec    1.00    200.9±0.59µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.13    251.1±0.67µs        ? ?/sec    1.00    221.5±0.55µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.3±0.31µs        ? ?/sec    1.00    207.5±1.20µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    142.1±0.26µs        ? ?/sec    1.00    142.2±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.15    218.6±1.54µs        ? ?/sec    1.00    190.1±1.75µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    148.5±0.36µs        ? ?/sec    1.00    147.8±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.04    105.3±1.09µs        ? ?/sec    1.00    101.2±0.90µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.16    199.8±1.03µs        ? ?/sec    1.00    171.9±1.58µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    110.1±1.57µs        ? ?/sec    1.02    112.0±1.56µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00     99.1±0.28µs        ? ?/sec    1.01    100.2±0.43µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    102.8±0.29µs        ? ?/sec    1.14    116.7±0.38µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00    102.3±0.25µs        ? ?/sec    1.01    103.2±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    134.4±1.29µs        ? ?/sec    1.02    137.4±2.52µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    165.0±0.51µs        ? ?/sec    1.17    193.6±0.42µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    138.7±0.26µs        ? ?/sec    1.03    142.6±2.29µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.02     44.5±0.51µs        ? ?/sec    1.00     43.6±0.15µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    118.1±0.20µs        ? ?/sec    1.23    144.7±2.73µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.02     49.3±0.11µs        ? ?/sec    1.00     48.5±0.09µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.01    102.2±0.44µs        ? ?/sec    1.00    101.1±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    151.4±0.51µs        ? ?/sec    1.17    177.3±0.53µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.01    107.8±0.84µs        ? ?/sec    1.00    106.9±0.22µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.01     38.2±0.11µs        ? ?/sec    1.00     37.9±0.19µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    114.5±0.45µs        ? ?/sec    1.24    142.4±0.96µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     43.8±0.24µs        ? ?/sec    1.00     43.4±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     95.6±0.17µs        ? ?/sec    1.03     98.5±0.56µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00     95.7±0.27µs        ? ?/sec    1.16    111.1±0.37µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     98.5±0.28µs        ? ?/sec    1.03    101.5±0.38µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    125.0±0.42µs        ? ?/sec    1.03    129.1±2.04µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    151.3±0.32µs        ? ?/sec    1.20    181.2±1.45µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    129.8±0.31µs        ? ?/sec    1.03    133.2±0.51µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.9±0.27µs        ? ?/sec    1.00     24.8±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00     99.9±4.17µs        ? ?/sec    1.27    127.0±0.60µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     31.1±0.34µs        ? ?/sec    1.01     31.4±0.17µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     83.1±0.18µs        ? ?/sec    1.01     83.7±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    132.6±0.29µs        ? ?/sec    1.20    158.8±2.66µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.01     87.7±0.19µs        ? ?/sec    1.00     87.2±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     17.7±0.30µs        ? ?/sec    1.00     17.7±0.46µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00     93.0±0.47µs        ? ?/sec    1.33    124.1±1.59µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     25.3±0.34µs        ? ?/sec    1.02     25.9±0.63µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.06     88.2±0.44µs        ? ?/sec    1.00     83.5±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.17    107.1±0.75µs        ? ?/sec    1.00     91.2±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.05     91.1±0.36µs        ? ?/sec    1.00     86.5±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.07    121.1±0.63µs        ? ?/sec    1.00    113.6±0.89µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.20    177.2±0.79µs        ? ?/sec    1.00    147.4±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.08    126.5±1.27µs        ? ?/sec    1.00    117.6±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    150.9±0.52µs        ? ?/sec    1.00    148.8±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.16    195.3±0.55µs        ? ?/sec    1.00    168.2±0.88µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.02    157.0±0.32µs        ? ?/sec    1.00    154.7±0.65µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     91.1±0.68µs        ? ?/sec    1.00     91.0±0.50µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.24    170.1±2.27µs        ? ?/sec    1.00    137.2±1.24µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     96.3±0.51µs        ? ?/sec    1.00     96.4±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     40.7±0.74µs        ? ?/sec    1.06     43.1±0.60µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.23    137.7±0.77µs        ? ?/sec    1.00    111.8±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.09     53.4±1.64µs        ? ?/sec    1.00     48.9±0.57µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     98.0±1.82µs        ? ?/sec    1.03    101.0±0.28µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00     99.0±0.21µs        ? ?/sec    1.16    114.8±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     99.8±0.31µs        ? ?/sec    1.04    103.8±0.34µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.00    130.5±1.21µs        ? ?/sec    1.04    135.5±4.94µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    159.0±1.79µs        ? ?/sec    1.18    188.1±1.19µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    134.4±1.11µs        ? ?/sec    1.04    140.1±0.74µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     35.1±0.14µs        ? ?/sec    1.03     36.2±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.00    110.7±1.23µs        ? ?/sec    1.23    136.2±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     40.1±0.20µs        ? ?/sec    1.04     41.5±0.17µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     94.0±0.22µs        ? ?/sec    1.00     94.1±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    141.7±0.34µs        ? ?/sec    1.20    170.1±0.55µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00     99.9±0.27µs        ? ?/sec    1.00     99.9±0.74µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.4±0.07µs        ? ?/sec    1.01     30.6±0.14µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    108.0±0.46µs        ? ?/sec    1.25    134.5±0.37µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.01     36.1±0.19µs        ? ?/sec    1.00     35.7±0.12µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      6.9±0.04ms        ? ?/sec    1.05      7.3±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     12.7±0.12ms        ? ?/sec    1.07     13.6±0.27ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.02    495.7±3.90µs        ? ?/sec    1.00    484.0±4.05µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    650.8±2.22µs        ? ?/sec    1.00    652.2±2.22µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.00    498.5±2.99µs        ? ?/sec    1.00    499.5±4.25µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.01    683.3±2.55µs        ? ?/sec    1.00    679.4±3.24µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.00    769.9±2.80µs        ? ?/sec    1.00    766.1±3.16µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.01    692.6±2.27µs        ? ?/sec    1.00    688.4±3.07µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.00    295.2±0.63µs        ? ?/sec    1.01    297.4±6.45µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.06    376.4±1.07µs        ? ?/sec    1.00    356.2±1.10µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.00    300.9±1.14µs        ? ?/sec    1.01    302.7±1.72µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    251.3±2.55µs        ? ?/sec    1.09    273.7±4.12µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.12    263.4±3.30µs        ? ?/sec    1.00    235.9±0.76µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    266.5±3.59µs        ? ?/sec    1.05    278.6±2.87µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.00    454.7±1.99µs        ? ?/sec    1.10    499.3±1.46µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.03    362.9±1.34µs        ? ?/sec    1.00    351.9±1.39µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.00    462.5±2.13µs        ? ?/sec    1.11    511.8±9.48µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    107.4±0.55µs        ? ?/sec    1.04    111.5±0.50µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    106.5±0.19µs        ? ?/sec    1.15    122.5±0.17µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    110.1±0.21µs        ? ?/sec    1.04    114.4±0.30µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    146.2±0.27µs        ? ?/sec    1.02    148.8±0.50µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    171.9±0.29µs        ? ?/sec    1.18    202.0±0.77µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    151.0±0.67µs        ? ?/sec    1.02    154.0±0.39µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     44.1±0.29µs        ? ?/sec    1.00     44.2±0.40µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.00    118.3±0.29µs        ? ?/sec    1.22    144.7±1.20µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     49.2±0.19µs        ? ?/sec    1.00     49.1±0.12µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    102.1±0.22µs        ? ?/sec    1.00    102.1±0.25µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    151.3±0.52µs        ? ?/sec    1.17    177.1±1.83µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    107.5±1.98µs        ? ?/sec    1.00    107.9±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.00     38.1±0.10µs        ? ?/sec    1.00     38.1±0.14µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.00    115.6±0.19µs        ? ?/sec    1.23    141.6±0.49µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.8±0.09µs        ? ?/sec    1.00     43.8±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     96.8±0.17µs        ? ?/sec    1.03     99.8±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00     97.1±0.32µs        ? ?/sec    1.16    112.6±0.39µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00    100.7±0.22µs        ? ?/sec    1.02    102.8±0.34µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.00    126.3±0.51µs        ? ?/sec    1.03    129.7±0.36µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    153.4±0.45µs        ? ?/sec    1.19    182.1±2.12µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.00    131.6±0.27µs        ? ?/sec    1.03    135.4±2.18µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     26.1±0.18µs        ? ?/sec    1.00     26.2±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.00     99.7±0.75µs        ? ?/sec    1.26    125.5±1.00µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     31.0±0.27µs        ? ?/sec    1.01     31.3±0.26µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.01     85.3±0.50µs        ? ?/sec    1.00     84.4±0.26µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    133.2±0.46µs        ? ?/sec    1.20    159.4±1.02µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.01     90.3±0.39µs        ? ?/sec    1.00     89.6±0.29µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.02     21.0±0.33µs        ? ?/sec    1.00     20.6±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00     97.5±0.36µs        ? ?/sec    1.27    123.5±0.28µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.01     26.1±0.34µs        ? ?/sec    1.00     25.9±0.39µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.05     87.9±0.34µs        ? ?/sec    1.00     83.5±0.34µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.18    107.4±0.33µs        ? ?/sec    1.00     91.1±0.29µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.06     91.2±0.23µs        ? ?/sec    1.00     86.1±0.26µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.08    122.4±1.06µs        ? ?/sec    1.00    113.6±0.58µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.21    177.3±0.81µs        ? ?/sec    1.00    146.9±0.41µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.08    125.9±1.86µs        ? ?/sec    1.00    117.1±0.61µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.00    149.7±1.39µs        ? ?/sec    1.01    151.4±0.83µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.16    196.2±0.57µs        ? ?/sec    1.00    169.6±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.00    155.6±0.33µs        ? ?/sec    1.01    156.9±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     90.1±0.46µs        ? ?/sec    1.01     90.7±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.12    162.4±0.40µs        ? ?/sec    1.00    144.4±0.97µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     96.1±0.59µs        ? ?/sec    1.00     96.0±1.58µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.07     47.4±1.86µs        ? ?/sec    1.00     44.1±0.83µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.22    137.2±0.41µs        ? ?/sec    1.00    112.9±0.44µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.09     54.3±2.11µs        ? ?/sec    1.00     50.0±0.80µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.00    101.4±1.04µs        ? ?/sec    1.03    104.7±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    100.9±0.54µs        ? ?/sec    1.16    117.0±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.00    104.1±0.91µs        ? ?/sec    1.03    107.8±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.00    135.5±0.27µs        ? ?/sec    1.03    139.4±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    161.5±0.41µs        ? ?/sec    1.19    192.2±0.59µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.00    140.3±0.47µs        ? ?/sec    1.03    145.2±2.79µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     34.3±0.08µs        ? ?/sec    1.06     36.2±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.00    109.4±0.45µs        ? ?/sec    1.25    136.5±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     40.0±0.25µs        ? ?/sec    1.03     41.1±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     94.1±0.23µs        ? ?/sec    1.00     93.9±0.10µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    141.7±0.45µs        ? ?/sec    1.20    169.7±0.82µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00     99.7±0.39µs        ? ?/sec    1.00     99.9±0.17µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.01     30.4±0.12µs        ? ?/sec    1.00     30.2±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.00    108.1±0.64µs        ? ?/sec    1.24    134.2±0.31µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     35.5±0.09µs        ? ?/sec    1.01     35.8±0.15µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                main                                   reduce-copy
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.03      2.4±0.01ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.0±0.50ms        ? ?/sec    1.06     14.9±1.31ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.5±0.68ms        ? ?/sec    1.00     16.4±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     28.7±0.41ms        ? ?/sec    1.00     28.2±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.02     40.1±0.51ms        ? ?/sec    1.00     39.2±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.03     38.9±0.51ms        ? ?/sec    1.00     37.7±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.03      6.0±0.29ms        ? ?/sec    1.00      5.8±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.21    137.7±4.77ms        ? ?/sec    1.00    114.3±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.34   177.8±14.43ms        ? ?/sec    1.00    133.0±1.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.16    284.8±8.14ms        ? ?/sec    1.00    244.6±3.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13    434.5±3.66ms        ? ?/sec    1.00    384.3±6.56ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     45.0±0.94ms        ? ?/sec    1.00     44.8±0.90ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.05    106.0±0.91ms        ? ?/sec    1.00    100.6±1.01ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.05    106.8±0.91ms        ? ?/sec    1.00    101.3±1.25ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     55.9±0.53ms        ? ?/sec    1.00     55.8±0.80ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.02    125.7±1.09ms        ? ?/sec    1.00    123.5±2.49ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.03    100.7±0.83ms        ? ?/sec    1.00     97.6±2.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     38.2±0.42ms        ? ?/sec    1.00     37.6±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.05     50.4±0.91ms        ? ?/sec    1.00     47.9±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.02     47.4±0.87ms        ? ?/sec    1.00     46.7±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.03     37.5±0.45ms        ? ?/sec    1.00     36.4±0.94ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.02     14.1±0.23ms        ? ?/sec    1.00     13.8±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.6±0.19ms        ? ?/sec    1.07     10.3±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.2±0.20ms        ? ?/sec    1.05     11.8±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.12     39.4±0.49ms        ? ?/sec    1.00     35.2±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.12     50.6±0.67ms        ? ?/sec    1.00     45.2±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.09     48.2±0.56ms        ? ?/sec    1.00     44.1±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.3±0.07ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.18    179.3±1.67ms        ? ?/sec    1.00    151.9±0.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.19    238.7±2.29ms        ? ?/sec    1.00    200.1±1.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.27    505.9±5.78ms        ? ?/sec    1.00    398.9±8.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.12   439.9±15.02ms        ? ?/sec    1.00    392.5±6.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     52.2±1.09ms        ? ?/sec    1.03     53.9±0.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    156.2±1.55ms        ? ?/sec    1.02    159.9±3.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    154.8±1.61ms        ? ?/sec    1.00    154.4±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     52.7±0.52ms        ? ?/sec    1.01     53.5±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    158.1±2.49ms        ? ?/sec    1.02    160.6±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     91.6±1.33ms        ? ?/sec    1.03     94.3±0.98ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     30.2±0.26ms        ? ?/sec    1.01     30.4±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     35.5±0.77ms        ? ?/sec    1.01     36.0±0.58ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.5±0.79ms        ? ?/sec    1.03     46.7±1.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     34.3±0.61ms        ? ?/sec    1.03     35.5±1.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.9±0.21ms        ? ?/sec    1.01     13.1±0.28ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                                                                main                                   reduce-copy
-----                                                                                ----                                   -----------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1749.9±14.97µs        ? ?/sec    1.00  1747.5±20.09µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.10      2.1±0.03ms        ? ?/sec    1.00  1868.1±21.35µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00   1588.7±6.67µs        ? ?/sec    1.01  1610.6±16.96µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.07  1693.5±13.75µs        ? ?/sec    1.00  1577.6±19.79µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.01  1554.1±15.19µs        ? ?/sec    1.00  1544.3±23.80µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.10  1892.6±20.95µs        ? ?/sec    1.00  1727.9±25.35µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1379.2±10.84µs        ? ?/sec    1.00  1373.7±11.02µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.08  1487.1±13.62µs        ? ?/sec    1.00   1373.6±8.02µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00  1725.4±18.15µs        ? ?/sec    1.01  1749.4±16.22µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.10      2.1±0.02ms        ? ?/sec    1.00  1872.0±27.19µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00  1602.1±12.67µs        ? ?/sec    1.00  1596.1±13.29µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.07  1676.2±19.61µs        ? ?/sec    1.00  1563.0±15.28µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.01   958.0±11.64µs        ? ?/sec    1.00    949.0±6.65µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.10    992.4±7.11µs        ? ?/sec    1.00    899.2±4.48µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00   844.1±11.15µs        ? ?/sec    1.03    869.4±6.86µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.10    980.4±5.60µs        ? ?/sec    1.00    888.9±8.03µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.00      4.2±0.08ms        ? ?/sec    1.00      4.2±0.05ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.01      4.2±0.07ms        ? ?/sec    1.00      4.1±0.08ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.00      3.7±0.05ms        ? ?/sec    1.02      3.7±0.06ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.04      3.6±0.04ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.04      2.0±0.03ms        ? ?/sec    1.00  1959.6±19.24µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.07      2.2±0.03ms        ? ?/sec    1.00      2.1±0.03ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.02  1843.7±17.81µs        ? ?/sec    1.00  1812.4±24.43µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.08  1963.2±19.20µs        ? ?/sec    1.00  1820.3±20.10µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.02  1299.6±22.33µs        ? ?/sec    1.00  1276.4±13.94µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.10  1440.6±12.99µs        ? ?/sec    1.00  1312.7±14.66µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1162.7±11.46µs        ? ?/sec    1.01  1178.8±20.53µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.10  1295.1±16.59µs        ? ?/sec    1.00  1173.5±10.62µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.01      4.4±0.11ms        ? ?/sec    1.00      4.3±0.09ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.03      5.0±0.05ms        ? ?/sec    1.00      4.9±0.08ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      3.6±0.03ms        ? ?/sec    1.03      3.7±0.05ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.02      3.5±0.04ms        ? ?/sec    1.00      3.4±0.08ms        ? ?/sec

@XiangpengHao
Copy link
Contributor Author

Looks like we have some meaningful perf difference here:

arrow_reader_clickbench/async/Q20    1.21    137.7±4.77ms        ? ?/sec    1.00    114.3±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.34   177.8±14.43ms        ? ?/sec    1.00    133.0±1.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.16    284.8±8.14ms        ? ?/sec    1.00    244.6±3.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13    434.5±3.66ms        ? ?/sec    1.00    384.3±6.56ms        ? ?/sec

arrow_reader_clickbench/sync/Q20     1.18    179.3±1.67ms        ? ?/sec    1.00    151.9±0.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.19    238.7±2.29ms        ? ?/sec    1.00    200.1±1.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.27    505.9±5.78ms        ? ?/sec    1.00    398.9±8.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.12   439.9±15.02ms        ? ?/sec    1.00    392.5±6.07ms        ? ?/sec

But I'm not sure if this is because a reduced copy or mimalloc, should we run arrow_reader_clickbench again since I've removed mimalloc?

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

will rerun

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @XiangpengHao and @MikeWalrus

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (afa2d1c) to 2eabb59 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                main                                   reduce-copy
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.00      2.4±0.03ms        ? ?/sec    1.01      2.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.02     14.6±0.37ms        ? ?/sec    1.00     14.3±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.3±0.35ms        ? ?/sec    1.00     16.3±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     29.2±0.29ms        ? ?/sec    1.00     28.7±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     40.2±0.28ms        ? ?/sec    1.00     40.0±2.51ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     37.9±0.22ms        ? ?/sec    1.00     37.9±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.09      6.3±0.20ms        ? ?/sec    1.00      5.7±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    119.1±0.65ms        ? ?/sec    1.12    133.1±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    139.5±0.64ms        ? ?/sec    1.17    163.0±1.41ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    289.7±8.89ms        ? ?/sec    1.08   313.2±23.16ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.03    434.4±4.53ms        ? ?/sec    1.00    420.3±2.78ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     44.7±0.52ms        ? ?/sec    1.00     44.5±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.02    104.7±0.52ms        ? ?/sec    1.00    102.8±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01    104.9±0.52ms        ? ?/sec    1.00    103.8±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     54.2±0.46ms        ? ?/sec    1.01     54.5±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.01    124.3±0.54ms        ? ?/sec    1.00    122.9±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01     99.9±0.94ms        ? ?/sec    1.00     98.9±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.04     38.4±0.28ms        ? ?/sec    1.00     37.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.04     50.6±0.52ms        ? ?/sec    1.00     48.5±0.58ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.09     49.4±0.36ms        ? ?/sec    1.00     45.2±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.10     38.6±0.38ms        ? ?/sec    1.00     35.2±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.07     14.2±0.17ms        ? ?/sec    1.00     13.4±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.4±0.08ms        ? ?/sec    1.07     10.1±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.0±0.08ms        ? ?/sec    1.07     11.7±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     38.8±0.32ms        ? ?/sec    1.05     40.6±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     49.6±0.30ms        ? ?/sec    1.04     51.6±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     47.7±0.30ms        ? ?/sec    1.04     49.8±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.03ms        ? ?/sec    1.04      4.5±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    176.9±0.83ms        ? ?/sec    1.05    185.4±1.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    240.0±1.92ms        ? ?/sec    1.04    249.0±2.83ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    484.3±2.10ms        ? ?/sec    1.07    516.7±6.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    425.8±9.53ms        ? ?/sec    1.08   459.7±14.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     50.5±0.57ms        ? ?/sec    1.08     54.7±1.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    153.7±0.79ms        ? ?/sec    1.06    162.3±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    153.1±1.70ms        ? ?/sec    1.03    158.0±0.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     51.8±0.82ms        ? ?/sec    1.03     53.2±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    153.7±0.91ms        ? ?/sec    1.07    164.7±1.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.3±0.52ms        ? ?/sec    1.05     94.7±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.8±0.17ms        ? ?/sec    1.03     30.7±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.6±0.50ms        ? ?/sec    1.06     36.5±0.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.8±0.25ms        ? ?/sec    1.01     46.1±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     34.4±0.33ms        ? ?/sec    1.02     35.2±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.9±0.09ms        ? ?/sec    1.02     13.2±0.20ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (afa2d1c) to 2eabb59 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                                                                main                                   reduce-copy
-----                                                                                ----                                   -----------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.04  1768.8±23.93µs        ? ?/sec    1.00   1699.5±7.78µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.15      2.1±0.03ms        ? ?/sec    1.00  1839.7±13.25µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.01   1578.2±8.86µs        ? ?/sec    1.00   1568.5±6.41µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.09  1676.9±10.22µs        ? ?/sec    1.00   1542.5±6.74µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00   1517.6±4.89µs        ? ?/sec    1.01  1531.7±11.99µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.10  1886.6±12.57µs        ? ?/sec    1.00  1713.3±15.70µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1357.3±12.34µs        ? ?/sec    1.01  1368.7±13.58µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.08  1496.1±10.25µs        ? ?/sec    1.00  1382.7±17.43µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00  1724.4±10.68µs        ? ?/sec    1.00  1730.7±10.46µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.12      2.0±0.02ms        ? ?/sec    1.00  1815.8±10.90µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.02   1585.4±9.35µs        ? ?/sec    1.00   1557.7±6.13µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.07   1661.6±9.37µs        ? ?/sec    1.00   1545.9±6.34µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00    957.3±8.79µs        ? ?/sec    1.00    961.2±6.71µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.12  1006.2±15.31µs        ? ?/sec    1.00    900.8±4.84µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00    856.4±6.83µs        ? ?/sec    1.02    872.6±4.37µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.13    996.1±4.92µs        ? ?/sec    1.00    884.7±3.49µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.02      4.2±0.06ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.06      4.2±0.05ms        ? ?/sec    1.00      4.0±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.04      3.7±0.06ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.08      3.6±0.05ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.01      2.0±0.02ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.07      2.3±0.03ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00  1861.9±33.91µs        ? ?/sec    1.00  1867.7±18.73µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.04  1961.0±27.33µs        ? ?/sec    1.00  1883.6±18.13µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.02  1293.9±13.68µs        ? ?/sec    1.00  1263.4±13.21µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.10  1452.5±12.27µs        ? ?/sec    1.00   1317.2±8.40µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.02  1182.6±10.23µs        ? ?/sec    1.00  1158.9±12.49µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.12   1310.4±9.89µs        ? ?/sec    1.00   1170.9±6.46µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.00      4.2±0.02ms        ? ?/sec    1.04      4.4±0.07ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.00      4.9±0.02ms        ? ?/sec    1.00      4.9±0.05ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      3.6±0.01ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.04      3.5±0.07ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec

@MikeWalrus
Copy link

MikeWalrus commented Nov 1, 2025

arrow_reader_clickbench/sync/Q20     1.00    176.9±0.83ms        ? ?/sec    1.05    185.4±1.24ms        ? ?/sec

Zero-copy means we need to hold the underlying buffer longer

The regression is indeed related to memory allocation (brk, page faults, etc.), but I'm not sure if it's due to the read buffers being held. Those buffers are small and freed quickly.

I traced that query and made a visualization. Interactive version. Script and raw traces.
image

@XiangpengHao @alamb Any thoughts?

@XiangpengHao
Copy link
Contributor Author

I'm pretty sure this has to do with things not under our control, and I wouldn't call it a regression, but system benchmark artifact.

image

⬆️ Both binary are the same, as evidenced by the same hash. But their performance is very very different. and it seems to has something to do with the filename length...

===========

One more thing about page-faults:

slow one:

 5,290,025      page-faults                      #  323.566 K/sec

fast one:

33,683      page-faults                      #    2.473 K/sec

=====

I have disabled ASLR, so it must come from some weird things in the glibc allocator...

@MikeWalrus
Copy link

and it seems to has something to do with the filename length...

Fyi, that difference goes away if there's no target/criterion. (or without --bench)
image

So without that weird artifact, there's still something going on that makes this branch slower than main.
image

Also, the memory allocation trace I got was without --bench, i.e.
image

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

The regression is indeed related to memory allocation (brk, page faults, etc.), but I'm not sure if it's due to the read buffers being held. Those buffers are small and freed quickly.

So in my mind the most recent benchmark results show a performance improvement for this branch

In an ideal world, benchmark results would be 100% reproducible and free of noise. However, in the real world, especially on the "machine" I am using to benchmark (a VM) there are many sources of noise in the measurements:

  1. Random other processes (apt-get update for example) deciding to do their work during a run
  2. State of the kernel VM
  3. Hardware thermal state
  4. Other tenants running on the same hardware
  5. etc

So while I applaud our efforts here to be scientific, I also think it has passed the level of scrutiny needed

From my perspective, the code after this PR is clearly doing less work, and shows improvements in the benchmarks (even if there is some noise), thus it is a net improvement over what is on main

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

I traced that query and made a visualization. Interactive version. Script and raw traces.

The memory usage over time is quite neat
508505912-8778e40a-8217-4fa0-a52f-8db386eddb5f

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

I'm pretty sure this has to do with things not under our control, and I wouldn't call it a regression, but system benchmark artifact.

I agree 100% with this

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

Let's merge this one in and keep making the code better going forward

@alamb alamb merged commit 3f3feed into apache:main Nov 1, 2025
16 checks passed
@MikeWalrus
Copy link

@alamb I agree. Thanks!

@alamb
Copy link
Contributor

alamb commented Nov 3, 2025

@alamb I agree. Thanks!

Thank you for the (great!) find

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants