[Parquet] Reduce one copy in `SerializedPageReader` #8745

XiangpengHao · 2025-10-30T18:36:55Z

This was originally found by @MikeWalrus

Basically the ChunkReader for the async reader is ColumnChunkData:

arrow-rs/parquet/src/arrow/in_memory_row_group.rs

Lines 282 to 292 in 2eabb59

    
           impl ChunkReader for ColumnChunkData { 
        
               type T = bytes::buf::Reader<Bytes>; 
        
               fn get_read(&self, start: u64) -> crate::errors::Result<Self::T> { 
        
                   Ok(self.get(start)?.reader()) 
        
               } 
        
               fn get_bytes(&self, start: u64, length: usize) -> crate::errors::Result<Bytes> { 
        
                   Ok(self.get(start)?.slice(..length)) 
        
               } 
        
           }

Which by itself is Bytes. The original implementation will copy the data from it and later only to make it a new Bytes.
This PR removes it.

Normally this should mean performance improvements across the board, but here're the nuances:

Zero-copy means we need to hold the underlying buffer longer
Original implementation "accidentally" (I'm not sure) gc'ed the buffer
To show meaningful performance difference, we need to use a proper allocator, i.e., mimalloc

tldr: with mimalloc, it will always improve performance, or at least as fast as the original implementation, tested locally with arrow_reader_clickbench

cc @tustvold and @alamb who might know this better

alamb

Thank you @XiangpengHao and @MikeWalrus ❤️

I don't think the parquet crate should be setting a global allocator, but otherwise this PR looks great to me

I think this change is justified on code simplification grounds alone even if we can't measure a performance difference

I am also going to explore the potential decompressor optimziation in a follow on PR

alamb · 2025-10-31T13:03:02Z

parquet/Cargo.toml

 rand = { version = "0.9", default-features = false, features = ["std", "std_rng", "thread_rng"] }
 object_store = { version = "0.12.0", default-features = false, features = ["azure", "fs"] }
 sysinfo = { version = "0.37.1", default-features = false, features = ["system"] }
+mimalloc = { version = "*" }


I don't think we should add this in the parquet crate as it will conflict with downstream crates that want to use a different allocator

This is only dev dependency, used to make benchmark more accurate

Let's get some benchmark results with the different allocator

However, I still don't think it is a good idea to use a non-system allocator for dev/benchmarks as it will make the benchmarks potentially farther from what the (average) user actually experiences

make sense, I've removed mimalloc

alamb · 2025-10-31T13:07:04Z

parquet/src/file/serialized_reader.rs

            let decompressed_size = uncompressed_page_size - offset;
            let mut decompressed = Vec::with_capacity(uncompressed_page_size);
-            decompressed.extend_from_slice(&buffer.as_ref()[..offset]);
+            decompressed.extend_from_slice(&buffer[..offset]);


This seems an unrelated (but nice) cleanup

While looking at this code, it seems like it always copies the compressed bytes, even when it then decompresses it immediately. I'll make a small PR to see if I can remove that unecessary copy

Follow on:

POC: Avoid a copy for uncompressed pages #8756

alamb · 2025-10-31T13:08:27Z

parquet/src/file/serialized_reader.rs

-                            read
-                        ));
-                    }
+                    let buffer = self.reader.get_bytes(data_start, data_len)?;


I confirm on review this can potentially avoid a copy if the underlying reader is already Bytes

alamb · 2025-10-31T13:39:16Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

alamb · 2025-10-31T14:58:40Z

🤖: Benchmark completed

Details

group                                                                                                      main                                   reduce-copy
-----                                                                                                      ----                                   -----------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01  1281.8±10.77µs        ? ?/sec    1.00  1274.0±11.88µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1281.9±9.87µs        ? ?/sec    1.00   1282.4±9.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1285.8±9.84µs        ? ?/sec    1.00   1280.1±3.01µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.00    479.5±3.26µs        ? ?/sec    1.03    492.5±2.66µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.01    650.1±5.90µs        ? ?/sec    1.00    646.1±1.42µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.01    491.7±5.60µs        ? ?/sec    1.00    485.3±3.52µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    560.1±2.04µs        ? ?/sec    1.05    588.9±3.92µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.01    720.2±3.47µs        ? ?/sec    1.00    712.9±2.73µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    583.4±7.68µs        ? ?/sec    1.00    585.2±3.90µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    258.6±2.31µs        ? ?/sec    1.08    278.9±2.86µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.09    255.2±0.53µs        ? ?/sec    1.00    233.3±0.90µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    255.9±1.94µs        ? ?/sec    1.07    274.4±2.85µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.24    362.2±2.11µs        ? ?/sec    1.00    291.0±1.64µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.23    347.9±8.01µs        ? ?/sec    1.00    283.5±1.20µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.26    326.0±1.51µs        ? ?/sec    1.00    259.7±1.23µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.24    371.9±1.94µs        ? ?/sec    1.00    299.3±2.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.05  1125.9±24.13µs        ? ?/sec    1.00   1077.3±6.00µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.07   973.2±14.94µs        ? ?/sec    1.00    912.9±5.27µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.04  1131.3±17.87µs        ? ?/sec    1.00   1084.5±5.86µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.12    461.3±3.35µs        ? ?/sec    1.00    411.2±5.22µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.12   647.8±21.73µs        ? ?/sec    1.00    579.8±2.49µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.12    465.2±3.77µs        ? ?/sec    1.00    414.4±1.78µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    160.7±0.39µs        ? ?/sec    1.26    202.1±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.00    303.2±1.08µs        ? ?/sec    1.14    344.2±1.07µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    166.4±0.49µs        ? ?/sec    1.25    207.7±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.0±0.89µs        ? ?/sec    1.55    119.0±0.28µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.00    259.2±7.24µs        ? ?/sec    1.16    301.2±0.46µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     81.4±0.26µs        ? ?/sec    1.53    124.8±0.55µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.01    740.5±8.77µs        ? ?/sec    1.00    735.8±1.34µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.03   597.5±22.81µs        ? ?/sec    1.00    581.6±1.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.01    747.3±7.63µs        ? ?/sec    1.00    741.1±6.62µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.19     67.9±1.62µs        ? ?/sec    1.00     57.1±5.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.12   270.9±19.66µs        ? ?/sec    1.00    242.9±2.08µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.15     75.2±1.63µs        ? ?/sec    1.00     65.6±6.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     94.4±0.15µs        ? ?/sec    1.00     94.3±0.28µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    233.0±0.84µs        ? ?/sec    1.00    233.7±1.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     99.8±0.37µs        ? ?/sec    1.00     99.6±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.5±0.12µs        ? ?/sec    1.01      9.6±0.15µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.00    190.4±0.53µs        ? ?/sec    1.00    191.2±0.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.9±0.23µs        ? ?/sec    1.01     15.0±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    184.4±0.63µs        ? ?/sec    1.00    184.4±0.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    344.5±1.83µs        ? ?/sec    1.00    344.9±0.74µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    189.7±0.47µs        ? ?/sec    1.00    190.5±0.76µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     13.6±0.10µs        ? ?/sec    1.05     14.3±0.11µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.00    259.4±2.30µs        ? ?/sec    1.00    260.5±1.67µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.1±0.40µs        ? ?/sec    1.01     20.2±0.52µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    366.1±0.59µs        ? ?/sec    1.00    367.6±3.68µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    382.1±1.53µs        ? ?/sec    1.01    385.6±3.68µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    373.6±0.77µs        ? ?/sec    1.00    373.2±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.07     27.2±0.47µs        ? ?/sec    1.00     25.5±0.66µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.02    217.8±0.79µs        ? ?/sec    1.00    214.5±2.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.04     34.0±0.27µs        ? ?/sec    1.00     32.6±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    122.6±0.31µs        ? ?/sec    1.02    125.0±0.30µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    124.0±0.39µs        ? ?/sec    1.11    137.7±0.50µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    126.0±0.33µs        ? ?/sec    1.02    128.2±0.23µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    178.0±0.55µs        ? ?/sec    1.02    180.9±0.64µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    207.9±0.56µs        ? ?/sec    1.12    233.0±1.02µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    183.6±0.52µs        ? ?/sec    1.01    186.3±0.56µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.03     77.0±0.29µs        ? ?/sec    1.00     75.0±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    154.3±1.18µs        ? ?/sec    1.15    178.0±0.44µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.02     82.3±0.23µs        ? ?/sec    1.00     80.7±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.02    136.9±1.20µs        ? ?/sec    1.00    134.8±0.55µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    186.8±1.66µs        ? ?/sec    1.13    210.5±0.96µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    140.8±1.13µs        ? ?/sec    1.00    140.5±0.92µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.0±0.30µs        ? ?/sec    1.00     74.1±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    153.2±0.60µs        ? ?/sec    1.15    176.1±0.49µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.3±0.41µs        ? ?/sec    1.00     78.4±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.02    114.4±0.15µs        ? ?/sec    1.00    112.2±0.57µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.15    135.3±0.44µs        ? ?/sec    1.00    117.7±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.04    118.1±0.22µs        ? ?/sec    1.00    113.5±0.73µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.04    174.3±0.38µs        ? ?/sec    1.00    167.0±0.84µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.16    241.1±1.50µs        ? ?/sec    1.00    207.1±0.62µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.05    180.4±0.62µs        ? ?/sec    1.00    172.0±0.55µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    201.8±0.67µs        ? ?/sec    1.00    200.9±0.59µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.13    251.1±0.67µs        ? ?/sec    1.00    221.5±0.55µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.3±0.31µs        ? ?/sec    1.00    207.5±1.20µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    142.1±0.26µs        ? ?/sec    1.00    142.2±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.15    218.6±1.54µs        ? ?/sec    1.00    190.1±1.75µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    148.5±0.36µs        ? ?/sec    1.00    147.8±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.04    105.3±1.09µs        ? ?/sec    1.00    101.2±0.90µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.16    199.8±1.03µs        ? ?/sec    1.00    171.9±1.58µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    110.1±1.57µs        ? ?/sec    1.02    112.0±1.56µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00     99.1±0.28µs        ? ?/sec    1.01    100.2±0.43µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    102.8±0.29µs        ? ?/sec    1.14    116.7±0.38µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00    102.3±0.25µs        ? ?/sec    1.01    103.2±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    134.4±1.29µs        ? ?/sec    1.02    137.4±2.52µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    165.0±0.51µs        ? ?/sec    1.17    193.6±0.42µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    138.7±0.26µs        ? ?/sec    1.03    142.6±2.29µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.02     44.5±0.51µs        ? ?/sec    1.00     43.6±0.15µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    118.1±0.20µs        ? ?/sec    1.23    144.7±2.73µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.02     49.3±0.11µs        ? ?/sec    1.00     48.5±0.09µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.01    102.2±0.44µs        ? ?/sec    1.00    101.1±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    151.4±0.51µs        ? ?/sec    1.17    177.3±0.53µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.01    107.8±0.84µs        ? ?/sec    1.00    106.9±0.22µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.01     38.2±0.11µs        ? ?/sec    1.00     37.9±0.19µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    114.5±0.45µs        ? ?/sec    1.24    142.4±0.96µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     43.8±0.24µs        ? ?/sec    1.00     43.4±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     95.6±0.17µs        ? ?/sec    1.03     98.5±0.56µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00     95.7±0.27µs        ? ?/sec    1.16    111.1±0.37µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     98.5±0.28µs        ? ?/sec    1.03    101.5±0.38µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    125.0±0.42µs        ? ?/sec    1.03    129.1±2.04µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    151.3±0.32µs        ? ?/sec    1.20    181.2±1.45µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    129.8±0.31µs        ? ?/sec    1.03    133.2±0.51µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.9±0.27µs        ? ?/sec    1.00     24.8±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00     99.9±4.17µs        ? ?/sec    1.27    127.0±0.60µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     31.1±0.34µs        ? ?/sec    1.01     31.4±0.17µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     83.1±0.18µs        ? ?/sec    1.01     83.7±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    132.6±0.29µs        ? ?/sec    1.20    158.8±2.66µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.01     87.7±0.19µs        ? ?/sec    1.00     87.2±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     17.7±0.30µs        ? ?/sec    1.00     17.7±0.46µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00     93.0±0.47µs        ? ?/sec    1.33    124.1±1.59µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     25.3±0.34µs        ? ?/sec    1.02     25.9±0.63µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.06     88.2±0.44µs        ? ?/sec    1.00     83.5±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.17    107.1±0.75µs        ? ?/sec    1.00     91.2±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.05     91.1±0.36µs        ? ?/sec    1.00     86.5±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.07    121.1±0.63µs        ? ?/sec    1.00    113.6±0.89µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.20    177.2±0.79µs        ? ?/sec    1.00    147.4±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.08    126.5±1.27µs        ? ?/sec    1.00    117.6±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    150.9±0.52µs        ? ?/sec    1.00    148.8±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.16    195.3±0.55µs        ? ?/sec    1.00    168.2±0.88µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.02    157.0±0.32µs        ? ?/sec    1.00    154.7±0.65µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     91.1±0.68µs        ? ?/sec    1.00     91.0±0.50µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.24    170.1±2.27µs        ? ?/sec    1.00    137.2±1.24µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     96.3±0.51µs        ? ?/sec    1.00     96.4±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     40.7±0.74µs        ? ?/sec    1.06     43.1±0.60µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.23    137.7±0.77µs        ? ?/sec    1.00    111.8±0.37µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.09     53.4±1.64µs        ? ?/sec    1.00     48.9±0.57µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     98.0±1.82µs        ? ?/sec    1.03    101.0±0.28µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00     99.0±0.21µs        ? ?/sec    1.16    114.8±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     99.8±0.31µs        ? ?/sec    1.04    103.8±0.34µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.00    130.5±1.21µs        ? ?/sec    1.04    135.5±4.94µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    159.0±1.79µs        ? ?/sec    1.18    188.1±1.19µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    134.4±1.11µs        ? ?/sec    1.04    140.1±0.74µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     35.1±0.14µs        ? ?/sec    1.03     36.2±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.00    110.7±1.23µs        ? ?/sec    1.23    136.2±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     40.1±0.20µs        ? ?/sec    1.04     41.5±0.17µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     94.0±0.22µs        ? ?/sec    1.00     94.1±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    141.7±0.34µs        ? ?/sec    1.20    170.1±0.55µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00     99.9±0.27µs        ? ?/sec    1.00     99.9±0.74µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.4±0.07µs        ? ?/sec    1.01     30.6±0.14µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    108.0±0.46µs        ? ?/sec    1.25    134.5±0.37µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.01     36.1±0.19µs        ? ?/sec    1.00     35.7±0.12µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      6.9±0.04ms        ? ?/sec    1.05      7.3±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     12.7±0.12ms        ? ?/sec    1.07     13.6±0.27ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.02    495.7±3.90µs        ? ?/sec    1.00    484.0±4.05µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    650.8±2.22µs        ? ?/sec    1.00    652.2±2.22µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.00    498.5±2.99µs        ? ?/sec    1.00    499.5±4.25µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.01    683.3±2.55µs        ? ?/sec    1.00    679.4±3.24µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.00    769.9±2.80µs        ? ?/sec    1.00    766.1±3.16µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.01    692.6±2.27µs        ? ?/sec    1.00    688.4±3.07µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.00    295.2±0.63µs        ? ?/sec    1.01    297.4±6.45µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.06    376.4±1.07µs        ? ?/sec    1.00    356.2±1.10µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.00    300.9±1.14µs        ? ?/sec    1.01    302.7±1.72µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    251.3±2.55µs        ? ?/sec    1.09    273.7±4.12µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.12    263.4±3.30µs        ? ?/sec    1.00    235.9±0.76µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    266.5±3.59µs        ? ?/sec    1.05    278.6±2.87µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.00    454.7±1.99µs        ? ?/sec    1.10    499.3±1.46µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.03    362.9±1.34µs        ? ?/sec    1.00    351.9±1.39µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.00    462.5±2.13µs        ? ?/sec    1.11    511.8±9.48µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    107.4±0.55µs        ? ?/sec    1.04    111.5±0.50µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    106.5±0.19µs        ? ?/sec    1.15    122.5±0.17µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    110.1±0.21µs        ? ?/sec    1.04    114.4±0.30µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    146.2±0.27µs        ? ?/sec    1.02    148.8±0.50µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    171.9±0.29µs        ? ?/sec    1.18    202.0±0.77µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    151.0±0.67µs        ? ?/sec    1.02    154.0±0.39µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     44.1±0.29µs        ? ?/sec    1.00     44.2±0.40µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.00    118.3±0.29µs        ? ?/sec    1.22    144.7±1.20µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     49.2±0.19µs        ? ?/sec    1.00     49.1±0.12µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    102.1±0.22µs        ? ?/sec    1.00    102.1±0.25µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    151.3±0.52µs        ? ?/sec    1.17    177.1±1.83µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    107.5±1.98µs        ? ?/sec    1.00    107.9±0.38µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.00     38.1±0.10µs        ? ?/sec    1.00     38.1±0.14µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.00    115.6±0.19µs        ? ?/sec    1.23    141.6±0.49µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.8±0.09µs        ? ?/sec    1.00     43.8±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     96.8±0.17µs        ? ?/sec    1.03     99.8±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00     97.1±0.32µs        ? ?/sec    1.16    112.6±0.39µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00    100.7±0.22µs        ? ?/sec    1.02    102.8±0.34µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.00    126.3±0.51µs        ? ?/sec    1.03    129.7±0.36µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    153.4±0.45µs        ? ?/sec    1.19    182.1±2.12µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.00    131.6±0.27µs        ? ?/sec    1.03    135.4±2.18µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     26.1±0.18µs        ? ?/sec    1.00     26.2±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.00     99.7±0.75µs        ? ?/sec    1.26    125.5±1.00µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     31.0±0.27µs        ? ?/sec    1.01     31.3±0.26µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.01     85.3±0.50µs        ? ?/sec    1.00     84.4±0.26µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    133.2±0.46µs        ? ?/sec    1.20    159.4±1.02µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.01     90.3±0.39µs        ? ?/sec    1.00     89.6±0.29µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.02     21.0±0.33µs        ? ?/sec    1.00     20.6±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00     97.5±0.36µs        ? ?/sec    1.27    123.5±0.28µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.01     26.1±0.34µs        ? ?/sec    1.00     25.9±0.39µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.05     87.9±0.34µs        ? ?/sec    1.00     83.5±0.34µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.18    107.4±0.33µs        ? ?/sec    1.00     91.1±0.29µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.06     91.2±0.23µs        ? ?/sec    1.00     86.1±0.26µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.08    122.4±1.06µs        ? ?/sec    1.00    113.6±0.58µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.21    177.3±0.81µs        ? ?/sec    1.00    146.9±0.41µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.08    125.9±1.86µs        ? ?/sec    1.00    117.1±0.61µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.00    149.7±1.39µs        ? ?/sec    1.01    151.4±0.83µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.16    196.2±0.57µs        ? ?/sec    1.00    169.6±0.51µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.00    155.6±0.33µs        ? ?/sec    1.01    156.9±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     90.1±0.46µs        ? ?/sec    1.01     90.7±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.12    162.4±0.40µs        ? ?/sec    1.00    144.4±0.97µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     96.1±0.59µs        ? ?/sec    1.00     96.0±1.58µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.07     47.4±1.86µs        ? ?/sec    1.00     44.1±0.83µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.22    137.2±0.41µs        ? ?/sec    1.00    112.9±0.44µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.09     54.3±2.11µs        ? ?/sec    1.00     50.0±0.80µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.00    101.4±1.04µs        ? ?/sec    1.03    104.7±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    100.9±0.54µs        ? ?/sec    1.16    117.0±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.00    104.1±0.91µs        ? ?/sec    1.03    107.8±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.00    135.5±0.27µs        ? ?/sec    1.03    139.4±0.41µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    161.5±0.41µs        ? ?/sec    1.19    192.2±0.59µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.00    140.3±0.47µs        ? ?/sec    1.03    145.2±2.79µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     34.3±0.08µs        ? ?/sec    1.06     36.2±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.00    109.4±0.45µs        ? ?/sec    1.25    136.5±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     40.0±0.25µs        ? ?/sec    1.03     41.1±0.12µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     94.1±0.23µs        ? ?/sec    1.00     93.9±0.10µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    141.7±0.45µs        ? ?/sec    1.20    169.7±0.82µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00     99.7±0.39µs        ? ?/sec    1.00     99.9±0.17µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.01     30.4±0.12µs        ? ?/sec    1.00     30.2±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.00    108.1±0.64µs        ? ?/sec    1.24    134.2±0.31µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     35.5±0.09µs        ? ?/sec    1.01     35.8±0.15µs        ? ?/sec

alamb · 2025-10-31T14:58:45Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

alamb · 2025-10-31T15:23:48Z

🤖: Benchmark completed

Details

group                                main                                   reduce-copy
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.03      2.4±0.01ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.0±0.50ms        ? ?/sec    1.06     14.9±1.31ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.5±0.68ms        ? ?/sec    1.00     16.4±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     28.7±0.41ms        ? ?/sec    1.00     28.2±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.02     40.1±0.51ms        ? ?/sec    1.00     39.2±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.03     38.9±0.51ms        ? ?/sec    1.00     37.7±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.03      6.0±0.29ms        ? ?/sec    1.00      5.8±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.21    137.7±4.77ms        ? ?/sec    1.00    114.3±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.34   177.8±14.43ms        ? ?/sec    1.00    133.0±1.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.16    284.8±8.14ms        ? ?/sec    1.00    244.6±3.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13    434.5±3.66ms        ? ?/sec    1.00    384.3±6.56ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     45.0±0.94ms        ? ?/sec    1.00     44.8±0.90ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.05    106.0±0.91ms        ? ?/sec    1.00    100.6±1.01ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.05    106.8±0.91ms        ? ?/sec    1.00    101.3±1.25ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     55.9±0.53ms        ? ?/sec    1.00     55.8±0.80ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.02    125.7±1.09ms        ? ?/sec    1.00    123.5±2.49ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.03    100.7±0.83ms        ? ?/sec    1.00     97.6±2.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     38.2±0.42ms        ? ?/sec    1.00     37.6±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.05     50.4±0.91ms        ? ?/sec    1.00     47.9±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.02     47.4±0.87ms        ? ?/sec    1.00     46.7±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.03     37.5±0.45ms        ? ?/sec    1.00     36.4±0.94ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.02     14.1±0.23ms        ? ?/sec    1.00     13.8±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.6±0.19ms        ? ?/sec    1.07     10.3±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.2±0.20ms        ? ?/sec    1.05     11.8±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.12     39.4±0.49ms        ? ?/sec    1.00     35.2±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.12     50.6±0.67ms        ? ?/sec    1.00     45.2±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.09     48.2±0.56ms        ? ?/sec    1.00     44.1±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.3±0.07ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.18    179.3±1.67ms        ? ?/sec    1.00    151.9±0.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.19    238.7±2.29ms        ? ?/sec    1.00    200.1±1.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.27    505.9±5.78ms        ? ?/sec    1.00    398.9±8.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.12   439.9±15.02ms        ? ?/sec    1.00    392.5±6.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     52.2±1.09ms        ? ?/sec    1.03     53.9±0.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    156.2±1.55ms        ? ?/sec    1.02    159.9±3.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    154.8±1.61ms        ? ?/sec    1.00    154.4±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     52.7±0.52ms        ? ?/sec    1.01     53.5±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    158.1±2.49ms        ? ?/sec    1.02    160.6±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     91.6±1.33ms        ? ?/sec    1.03     94.3±0.98ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     30.2±0.26ms        ? ?/sec    1.01     30.4±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     35.5±0.77ms        ? ?/sec    1.01     36.0±0.58ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.5±0.79ms        ? ?/sec    1.03     46.7±1.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     34.3±0.61ms        ? ?/sec    1.03     35.5±1.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.9±0.21ms        ? ?/sec    1.01     13.1±0.28ms        ? ?/sec

alamb · 2025-10-31T15:23:52Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (55e9d82) to 2eabb59 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

alamb · 2025-10-31T15:37:34Z

🤖: Benchmark completed

Details

group                                                                                main                                   reduce-copy
-----                                                                                ----                                   -----------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1749.9±14.97µs        ? ?/sec    1.00  1747.5±20.09µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.10      2.1±0.03ms        ? ?/sec    1.00  1868.1±21.35µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00   1588.7±6.67µs        ? ?/sec    1.01  1610.6±16.96µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.07  1693.5±13.75µs        ? ?/sec    1.00  1577.6±19.79µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.01  1554.1±15.19µs        ? ?/sec    1.00  1544.3±23.80µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.10  1892.6±20.95µs        ? ?/sec    1.00  1727.9±25.35µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1379.2±10.84µs        ? ?/sec    1.00  1373.7±11.02µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.08  1487.1±13.62µs        ? ?/sec    1.00   1373.6±8.02µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00  1725.4±18.15µs        ? ?/sec    1.01  1749.4±16.22µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.10      2.1±0.02ms        ? ?/sec    1.00  1872.0±27.19µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00  1602.1±12.67µs        ? ?/sec    1.00  1596.1±13.29µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.07  1676.2±19.61µs        ? ?/sec    1.00  1563.0±15.28µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.01   958.0±11.64µs        ? ?/sec    1.00    949.0±6.65µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.10    992.4±7.11µs        ? ?/sec    1.00    899.2±4.48µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00   844.1±11.15µs        ? ?/sec    1.03    869.4±6.86µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.10    980.4±5.60µs        ? ?/sec    1.00    888.9±8.03µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.00      4.2±0.08ms        ? ?/sec    1.00      4.2±0.05ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.01      4.2±0.07ms        ? ?/sec    1.00      4.1±0.08ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.00      3.7±0.05ms        ? ?/sec    1.02      3.7±0.06ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.04      3.6±0.04ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.04      2.0±0.03ms        ? ?/sec    1.00  1959.6±19.24µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.07      2.2±0.03ms        ? ?/sec    1.00      2.1±0.03ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.02  1843.7±17.81µs        ? ?/sec    1.00  1812.4±24.43µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.08  1963.2±19.20µs        ? ?/sec    1.00  1820.3±20.10µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.02  1299.6±22.33µs        ? ?/sec    1.00  1276.4±13.94µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.10  1440.6±12.99µs        ? ?/sec    1.00  1312.7±14.66µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1162.7±11.46µs        ? ?/sec    1.01  1178.8±20.53µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.10  1295.1±16.59µs        ? ?/sec    1.00  1173.5±10.62µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.01      4.4±0.11ms        ? ?/sec    1.00      4.3±0.09ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.03      5.0±0.05ms        ? ?/sec    1.00      4.9±0.08ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      3.6±0.03ms        ? ?/sec    1.03      3.7±0.05ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.02      3.5±0.04ms        ? ?/sec    1.00      3.4±0.08ms        ? ?/sec

…into reduce-copy

XiangpengHao · 2025-10-31T16:18:37Z

Looks like we have some meaningful perf difference here:

arrow_reader_clickbench/async/Q20    1.21    137.7±4.77ms        ? ?/sec    1.00    114.3±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.34   177.8±14.43ms        ? ?/sec    1.00    133.0±1.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.16    284.8±8.14ms        ? ?/sec    1.00    244.6±3.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13    434.5±3.66ms        ? ?/sec    1.00    384.3±6.56ms        ? ?/sec

arrow_reader_clickbench/sync/Q20     1.18    179.3±1.67ms        ? ?/sec    1.00    151.9±0.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.19    238.7±2.29ms        ? ?/sec    1.00    200.1±1.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.27    505.9±5.78ms        ? ?/sec    1.00    398.9±8.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.12   439.9±15.02ms        ? ?/sec    1.00    392.5±6.07ms        ? ?/sec

But I'm not sure if this is because a reduced copy or mimalloc, should we run arrow_reader_clickbench again since I've removed mimalloc?

alamb · 2025-10-31T16:31:48Z

will rerun

alamb

Thank you again @XiangpengHao and @MikeWalrus

alamb · 2025-10-31T20:47:51Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (afa2d1c) to 2eabb59 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

alamb · 2025-10-31T21:13:23Z

🤖: Benchmark completed

Details

group                                main                                   reduce-copy
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.00      2.4±0.03ms        ? ?/sec    1.01      2.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.02     14.6±0.37ms        ? ?/sec    1.00     14.3±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.3±0.35ms        ? ?/sec    1.00     16.3±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     29.2±0.29ms        ? ?/sec    1.00     28.7±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     40.2±0.28ms        ? ?/sec    1.00     40.0±2.51ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     37.9±0.22ms        ? ?/sec    1.00     37.9±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.09      6.3±0.20ms        ? ?/sec    1.00      5.7±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    119.1±0.65ms        ? ?/sec    1.12    133.1±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    139.5±0.64ms        ? ?/sec    1.17    163.0±1.41ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    289.7±8.89ms        ? ?/sec    1.08   313.2±23.16ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.03    434.4±4.53ms        ? ?/sec    1.00    420.3±2.78ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     44.7±0.52ms        ? ?/sec    1.00     44.5±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.02    104.7±0.52ms        ? ?/sec    1.00    102.8±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01    104.9±0.52ms        ? ?/sec    1.00    103.8±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     54.2±0.46ms        ? ?/sec    1.01     54.5±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.01    124.3±0.54ms        ? ?/sec    1.00    122.9±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01     99.9±0.94ms        ? ?/sec    1.00     98.9±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.04     38.4±0.28ms        ? ?/sec    1.00     37.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.04     50.6±0.52ms        ? ?/sec    1.00     48.5±0.58ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.09     49.4±0.36ms        ? ?/sec    1.00     45.2±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.10     38.6±0.38ms        ? ?/sec    1.00     35.2±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.07     14.2±0.17ms        ? ?/sec    1.00     13.4±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.4±0.08ms        ? ?/sec    1.07     10.1±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.0±0.08ms        ? ?/sec    1.07     11.7±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     38.8±0.32ms        ? ?/sec    1.05     40.6±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     49.6±0.30ms        ? ?/sec    1.04     51.6±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     47.7±0.30ms        ? ?/sec    1.04     49.8±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.03ms        ? ?/sec    1.04      4.5±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    176.9±0.83ms        ? ?/sec    1.05    185.4±1.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    240.0±1.92ms        ? ?/sec    1.04    249.0±2.83ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    484.3±2.10ms        ? ?/sec    1.07    516.7±6.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    425.8±9.53ms        ? ?/sec    1.08   459.7±14.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     50.5±0.57ms        ? ?/sec    1.08     54.7±1.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    153.7±0.79ms        ? ?/sec    1.06    162.3±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    153.1±1.70ms        ? ?/sec    1.03    158.0±0.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     51.8±0.82ms        ? ?/sec    1.03     53.2±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    153.7±0.91ms        ? ?/sec    1.07    164.7±1.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.3±0.52ms        ? ?/sec    1.05     94.7±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.8±0.17ms        ? ?/sec    1.03     30.7±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.6±0.50ms        ? ?/sec    1.06     36.5±0.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.8±0.25ms        ? ?/sec    1.01     46.1±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     34.4±0.33ms        ? ?/sec    1.02     35.2±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.9±0.09ms        ? ?/sec    1.02     13.2±0.20ms        ? ?/sec

alamb · 2025-10-31T21:13:27Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reduce-copy (afa2d1c) to 2eabb59 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=reduce-copy
Results will be posted here when complete

alamb · 2025-10-31T21:26:57Z

🤖: Benchmark completed

Details

group                                                                                main                                   reduce-copy
-----                                                                                ----                                   -----------
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.04  1768.8±23.93µs        ? ?/sec    1.00   1699.5±7.78µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.15      2.1±0.03ms        ? ?/sec    1.00  1839.7±13.25µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.01   1578.2±8.86µs        ? ?/sec    1.00   1568.5±6.41µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.09  1676.9±10.22µs        ? ?/sec    1.00   1542.5±6.74µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00   1517.6±4.89µs        ? ?/sec    1.01  1531.7±11.99µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.10  1886.6±12.57µs        ? ?/sec    1.00  1713.3±15.70µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1357.3±12.34µs        ? ?/sec    1.01  1368.7±13.58µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.08  1496.1±10.25µs        ? ?/sec    1.00  1382.7±17.43µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00  1724.4±10.68µs        ? ?/sec    1.00  1730.7±10.46µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.12      2.0±0.02ms        ? ?/sec    1.00  1815.8±10.90µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.02   1585.4±9.35µs        ? ?/sec    1.00   1557.7±6.13µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.07   1661.6±9.37µs        ? ?/sec    1.00   1545.9±6.34µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00    957.3±8.79µs        ? ?/sec    1.00    961.2±6.71µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.12  1006.2±15.31µs        ? ?/sec    1.00    900.8±4.84µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00    856.4±6.83µs        ? ?/sec    1.02    872.6±4.37µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.13    996.1±4.92µs        ? ?/sec    1.00    884.7±3.49µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.02      4.2±0.06ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.06      4.2±0.05ms        ? ?/sec    1.00      4.0±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.04      3.7±0.06ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.08      3.6±0.05ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.01      2.0±0.02ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.07      2.3±0.03ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00  1861.9±33.91µs        ? ?/sec    1.00  1867.7±18.73µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.04  1961.0±27.33µs        ? ?/sec    1.00  1883.6±18.13µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.02  1293.9±13.68µs        ? ?/sec    1.00  1263.4±13.21µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.10  1452.5±12.27µs        ? ?/sec    1.00   1317.2±8.40µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.02  1182.6±10.23µs        ? ?/sec    1.00  1158.9±12.49µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.12   1310.4±9.89µs        ? ?/sec    1.00   1170.9±6.46µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.00      4.2±0.02ms        ? ?/sec    1.04      4.4±0.07ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.00      4.9±0.02ms        ? ?/sec    1.00      4.9±0.05ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      3.6±0.01ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.04      3.5±0.07ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec

MikeWalrus · 2025-11-01T02:38:24Z

arrow_reader_clickbench/sync/Q20     1.00    176.9±0.83ms        ? ?/sec    1.05    185.4±1.24ms        ? ?/sec

Zero-copy means we need to hold the underlying buffer longer

The regression is indeed related to memory allocation (brk, page faults, etc.), but I'm not sure if it's due to the read buffers being held. Those buffers are small and freed quickly.

I traced that query and made a visualization. Interactive version. Script and raw traces.

@XiangpengHao @alamb Any thoughts?

XiangpengHao · 2025-11-01T05:11:48Z

I'm pretty sure this has to do with things not under our control, and I wouldn't call it a regression, but system benchmark artifact.

⬆️ Both binary are the same, as evidenced by the same hash. But their performance is very very different. and it seems to has something to do with the filename length...

===========

One more thing about page-faults:

slow one:

 5,290,025      page-faults                      #  323.566 K/sec

fast one:

33,683      page-faults                      #    2.473 K/sec

=====

I have disabled ASLR, so it must come from some weird things in the glibc allocator...

MikeWalrus · 2025-11-01T06:55:55Z

and it seems to has something to do with the filename length...

Fyi, that difference goes away if there's no target/criterion. (or without --bench)

So without that weird artifact, there's still something going on that makes this branch slower than main.

Also, the memory allocation trace I got was without --bench, i.e.

alamb · 2025-11-01T10:38:32Z

The regression is indeed related to memory allocation (brk, page faults, etc.), but I'm not sure if it's due to the read buffers being held. Those buffers are small and freed quickly.

So in my mind the most recent benchmark results show a performance improvement for this branch

In an ideal world, benchmark results would be 100% reproducible and free of noise. However, in the real world, especially on the "machine" I am using to benchmark (a VM) there are many sources of noise in the measurements:

Random other processes (apt-get update for example) deciding to do their work during a run
State of the kernel VM
Hardware thermal state
Other tenants running on the same hardware
etc

So while I applaud our efforts here to be scientific, I also think it has passed the level of scrutiny needed

From my perspective, the code after this PR is clearly doing less work, and shows improvements in the benchmarks (even if there is some noise), thus it is a net improvement over what is on main

alamb · 2025-11-01T10:41:56Z

I traced that query and made a visualization. Interactive version. Script and raw traces.

The memory usage over time is quite neat

alamb · 2025-11-01T10:42:33Z

I'm pretty sure this has to do with things not under our control, and I wouldn't call it a regression, but system benchmark artifact.

I agree 100% with this

alamb · 2025-11-01T10:42:49Z

Let's merge this one in and keep making the code better going forward

MikeWalrus · 2025-11-01T16:57:25Z

@alamb I agree. Thanks!

alamb · 2025-11-03T16:12:48Z

@alamb I agree. Thanks!

Thank you for the (great!) find

XiangpengHao added 2 commits October 30, 2025 13:19

mimalloc

6f1d112

reduce copy

be939ef

github-actions bot added the parquet Changes to the parquet crate label Oct 30, 2025

Merge branch 'main' into reduce-copy

55e9d82

alamb reviewed Oct 31, 2025

View reviewed changes

alamb added the performance label Oct 31, 2025

alamb changed the title ~~Reduce one copy in SerializedPageReader~~ [Parquet] Reduce one copy in SerializedPageReader Oct 31, 2025

alamb mentioned this pull request Oct 31, 2025

POC: Avoid a copy for uncompressed pages #8756

Closed

XiangpengHao added 2 commits October 31, 2025 11:13

remove mimalloc

1cb0cdc

Merge branch 'reduce-copy' of https://github.com/XiangpengHao/arrow-rs …

afa2d1c

…into reduce-copy

alamb approved these changes Oct 31, 2025

View reviewed changes

alamb merged commit 3f3feed into apache:main Nov 1, 2025
16 checks passed

	impl ChunkReader for ColumnChunkData {
	type T = bytes::buf::Reader<Bytes>;

	fn get_read(&self, start: u64) -> crate::errors::Result<Self::T> {
	Ok(self.get(start)?.reader())
	}

	fn get_bytes(&self, start: u64, length: usize) -> crate::errors::Result<Bytes> {
	Ok(self.get(start)?.slice(..length))
	}
	}

[Parquet] Reduce one copy in SerializedPageReader #8745

[Parquet] Reduce one copy in SerializedPageReader #8745

Conversation

XiangpengHao commented Oct 30, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

XiangpengHao Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

XiangpengHao Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

XiangpengHao commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

alamb commented Oct 31, 2025

Uh oh!

MikeWalrus commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiangpengHao commented Nov 1, 2025

Uh oh!

MikeWalrus commented Nov 1, 2025

Uh oh!

alamb commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

alamb commented Nov 1, 2025

Uh oh!

Uh oh!

MikeWalrus commented Nov 1, 2025

Uh oh!

alamb commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

[Parquet] Reduce one copy in `SerializedPageReader` #8745

[Parquet] Reduce one copy in `SerializedPageReader` #8745

MikeWalrus commented Nov 1, 2025 •

edited

Loading

alamb commented Nov 1, 2025 •

edited

Loading