Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jan 28, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

Speeds up from_iter.

This speeds up creation for statistics if all values are present (common case):

Extract row group statistics for Int64/extract_statistics/Int64
                        time:   [392.26 ns 394.25 ns 397.06 ns]
                        change: [−44.865% −44.674% −44.456%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Extract data page statistics for Int64/extract_statistics/Int64
                        time:   [8.8307 µs 8.8472 µs 8.8641 µs]
                        change: [−22.701% −22.399% −22.099%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Extract row group statistics for UInt64/extract_statistics/UInt64
                        time:   [391.21 ns 393.46 ns 396.43 ns]
                        change: [−44.227% −43.085% −41.444%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Extract data page statistics for UInt64/extract_statistics/UInt64
                        time:   [7.9090 µs 8.0075 µs 8.1958 µs]
                        change: [−48.323% −46.584% −44.593%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Extract row group statistics for F64/extract_statistics/F64
                        time:   [395.12 ns 395.86 ns 396.64 ns]
                        change: [−58.982% −57.663% −56.236%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Extract data page statistics for F64/extract_statistics/F64
                        time:   [8.9134 µs 8.9925 µs 9.1393 µs]
                        change: [−29.078% −25.866% −22.853%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

What changes are included in this PR?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 28, 2026
@Dandandan Dandandan changed the title Improve PrimitiveArray::from_iter Improve PrimitiveArray::from_iter perf Jan 28, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good improvement to me.

@Dandandan
Copy link
Contributor Author

run benchmark arrow_statistics

@Dandandan Dandandan merged commit 6c54276 into apache:main Jan 29, 2026
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 29, 2026

run benchmark arrow_statistics

Sorry I think the VM runner got rebooted / wasn't working. I restarted it and now the queue is good

@alamb
Copy link
Contributor

alamb commented Jan 29, 2026

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @alamb, you asked to view the benchmark queue (#9294 (comment)).

Job User Benchmarks Comment
20055_3815459655.sh Dandandan default https://github.com/apache/datafusion/pull/20055#issuecomment-3815459655
20055_3815478475.sh Dandandan default https://github.com/apache/datafusion/pull/20055#issuecomment-3815478475
arrow-9294-3817591918.sh Dandandan arrow_statistics https://github.com/apache/arrow-rs/pull/9294#issuecomment-3817591918

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing from_iter_speed (2a4ab06) to bd76edd diff
BENCH_NAME=arrow_statistics
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_statistics
BENCH_FILTER=
BENCH_BRANCH_NAME=from_iter_speed
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                      from_iter_speed                        main
-----                                                                                                      ---------------                        ----
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.12     82.1±0.98µs        ? ?/sec    1.00     73.5±0.73µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     12.2±0.18µs        ? ?/sec    1.27     15.6±0.08µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     14.0±0.24µs        ? ?/sec    1.26     17.6±0.52µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.12     81.5±0.48µs        ? ?/sec    1.00     72.8±0.31µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     12.3±0.04µs        ? ?/sec    1.25     15.5±0.13µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00  1015.3±12.19ns        ? ?/sec    1.15  1167.6±15.76ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    535.3±4.27ns        ? ?/sec    1.81   967.6±10.35ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    537.1±7.38ns        ? ?/sec    1.80   966.8±24.59ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00  1006.9±32.83ns        ? ?/sec    1.12  1123.8±16.19ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00   535.2±10.28ns        ? ?/sec    1.81    969.5±8.96ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jan 29, 2026

run benchmark arrow_statistics

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing from_iter_speed (2a4ab06) to bd76edd diff
BENCH_NAME=arrow_statistics
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_statistics
BENCH_FILTER=
BENCH_BRANCH_NAME=from_iter_speed
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                      from_iter_speed                        main
-----                                                                                                      ---------------                        ----
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00     70.8±0.41µs        ? ?/sec    1.03     73.1±3.77µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     11.6±0.15µs        ? ?/sec    1.33     15.4±0.30µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     13.2±0.17µs        ? ?/sec    1.35     17.7±0.16µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.00     70.5±0.40µs        ? ?/sec    1.02     72.2±0.60µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     11.8±0.41µs        ? ?/sec    1.34     15.8±1.05µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00   1054.8±8.28ns        ? ?/sec    1.14  1205.3±20.01ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    538.6±7.15ns        ? ?/sec    1.84   989.6±10.53ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    542.8±5.83ns        ? ?/sec    1.81   979.8±12.40ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00   1050.5±8.53ns        ? ?/sec    1.11  1163.6±17.81ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00    540.4±6.24ns        ? ?/sec    1.82   985.5±11.18ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jan 29, 2026

Those are some pretty sweet results

Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00     70.8±0.41µs        ? ?/sec    1.03     73.1±3.77µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     11.6±0.15µs        ? ?/sec    1.33     15.4±0.30µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     13.2±0.17µs        ? ?/sec    1.35     17.7±0.16µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.00     70.5±0.40µs        ? ?/sec    1.02     72.2±0.60µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     11.8±0.41µs        ? ?/sec    1.34     15.8±1.05µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00   1054.8±8.28ns        ? ?/sec    1.14  1205.3±20.01ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    538.6±7.15ns        ? ?/sec    1.84   989.6±10.53ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    542.8±5.83ns        ? ?/sec    1.81   979.8±12.40ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00   1050.5±8.53ns        ? ?/sec    1.11  1163.6±17.81ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00    540.4±6.24ns        ? ?/sec    1.82   985.5±11.18ns        ? ?/sec

alamb added a commit that referenced this pull request Jan 30, 2026
…om Vec and `from_trusted_len_iter` (#9299)

# Which issue does this PR close?

- part of #9298


# Rationale for this change

While reviewing #9294 from
@Dandandan I noticed some other places where we can avoid making
ArrayData and thus save some allocations (and `unsafe`)

I don't expect this to make a huge performance difference, but every
little allocation helps, and I think the change is justified simply from
the perspective of avoiding some more `unsafe`


# What changes are included in this PR?

Construct primitive arrays directly

# Are these changes tested?

By existing CI
# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants