Improve `PrimitiveArray::from_iter` perf #9294

Dandandan · 2026-01-28T21:08:21Z

Which issue does this PR close?

Closes #NNN.

Rationale for this change

Speeds up from_iter.

This speeds up creation for statistics if all values are present (common case):

Extract row group statistics for Int64/extract_statistics/Int64
                        time:   [392.26 ns 394.25 ns 397.06 ns]
                        change: [−44.865% −44.674% −44.456%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Extract data page statistics for Int64/extract_statistics/Int64
                        time:   [8.8307 µs 8.8472 µs 8.8641 µs]
                        change: [−22.701% −22.399% −22.099%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Extract row group statistics for UInt64/extract_statistics/UInt64
                        time:   [391.21 ns 393.46 ns 396.43 ns]
                        change: [−44.227% −43.085% −41.444%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Extract data page statistics for UInt64/extract_statistics/UInt64
                        time:   [7.9090 µs 8.0075 µs 8.1958 µs]
                        change: [−48.323% −46.584% −44.593%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Extract row group statistics for F64/extract_statistics/F64
                        time:   [395.12 ns 395.86 ns 396.64 ns]
                        change: [−58.982% −57.663% −56.236%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Extract data page statistics for F64/extract_statistics/F64
                        time:   [8.9134 µs 8.9925 µs 9.1393 µs]
                        change: [−29.078% −25.866% −22.853%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

What changes are included in this PR?

alamb

Looks like a good improvement to me.

arrow-array/src/array/primitive_array.rs

Dandandan · 2026-01-29T13:08:52Z

run benchmark arrow_statistics

alamb · 2026-01-29T14:11:56Z

run benchmark arrow_statistics

Sorry I think the VM runner got rebooted / wasn't working. I restarted it and now the queue is good

alamb · 2026-01-29T14:12:03Z

show benchmark queue

alamb-ghbot · 2026-01-29T14:12:04Z

🤖 Hi @alamb, you asked to view the benchmark queue (#9294 (comment)).

Job	User	Benchmarks	Comment
`20055_3815459655.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20055#issuecomment-3815459655`
`20055_3815478475.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20055#issuecomment-3815478475`
`arrow-9294-3817591918.sh`	Dandandan	arrow_statistics	`https://github.com/apache/arrow-rs/pull/9294#issuecomment-3817591918`

alamb-ghbot · 2026-01-29T15:14:39Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing from_iter_speed (2a4ab06) to bd76edd diff
BENCH_NAME=arrow_statistics
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_statistics
BENCH_FILTER=
BENCH_BRANCH_NAME=from_iter_speed
Results will be posted here when complete

alamb-ghbot · 2026-01-29T15:20:36Z

🤖: Benchmark completed

Details

group                                                                                                      from_iter_speed                        main
-----                                                                                                      ---------------                        ----
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.12     82.1±0.98µs        ? ?/sec    1.00     73.5±0.73µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     12.2±0.18µs        ? ?/sec    1.27     15.6±0.08µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     14.0±0.24µs        ? ?/sec    1.26     17.6±0.52µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.12     81.5±0.48µs        ? ?/sec    1.00     72.8±0.31µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     12.3±0.04µs        ? ?/sec    1.25     15.5±0.13µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00  1015.3±12.19ns        ? ?/sec    1.15  1167.6±15.76ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    535.3±4.27ns        ? ?/sec    1.81   967.6±10.35ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    537.1±7.38ns        ? ?/sec    1.80   966.8±24.59ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00  1006.9±32.83ns        ? ?/sec    1.12  1123.8±16.19ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00   535.2±10.28ns        ? ?/sec    1.81    969.5±8.96ns        ? ?/sec

alamb · 2026-01-29T16:35:49Z

run benchmark arrow_statistics

alamb-ghbot · 2026-01-29T17:17:45Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing from_iter_speed (2a4ab06) to bd76edd diff
BENCH_NAME=arrow_statistics
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_statistics
BENCH_FILTER=
BENCH_BRANCH_NAME=from_iter_speed
Results will be posted here when complete

alamb-ghbot · 2026-01-29T17:23:43Z

🤖: Benchmark completed

Details

group                                                                                                      from_iter_speed                        main
-----                                                                                                      ---------------                        ----
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00     70.8±0.41µs        ? ?/sec    1.03     73.1±3.77µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     11.6±0.15µs        ? ?/sec    1.33     15.4±0.30µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     13.2±0.17µs        ? ?/sec    1.35     17.7±0.16µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.00     70.5±0.40µs        ? ?/sec    1.02     72.2±0.60µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     11.8±0.41µs        ? ?/sec    1.34     15.8±1.05µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00   1054.8±8.28ns        ? ?/sec    1.14  1205.3±20.01ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    538.6±7.15ns        ? ?/sec    1.84   989.6±10.53ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    542.8±5.83ns        ? ?/sec    1.81   979.8±12.40ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00   1050.5±8.53ns        ? ?/sec    1.11  1163.6±17.81ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00    540.4±6.24ns        ? ?/sec    1.82   985.5±11.18ns        ? ?/sec

alamb · 2026-01-29T19:38:50Z

Those are some pretty sweet results

Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00     70.8±0.41µs        ? ?/sec    1.03     73.1±3.77µs        ? ?/sec
Extract data page statistics for F64/extract_statistics/F64                                                1.00     11.6±0.15µs        ? ?/sec    1.33     15.4±0.30µs        ? ?/sec
Extract data page statistics for Int64/extract_statistics/Int64                                            1.00     13.2±0.17µs        ? ?/sec    1.35     17.7±0.16µs        ? ?/sec
Extract data page statistics for String/extract_statistics/String                                          1.00     70.5±0.40µs        ? ?/sec    1.02     72.2±0.60µs        ? ?/sec
Extract data page statistics for UInt64/extract_statistics/UInt64                                          1.00     11.8±0.41µs        ? ?/sec    1.34     15.8±1.05µs        ? ?/sec
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, String)    1.00   1054.8±8.28ns        ? ?/sec    1.14  1205.3±20.01ns        ? ?/sec
Extract row group statistics for F64/extract_statistics/F64                                                1.00    538.6±7.15ns        ? ?/sec    1.84   989.6±10.53ns        ? ?/sec
Extract row group statistics for Int64/extract_statistics/Int64                                            1.00    542.8±5.83ns        ? ?/sec    1.81   979.8±12.40ns        ? ?/sec
Extract row group statistics for String/extract_statistics/String                                          1.00   1050.5±8.53ns        ? ?/sec    1.11  1163.6±17.81ns        ? ?/sec
Extract row group statistics for UInt64/extract_statistics/UInt64                                          1.00    540.4±6.24ns        ? ?/sec    1.82   985.5±11.18ns        ? ?/sec

@Dandandan

…om Vec and `from_trusted_len_iter` (#9299) # Which issue does this PR close? - part of #9298 # Rationale for this change While reviewing #9294 from @Dandandan I noticed some other places where we can avoid making ArrayData and thus save some allocations (and `unsafe`) I don't expect this to make a huge performance difference, but every little allocation helps, and I think the change is justified simply from the perspective of avoiding some more `unsafe` # What changes are included in this PR? Construct primitive arrays directly # Are these changes tested? By existing CI # Are there any user-facing changes?

Improve from_iter

48bfd1f

github-actions bot added the arrow Changes to the arrow crate label Jan 28, 2026

Dandandan changed the title ~~Improve PrimitiveArray::from_iter~~ Improve PrimitiveArray::from_iter perf Jan 28, 2026

Improve from_iter

1f2bdf7

alamb approved these changes Jan 29, 2026

View reviewed changes

arrow-array/src/array/primitive_array.rs Outdated Show resolved Hide resolved

alamb mentioned this pull request Jan 29, 2026

Remove some unsafe and allocations when creating PrimitiveArrays from Vec and from_trusted_len_iter #9299

Merged

Avoid second allocation

2a4ab06

Dandandan merged commit 6c54276 into apache:main Jan 29, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `PrimitiveArray::from_iter` perf #9294

Improve `PrimitiveArray::from_iter` perf #9294

Uh oh!

Dandandan commented Jan 28, 2026 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

Dandandan commented Jan 29, 2026

Uh oh!

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve PrimitiveArray::from_iter perf #9294

Improve PrimitiveArray::from_iter perf #9294

Uh oh!

Conversation

Dandandan commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dandandan commented Jan 29, 2026

Uh oh!

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve `PrimitiveArray::from_iter` perf #9294

Improve `PrimitiveArray::from_iter` perf #9294

Dandandan commented Jan 28, 2026 •

edited

Loading