Optimised the Decode::decode for [T; N]#299
Conversation
Robbepop
left a comment
There was a problem hiding this comment.
Thanks a lot for this optimization!
Some benchmarks would be really nice and a comparison between before and with this PR.
Will be cool to have some benchmark tests in I tried to build some heavy contracts in ink!: If you want, I can create a PR in ink! repository to check the output of |
I mean |
| let ref_typed: &[T; N] = unsafe { mem::transmute(&array) }; | ||
| let typed: [T; N] = unsafe { ptr::read(ref_typed) }; | ||
| Ok(typed) |
There was a problem hiding this comment.
| let ref_typed: &[T; N] = unsafe { mem::transmute(&array) }; | |
| let typed: [T; N] = unsafe { ptr::read(ref_typed) }; | |
| Ok(typed) | |
| Ok(unsafe { mem::transmute(array.into_inner()) }) |
Why can we not do this?
There was a problem hiding this comment.
Because we can't convert due to this issue rust-lang/rust#61956
There was a problem hiding this comment.
Ahh okay. Why don't we need to drop array? I assume this will then be done when dropping typed?
There was a problem hiding this comment.
array and typed use the same memory. We return typed from the functions, so the compiler will not call drop on that variable. But it will drop array. So to avoid dropping of shared memory we don't need to drop array=)
There was a problem hiding this comment.
Maybe you could add some comments explaining this to people looking at this code in the future.
There was a problem hiding this comment.
Maybe you could add some comments explaining this to people looking at this code in the future.
Sounds like a good idea :)
| macro_rules! decode { | ||
| ( u8 ) => {{ | ||
| let mut array: ManuallyDrop<[u8; N]> = ManuallyDrop::new([0; N]); | ||
| input.read(&mut array[..])?; |
There was a problem hiding this comment.
This is a memory leak if read returns an error.
There was a problem hiding this comment.
Good catch!=) @Robbepop I returned back the usage of forget
| ( i8 ) => {{ | ||
| let mut array: ManuallyDrop<[i8; N]> = ManuallyDrop::new([0; N]); | ||
| let bytes = unsafe { mem::transmute::<&mut [i8], &mut [u8]>(&mut array[..]) }; | ||
| input.read(bytes)?; |
|
I'd like to see some benchmarks for this approach before we merge this. Some code I hacked together that should not be cargo-culted into this PR: fn encode_decode_array<T: Codec, const N: usize>(c: &mut Criterion) {
c.bench_function(
&format!("array_encode_{}", type_name::<[T; N]>()),
|bencher| {
bencher.iter_batched_ref(
|| [0x00; N],
|array| {
let _ = black_box(array.encode());
},
BatchSize::SmallInput,
)
});
c.bench_function(
&format!("array_decode_{}", type_name::<[T; N]>()),
|bencher| {
let input = vec![0x00; N * core::mem::size_of::<T>()];
bencher.iter_batched_ref(
|| &input[..],
|input| {
let _ = black_box(<[T; N]>::decode(input));
},
BatchSize::SmallInput,
)
});
}
criterion_group!{
name = benches;
config = Criterion::default().warm_up_time(Duration::from_millis(500)).without_plots();
targets =
encode_decode_array::<u8, 8192>,
encode_decode_array::<u16, 8192>,
encode_decode_array::<u32, 8192>,
encode_decode_array::<u64, 8192>,
encode_decode_array::<i8, 8192>,
encode_decode_array::<i16, 8192>,
encode_decode_array::<i32, 8192>,
encode_decode_array::<i64, 8192>,
}
criterion_main!(benches); |
|
@xgreenx if you fix the merge conflicts, we can merge this. |
# Conflicts: # tests/max_encoded_len_ui/crate_str.stderr # tests/max_encoded_len_ui/incomplete_attr.stderr # tests/max_encoded_len_ui/missing_crate_specifier.stderr # tests/max_encoded_len_ui/not_encode.stderr
|
I off all my heavy programs. After, I run the bench for an old version of Gnuplot not found, using plotters backend
array_encode_[u8; 8192] time: [2.0362 us 2.0728 us 2.1075 us]
change: [-2.5540% +3.7079% +11.535%] (p = 0.29 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
array_decode_[u8; 8192] time: [4.9179 us 4.9317 us 4.9457 us]
change: [-89.768% -89.711% -89.652%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
array_encode_[u16; 8192]
time: [2.0247 us 2.0612 us 2.0911 us]
change: [-3.3047% +2.5467% +8.6157%] (p = 0.40 > 0.05)
No change in performance detected.
array_decode_[u16; 8192]
time: [6.3075 us 6.3515 us 6.4100 us]
change: [-90.291% -90.178% -90.071%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
array_encode_[u32; 8192]
time: [2.0151 us 2.0496 us 2.0765 us]
change: [-6.2318% -0.2958% +6.1413%] (p = 0.92 > 0.05)
No change in performance detected.
array_decode_[u32; 8192]
time: [9.4854 us 9.5448 us 9.6006 us]
change: [-86.855% -86.700% -86.573%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) low severe
13 (13.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
array_encode_[u64; 8192]
time: [2.0011 us 2.0363 us 2.0634 us]
change: [-7.4003% -1.2854% +4.8868%] (p = 0.70 > 0.05)
No change in performance detected.
array_decode_[u64; 8192]
time: [26.624 us 26.735 us 26.877 us]
change: [-61.303% -61.145% -60.962%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
array_encode_[u128; 8192]
time: [2.0079 us 2.0423 us 2.0688 us]
change: [-6.2668% -0.4866% +5.6703%] (p = 0.87 > 0.05)
No change in performance detected.
array_decode_[u128; 8192]
time: [55.277 us 55.359 us 55.448 us]
change: [-44.229% -43.987% -43.754%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
array_encode_[i8; 8192] time: [1.9977 us 2.0339 us 2.0622 us]
change: [-8.6504% -2.4910% +4.5759%] (p = 0.46 > 0.05)
No change in performance detected.
array_decode_[i8; 8192] time: [4.9660 us 4.9858 us 5.0075 us]
change: [-88.336% -88.153% -87.990%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
array_encode_[i16; 8192]
time: [1.9990 us 2.0354 us 2.0635 us]
change: [-6.6755% -0.7210% +5.7634%] (p = 0.82 > 0.05)
No change in performance detected.
array_decode_[i16; 8192]
time: [6.3097 us 6.4007 us 6.5550 us]
change: [-90.154% -90.079% -89.972%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
array_encode_[i32; 8192]
time: [2.0158 us 2.0499 us 2.0766 us]
change: [-5.8196% +0.3042% +7.0575%] (p = 0.93 > 0.05)
No change in performance detected.
array_decode_[i32; 8192]
time: [9.3346 us 9.4024 us 9.4697 us]
change: [-86.940% -86.847% -86.754%] (p = 0.00 < 0.05)
Performance has improved.
array_encode_[i64; 8192]
time: [2.0141 us 2.0470 us 2.0723 us]
change: [-7.0847% -1.0581% +5.5361%] (p = 0.74 > 0.05)
No change in performance detected.
array_decode_[i64; 8192]
time: [26.394 us 26.441 us 26.504 us]
change: [-62.515% -62.179% -61.891%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
array_encode_[i128; 8192]
time: [2.0118 us 2.0505 us 2.0828 us]
change: [-6.4327% -0.4668% +6.0002%] (p = 0.88 > 0.05)
No change in performance detected.
array_decode_[i128; 8192]
time: [55.349 us 55.422 us 55.492 us]
change: [-44.835% -44.371% -43.987%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severeMy system is ittle-endian=) |
|
Great! Thanks for the benchmark results!
And so is WebAssembly. ;) |
|
/tip medium |
|
We will fix the tip bot to work in this repo, but in the meantime @xgreenx can you please update your first message in this PR to include: So then we can send a tip to you when this PR is merged. |
|
/tip medium |
|
Please fix the following problems before calling the tip bot again:
|
|
/tip medium |
|
A medium tip was successfully submitted for xgreenx (1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe on polkadot). |
Added more comments. Implemented traits for `[T; N]` arrays. It has many usage of `unsafe`, but I did the same change in this PR: paritytech/parity-scale-codec#299
This change appears after this PR in ink!.
The initial
decodeis complex and takes a lot of space in a binary file.general_array_decodein this PR reduces the size of the function 2 times. But the native implementation for integers reduces it more(the same idea si used inencode_slice_no_len). It saves518bytes in theErc20example=)polkadot address: 1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe