shared: attempt a faster date <-> rate die algorithm #449

BurntSushi · 2025-11-26T19:30:51Z

This comes from:
https://www.benjoffe.com/fast-date-64

More specifically, the code is here:
https://github.com/benjoffe/fast-date-benchmarks/blob/7fcf82b07d340ddbec866e15cfe333485439ab7f/algorithms/benjoffe_fast64.hpp

But the benchmarks don't show an improvement:

$ critcmp base x01
group                                                   base                                   x01
-----                                                   ----                                   ---
civil_datetime/to_timestamp_static/bundled/jiff         1.00     10.5±0.09ns        ? ?/sec    1.03     10.9±0.07ns        ? ?/sec
civil_datetime/to_timestamp_static/zoneinfo/jiff        1.00     10.4±0.09ns        ? ?/sec    1.03     10.8±0.07ns        ? ?/sec
timestamp/to_civil_datetime_offset_conversion/jiff      1.00      4.4±0.05ns        ? ?/sec    1.03      4.6±0.03ns        ? ?/sec

I ran the benchmarks like this:

cd bench

Before the change:

cargo bench --'(civil_datetime/to_timestamp_static|timestamp/to_civil_datetime_offset_conversion).*jiff' --save-baseline base

And then after the change:

cargo bench --'(civil_datetime/to_timestamp_static|timestamp/to_civil_datetime_offset_conversion).*jiff' --save-baseline x01

Then I used critcmp to compare them:

critcmp base x01

It's very possible I didn't port it correctly. I haven't scrutinized the
codegen. It's also possible that there is an improvement, but that it's
hard to write a benchmark using Jiff APIs to observe it.

(Note that I left out the ARM-specific bits. I'm testing this on x86-64.
I wanted to test there first before digging into the platform specific
optimizations.)

BurntSushi · 2025-11-26T19:31:15Z

cc @benjoffe - I hope you don't mind the ping. :-)

This comes from: https://www.benjoffe.com/fast-date-64 More specifically, the code is here: https://github.com/benjoffe/fast-date-benchmarks/blob/7fcf82b07d340ddbec866e15cfe333485439ab7f/algorithms/benjoffe_fast64.hpp But the benchmarks don't show an improvement: ``` $ critcmp base x01 group base x01 ----- ---- --- civil_datetime/to_timestamp_static/bundled/jiff 1.00 10.5±0.09ns ? ?/sec 1.03 10.9±0.07ns ? ?/sec civil_datetime/to_timestamp_static/zoneinfo/jiff 1.00 10.4±0.09ns ? ?/sec 1.03 10.8±0.07ns ? ?/sec timestamp/to_civil_datetime_offset_conversion/jiff 1.00 4.4±0.05ns ? ?/sec 1.03 4.6±0.03ns ? ?/sec ``` I ran the benchmarks like this: ``` cd bench ``` Before the change: ``` cargo bench --'(civil_datetime/to_timestamp_static|timestamp/to_civil_datetime_offset_conversion).*jiff' --save-baseline base ``` And then after the change: ``` cargo bench --'(civil_datetime/to_timestamp_static|timestamp/to_civil_datetime_offset_conversion).*jiff' --save-baseline x01 ``` Then I used [`critcmp`] to compare them: ``` critcmp base x01 ``` It's very possible I didn't port it correctly. I haven't scrutinized the codegen. It's also possible that there is an improvement, but that it's hard to write a benchmark using Jiff APIs to observe it. (Note that I left out the ARM-specific bits. I'm testing this on x86-64. I wanted to test there first before digging into the platform specific optimizations.) [`critcmp`]: https://github.com/BurntSushi/critcmp

benjoffe · 2025-11-26T23:55:40Z

Hi, thanks for trying out the algorithm!

I'm very keen to help out, it'll be great to see this used successfully in the wild.

I have to first note, I have never programmed in Rust before, however it's been high on my list to try out, and this is the perfect push for me.

At first glance, it looks right, and if it passes tests I assume it's been ported correctly.

perhaps rust is not pre-compiling constants such as 2048 * SCALE?
perhaps the 128-bit arithmetic does not compile directly to raw 64-bit register access without extra work?
perhaps the line let shift = if bump ... is compiling to an actual branch?

And finally, perhaps you are testing this on a x64 Mac? There is a note at the end of my blog post that x64 Mac does not surface the same speed gains in the benchmarks (it's still faster, but by less), which I presume is due to the power management features of the OS or chip activating when 64 bit math is run in a tight loop.

If it turns out that Rust is unable to compile the 128-bit arithmetic to direct register access, then it might be that the 32-bit friendly alternative is going to end up faster in this language*. It would be unfortunate to have to fallback to that, as it only achieves 3/4 of the speed gains over Neri-Schneider, but could be an interesting thing to test if you run out of ideas.

https://github.com/benjoffe/fast-date-benchmarks/blob/main/algorithms/benjoffe_fast32.hpp

Finally, I note that the inverse function can be 1-cycle faster if you only need to support 32-bit inputs but are only targeting speed on 64-bit machines - by using 64-bit internal arithmetic and then changing yrs * 365 + yrs / 4 to yrs * 1461 / 4.

I'll be able to have a deeper dive later, hopefully in the next week or two.

BurntSushi · 2025-11-27T02:43:45Z

I don't have hands on a keyboard, so excuse the short reply. But thank you for responding!

rustc uses llvm, so I would be very surprised if rudimentary constant folding or simple branch avoidance wasn't happening. The 128-bit handling I'm less sure about though. For your benchmarks, did you use gcc or clang? Either way, when I get hands on a keyboard, I'll post the codegen here (and show you how I got it). The previous Neri-Schneider implementation used similar techniques, and I had looked at its codegen and at least verified there was no branching or div instructions.

My benchmarks above were on a quiet Linux desktop with the CPU governor set to performance mode.

Thanks again for taking a look!

BurntSushi · 2025-11-27T02:45:25Z

Also, Jiff only requires 32 bit integers to represent rata die. Namely, Jiff only supports years -9999 to 9999.

benjoffe · 2025-11-27T02:53:55Z

The benchmark table at the bottom of the blog post specifies the compilers used. For Windows it was MSVC 19.44, and on Apple M4 Pro: Apple clang 17.0.0.

Since the date range is restricted to just +-9999 - you don't need to hoist to 64-bit to safely use the more compact, and faster code in the inverse: yrs * 1461 / 4

benjoffe · 2025-11-27T04:41:27Z

Upon further investigation, perhaps the issue is the usage of (A as u128) * (B as u128).
In the worst case, the compiler might possibly turn this into four u64 bit multiplications and stitch the results together.
There seems to be a feature: core::arch::x86_64::_mulx_u64 which might do this more directly.

BurntSushi force-pushed the ag/joffe branch from ecbd0d2 to 269aee4 Compare November 26, 2025 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

shared: attempt a faster date <-> rate die algorithm #449

shared: attempt a faster date <-> rate die algorithm #449

BurntSushi commented Nov 26, 2025

Uh oh!

BurntSushi commented Nov 26, 2025

Uh oh!

benjoffe commented Nov 26, 2025 •

edited

Loading

Uh oh!

BurntSushi commented Nov 27, 2025 •

edited

Loading

Uh oh!

BurntSushi commented Nov 27, 2025

Uh oh!

benjoffe commented Nov 27, 2025 •

edited

Loading

Uh oh!

benjoffe commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

shared: attempt a faster date <-> rate die algorithm #449

Are you sure you want to change the base?

shared: attempt a faster date <-> rate die algorithm #449

Conversation

BurntSushi commented Nov 26, 2025

Uh oh!

BurntSushi commented Nov 26, 2025

Uh oh!

benjoffe commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented Nov 27, 2025

Uh oh!

benjoffe commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjoffe commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benjoffe commented Nov 26, 2025 •

edited

Loading

BurntSushi commented Nov 27, 2025 •

edited

Loading

benjoffe commented Nov 27, 2025 •

edited

Loading