Skip to content

Fix cross-platform determinism on ARM64 NEON#1042

Closed
halx99 wants to merge 1 commit intoerincatto:mainfrom
halx99:patch-1
Closed

Fix cross-platform determinism on ARM64 NEON#1042
halx99 wants to merge 1 commit intoerincatto:mainfrom
halx99:patch-1

Conversation

@halx99
Copy link
Copy Markdown

@halx99 halx99 commented Mar 23, 2026

CHANGES

  1. Add windows arm64 ci
  2. Fix samples build fail on windows arm64
  3. Fixed DeterminismTest.CrossPlatformTest by disabling NEON FMA on native MSVC. This avoids codegen rounding differences between MSVC and other toolchains

Relative issues

@halx99 halx99 marked this pull request as draft March 23, 2026 17:27
@erincatto
Copy link
Copy Markdown
Owner

I will only support a platform that runs with GitHub actions and passes unit tests. Just building it is not enough.

@halx99 halx99 marked this pull request as ready for review March 25, 2026 04:29
@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 25, 2026

Finally, I fixed the DeterminismTest by disabling NEON intrinsics in the Windows ARM64 build, see ci check: https://github.com/halx99/box2d/actions/runs/23525079273/job/68476658553?pr=3

@halx99 halx99 marked this pull request as draft March 25, 2026 06:07
@halx99 halx99 changed the title Fix #1012 windows arm64 build Fix #1012 windows arm64 support Mar 25, 2026
@halx99 halx99 changed the title Fix #1012 windows arm64 support Fix windows arm64 support Mar 25, 2026
@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 25, 2026

Finally+1, don't use NEON fma instructions solve the DeterminismTest issue

@halx99 halx99 marked this pull request as ready for review March 25, 2026 06:33
@halx99 halx99 force-pushed the patch-1 branch 2 times, most recently from 4f47271 to 45682ec Compare March 25, 2026 07:52
@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 25, 2026

Why Linux ARM64 works with FMA instruction?

A: On linux arm64, the compiler translate vmlaq_f32 to FMUL + FADD, but on Windows Arm64, the msvc compiler translate to as-is aka FMLA

Verified by disassembling the same C source.

C source with NEON FMA instruction:

b2FloatW b2MulAddW( b2FloatW a, b2FloatW b, b2FloatW c )
{
	return vmlaq_f32( a, b, c );
}

Windows ARM64:

; Exported entry 249. b2MulAddW

EXPORT b2MulAddW
b2MulAddW
FMLA            V0.4S, V1.4S, V2.4S
RET
; End of function b2MulAddW

Linux ARM64:

EXPORT b2MulAddW
b2MulAddW
FMUL            V1.4S, V1.4S, V2.4S
FADD            V0.4S, V0.4S, V1.4S
RET
; End of function b2MulAddW

Android ARM64

EXPORT b2MulAddW
b2MulAddW
FMUL            V1.4S, V1.4S, V2.4S
FADD            V0.4S, V1.4S, V0.4S
RET
; End of function b2MulAddW

macOS ARM64

_b2MulAddW
FMUL            V1.4S, V1.4S, V2.4S
FADD            V0.4S, V1.4S, V0.4S
RET
; End of function _b2MulAddW

iOS ARM64

_b2MulAddW
FMUL            V1.4S, V1.4S, V2.4S
FADD            V0.4S, V1.4S, V0.4S
RET
; End of function _b2MulAddW

Tested Android ARM64, when remove -ffp-contact=off, then generated code is:

EXPORT b2MulAddW
b2MulAddW
FMLA            V0.4S, V2.4S, V1.4S
RET
; End of function b2MulAddW

@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 25, 2026

Finally:

Root Cause of CrossPlatformTest Failure

  • On GCC/Clang (Linux/Android/iOS/macOS), the compiler behavior can be controlled with
    -ffp-contract=off. This forces vmlaq_f32/vmlsq_f32 to be lowered into
    separate FMUL + FADD instructions, matching the execution order used on
    Intel SSE/AVX.

  • On MSVC (Windows ARM64), the compiler always emits a fused FMLA instruction
    for these intrinsics. Even with /fp:strict, /fp:precise, or /fp:contract-,
    there is no way to disable contraction.

Conclusion:
The determinism test fails because non‑MSVC compilers can disable FMA contraction
to align with Intel’s implementation, while MSVC cannot. This difference in
compiler behavior leads to cross‑platform result mismatches.

@halx99 halx99 changed the title Fix windows arm64 support Fix cross-platform determinism on ARM64 NEON Mar 25, 2026
@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 26, 2026

I have no further changes on this PR, it’s ready for review.

halx99 added a commit to axmolengine/axmol that referenced this pull request Mar 27, 2026
- Update box2d to 3.2-04b0e92 with PR erincatto/box2d#1042
- Remove Rigidbody2D::setTag
- Make CollisionFilter works with box2d
- Make change rigidbody2d betweens node works
- Add API: Rigidbody2D::setMass and Rigidbody2D::setAutoMass
- Make rigidbody transform stable when change between nodes
- Rigidbody2D: setRotationOffset when attach to world not necessary.
- Rename API: Rigidbody2D::setRotationEnable -> Rigidbody2D::setRotationEnabled
- Fix gravity units
- Disable commonBox rotation for Test: 'PhysicsSetGravityEnableTest'
- Fix TargetJoint2D not work when world gravity is 0, tests: Pump, Position/Rotation Test
- Fix Contact Test yellow leaving the screen
- Remove API: Collider2D::setContactMaskBits/getContactMaskBits
- Add API PhysicsWorld2D::rayCastClosest
- SImplify Physics 2d query callback
- Rename PhysicsRayCastInfo -> RayCastHit2D
- Rename RayCastHit2D::contact -> RayCastHit2D::point
- Remove start and end members of RayCastHit2D
- Re-struct 2d/3d physics folder
- Implement all box2d supported contact events
- Update luabindings and lua-tests
@erincatto
Copy link
Copy Markdown
Owner

It is very cool you got this working. Thanks for looking into the determinism problem. I'll change the Neon code regardless of other factors.

You added 3 CI jobs for a platform I'm not sure anyone uses with Box2D. Can you justify this?

What is the real world use case for Box2D supporting this platform?

For this platform I would only want to support one CI job and it would be for unit tests, not samples. Samples take longer to build and are lower priority than unit tests.

- Add windows arm64 ci
- Fix samples build fail on windows arm64
- Fixed DeterminismTest.CrossPlatformTest by disabling NEON FMA on native MSVC. This avoids codegen rounding differences between MSVC and other toolchains
@halx99
Copy link
Copy Markdown
Author

halx99 commented Mar 28, 2026

You added 3 CI jobs for a platform I'm not sure anyone uses with Box2D. Can you justify this?

done

What is the real world use case for Box2D supporting this platform?

For example: Microsoft Surface Pro

erincatto added a commit that referenced this pull request Mar 28, 2026
clean up
@erincatto erincatto mentioned this pull request Mar 28, 2026
erincatto added a commit that referenced this pull request Mar 28, 2026
Remove some unhelpful macros.
Removed Neon FMA usage.
#1033
Windows ARM64 support #1042
@erincatto
Copy link
Copy Markdown
Owner

Implemented in #1045. Unit tests are passing.

@erincatto erincatto closed this Mar 28, 2026
@halx99 halx99 deleted the patch-1 branch March 29, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants