Skip to content

arch/arm: New arch_switch() based context layer for Cortex M#85248

Merged
henrikbrixandersen merged 35 commits intozephyrproject-rtos:mainfrom
andyross:arm-m-switch
Mar 10, 2026
Merged

arch/arm: New arch_switch() based context layer for Cortex M#85248
henrikbrixandersen merged 35 commits intozephyrproject-rtos:mainfrom
andyross:arm-m-switch

Conversation

@andyross
Copy link
Copy Markdown
Contributor

@andyross andyross commented Feb 5, 2025

  1. Mostly complete. Supports MPU, userspace, PSPLIM-based stack guards, and FPU/DSP features. ARMv8-M secure mode "should" work but I don't know how to test it.

  2. Designed with an eye to uncompromising/best-in-industry cooperative context switch performance. No PendSV exception nor hardware stacking/unstacking, just a traditional "musical chairs" switch. Context gets saved on process stacks only instead of split between there and the thread struct. No branches in the core integer switch code (and just one in the FPU bits that can't be avoided).

  3. Minimal assembly use; arch_switch() itself is ALWAYS_INLINE, there is an assembly stub for exception exit, and that's it beyond one/two instruction inlines elsewhere.

  4. Selectable at build time, interoperable with existing code. Just use the pre-existing CONFIG_USE_SWITCH=y flag to enable it. Or turn it off to evade regressions as this stabilizes.

  5. Exception/interrupt returns in the common case need only a single C function to be called at the tail, and then return naturally. Effectively "all interrupts are direct now". This isn't a benefit currently because the existing stubs haven't been removed (see # 4), but in the long term we can look at exploiting this. The boilerplate previously required is now (mostly) empty.

  6. No support for ARMv6 (Cortex M0 et. al.) thumb code. The expanded instruction encodings in ARMv7 are a big (big) win, so the older cores really need a separate port to avoid impacting newer hardware. Thankfully there isn't that much code to port (see # 3), so this should be doable.

Fixes #79069

@andyross
Copy link
Copy Markdown
Contributor Author

andyross commented Feb 5, 2025

This is finally looking good enough to submit, let's see how it runs in CI. First, it's important to note that @ithinuel has an entirely different arch_switch() implementation in #85080 that everyone should review too. That one is a relatively straight-line evolution of the current PendSV implementation. This one is (as I'm sure surprises no one) more of a rewrite, using a "normal" context switch. Really I don't see any reason why both shouldn't be able to merge: this will likely take some time to stabilize and we'd want to be maintaining the old stuff in parallel anyway.

The big advantages to this one over that one:

  1. Smaller. I worked really hard to limit code size for performance reasons. And there's more fruit to pick: the thread struct can lose all the still-present slots for the callee-saved registers that now live on the stack, and lots of the legacy fault handlers have boilerplate that now duplicates the exit code that runs out of a regular C handler.

  2. Bigger, heh. Well, more complete. This works with the PSPLIM stack guard feature (which btw: we have very poor test coverage of!) FPU hardware (which was have almost no coverage of, there's only one in-tree qemu FPU platform and it doesn't run in CI). And as I understand the architecture secure mode should ("should") work too, but I don't have a system to test with.

  3. It's actually kinda scary fast, which is what I was hoping to see. The microbenchmark at the end of the series is showing about 60% improvement in z_swap() on my FRDM-K64F vs. the current tree (just z_swap though, not all the other stuff!). It's tuned heavily for the common case of cooperative switching, using a custom entirely-on-process-stack frame format for suspended threads and not the one the hardware emits (there's a conversion step when threads switch on interrupt/exception exit).

  4. Legacy-free. No more ARCH_HAS_CUSTOM_SWAP_TO_MAIN or ARCH_HAS_THREAD_ABORT (and especially no more SWAP_NONATOMIC!), nor a custom arch_thread_return_value_set(). ARM Cortex M as of this patch looks like a "standard" Zephyr platform without any magic.

  5. Minimal impact on existing code. The new context layer is in two new files with only ~130 lines of changes to existing code.

andyross and others added 11 commits March 3, 2026 10:59
Some toolchains don't support an __asm__(...) block at the top level
of a file and require that they live within function scope.  That's
not a hardship as these two blocks were defining callable functions
anyway.  Exploit the "naked" attribute to avoid wasted bytes in unused
entry/exit code.

Signed-off-by: Andy Ross <andyross@google.com>
I'm at a loss here.  The feature this test case wants to see (on
ARMv7M, not ARMv6M) is the ability to take a irq_lock() inside a
system call and then see that future system calls from the same thread
continue to hold the lock.

That's not documented AFAICT.  It's also just a terrible idea because
either:

1. The obvious denial of service implications if user code is allowed
   to run in an unpreemptible mode, or:

2. The broken locking promise if this is implemented to release the
   lock and reacquire it in an attempt to avoid #1.

(FWIW: my read of the code is that #1 is the current implementation.
But hilariously the test isn't able to tell the difference!)

And in any case it's not how any of our other platforms work (or can
work, in some cases), making this a non-portable system call
API/feature at best.

Leave it in place for now out of conservatism, but disable with the
new arch_switch() code, whose behavior matches that of other Zephyr
userspaces.

Signed-off-by: Andy Ross <andyross@google.com>
The exit from the SVC exception used for syscalls back into the
calling thread is done without locking.  This means that the
intermediate states can be interrupted while the kernel-mode code is
still managing thread state like the mode bit, leading to mismatches.
This seems mostly robust when used with PendSV (though I'm a little
dubious), but the new arch_switch() code needs to be able to suspend
such an interrupted thread and restore it without going through a full
interrupt entry/exit again, so it needs locking for sure.

Take the lock unconditionally before exiting the call, and release it
in the thread once the magic is finished, just before calling the
handler.  Then take it again before swapping stacks and dropping
privilege.

Even then there is a one-cycle race where the interrupted thread has
dropped the lock but still has privilege (the nPRIV bit is clear in
CONTROL).  This thread will be resumed later WITHOUT privilege, which
means that trying to set CONTROL will fail.  So there's detection of
this 1-instruction race that will skip over it.

Signed-off-by: Andy Ross <andyross@google.com>
The ARM Ltd. FVP emulator (at least the variants run in Zephyr CI)
appears to have a bug with the stack alignment bit in xPSR.  It's
common (it fails in the first 4-6 timer interrupts in
tests.syscalls.timeslicing) that we'll take an interrupt from a
seemingly aligned (!) stack with the bit set.  If we then switch and
resume the thread from a different context later, popping the stack
goes wrong (more so than just a misalignment of four bytes: I usually
see it too low by 20 bytes) in a way that it doesn't if we return
synchronously.  Presumably legacy PendSV didn't see this because it
used the unmodified exception frame.

Work around this by simply assuming all interrupted stacks were
aligned and clearing the bit.  That is NOT correct in the general
case, but in practice it's enough to get tests to pass.

Signed-off-by: Andy Ross <andyross@google.com>
…tion

A nested exception can occur in arm_m_exc_exit after interrupts are
re-enabled but before branching to the EXC_RETURN value. In that case,
the nested exception stacks an exception stack frame (ESF) on MSP.

arm_m_exc_tail() unconditionally rewrites the LR slot on the active
stack to redirect execution to arm_m_exc_exit. If a nested exception
has stacked an ESF on MSP, this rewrite corrupts the stacked xPSR
field, leading to a UsageFault ("Illegal use of EPSR") on exception
return.

Guard the LR rewrite so that it is only performed when the exception
is returning to Thread mode using PSP. This ensures that the rewrite
does not interfere with ESFs stacked on MSP during nested exceptions.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
USE_SWITCH code unconditionally applied interrupt locking, which altered
BASEPRI handling and broke expected interrupt behavior on both
Baseline and Mainline CPUs when USE_SWITCH was disabled.

This commit restores the original behavior with USE_SWITCH disabled and
fixes tests/arch/arm/arm_interrupt failures.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
The z_arm_pendsv vector doesn't exist on USE_SWITCH=y builds

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Change the MPU alignment to fix below ci failure for
nrf52840dk/nrf52840:

```
padding_section' will not fit in region `RAM'
arm-zephyr-eabi/bin/ld.bfd: region `RAM' overflowed by 54928 bytes
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
```

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Documented Cortex-M switch helper interfaces in arm-m-switch.h with
Doxygen blocks.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
orr fix is as reported in review:
```
The add causes a crash with IAR tools as the address loaded to r8
already has the lowest bit set, and the add causes it to be set to ARM
mode. The orr instruction works fine with both scenarios
```

`UDF 0` seems to break on IAR but `UDF #0` works for all.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Fix below issues when trying to build hello world with armclang:
```
Error: L6218E: Undefined symbol z_arm_exc_exit (referred from reset.o).
Error: L6218E: Undefined symbol z_arm_int_exit (referred from reset.o).
Error: L6218E: Undefined symbol z_arm_pendsv (referred from reset.o).
```

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
@teburd
Copy link
Copy Markdown
Contributor

teburd commented Mar 6, 2026

Posting some files here to hopefully show what is wrong

With PR
twister.json
twister.log

Without PR
twister.json
twister.log

@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 6, 2026

I see some additional faults on my boards with this PR when running kernel tests which is probably a good canary that something isn't still quite right somewhere.

Hi @teburd, could you share more details about the failures? I can try to reproduce it with qemu/fvp. BTW, non-secure boards are expected to fail.

@teburd
Copy link
Copy Markdown
Contributor

teburd commented Mar 6, 2026

I see some additional faults on my boards with this PR when running kernel tests which is probably a good canary that something isn't still quite right somewhere.

Hi @teburd, could you share more details about the failures? I can try to reproduce it with qemu/fvp. BTW, non-secure boards are expected to fail.

I don't know what a non-secure board is exactly, do you mean an armv8m device without trustzone? yes that's exactly what I see. A cortex-m55 without trustzone failing.

@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 8, 2026

I don't know what a non-secure board is exactly, do you mean an armv8m device without trustzone? yes that's exactly what I see. A cortex-m55 without trustzone failing.

In Zephyr terms non-secure board is the _ns variant of the board having CONFIG_TRUSTED_EXECUTION_NONSECURE set to y.

Could you provide more specific of which test failed and how to reproduce it in?
The upstream board mps3/corstone300/fvp also has Cortex-M55 and has passed all the existing tests in ci, we could try to reproduce the issue with this board if we know what failed.

@teburd
Copy link
Copy Markdown
Contributor

teburd commented Mar 8, 2026

I don't know what a non-secure board is exactly, do you mean an armv8m device without trustzone? yes that's exactly what I see. A cortex-m55 without trustzone failing.

In Zephyr terms non-secure board is the _ns variant of the board having CONFIG_TRUSTED_EXECUTION_NONSECURE set to y.

Could you provide more specific of which test failed and how to reproduce it in? The upstream board mps3/corstone300/fvp also has Cortex-M55 and has passed all the existing tests in ci, we could try to reproduce the issue with this board if we know what failed.

I linked the twister logs and reports above. There’s about 15 or so additional test failures with this PR. Some are failing on hardware exceptions.

USE_SWITCH is a new feature and needs more testing before enabling it by
default. While all tests in upstream Zephyr CI passed, keeping this
config disabled helps in getting majority of the work in without causing
regression on upstream boards that are not tested in ci.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 9, 2026

@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 9, 2026

Note for reviewers: CONFIG_USE_SWITCH is disabled by default to avoid any conflicts for the release and to get majority of the work merged. Please drop the last commit if you plan to test this feature locally.

Copy link
Copy Markdown
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for preserving history and making commits on top of the original ones. That makes reviewing much easier. Disabling CONFIG_SWITCH by default is a good idea as well which will allow people to easily test the feature while also ensuring there are no unintended side-effects.

I think it would make sense to try to get this feature in before 4.4.0.

@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 9, 2026

@teburd I found that there are test failures in both the logs (with and without the PR), not sure if that's expected but below failures are not in the "without_pr" logs so focusing only on them:

kernel.events on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)
kernel.cache.api on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)
kernel.tickless.concept on kit_psc3m5_evk/psc3m5fds2afq1 failed (Testsuite failed)
kernel.tickless.concept on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite failed)
kernel.tickless.concept on kit_pse84_eval/pse846gps2dbzc4a/m33 failed (Testsuite failed)
kernel.scheduler.slice_perthread on kit_pse84_eval/pse846gps2dbzc4a/m33 failed (Unknown Error)
kernel.common.timing on kit_psc3m5_evk/psc3m5fds2afq1 failed (Testsuite failed)
kernel.common.tls on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)

The twister logs don't clearly explain the cause of the failure (some are just Testsuite mismatch) and I couldn't find a crash associated with these tests.
I tried to reproduce the issue mps2/an521/cpu0 which is a QEMU board with Cortex-M33 and mps3/corstone300/fvp which is a ARM FVP simulation board with Cortex-M55 but was not able to reproduce the issue.
Would it be possible for you to run only the above tests individually with CONFIG_USE_SWITCH enabled and provide separate failure logs for each?

1 similar comment
@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 9, 2026

@teburd I found that there are test failures in both the logs (with and without the PR), not sure if that's expected but below failures are not in the "without_pr" logs so focusing only on them:

kernel.events on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)
kernel.cache.api on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)
kernel.tickless.concept on kit_psc3m5_evk/psc3m5fds2afq1 failed (Testsuite failed)
kernel.tickless.concept on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite failed)
kernel.tickless.concept on kit_pse84_eval/pse846gps2dbzc4a/m33 failed (Testsuite failed)
kernel.scheduler.slice_perthread on kit_pse84_eval/pse846gps2dbzc4a/m33 failed (Unknown Error)
kernel.common.timing on kit_psc3m5_evk/psc3m5fds2afq1 failed (Testsuite failed)
kernel.common.tls on kit_pse84_eval/pse846gps2dbzc4a/m55 failed (Testsuite mismatch)

The twister logs don't clearly explain the cause of the failure (some are just Testsuite mismatch) and I couldn't find a crash associated with these tests.
I tried to reproduce the issue mps2/an521/cpu0 which is a QEMU board with Cortex-M33 and mps3/corstone300/fvp which is a ARM FVP simulation board with Cortex-M55 but was not able to reproduce the issue.
Would it be possible for you to run only the above tests individually with CONFIG_USE_SWITCH enabled and provide separate failure logs for each?

@wearyzen
Copy link
Copy Markdown
Contributor

wearyzen commented Mar 9, 2026

I removed myself from the Assignee because I think one of the release maintainers would be the right assignee in this case.

@@ -0,0 +1,5 @@
CONFIG_ZTEST=y
CONFIG_MULTITHREADING=n
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross note this test fails to configure for targets whose default configured components depend on CONFIG_MULTITHREADING
We got at least 28 failures in main due to this https://github.com/zephyrproject-rtos/zephyr/actions/runs/23102089038
warning: EVENTS (defined at kernel/Kconfig:758) has direct dependencies MULTITHREADING with value n, but is currently being y-selected by the following symbols:
Can be reproduced for example with

mkdir build && cd build 
cmake -GNinja -DBOARD=nrf52840_mdk_usb_dongle/nrf52840 ../tests/arch/arm/arm_switch/

Maybe the test should be limited to some platforms with platform_allow or have tighter filtering?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @wearyzen

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a change to fix this : #105588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 4.4

Development

Successfully merging this pull request may close these issues.

Improve Arm (And other Arch) Context Swap Performance