Andrea Parri spotted an issue with the amocas operation with the current seq_cst amo<op> mappings.
The issue stems from this sentence in the ratified zacas extension:
The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.
https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0
The issue can be demonstrated using this litmus test:
C c-cmpxchg-fail
{ [z] = 1; }
P0(atomic_int *x, atomic_int *y, int *z)
{
atomic_store_explicit(x, 1, memory_order_seq_cst);
int r0 = atomic_compare_exchange_strong_explicit(y, z, 2, memory_order_seq_cst, memory_order_seq_cst);
}
P1(atomic_int *x, atomic_int *y)
{
atomic_store_explicit(y, 1, memory_order_seq_cst);
int r1 = atomic_load_explicit(x, memory_order_seq_cst);
}
exists (0:r0=0 /\ 1:r1=0)
The atomic_compare_exchange_strong_explicit fails (r0 == 0) meaning it read that y != z (value of 1) aka y = 0.
r1 == 0 ensures that P1 also reads 0 aka x = 0.
herd7 results:
Test c-cmpxchg-fail Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No
As shown by herd7, the (0:r0=0; 1:r1=0;) behavior is impossible with the c11 memory model.
This is a variant of another common memory-model testcase
This testcase is essentially a variant of:
seq_cst store a, 1 | seq_cst store b, 1
seq_cst load 0x7, b | seq_cst load 0x8, a
exists (0x7 = 0 /\ 0x8 = 0)
We can approximate this mapping into RISC-V asm using the memory model mappings.
Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store.
Following that will be amocas.aqrl, however since it fails it follows the text in the spec:
The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.
We'll model it using lw.aq - a hypothetical insn that is present in herd7.
RISCV rv-cmpxchg-fail
{
0:x1=x; 0:x2=1; 0:x3=y;
1:x1=y; 1:x2=1; 1:x3=x;
}
P0 | P1 ;
fence rw,w | sw x2,0(x1) ;
sw x2,0(x1) | fence rw,rw ;
lw.aq x4,0(x3) | lw x4,0(x3) ;
exists (0:x4=0 /\ 1:x4=0)
P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.
herd7 results:
Test rv-cmpxchg-fail Allowed
States 4
0:x4=0; 1:x4=0;
0:x4=0; 1:x4=1;
0:x4=1; 1:x4=0;
0:x4=1; 1:x4=1;
Ok
This shows the issue: 0:x4=0; 1:x4=0;. This is the same case that causes us to require lr.aqrl for atomic_(memory_order_seq_cst).
Thanks again to Andrea Parri for spotting this and making the litmus tests.
To resolve this for atomic_compare_exchange ops that have the failure argument set to memory_order_seq_cst we may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.
Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.
cc'ing a few atomics related people, please add anyone I missed:
@hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb
Andrea Parri spotted an issue with the
amocasoperation with the current seq_cstamo<op>mappings.The issue stems from this sentence in the ratified zacas extension:
https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0
The issue can be demonstrated using this litmus test:
The atomic_compare_exchange_strong_explicit fails (
r0 == 0) meaning it read thaty != z (value of 1)akay = 0.r1 == 0ensures thatP1also reads 0 akax = 0.herd7 results:
As shown by herd7, the (
0:r0=0; 1:r1=0;) behavior is impossible with the c11 memory model.This is a variant of another common memory-model testcase
This testcase is essentially a variant of:
We can approximate this mapping into RISC-V asm using the memory model mappings.
Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store.
Following that will be amocas.aqrl, however since it fails it follows the text in the spec:
We'll model it using lw.aq - a hypothetical insn that is present in herd7.
P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.
herd7 results:
This shows the issue:
0:x4=0; 1:x4=0;. This is the same case that causes us to requirelr.aqrlfor atomic_(memory_order_seq_cst).Thanks again to Andrea Parri for spotting this and making the litmus tests.
To resolve this for atomic_compare_exchange ops that have the
failureargument set tomemory_order_seq_cstwe may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.
cc'ing a few atomics related people, please add anyone I missed:
@hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb