AMOCAS recommended compatibility mappings are too weak for the c11 memory model

Andrea Parri spotted an issue with the `amocas` operation with the current seq_cst `amo<op>` mappings.

The issue stems from this sentence in the ratified zacas extension:
> The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0

The issue can be demonstrated using this litmus test:
```herd7
C c-cmpxchg-fail
{ [z] = 1; }

P0(atomic_int *x, atomic_int *y, int *z)
{
        atomic_store_explicit(x, 1, memory_order_seq_cst);
        int r0 = atomic_compare_exchange_strong_explicit(y, z, 2, memory_order_seq_cst, memory_order_seq_cst);
}

P1(atomic_int *x, atomic_int *y)
{
        atomic_store_explicit(y, 1, memory_order_seq_cst);
        int r1 = atomic_load_explicit(x, memory_order_seq_cst);
}

exists (0:r0=0 /\ 1:r1=0)
```
The atomic_compare_exchange_strong_explicit fails (`r0 == 0`) meaning it read that `y != z (value of 1)` aka `y = 0`.
`r1 == 0` ensures that `P1` also reads 0 aka `x = 0`.

herd7 results:
```
Test c-cmpxchg-fail Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No
```
As shown by herd7, the (`0:r0=0; 1:r1=0;`) behavior is impossible with the c11 memory model.

<details>
  <summary>This is a variant of another common memory-model testcase</summary>

This testcase is essentially a variant of:
```
seq_cst store a, 1   | seq_cst store b, 1
seq_cst load  0x7, b | seq_cst load 0x8, a
exists (0x7 = 0 /\ 0x8 = 0)
```  
</details>


We can approximate this mapping into RISC-V asm using the memory model mappings.
Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store.
Following that will be amocas.aqrl, however since it fails it follows the text in the spec:
> The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

We'll model it using lw.aq - a hypothetical insn that is present in herd7.
```
RISCV rv-cmpxchg-fail

{
0:x1=x; 0:x2=1; 0:x3=y;
1:x1=y; 1:x2=1; 1:x3=x;
}

  P0             |  P1           ;
  fence rw,w     |  sw x2,0(x1)  ;
  sw x2,0(x1)    |  fence rw,rw  ;
  lw.aq x4,0(x3) |  lw x4,0(x3)  ;

exists (0:x4=0 /\ 1:x4=0)
```
P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.

herd7 results:
```
Test rv-cmpxchg-fail Allowed
States 4
0:x4=0; 1:x4=0;
0:x4=0; 1:x4=1;
0:x4=1; 1:x4=0;
0:x4=1; 1:x4=1;
Ok
```
This shows the issue: `0:x4=0; 1:x4=0;`. This is the same case that causes us to require `lr.aqrl` for atomic_<op>(memory_order_seq_cst).

Thanks again to Andrea Parri for spotting this and making the litmus tests.

To resolve this for atomic_compare_exchange ops that have the `failure` argument set to `memory_order_seq_cst` we may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.

Thankfully LLVM 18 does not contain zacas but [trunk](https://godbolt.org/z/K4hr7WTq7) uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

cc'ing a few atomics related people, please add anyone I missed:
@hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMOCAS recommended compatibility mappings are too weak for the c11 memory model #444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AMOCAS recommended compatibility mappings are too weak for the c11 memory model #444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions