Skip to content

AMOCAS recommended compatibility mappings are too weak for the c11 memory model #444

@patrick-rivos

Description

@patrick-rivos

Andrea Parri spotted an issue with the amocas operation with the current seq_cst amo<op> mappings.

The issue stems from this sentence in the ratified zacas extension:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0

The issue can be demonstrated using this litmus test:

C c-cmpxchg-fail
{ [z] = 1; }

P0(atomic_int *x, atomic_int *y, int *z)
{
        atomic_store_explicit(x, 1, memory_order_seq_cst);
        int r0 = atomic_compare_exchange_strong_explicit(y, z, 2, memory_order_seq_cst, memory_order_seq_cst);
}

P1(atomic_int *x, atomic_int *y)
{
        atomic_store_explicit(y, 1, memory_order_seq_cst);
        int r1 = atomic_load_explicit(x, memory_order_seq_cst);
}

exists (0:r0=0 /\ 1:r1=0)

The atomic_compare_exchange_strong_explicit fails (r0 == 0) meaning it read that y != z (value of 1) aka y = 0.
r1 == 0 ensures that P1 also reads 0 aka x = 0.

herd7 results:

Test c-cmpxchg-fail Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No

As shown by herd7, the (0:r0=0; 1:r1=0;) behavior is impossible with the c11 memory model.

This is a variant of another common memory-model testcase

This testcase is essentially a variant of:

seq_cst store a, 1   | seq_cst store b, 1
seq_cst load  0x7, b | seq_cst load 0x8, a
exists (0x7 = 0 /\ 0x8 = 0)

We can approximate this mapping into RISC-V asm using the memory model mappings.
Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store.
Following that will be amocas.aqrl, however since it fails it follows the text in the spec:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

We'll model it using lw.aq - a hypothetical insn that is present in herd7.

RISCV rv-cmpxchg-fail

{
0:x1=x; 0:x2=1; 0:x3=y;
1:x1=y; 1:x2=1; 1:x3=x;
}

  P0             |  P1           ;
  fence rw,w     |  sw x2,0(x1)  ;
  sw x2,0(x1)    |  fence rw,rw  ;
  lw.aq x4,0(x3) |  lw x4,0(x3)  ;

exists (0:x4=0 /\ 1:x4=0)

P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.

herd7 results:

Test rv-cmpxchg-fail Allowed
States 4
0:x4=0; 1:x4=0;
0:x4=0; 1:x4=1;
0:x4=1; 1:x4=0;
0:x4=1; 1:x4=1;
Ok

This shows the issue: 0:x4=0; 1:x4=0;. This is the same case that causes us to require lr.aqrl for atomic_(memory_order_seq_cst).

Thanks again to Andrea Parri for spotting this and making the litmus tests.

To resolve this for atomic_compare_exchange ops that have the failure argument set to memory_order_seq_cst we may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.

Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

cc'ing a few atomics related people, please add anyone I missed:
@hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions