Skip to content

Commit 6b8cda0

Browse files
authored
[Arm64] Vector Load/Store structure instructions (dotnet#33461)
This adds support in the JIT emitter for Vector Load/Store structure instructions (C3.2.10 - Arm Architecture Reference Manual): - LD1 (1-4 registers) - LD2 - LD3 - LD4 - LD1R - LD2R - LD3R - LD4R - ST1 (1-4 registers) - ST2 - ST3 - ST4 in the following addressing modes: - Base register only - Post-indexed by a 64-bit register - Post-indexed by an immediate, equal to the number of bytes transferred Also adds support in JitDump for printing of * A SIMD vector register list. For example, ld1 {v5.16b, v6.16b, v7.16b, v8.16b}, [x9] * A SIMD vector element list. For example, st1 {v0.b}[3], [x1],#1
1 parent a1af0f2 commit 6b8cda0

File tree

6 files changed

+1727
-181
lines changed

6 files changed

+1727
-181
lines changed

src/coreclr/src/jit/codegenarm64.cpp

Lines changed: 720 additions & 0 deletions
Large diffs are not rendered by default.

src/coreclr/src/jit/emit.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,8 @@ class emitter
12331233
#define PERFSCORE_THROUGHPUT_4C 4.0f // slower - 4 cycles
12341234
#define PERFSCORE_THROUGHPUT_5C 5.0f // slower - 5 cycles
12351235
#define PERFSCORE_THROUGHPUT_6C 6.0f // slower - 6 cycles
1236+
#define PERFSCORE_THROUGHPUT_7C 7.0f // slower - 7 cycles
1237+
#define PERFSCORE_THROUGHPUT_8C 8.0f // slower - 8 cycles
12361238
#define PERFSCORE_THROUGHPUT_9C 9.0f // slower - 9 cycles
12371239
#define PERFSCORE_THROUGHPUT_10C 10.0f // slower - 10 cycles
12381240
#define PERFSCORE_THROUGHPUT_13C 13.0f // slower - 13 cycles

0 commit comments

Comments
 (0)