Create proposal

LucasSte · LucasSte · commit fca89dacb982 · 2025-10-09T17:36:05.000-03:00
diff --git a/proposals/0377-ebpf-isa-compatibility.md b/proposals/0377-ebpf-isa-compatibility.md
@@ -0,0 +1,259 @@
+---
+simd: '0377'
+title: eBPF ISA compatibility
+authors:
+  - Lucas Steuernagel (Anza)
+  - Alexander Meißner (Anza)
+category: Standard
+type: Core
+status: Review
+created: 2025-10-09
+feature: (fill in with feature key and github tracking issues once accepted)
+---
+
+## Summary
+
+This SIMD introduces instruction set architecture (ISA) changes to make the 
+sBPF virtual machine compatible with the latest existing version of eBPF ISA 
+generated by its LLVM backend.
+
+It reverts past ISA changes, modifies the encoding of existing instructions 
+and brings new instructions to the Solana virtual machine.
+
+## Motivation
+
+The eBPF target on the Rust compiler emits code by default for eBPFv1, whose 
+only incompatibility with the Solana virtual machine is the `callx` 
+instruction. Aiming to prioritize Solana programs and decrease their CU 
+consumption, we want to be compatible with at least the current eBPF version 
+(v3), which brings in new instructions. In order for that to be possible, we 
+must modify our virtual machine to support eBPF integrally.
+
+Additionally, we are going to anticipate some eBPF v4 instructions that are 
+beneficial for Solana and would decrease the update burden in case v4 becomes 
+the new default configuration for the eBPF upstream LLVM target.
+
+## New Terminology
+
+The set containing these new instructions will form an sBPFv3 program.
+
+## Detailed Design
+
+### ELF Identification
+
+Programs containing the instructions mentioned in this SIMD must have the 
+`0x03` value in the `e_flags` field of their header.
+
+### Revert SIMD-0166
+
+SIMD-0166 must be reverted beginning with sBPFv3, since we will introduce a 
+new design for dynamic stack frames that is closer to the eBPF code generation.
+
+### Revert SIMD-0173
+
+All changes proposed in SIMD-173 will no longer take effect in sBPFv3. 
+Consequently, the verifier must accept the following opcodes:
+
+- `0x18`, `0x00` (`LDDW`)
+- `0x72`, `0x71`, `0x73` (`STB`, `LDXB`, `STXB`)
+- `0x6A`, `0x69`, `0x6B` (`STH`, `LDXH`, `STXH`)
+- `0x62`, `0x61`, `0x63` (`STW`, `LDXW`, `STXW`)
+- `0x7A`, `0x79`, `0x7B` (`STDW`, `LDXDW`, `STXDW`)
+- `0xD4` (`LE`)
+
+The new opcodes introduced in SIMD-173 must be rejected in the verifier with `VerifierError::UnknownOpCode`:
+
+- the `HOR64` instruction (opcode `0xF7`)
+- the moved opcodes:
+  - `0x27`, `0x2C`, `0x2F` (`STB`, `LDXB`, `STXB`)
+  - `0x37`, `0x3C`, `0x3F` (`STH`, `LDXH`, `STXH`)
+  - `0x87`, `0x8C`, `0x8F` (`STW`, `LDXW`, `STXW`)
+  - `0x97`, `0x9C`, `0x9F` (`STDW`, `LDXDW`, `STXDW`)
+
+### Revert SIMD-0174
+
+All changes proposed in SIMD-174 will no longer take effect in sBPFv3. 
+Consequently, the verifier must accept the following opcodes:
+
+- the `MUL` instruction (opcodes `0x24`, `0x2C`, `0x27` and `0x2F`)
+- the `DIV` instruction (opcodes `0x34`, `0x3C`, `0x37` and `0x3F`)
+- the `MOD` instruction (opcodes `0x94`, `0x9C`, `0x97` and `0x9F`)
+- the `NEG` instruction (opcodes `0x84` and `0x87`)
+
+The verifier must reject programs and throw `VerifierError::UnknownOpCode` for 
+programs that contain any of the following opcodes.
+
+- the `UHMUL64` instruction (opcode `0x36` and `0x3E`)
+- the `UDIV32` instruction (opcode `0x46` and `0x4E`)
+- the `UDIV64` instruction (opcode `0x56` and `0x5E`)
+- the `UREM32` instruction (opcode `0x66` and `0x6E`)
+- the `UREM64` instruction (opcode `0x76` and `0x7E`)
+- the `LMUL32` instruction (opcode `0x86` and `0x8E`)
+- the `LMUL64` instruction (opcode `0x96` and `0x9E`)
+- the `SHMUL64` instruction (opcode `0xB6` and `0xBE`)
+- the `SDIV32` instruction (opcode `0xC6` and `0xCE`)
+- the `SDIV64` instruction (opcode `0xD6` and `0xDE`)
+- the `SREM32` instruction (opcode `0xE6` and `0xEE`)
+- the `SREM64` instruction (opcode `0xF6` and `0xFE`)
+
+### Execution changes
+
+MOV32_REG (opcode `0x14`) must NOT perform sign extension.
+
+SUB32_IMM and SUB64_IMM must perform the operation `src = src - imm`.
+
+### Dynamic stack frames
+
+Aiming a closer compatibility to eBPF, the implementation of dynamic stack 
+frames is going to change.
+
+The R10 register must continue to be the frame pointer, i.e. pointing to the 
+highest address accessible in a function. As such, the stack will grow upwards.
+
+At the prologue of each function, there may be one ADD64 (opcode 0x07) 
+instruction to adjust the frame pointer to its new position with a positive 
+offset (`add64 R10, +imm`). When a function returns, the virtual machine will 
+automatically restore the frame pointer register with the value used in the 
+caller, so programs do not need to emit any instruction to adjust the frame 
+pointer in the epilogue of each function.
+
+### JMP32 instruction class
+
+The JMP32 instruction class utilizes 32 bit wide operands for the same 
+operations as the JMP class.
+
+The following opcodes must be allowed in the verifier and the virtual machine 
+must implement the behavior described below for each one of them.
+
+- `JEQ32_IMM`  -> opcode = `0x16` -> `pc += offset if dst as u32 == IMM as u32`
+- `JGT32_IMM`  -> opcode = `0x26` -> `pc += offset if dst as u32 >  IMM as u32`
+- `JGE32_IMM`  -> opcode = `0x36` -> `pc += offset if dst as u32 >= IMM as u32`
+- `JSET32_IMM` -> opcode = `0x46` -> 
+                               `pc += offset if (dst as u32 & IMM as u32) != 0`
+- `JNE32_IMM`  -> opcode = `0x56` -> `pc += offset if dst as u32 != IMM as u32`
+- `JSGT32_IMM` -> opcode = `0x66` -> `pc += offset if dst as i32 >  IMM as i32`
+- `JSGE32_IMM` -> opcode = `0x76` -> `pc += offset if dst as i32 >  IMM as i32`
+- `JLT32_IMM`  -> opcode = `0xa6` -> `pc += offset if dst as u32 <  IMM as u32`
+- `JLE32_IMM`  -> opcode = `0xb6` -> `pc += offset if dst as u32 <= IMM as u32`
+- `JSLT32_IMM` -> opcode = `0xc6` -> `pc += offset if dst as i32 <  IMM as i32`
+- `JSLE32_IMM` -> opcode = `0xd6` -> `pc += offset if dst as i32 <= IMM as i32`
+
+- `JEQ32_REG`  -> opcode = `0x1e` -> `pc += offset if dst as u32 == src as u32`
+- `JGT32_REG`  -> opcode = `0x2e` -> `pc += offset if dst as u32 >  src as u32`
+- `JGE32_REG`  -> opcode = `0x3e` -> `pc += offset if dst as u32 >= src as u32`
+- `JSET32_REG` -> opcode = `0x4e` -> 
+                               `pc += offset if (dst as u32 & src as u32) != 0`
+- `JNE32_REG`  -> opcode = `0x56` -> `pc += offset if dst as u32 != src as u32`
+- `JSGT32_REG` -> opcode = `0x66` -> `pc += offset if dst as i32 >  src as i32`
+- `JSGE32_REG` -> opcode = `0x76` -> `pc += offset if dst as i32 >  src as i32`
+- `JLT32_REG`  -> opcode = `0xa6` -> `pc += offset if dst as u32 <  src as u32`
+- `JLE32_REG`  -> opcode = `0xb6` -> `pc += offset if dst as u32 <= src as u32`
+- `JSLT32_REG` -> opcode = `0xc6` -> `pc += offset if dst as i32 <  src as i32`
+- `JSLE32_REG` -> opcode = `0xd6` -> `pc += offset if dst as i32 <= src as i32`
+
+### SMOD and SDIV instructions
+
+The following opcodes must be allowed in the verifier for a sBPFv3 program and 
+the following behavior must occur in the virtual machine.
+
+- `SMOD64_IMM` -> opcode = `0x97` -> `dst = dst as i64 % imm as i64`
+- `SMOD64_REG` -> opcode = `0x9f` -> `dst = dst as i64 % src as i64`
+- `SMOD32_IMM` -> opcode = `0x94` -> `dst = dst as i32 % imm as i32`
+- `SMOD32_REG` -> opcode = `0x9c` -> `dst = dst as i32 % src as i32`
+- `SDIV64_IMM` -> opcode = `0x37` -> `dst = dst as i64 / imm as i64`
+- `SDIV64_REG` -> opcode = `0x3f` -> `dst = dst as i64 / src as i64`
+- `SDIV32_IMM` -> opcode = `0x34` -> `dst = dst as i32 / imm as i32`
+- `SDIV32_REG` -> opcode = `0x3c` -> `dst = dst as i32 / src as i32`
+
+### Sign extended mov and sign extended load
+
+The verifier must accept the following instruction encodings for sign extended 
+`mov` operations, and the virtual machine must implement the behavior detailed 
+below for them.
+
+The existing `MOV64` and `MOV32` instructions were included in the list below 
+only for comparison.
+
+- `MOV64`    -> opcode = `0xbf`, offset = `0` -> `dst = src as i64` 
+                     (existing instruction - only here for comparison)
+- `MOV64S8`  -> opcode = `0xbf`, offset = `8`  -> `dst = src as i8 as i64`
+- `MOV64S16` -> opcode = `0xbf`, offset = `16` -> `dst = src as i16 as i64`
+- `MOV64S32` -> opcode = `0xbf`, offset = `32` -> `dst = src as i32 as i64`
+
+- `MOV32`    -> opcode = `0xbc`, offset = `0`  -> `dst = src as u32` 
+                     (existing instruction - only here for comparison)
+- `MOV32S8`  -> opcode = `0xbc`, offset = `8`  -> `dst = src as i8 as i32`
+- `MOV32S16` -> opcode = `0xbc`, offset = `16` -> `dst = src as i16 as i32`
+
+
+### Indirect jump
+
+The indirect jump instruction `jx` jumps to the instruction pointed by the 
+address in the source register. In sum, the verifier must allow the following 
+opcode and the runtime must implement the following behavior.
+
+- `jx` -> opcode = `0x0d` -> pc = `src`
+
+### callx encoding
+
+The encoding of callx must change so that the register containing the address 
+to jump to is in the destination register.
+
+- `callx` -> opcode = `0x9d` -> pc = `dst`
+
+
+### Reinterpretation of LDDW as MOV and HOR
+
+The LDDW instruction consists of two 8-byte instruction frames, but consumes 
+only one CU in the virtual machine.
+
+The virtual machine must now interpret the first half of LLDW as the opcode 
+`0x18`, with the same behavior as `mov32 reg, imm` zero extending the 
+immediate value.
+
+Likewise, the second half of LDDW must be interpreted as the opcode `0x00`, 
+being a bitwise OR operation of the MSBs in the destination register. This 
+instruction must be called `hor64 reg, imm`.
+
+The HOR instruction, however, must encode a destination register on which to 
+operate. The encoding is as follows.
+
+- `HOR64` -> opcode = `0x00` -> `dst = dst | (imm << 32)`
+
+Consequently, we must charge two CUs for an LDDW execution.
+
+## Alternatives Considered
+
+We have considered diverging from the eBPF standard by introducing new opcodes 
+and creating specific instructions to the Solana environment. We discarded 
+such an approach to be compatible with the existing LLVM eBPF code generation.
+
+We are not adding `may_goto`, since it has an implicit condition implemented 
+in the kernel, and does not yet have any path for code generation. We are also 
+not supporting any of the eBPF atomic instructions, since the Solana virtual 
+machine is single threaded.
+
+In eBPFv4, there are two other instructions that could be part of the Solana 
+vendored virtual machine, but are not included in the proposal: `gotol` 
+(opcode `0x06`) and `bswap` (opcode `0xd7`).
+
+`gotol` is an unconditional jump with a 32 bit offset. It does not replace the 
+existing JA instruction (opcode 0x05), and is used only when the 16-bit offset 
+from the JA can't be used for a jump. This situation appears only in very 
+large functions, or in environments with aggressive inlining, and has not so 
+far represented a problem for smart contracts.
+
+`bswap` supersedes LE (opcode 0xd4) and BE (opcode 0xdc) in eBPFv4, but 
+otherwise behaves similarly. Byte swap is rarely used as an instruction in 
+smart contracts, and so far the existing opcodes already fill up any needs.
+
+## Impact
+
+With these changes, and a patch to the aya bpf-linker, developers will be able 
+to install the bpf-linker and use the existing rustup/cargo/rustc 
+infrastructure to build their programs.
+
+## Security Considerations
+
+None
+