solana-foundation · Benhawkins18 · Jul 10, 2025 · Sep 5, 2024 · Sep 13, 2024 · Nov 22, 2024
diff --git a/proposals/0173-sbpf-instruction-encoding-improvements.md b/proposals/0173-sbpf-instruction-encoding-improvements.md
@@ -0,0 +1,123 @@
+---
+simd: '0173'
+title: SBPF instruction encoding improvements
+authors:
+  - Alexander Meißner
+category: Standard
+type: Core
+status: Review
+created: 2024-09-05
+feature: F6UVKh1ujTEFK3en2SyAL3cdVnqko1FVEXWhmdLRu6WP
+extends: SIMD-0161
+---
+
+## Summary
+
+There are some instructions with questionable encodings, that when slightly
+adjusted, could significantly simplify verification and execution of programs.
+
+## Motivation
+
+The instruction `lddw dst, imm` is currently the only instruction which takes
+two instruction slots. This proposal splits it into a two one-slot instruction
+sequence: `mov32 dst, imm` and an introduced `hor64 dst, imm`. This way all
+instructions will be exactly one slot long which will simplify:
+
+- Calculating the number of instructons in a program will no longer require a
+full linear scan. A division of the length of the text section by the
+instruction slot size will suffice.
+- The instruction meter will no longer have to skip one instruction slot when
+counting a `LDDW` instruction.
+- Jump and call instructions will no longer have to verify that the desination
+is not the second half of a `LDDW` instruction.
+- The verifier will no longer have to check that `LDDW` instructions are
+complete and its first or second half does not occur without the other on its
+own.
+
+The `LE` instruction is essentially useless as only `BE` performs a byte-swap.
+Its runtime behavior is close to no-op and can be replicated by other
+instructions:
+
+- `le dst, 16` behaves the same as `and32 dst, 0xFFFF`
+- `le dst, 32` behaves the same as `and32 dst, 0xFFFFFFFF`
+- `le dst, 64` behaves the same as `mov64 dst, src`
+
+The `CALLX` instruction encodes its source register in the immediate field.
+This is makes the instruction decoder more complex because it is the only case
+in which a register is encoded in the immediate field, for no reason.
+
+With all of the above changes and the ones defined in SIMD-0174, the memory
+related instructions can be moved into the ALU instruction classes. Doing so
+would free up 8 instruction classes completely, giving us back three bits of
+instruction encoding.
+
+## Alternatives Considered
+
+None.
+
+## New Terminology
+
+None.
+
+## Detailed Design
+
+The following must go into effect if and only if a program indicates the
+SBPF-version v2 or higher in its program header (see SIMD-0161). Some now
+unreachable verification and execution checks around `LDDW` can be safely
+removed (see motivation).
+
+### Changes to the Bytecode Verifier
+
+A program containing one of the following instructions must throw
+`VerifierError::UnknownOpCode` during verification:
+
+- the `LDDW` instruction (opcodes `0x18` and `0x00`)
+- the `LE` instruction (opcode `0xD4`)
+- the moved opcodes:
+  - `0x72`, `0x71`, `0x73` (`STB`, `LDXB`, `STXB`)
+  - `0x6A`, `0x69`, `0x6B` (`STH`, `LDXH`, `STXH`)
+  - `0x62`, `0x61`, `0x63` (`STW`, `LDXW`, `STXW`)
+  - `0x7A`, `0x79`, `0x7B` (`STDW`, `LDXDW`, `STXDW`)
+
+A program containing one of the following instructions must **not** throw
+`VerifierError::UnknownOpCode` during verification anymore:
+
+- the `HOR64` instruction (opcode `0xF7`)
+- the moved opcodes:
+  - `0x27`, `0x2C`, `0x2F` (`STB`, `LDXB`, `STXB`)
+  - `0x37`, `0x3C`, `0x3F` (`STH`, `LDXH`, `STXH`)
+  - `0x87`, `0x8C`, `0x8F` (`STW`, `LDXW`, `STXW`)
+  - `0x97`, `0x9C`, `0x9F` (`STDW`, `LDXDW`, `STXDW`)
+
+When a `CALLX` instruction (opcode `0x8D`) is encountered during verification,
+the `src` register field must be verified instead of the `imm` immediate field.
+Otherwise, the verification rule stays the same: The src register must be in
+the inclusive range from R0 to R9.
+
+### Changes to Execution
+
+The introduced `HOR64` instruction (opcode `0xF7`) must take its immediate
+value, shift it 32 bit towards the MSBs (multiplication-like left shift) and
+then bitwise OR it into the given `dst` register.
+
+For the `CALLX` instruction (opcode `0x8D`) the jump destination must be read
+from the `src` register field instead of the `imm` immediate field.
+
+The execution behavior of the moved instructions is transferred to their new
+opcodes:
+
+- `0x72` => `0x27`, `0x71` => `0x2C`, `0x73` => `0x2F`
+- `0x6A` => `0x37`, `0x69` => `0x3C`, `0x6B` => `0x3F`
+- `0x62` => `0x87`, `0x61` => `0x8C`, `0x63` => `0x8F`
+- `0x7A` => `0x97`, `0x79` => `0x9C`, `0x7B` => `0x9F`
+
+## Impact
+
+The toolchain will emit machinecode according to the selected SBPF version.
+As most proposed changes affect the encoding only, and not the functionallity,
+we expect to see no impact on dApp developers. The only exception is that
+64-bit immediate loads will now cost 2 CU instead of 1 CU.
+
+## Security Considerations
+
+None.