Skip to content

Conversation

@jargh
Copy link
Contributor

@jargh jargh commented Nov 24, 2024

This involves a number of general infrastructure additions such as new versions of lemmas and tools for 256-bit and 512-bit sizes, as well as extra utility functions to handle decoding of x86 SIMD instructions.

So far, we just decode the single instruction VPXOR, but adding more instructions should now be relatively easy. The "decode_aux" function is slightly uglified by using conditional branches instead of pattern matching, to work round limitations of the evaluation machinery. We plan to replace this soon anyway with simpler code based on the HOL Light Compute module, so we won't fix it now.

Although the register model extends up to ZMMs for possible future extension with AVX512, all the practical simulation infrastructure treats the YMM registers as basic and records updates to those registers. The treatment of XMMs within YMMs within ZMMs performs zero-extension on write (analogous to the treatment of EAX within RAX on general purpose registers). This is in conformance with the specified behavior when the operation is VEX-encoded, which is all our decoder is handling. See the Intel combined ISA manual 15.5, e.g.

"The lower 128 bits of a YMM register is aliased to the
corresponding XMM register. Legacy SSE instructions (i.e., SIMD
instructions operating on XMM state but not using the VEX prefix,
also referred to non-VEX encoded SIMD instructions) will not
access the upper bits (MAXVL-1:128) of the YMM registers. AVX and
FMA instructions with a VEX prefix and vector length of 128-bits
zeroes the upper 128 bits of the YMM register."

The formal specifications of the standard x86 ABI and the Microsoft x64 ABI (usually called the "Windows ABI") are both updated to include the possible modification of the SIMD register state. The specs for both the ABIs are taken directly from Table 4 here:

https://www.agner.org/optimize/calling_conventions.pdf

For the standard ABI all the ZMM registers are treated as scratch. The fact that the Windows ABI is less liberal (requiring XMM6..XMM15 to be preserved) necessitates some trivial but extensive changes to existing proofs, since we can no longer simulate through subroutines that use MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI to abbreviate the possible state updates, since such a subroutine might in principle modify XMM6..XMM15 based on its spec. We work round this using a new function X86_SIMD_SHARPEN_RULE which in essence reruns the relevant subroutine wrapper boilerplate on a theorem to create a sharper one.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This involves a number of general infrastructure additions such as new
versions of lemmas and tools for 256-bit and 512-bit sizes, as well as
extra utility functions to handle decoding of x86 SIMD instructions.

So far, we just decode the single instruction VPXOR, but adding more
instructions should now be relatively easy. The "decode_aux" function
is slightly uglified by using conditional branches instead of pattern
matching, to work round limitations of the evaluation machinery. We
plan to replace this soon anyway with simpler code based on the
HOL Light Compute module, so we won't fix it now.

Although the register model extends up to ZMMs for possible future
extension with AVX512, all the practical simulation infrastructure
treats the YMM registers as basic and records updates to those
registers. The treatment of XMMs within YMMs within ZMMs performs
zero-extension on write (analogous to the treatment of EAX within RAX
on general purpose registers). This is in conformance with the
specified behavior when the operation is VEX-encoded, which is all our
decoder is handling. See the Intel combined ISA manual 15.5, e.g.

"The lower 128 bits of a YMM register is aliased to the
 corresponding XMM register. Legacy SSE instructions (i.e., SIMD
 instructions operating on XMM state but not using the VEX prefix,
 also referred to non-VEX encoded SIMD instructions) will not
 access the upper bits (MAXVL-1:128) of the YMM registers. AVX and
 FMA instructions with a VEX prefix and vector length of 128-bits
 zeroes the upper 128 bits of the YMM register."

The formal specifications of the standard x86 ABI and the Microsoft
x64 ABI (usually called the "Windows ABI") are both updated to include
the possible modification of the SIMD register state. The specs for
both the ABIs are taken directly from Table 4 here:

  https://www.agner.org/optimize/calling_conventions.pdf

For the standard ABI *all* the ZMM registers are treated as scratch.
The fact that the Windows ABI is less liberal (requiring XMM6..XMM15
to be preserved) necessitates some trivial but extensive changes to
existing proofs, since we can no longer simulate through subroutines
that use MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI to abbreviate the
possible state updates, since such a subroutine might in principle
modify XMM6..XMM15 based on its spec. We work round this using a new
function X86_SIMD_SHARPEN_RULE which in essence reruns the relevant
subroutine wrapper boilerplate on a theorem to create a sharper one.
@jargh jargh marked this pull request as ready for review November 24, 2024 04:08
@ctz
Copy link
Contributor

ctz commented Dec 2, 2024

Thanks for starting this! I will try and work on it over my christmas and NY break.

jargh and others added 8 commits December 3, 2024 19:46
This enables general pattern-matching over the VEXM type in the
"evaluate" function. (It's a bit ad hoc to do it this way for the
different types rather than via general code, but we will probably
make more substantial changes here soon.) Taking advantage of
this improvement, we rewrite decode_aux with pattern-matching
instead of the nested conditional branches for the VEXM part,
which seems more natural.
This extends the state considered from just the flags and integer
registers (other than RSP) to include YMM0..YMM15 too. It also
explicitly adds tests for the new VPXOR instruction which would
otherwise be unexercised since it's not yet used in the code.
@jargh jargh requested a review from aqjune-aws January 13, 2025 23:52
Copy link
Collaborator

@aqjune-aws aqjune-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code and the updates looked reasonable to me. Also I ran the simulator more extensively by only leaving the VPXOR cases and running for 20 minutes, and it passed.

@jargh jargh merged commit 72a34a3 into awslabs:main Jan 15, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants