Skip to content

crucible-llvm: Fully align getelementptr attribute semantics with LLVM Language Reference #1605

@RyanGlScott

Description

@RyanGlScott

While crucible-llvm generally does a good job of implementing the semantics of LLVM's getelementptr instruction, there are some corner cases where crucible-llvm's behavior differs from the specification in the LLVM Language Reference. In particular, getelementptr has a number of optional attributes that enable additional validity checks, both during the calculation of the address that getelementptr returns, as well as in subsequent memory loads from or stores to that address. At present, crucible-llvm implements its own approach to getelementptr that implements some subset of what of the attributes check for, but if we want to be fully compliant with the LLVM Language Reference, then we will need to tighten up crucible-llvm's implementation in various ways.

Here is the list of getelementptr attributes, along with my understanding of how crucible-llvm currently implements (or doesn't implement) each one:

nusw

The LLVM Language Reference says:

For nusw (no unsigned signed wrap):

  • If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the signed value (trunc nsw).
  • The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (mul nsw).
  • The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a signed sense (add nsw).
  • The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the pointer index type.

crucible-llvms getelementptr semantics do not check whether the nusw flag is enabled or not. In practice, the semantics already perform some validity checks that involved signed integer overflow, and these likely overlap with the checks described above. Figuring out if all of the checks are implemented will require some auditing, as we do not literally implement the getelementptr semantics in terms of the semantics of the {add,mul,trunc} nsw instructions, as suggested in the LLVM Language Reference. The following parts of the code will need to be audited:

  • -- Calculate the size of the element memtype and check that it fits
    -- in the pointer width
    let dl = llvmDataLayout ?lc
    let isz = G.bytesToInteger $ memTypeSize dl typ
    unless (isz <= maxSigned PtrWidth)
    (fail $ unwords ["Type size too large for pointer width:", show typ])
  • -- Compute safe upper and lower bounds for the index value to
    -- prevent multiplication overflow. Note that `minidx <= idx <=
    -- maxidx` iff `MININT <= (isz * idx) <= MAXINT` when `isz` and
    -- `idx` are considered as infinite precision integers. This
    -- property holds only if we use `quot` (which rounds toward 0)
    -- for the divisions in the following definitions.
    -- maximum and minimum indices to prevent multiplication overflow
    maxidx = maxSigned PtrWidth `quot` (max isz 1)
    minidx = minSigned PtrWidth `quot` (max isz 1)
    poison = Poison.GEPOutOfBounds base idx'
    cond =
    (app $ BVSle PtrWidth (app $ BVLit PtrWidth (BV.mkBV PtrWidth minidx)) idx') .&&
    (app $ BVSle PtrWidth idx' (app $ BVLit PtrWidth (BV.mkBV PtrWidth maxidx)))
    in
    -- Multiplication overflow will result in a pointer which is not "in
    -- bounds" for the given allocation. We translate all GEP
    -- instructions as if they had the `inbounds` flag set, so the
    -- result would be a poison value.
    poisonSideCondition mvar (BVRepr PtrWidth) poison off0 cond
  • do -- Get the field offset and check that it fits
    -- in the pointer width
    let ioff = G.bytesToInteger $ fiOffset fi
    unless (ioff <= maxSigned PtrWidth)
    (fail $ unwords ["Field offset too large for pointer width in structure:", show ioff])

There are also other places where validity checks may need to be applied where they aren't currently. For instance, it's not obvious to me where the getelementptr semantics truncate the pointer index type.

nuw

The LLVM Language Reference says:

For nuw (no unsigned wrap):

  • If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the unsigned value (trunc nuw).
  • The multiplication of an index by the type size does not wrap the pointer index type in an unsigned sense (mul nuw).
  • The successive addition of each offset (without adding the base address) does not wrap the pointer index type in an unsigned sense (add nuw).
  • The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, also interpreted as an unsigned number, does not wrap the pointer index type (add nuw).

In other words, perform all of the same checks that are used to power nusw, but using unsigned arithmetic instead of signed arithmetic. In addition to covering all of the checks used in nusw, we will also need to audit parts of the code where unsigned arithmetic is being performed, such as in this code:

asOffset mt (IntConst _ x) =
do let x' = BV.asUnsigned x * bytesToInteger (memTypeSize dl mt)
unless (x' <= maxUnsigned ?ptrWidth)
(throwError "Computed offset overflow in constant GEP")
return x'

inbounds

The LLVM Language Reference says:

For inbounds all rules of the nusw attribute apply. Additionally, if the getelementptr has any non-zero indices, the following rules apply:

  • The base pointer has an in bounds address of the allocated object that it is based on. This means that it points into that allocated object, or to its end. Note that the object does not have to be live anymore; being in-bounds of a deallocated object is sufficient.
  • During the successive addition of offsets to the address, the resulting pointer must remain in bounds of the allocated object at each step.

Note that getelementptr with all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. As a corollary, the only pointer in bounds of the null pointer in the default address space is the null pointer itself.

These rules are based on the assumption that no allocated object may cross the unsigned address space boundary, and no allocated object may be larger than half the pointer index type space.

If inbounds is present on a getelementptr instruction, the nusw attribute will be automatically set as well. For this reason, the nusw will also not be printed in textual IR if inbounds is already present.

As noted in the comments here, the crucible-llvm semantics mostly assume that every getelementptr instruction already has the inbounds attribute enabled. Whether this is actually true or not deserves to be investigated. In practice, I suspect that there may be some discrepancies. For instance, the LLVM Language Reference states that inbounds should imply nusw, but as noted above, it is unlikely that crucible-llvm's semantics actually cover everything that nusw needs to cover.

inrange

The LLVM Language Reference says:

If the inrange(Start, End) attribute is present, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside the half-open range [Start, End) from the getelementptr expression result. The result of a pointer comparison or ptrtoint (including ptrtoint-like operations involving memory) involving a pointer derived from a getelementptr with the inrange keyword is undefined, with the exception of comparisons in the case where both operands are in the closed range [Start, End]. Note that the inrange keyword is currently only allowed in constant getelementptr expressions.

This is by far the trickiest attribute to implement in crucible-llvm, as it involves validity checks that are not performed at the time that the getelementptr address is calculated, but rather whenever the memory that the address points to is involved in a memory operation. As a result, these checks are inherently non-local, and in order to support them, we'd likely need to augment crucible-llvm's memory model to store getelementptr ranges somewhere.

As the comments here indicate, crucible-llvm currently does not make any effort to implement some sort of range check. Given the difficulties in doing so, I can see why, and we may want to defer implementing range checks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions