-
Notifications
You must be signed in to change notification settings - Fork 46
Description
While crucible-llvm generally does a good job of implementing the semantics of LLVM's getelementptr instruction, there are some corner cases where crucible-llvm's behavior differs from the specification in the LLVM Language Reference. In particular, getelementptr has a number of optional attributes that enable additional validity checks, both during the calculation of the address that getelementptr returns, as well as in subsequent memory loads from or stores to that address. At present, crucible-llvm implements its own approach to getelementptr that implements some subset of what of the attributes check for, but if we want to be fully compliant with the LLVM Language Reference, then we will need to tighten up crucible-llvm's implementation in various ways.
Here is the list of getelementptr attributes, along with my understanding of how crucible-llvm currently implements (or doesn't implement) each one:
nusw
The LLVM Language Reference says:
For
nusw(no unsigned signed wrap):
- If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the signed value (
trunc nsw).- The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (
mul nsw).- The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a signed sense (
add nsw).- The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the pointer index type.
crucible-llvms getelementptr semantics do not check whether the nusw flag is enabled or not. In practice, the semantics already perform some validity checks that involved signed integer overflow, and these likely overlap with the checks described above. Figuring out if all of the checks are implemented will require some auditing, as we do not literally implement the getelementptr semantics in terms of the semantics of the {add,mul,trunc} nsw instructions, as suggested in the LLVM Language Reference. The following parts of the code will need to be audited:
crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 506 to 511 in ff139fe
-- Calculate the size of the element memtype and check that it fits -- in the pointer width let dl = llvmDataLayout ?lc let isz = G.bytesToInteger $ memTypeSize dl typ unless (isz <= maxSigned PtrWidth) (fail $ unwords ["Type size too large for pointer width:", show typ]) crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 521 to 540 in ff139fe
-- Compute safe upper and lower bounds for the index value to -- prevent multiplication overflow. Note that `minidx <= idx <= -- maxidx` iff `MININT <= (isz * idx) <= MAXINT` when `isz` and -- `idx` are considered as infinite precision integers. This -- property holds only if we use `quot` (which rounds toward 0) -- for the divisions in the following definitions. -- maximum and minimum indices to prevent multiplication overflow maxidx = maxSigned PtrWidth `quot` (max isz 1) minidx = minSigned PtrWidth `quot` (max isz 1) poison = Poison.GEPOutOfBounds base idx' cond = (app $ BVSle PtrWidth (app $ BVLit PtrWidth (BV.mkBV PtrWidth minidx)) idx') .&& (app $ BVSle PtrWidth idx' (app $ BVLit PtrWidth (BV.mkBV PtrWidth maxidx))) in -- Multiplication overflow will result in a pointer which is not "in -- bounds" for the given allocation. We translate all GEP -- instructions as if they had the `inbounds` flag set, so the -- result would be a poison value. poisonSideCondition mvar (BVRepr PtrWidth) poison off0 cond crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 552 to 556 in ff139fe
do -- Get the field offset and check that it fits -- in the pointer width let ioff = G.bytesToInteger $ fiOffset fi unless (ioff <= maxSigned PtrWidth) (fail $ unwords ["Field offset too large for pointer width in structure:", show ioff])
There are also other places where validity checks may need to be applied where they aren't currently. For instance, it's not obvious to me where the getelementptr semantics truncate the pointer index type.
nuw
The LLVM Language Reference says:
For
nuw(no unsigned wrap):
- If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the unsigned value (
trunc nuw).- The multiplication of an index by the type size does not wrap the pointer index type in an unsigned sense (
mul nuw).- The successive addition of each offset (without adding the base address) does not wrap the pointer index type in an unsigned sense (
add nuw).- The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, also interpreted as an unsigned number, does not wrap the pointer index type (
add nuw).
In other words, perform all of the same checks that are used to power nusw, but using unsigned arithmetic instead of signed arithmetic. In addition to covering all of the checks used in nusw, we will also need to audit parts of the code where unsigned arithmetic is being performed, such as in this code:
crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Constant.hs
Lines 496 to 500 in ff139fe
| asOffset mt (IntConst _ x) = | |
| do let x' = BV.asUnsigned x * bytesToInteger (memTypeSize dl mt) | |
| unless (x' <= maxUnsigned ?ptrWidth) | |
| (throwError "Computed offset overflow in constant GEP") | |
| return x' |
inbounds
The LLVM Language Reference says:
For
inboundsall rules of the nusw attribute apply. Additionally, if thegetelementptrhas any non-zero indices, the following rules apply:
- The base pointer has an in bounds address of the allocated object that it is based on. This means that it points into that allocated object, or to its end. Note that the object does not have to be live anymore; being in-bounds of a deallocated object is sufficient.
- During the successive addition of offsets to the address, the resulting pointer must remain in bounds of the allocated object at each step.
Note that
getelementptrwith all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. As a corollary, the only pointer in bounds of the null pointer in the default address space is the null pointer itself.These rules are based on the assumption that no allocated object may cross the unsigned address space boundary, and no allocated object may be larger than half the pointer index type space.
If
inboundsis present on a getelementptr instruction, thenuswattribute will be automatically set as well. For this reason, thenuswwill also not be printed in textual IR if inbounds is already present.
As noted in the comments here, the crucible-llvm semantics mostly assume that every getelementptr instruction already has the inbounds attribute enabled. Whether this is actually true or not deserves to be investigated. In practice, I suspect that there may be some discrepancies. For instance, the LLVM Language Reference states that inbounds should imply nusw, but as noted above, it is unlikely that crucible-llvm's semantics actually cover everything that nusw needs to cover.
inrange
The LLVM Language Reference says:
If the
inrange(Start, End)attribute is present, loading from or storing to any pointer derived from thegetelementptrhas undefined behavior if the load or store would access memory outside the half-open range[Start, End)from thegetelementptrexpression result. The result of a pointer comparison orptrtoint(includingptrtoint-like operations involving memory) involving a pointer derived from agetelementptrwith theinrangekeyword is undefined, with the exception of comparisons in the case where both operands are in the closed range[Start, End]. Note that theinrangekeyword is currently only allowed in constantgetelementptrexpressions.
This is by far the trickiest attribute to implement in crucible-llvm, as it involves validity checks that are not performed at the time that the getelementptr address is calculated, but rather whenever the memory that the address points to is involved in a memory operation. As a result, these checks are inherently non-local, and in order to support them, we'd likely need to augment crucible-llvm's memory model to store getelementptr ranges somewhere.
As the comments here indicate, crucible-llvm currently does not make any effort to implement some sort of range check. Given the difficulties in doing so, I can see why, and we may want to defer implementing range checks.