-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Currently Biohazrd's LinkImportsTransformation uses LibObjectFile when parsing .so shared objects on Linux and does not support parsing .a library archives. (If you attempt to load a .a library archive it actually combusts violently since their header magic is the same as Windows .lib files but the format is slightly different.)
.a and .lib files are both !<arch>-magic archive files. This format isn't actually standardized and barely even has a name. (The Microsoft documentation literally just calls them archive files, which is decidedly the most infuriatingly impossible thing to search ever.)
I've been unable thus far to find a good documentation source on the format of .a files, but Wikipedia has a nicely detailed section (Archived PDF) on it that seems to be correct. (I suspect the closest thing to official "reference" documentation is the source code for ar.) As the Wikipedia article notes, the format was never properly standardized and has many variants. One major downside of this lack of standardization is there's no great way to differentiate the different types. (Which kinda makes sense, it's basically a no compression archive format that has a bunch of non-standard extensions.)
As such we can't easily have Biohazrd peek at the .a/.lib file and try to decide whether it's a Windows-style archive or a GNU-style archive. If we could, in theory we could've deferred to LibObjectFile for the GNU-style ones and keep using Kaisa for the Windows-style ones. We could also just go off of the file extension, but I'm not crazy about doing this.
However, even if we did differentiate (since it is possible -- see differences below) LibObjectFile doesn't seem to properly support (modern? LLVM-generated?) .a files. I briefly tested it with libPhysXCharacterKinematic_static_64.a and it combusts violently regardless of which ArArchiveKind I specify. However however, parsing the actually archive isn't the problematic part, it's the object files within. I also briefly tested modifying Kaisa to parse the .a files and defer to LibObjectFile for parsing the ELF objects within and that combusted violently too. (Some object files parsed fine, but others complained about sections overlapping or invalid section info.)
So this leaves us with a decision: Fix LibObjectFile or parse the ELF files ourselves. While I'd love to upstream fixes to LibObjectFile, I'm actually inclined to have us parse the ELF files ourselves:
- LibObjectFile is actually a little too flexible for our use-case and ends up putting a lot of unneeded ELF-specific logic down into
LinkImportsTransformationand others who consume its output. - Because we need the ELF-specific logic, I've actually been meaning to read the ELF specification closer to understand if we're interpreting symbol tables as intended.
- By the time I understand the ELF specification well enough to fix LibObjectFile I probably could've written a good-enough parser for Kaisa.
As such I think I'm going to make an ELF parser in Kaisa. We can use this for both GNU-style archives and for parsing shared object files. I do plan to look at LibObjectFile as I go through the spec and see if the issue immediately jumps out to me. (Based on the error messages though, I think it's some failure to follow the letter of the spec so it's probably very subtle.)
Differences between .a and .lib
Luckily it seems the GNU variant and the Microsoft variants are very closely related. These are the two main differences I'd identified:
- The longnames file (
//) is delimited by\ninstead of\0 - Longnames have a
/suffix just like shortnames do - The actual object files are in ELF instead of COFF
The first two are pretty easy to resolve. The last one slightly less so because COFF files aren't really identifiable. Luckily ELF files are since they have the header magic 0x7F, 'E', 'L', 'F'. There's already a precedent for parsing the first 32 bits of the file to determine it's type thanks for import archive members so I think it's pretty reasonable to put a check for ELF files here too. (In pedantic land this means we don't support a COFF member for the 0x457F machine type (processor architecture) with exactly 0x464C sections, but that's probably fine since that's absurd levels of pedantry.) (In fact I think I might add logic to skip parsing a COFF member if the machine type is invalid to avoid crashing when we interpret something that isn't a COFF member as a COFF member.)
After that the only issue is parsing the ELF files...
ELF file spec
I found what is probably the most canonical ELF file spec on the Linux foundation's reference specifications page. There's a few different specifications linked with no clear winner. The two "best" ones appear to be the TIS 1.2 spec from 1995 (Archive) (only contains ELF), the 1997 System-V ABI (Archive), or the draft spec from 2001.
There's also the AMD64-specific extensions to ELF. These aren't critical but are worth keeping in mind. The Linux refpsecs link up to v0.99 (Archive) and I also have in my personal documentation folder a version 1.0 PDF (which is derived from https://gitlab.com/x86-psABIs/x86-64-ABI which seems to be the canonical spec.)
I'll probably end up basing things on the 2001 draft ELF spec (since presumably the fact that it's linked from this page that means it's the version the Kernel developers use use) along with the 1.0 PDF of the AMD64 extensions. The draft specs are also presented as HTML which makes them much easier to link to in comments.