-
Notifications
You must be signed in to change notification settings - Fork 536
Description
I've read previous, similar issues about fuzzing #529 and corrupted ELF files #482. I've followed the rule of thumb as laid out by sevaa there
In general, our preferred rule of thumb is - can the GNU or the LLVM tools (e. g. readelf) parse the binary with no errors? If they do, and pyelftools throws an error, it's issue with pyelftools. Else, it's the issue with the binary.
Well, using the fuzzer Atheris I found 19 uncaught exceptions. I've logged all offending ELF files, and only logged them if llvm-readelf returned without exit.
To be precise, I made this check as such:
result = subprocess.run(['llvm-readelf', '--addrsig', '--arch-specific', '--bb-addr-map', '--demangle', '--dependent-libraries', '--dyn-relocations', '--dyn-symbols', '--dynamic-table', '--cg-profile', '--histogram', '--elf-linker-options', '--section-groups', '--expand-relocs', '--file-header', '--gnu-hash-table', '--hash-symbols', '--elf-output-style=JSON', '--pretty-print', '--hash-table', '--headers', '--needed-libs', '--notes', '--program-headers', '--relocations', '--sections', '--section-data', '--section-mapping', '--section-relocations', '--section-symbols', '--stackmap', '--stack-sizes', '--symbols', '--unwind', '--version-info', temp_file_path], capture_output=True, text=False)
if result.returncode == 0:
# Log the crash only if llvm-readelf succeeded
log_crash(e, data, CRASH_DIR)
else:
pass
# print(f"llvm-readelf failed to parse the file: {temp_file_path}")I tried to make llvm-parse parse as much as possible of the ELF file this way, to make the comparison as fair as possible.
I logged the crashing ELF files, as well as a JSON containing more information (such as the stack trace), for example:
{
"exception_type": "<class 'UnicodeDecodeError'>",
"exception_message": "'utf-8' codec can't decode byte 0xff in position 4: invalid start byte",
"traceback": "Traceback (most recent call last):\n File \"/root/pyelftools/fuzzing.py\", line 111, in TestOneInput\n readelf.main()\n File \"/root/pyelftools/scripts/readelf.py\", line 1955, in main\n readelf.display_arch_specific()\n File \"/root/pyelftools/scripts/readelf.py\", line 800, in display_arch_specific\n self._display_arch_specific_arm()\n File \"/root/pyelftools/scripts/readelf.py\", line 1830, in _display_arch_specific_arm\n self._display_attributes(attr_sec, describe_attr_tag_arm)\n File \"/root/pyelftools/scripts/readelf.py\", line 1821, in _display_attributes\n for attr in ss.iter_attributes():\n File \"/root/pyelftools/elftools/elf/sections.py\", line 335, in iter_attributes\n for attribute in self._make_attributes():\n File \"/root/pyelftools/elftools/elf/sections.py\", line 360, in _make_attributes\n yield self.attribute(self.structs, self.stream)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/elf/sections.py\", line 495, in __init__\n self.value = struct_parse(structs.Elf_ntbs('value',\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/common/utils.py\", line 36, in struct_parse\n return struct.parse_stream(stream)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/construct/core.py\", line 190, in parse_stream\n return self._parse(stream, Container())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/construct/core.py\", line 261, in _parse\n return self.subcon._parse(stream, context)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/construct/core.py\", line 276, in _parse\n return self._decode(self.subcon._parse(stream, context), context)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/construct/adapters.py\", line 238, in _decode\n return StringAdapter._decode(self, b''.join(obj[:-1]), context)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/root/pyelftools/elftools/construct/adapters.py\", line 153, in _decode\n obj = obj.decode(self.encoding)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 4: invalid start byte\n",
"input_data": "7f454c46010101000000000000000000010028000100000000000000000000003c010000000000053400000000002800080007004174000000616561626900016a00000043342e30380040000472616e64ffffffffffffff3f6370750006010753080109010a010b010c010d010e010f0110011101120213011401150116011701180119011a011b011c011d011e011f012001676e75002201240126012a012c0141060b0042014403000000000000000000000000000000000000000000000000000000000000000300010000000000000000000000000003000200000000000000000000000000030003000000000000000000000000000300040000002e73796d746162002e737472746162002e7368737472746162002e74657874002e64617461002e627373002e41524d2e6174747269627574657300000000000000000000000000000000000000000000000000000000000000000000000000000000000000001b00000001000000060000000000000034000000000000000000000000000000010000000000000021000000010000000300000000000000340000000000000000000000000000000100000000000000270000000800000003000000000000003400000000000000000000000000000001000000000000002c00000003000070000000000000000034000000750000000000000000000000010000000000000001000000020000000000000000000000ac000000500000000600000005000000040000001000000009000000030000000000000000000000fc000000010000000000000000000000010000000000000011000000030000000000000000000000fd0000003c00000000000000000000000e00000000000000",
"crashing_file_path": "elftools/construct/adapters.py"
}Now I wonder, given that llvm-parseelf seems to parse them with no error, are you interested in further investigating these crashes? If so, how can I best provide them to you in a way that you find convenient?
I don't mean to simply drop the dirty work on you here. I've looked into the crashes but I lack thorough understanding of the ELF format to make a well-informed decision. I hope that comparing with llvm-readelf makes sense, if you'd like me to do any other comparison before storing them as a crash, let me know! Also, if you're interested in this fuzzing setup, I´d be happy to help you set it up.