Skip to content

Conversation

@sud0woodo
Copy link

First version for the dissect PE format parser. Currently the parser supports standard PE files, as well as some patching of executables and building an executable from scratch. It is expected there will be some issues that will be found during usage and it will need quite some refactoring. The API as it currently exists might need some revising too :)

@codecov
Copy link

codecov bot commented Mar 9, 2024

Codecov Report

Attention: Patch coverage is 87.37864% with 143 lines in your changes are missing coverage. Please review.

Project coverage is 88.60%. Comparing base (4f08029) to head (137d17b).

Files Patch % Lines
dissect/executable/pe/helpers/resources.py 73.65% 44 Missing ⚠️
dissect/executable/pe/pe.py 82.88% 38 Missing ⚠️
dissect/executable/pe/helpers/sections.py 85.08% 17 Missing ⚠️
dissect/executable/pe/helpers/tls.py 68.62% 16 Missing ⚠️
dissect/executable/pe/helpers/imports.py 94.02% 8 Missing ⚠️
dissect/executable/pe/helpers/patcher.py 95.69% 8 Missing ⚠️
dissect/executable/pe/helpers/exports.py 87.23% 6 Missing ⚠️
dissect/executable/pe/helpers/builder.py 98.03% 3 Missing ⚠️
dissect/executable/pe/helpers/relocations.py 89.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            main      #10       +/-   ##
==========================================
+ Coverage   0.00%   88.60%   +88.60%     
==========================================
  Files          5       17       +12     
  Lines        306     1439     +1133     
==========================================
+ Hits           0     1275     +1275     
+ Misses       306      164      -142     
Flag Coverage Δ
unittests 88.60% <87.37%> (+88.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Miauwkeru Miauwkeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done an initial review of the code, mostly just commented on some typehint weirdness and the like. I'll go deeper into it at a later time.

self.patched_pe = BytesIO()
self.functions = []

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being a property/attribute doesn't make much sense. I wouldn't expect assigning the attribute to some other variable would build the whole patched binary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just a design preference, happy to discuss other options for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on what you expect of a property, most properies just return a value of some internal state. This goes a bit beyond that IMHO, where it produces the whole binary. Feels like it should be the responsibility of a method instead.

Comment on lines 85 to 94
self.raw_resources.append(
{
"offset": entry.OffsetToDirectory,
"entry": data_entry,
"data": data,
"data_offset": raw_offset,
"resource": rsrc,
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an additional raw_resources list, why not add these offset and information to Resource itself? Where you can have the same functionality as a dict if you add def raw() to Resource

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kept for patching resources, adding them as a dict like proposed would need some more extensive refactoring of the code. We might opt for that indeed as it would be cleaner

Comment on lines 376 to 409
for rsrc_entry in sorted(self.pe.raw_resources, key=lambda rsrc: rsrc["data_offset"]):
entry_offset = rsrc_entry["offset"]
entry = rsrc_entry["entry"]

if entry._type.name == "IMAGE_RESOURCE_DATA_ENTRY":
rsrc_obj = rsrc_entry["resource"]
data_offset = rsrc_entry["data_offset"]

# Normally the data is separated by a null byte, increment the new offset by 1
new_data_offset = prev_offset + prev_size
# if new_data_offset and (new_data_offset > data_offset or new_data_offset < data_offset):
if new_data_offset and new_data_offset != data_offset:
data_offset = new_data_offset
rsrc_entry["data_offset"] = data_offset
rsrc_obj.offset = self.section.virtual_address + data_offset

data = rsrc_obj.data

# Write the resource entry data into the section
section_data.seek(data_offset)
section_data.write(data)

# Take note of the offset and size so we can update any of these values after changing the data within
# the resource
prev_offset = data_offset
prev_size = rsrc_obj.size

# Write the resource entry into the section
section_data.seek(entry_offset)
section_data.write(entry.dumps())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels out of place in the Resource class. I wouldn't imagine that everything gets immediately changed when assigning data to a resource. I would expect either the manager to take care of that (which holds the raw resources) or a seperate utility that does so during patching.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to move those things around if it doesn't make sense in the place where it is at right now!

from dissect.executable.pe.pe import PE


class PESection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a random thought, in line with the other managers and reduce a bit of the complexity of this class.
Is a SectionManager an idea that does the patching and keeping account of the patched and unpatched sections?

aka, something like:

class PESectionManager(Manager):
    def __init__(self, pe, section, ...):
       self.patched_items = {}

    def patch(self, name, data):
       # get data from self.items
       # Copy data
       # insert it in self.patched_items

    def __getitem__(self, name: str):
        if section := self.patched_items.get(name):
            return section
        else:
            return super().__getitem__(name)

    def parse():
        ... # Do something

this is just a rough draft of such a manager class

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like the idea, initially that is kind of what I had in mind as well but I lacked (still do) some understanding of the language to do it in a nice manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want me to look into that a bit?

Comment on lines 186 to 202
for name, section in self.pe.patched_sections.items():
if section.virtual_address == prev_va:
continue

pointer_to_raw_data = utils.align_int(integer=prev_ptr + prev_size, blocksize=self.pe.file_alignment)
virtual_address = utils.align_int(integer=prev_va + prev_vsize, blocksize=self.pe.section_alignment)

if section.virtual_address < virtual_address:
"""Set the virtual address and raw pointer of the section to the new values, but only do so if the
section virtual address is lower than the previous section. We want to prevent messing up RVA's as
much as possible, this could lead to binaries that are a bit larger than they need to be but that
doesn't really matter."""
self.pe.patched_sections[name].virtual_address = virtual_address
self.pe.patched_sections[name].pointer_to_raw_data = pointer_to_raw_data

prev_ptr = pointer_to_raw_data
prev_size = section.size_of_raw_data
prev_va = virtual_address
prev_vsize = section.virtual_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the Resource class, it doesn't feel like this class should be responsible for patching all the other sections.

@Miauwkeru
Copy link
Contributor

to help with a lot of these style issues, you could check out the new linting/formatting we plan to use. It will probably also warn you about stuff I missed: fox-it/dissect.util#52

Copy link
Contributor

@Miauwkeru Miauwkeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered your comments a bit, if you need help with any of the changes I requested do not hesitate to ask!

sud0woodo and others added 5 commits February 17, 2025 08:57
First version for the dissect PE format parser. Currently the parser
supports standard PE files, as well as some patching of executables and
building an executable from scratch. It is expected there will be some
issues that will be found during usage and it will need quite some
refactoring. The API as it currently exists might need some revising too
:)
* Refactored based on comments given
* Ran `ruff` for formatting
* Applied style and formatting on `dissect.executable.elf` as well (hope you don't mind)
@Miauwkeru Miauwkeru requested a review from Schamper April 9, 2025 09:27
@Miauwkeru
Copy link
Contributor

The only gripe I still have is that some data is still getting overwritten. But i couldn't currently think of a better structure for that part.

@Schamper Schamper mentioned this pull request Jul 14, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial commit for Dissect PE format parser PR #10

2 participants