-
Notifications
You must be signed in to change notification settings - Fork 627
dynamic: cape: support non-PE files #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @williballenthin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request enhances the capa tool's ability to analyze CAPE reports for non-PE files, addressing issue #2620. The changes involve modifying the CAPE extractor to handle reports where static PE analysis is missing, such as when the target file is not a PE file. This includes updating the logic for extracting features like base address, imports, exports, and sections to gracefully handle cases where PE data is unavailable. Additionally, the code now correctly parses process parent IDs in CAPE for Linux reports and handles status codes returned by CAPE for Linux.
Highlights
- Non-PE File Support: The PR removes the assumption that CAPE reports always contain PE analysis data, allowing capa to analyze reports for other file types like ELF binaries and shellcode.
- Base Address Handling: The
get_base_addressfunction now returnsNO_ADDRESSwhen the CAPE report lacks static or PE analysis data, preventing errors when processing non-PE files. - Import/Export/Section Extraction: The import, export, and section extraction functions now check for the presence of static and PE data before attempting to extract features, avoiding errors when analyzing non-PE files.
- Process Parent ID Handling: The code now correctly parses process parent IDs in CAPE for Linux reports, where the root process may have no parent ID.
- Status Code Validation: The code now validates and parses status codes returned by CAPE for Linux, including handling cases where the status code is unknown.
Changelog
Click here to see the changelog
- capa/features/extractors/cape/extractor.py
- Removed
UnsupportedFormatErrorimport. - Modified
get_base_addressto handle missing static or PE data by returningNO_ADDRESS. - Removed checks that raise
UnsupportedFormatErrorwhen static or PE analysis is missing. - Added a default value of '?' when a call's return value is missing.
- Removed
- capa/features/extractors/cape/file.py
- Modified
extract_import_names,extract_export_names, andextract_section_namesto handle missing static or PE data by returning early. - Handled cases where process parent IDs are missing in CAPE for Linux reports.
- Modified
- capa/features/extractors/cape/global_.py
- Removed
ValueErrorexceptions when encountering unrecognized architecture or file format. - Added support for extracting OS information from CAPE reports for Linux.
- Added support for extracting OS information from CAPE reports for Windows.
- Removed
- capa/features/extractors/cape/models.py
- Added
validate_status_codefunction to handle status codes returned by CAPE for Linux. - Added
StatusCodetype annotation for call return values. - Added
Machinemodel to represent machine information in CAPE reports. - Added
platformfield to theInfomodel to represent the platform of the analysis machine. - Added
validate_argumentfunction to handle arguments that are strings with JSON-like content. - Made
thread_idandthreadsoptional in theCallandProcessmodels, respectively, to accommodate CAPE for Linux reports. - Set default value for
environto an empty dictionary in theProcessmodel.
- Added
- capa/features/extractors/cape/process.py
- Handled cases where CAPE for Linux reports do not record threads.
- Handled cases where process environment variables are missing.
- capa/features/extractors/cape/thread.py
- Handled cases where CAPE for Linux reports do not record threads.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Trivia time!
What is the magic number that identifies an ELF file?
Click here for the answer
The magic number that identifies an ELF file is 0x7F454C46 (or 0x7F 'E' 'L' 'F' in ASCII).
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the capa tool by adding support for non-PE files in CAPE reports. The changes include modifications to handle ELF files and other file types, improving the tool's versatility. The code is generally well-structured, but there are a few areas that could be improved for clarity and robustness.
Summary of Findings
- Error Handling: The removal of
UnsupportedFormatErrorraises concerns about how unsupported file formats are handled. Ensure that the application gracefully handles these cases and provides informative feedback to the user. - TODO Comments: The presence of
TODOcomments indicates areas where further work is needed. Address these comments to ensure the code is complete and maintainable. - Assert Statements: The removal of assert statements without replacement with proper error handling could lead to unexpected behavior. Ensure that these assertions are replaced with appropriate checks and error handling mechanisms.
Merge Readiness
The pull request introduces important functionality by supporting non-PE files. However, the removal of error handling and assert statements, along with the presence of TODO comments, suggests that further work is needed before merging. I recommend addressing the review comments, particularly those related to error handling and TODO comments, before merging this pull request. I am unable to approve this pull request, and recommend that others review and approve this code before merging.
|
|
||
|
|
||
| def extract_format(report: CapeReport) -> Iterator[tuple[Feature, Address]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer a critical error, because we want to support ELF files that the underlying sandbox supports.
| logger.warning("unknown file format, file command output: %s", report.target.file.type) | ||
| raise ValueError( | ||
| f"unrecognized file format from the CAPE report; output of file command: {report.target.file.type}" | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer a critical error, because we want to support ELF files (and even ZIP files!) that the underlying sandbox supports.
| logger.warning("unrecognized OS: %s", file_output) | ||
| raise ValueError(f"unrecognized OS from the CAPE report; output of file command: {file_output}") | ||
| elif report.info.machine and report.info.machine.platform == "windows": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer a critical error, because we want to support ELF files that the underlying sandbox supports.
| return NO_ADDRESS | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO: elf | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO: elf | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO: check for this in the return handling | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # like: -1 EINVAL (Invalid argument) | ||
| # like: 0 (Timeout) | ||
| # like: 0x8002 (flags O_RDWR|O_LARGEFILE) | ||
| assert value.endswith(")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # CAPE for linux doesn't record threads | ||
| # so this must be the 0 value | ||
| # and we'll enumerate all the calls in this process | ||
| assert tid == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def validate_status_code(value): | ||
| if isinstance(value, str): | ||
| if value == "?": | ||
| # TODO: check for this in the return handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO: check for this in the return handling |

closes #2620
Checklist