dynamic: cape: support non-PE files #2636

williballenthin · 2025-03-19T16:10:03Z

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

gemini-code-assist

Hello @williballenthin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request enhances the capa tool's ability to analyze CAPE reports for non-PE files, addressing issue #2620. The changes involve modifying the CAPE extractor to handle reports where static PE analysis is missing, such as when the target file is not a PE file. This includes updating the logic for extracting features like base address, imports, exports, and sections to gracefully handle cases where PE data is unavailable. Additionally, the code now correctly parses process parent IDs in CAPE for Linux reports and handles status codes returned by CAPE for Linux.

Highlights

Non-PE File Support: The PR removes the assumption that CAPE reports always contain PE analysis data, allowing capa to analyze reports for other file types like ELF binaries and shellcode.
Base Address Handling: The get_base_address function now returns NO_ADDRESS when the CAPE report lacks static or PE analysis data, preventing errors when processing non-PE files.
Import/Export/Section Extraction: The import, export, and section extraction functions now check for the presence of static and PE data before attempting to extract features, avoiding errors when analyzing non-PE files.
Process Parent ID Handling: The code now correctly parses process parent IDs in CAPE for Linux reports, where the root process may have no parent ID.
Status Code Validation: The code now validates and parses status codes returned by CAPE for Linux, including handling cases where the status code is unknown.

Changelog

Click here to see the changelog

capa/features/extractors/cape/extractor.py
- Removed UnsupportedFormatError import.
- Modified get_base_address to handle missing static or PE data by returning NO_ADDRESS.
- Removed checks that raise UnsupportedFormatError when static or PE analysis is missing.
- Added a default value of '?' when a call's return value is missing.
capa/features/extractors/cape/file.py
- Modified extract_import_names, extract_export_names, and extract_section_names to handle missing static or PE data by returning early.
- Handled cases where process parent IDs are missing in CAPE for Linux reports.
capa/features/extractors/cape/global_.py
- Removed ValueError exceptions when encountering unrecognized architecture or file format.
- Added support for extracting OS information from CAPE reports for Linux.
- Added support for extracting OS information from CAPE reports for Windows.
capa/features/extractors/cape/models.py
- Added validate_status_code function to handle status codes returned by CAPE for Linux.
- Added StatusCode type annotation for call return values.
- Added Machine model to represent machine information in CAPE reports.
- Added platform field to the Info model to represent the platform of the analysis machine.
- Added validate_argument function to handle arguments that are strings with JSON-like content.
- Made thread_id and threads optional in the Call and Process models, respectively, to accommodate CAPE for Linux reports.
- Set default value for environ to an empty dictionary in the Process model.
capa/features/extractors/cape/process.py
- Handled cases where CAPE for Linux reports do not record threads.
- Handled cases where process environment variables are missing.
capa/features/extractors/cape/thread.py
- Handled cases where CAPE for Linux reports do not record threads.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Trivia time!

What is the magic number that identifies an ELF file?

Click here for the answer

The magic number that identifies an ELF file is 0x7F454C46 (or 0x7F 'E' 'L' 'F' in ASCII).

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

williballenthin · 2025-03-19T16:10:19Z

here's the output from CAPE for Linux:

gemini-code-assist

Code Review

This pull request enhances the capa tool by adding support for non-PE files in CAPE reports. The changes include modifications to handle ELF files and other file types, improving the tool's versatility. The code is generally well-structured, but there are a few areas that could be improved for clarity and robustness.

Summary of Findings

Error Handling: The removal of UnsupportedFormatError raises concerns about how unsupported file formats are handled. Ensure that the application gracefully handles these cases and provides informative feedback to the user.
TODO Comments: The presence of TODO comments indicates areas where further work is needed. Address these comments to ensure the code is complete and maintainable.
Assert Statements: The removal of assert statements without replacement with proper error handling could lead to unexpected behavior. Ensure that these assertions are replaced with appropriate checks and error handling mechanisms.

Merge Readiness

The pull request introduces important functionality by supporting non-PE files. However, the removal of error handling and assert statements, along with the presence of TODO comments, suggests that further work is needed before merging. I recommend addressing the review comments, particularly those related to error handling and TODO comments, before merging this pull request. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

capa/features/extractors/cape/extractor.py

gemini-code-assist · 2025-03-19T16:10:30Z

capa/features/extractors/cape/global_.py



 def extract_format(report: CapeReport) -> Iterator[tuple[Feature, Address]]:


The ValueError exception is removed here. How will the program handle unrecognized architectures now? Is there alternative error handling in place?

this is no longer a critical error, because we want to support ELF files that the underlying sandbox supports.

gemini-code-assist · 2025-03-19T16:10:30Z

capa/features/extractors/cape/global_.py

        logger.warning("unknown file format, file command output: %s", report.target.file.type)
-        raise ValueError(
-            f"unrecognized file format from the CAPE report; output of file command: {report.target.file.type}"
-        )




The ValueError exception is removed here. How will the program handle unknown file formats now? Is there alternative error handling in place?

this is no longer a critical error, because we want to support ELF files (and even ZIP files!) that the underlying sandbox supports.

gemini-code-assist · 2025-03-19T16:10:30Z

capa/features/extractors/cape/global_.py

            logger.warning("unrecognized OS: %s", file_output)
-            raise ValueError(f"unrecognized OS from the CAPE report; output of file command: {file_output}")
+    elif report.info.machine and report.info.machine.platform == "windows":


The ValueError exception is removed here. How will the program handle unrecognized OS now? Is there alternative error handling in place?

this is no longer a critical error, because we want to support ELF files that the underlying sandbox supports.

gemini-code-assist · 2025-03-19T16:10:31Z

capa/features/extractors/cape/extractor.py

+            return NO_ADDRESS
+


Consider logging a warning message here to indicate that the report is missing static information, which might affect analysis results.

Suggested change

return NO_ADDRESS

logger.warning("CAPE report missing static analysis")

return NO_ADDRESS

gemini-code-assist · 2025-03-19T16:10:31Z

capa/features/extractors/cape/file.py

+        # TODO: elf
+        return


There is a TODO here to handle ELF files. It would be beneficial to provide more details about the expected implementation or link to a specific issue tracking this work.

gemini-code-assist · 2025-03-19T16:10:31Z

capa/features/extractors/cape/file.py

+        # TODO: elf
+        return


There is a TODO here to handle ELF files. It would be beneficial to provide more details about the expected implementation or link to a specific issue tracking this work.

gemini-code-assist · 2025-03-19T16:10:31Z

capa/features/extractors/cape/models.py

+            # TODO: check for this in the return handling
+            return None


There is a TODO here to check for this in the return handling. It would be beneficial to provide more details about the expected implementation or link to a specific issue tracking this work.

gemini-code-assist · 2025-03-19T16:10:31Z

capa/features/extractors/cape/models.py

+        # like: -1 EINVAL (Invalid argument)
+        # like: 0 (Timeout)
+        # like: 0x8002 (flags O_RDWR|O_LARGEFILE)
+        assert value.endswith(")")


This assert statement checks if the value ends with ')'. Consider adding a more descriptive error message or handling the case where the value does not end with ')' to prevent unexpected behavior.

gemini-code-assist · 2025-03-19T16:10:32Z

capa/features/extractors/cape/thread.py

+            # CAPE for linux doesn't record threads
+            # so this must be the 0 value
+            # and we'll enumerate all the calls in this process
+            assert tid == 0


This assert statement checks if tid == 0. Consider adding a more descriptive error message or handling the case where tid != 0 to prevent unexpected behavior.

williballenthin · 2025-03-19T16:15:04Z

capa/features/extractors/cape/models.py

+def validate_status_code(value):
+    if isinstance(value, str):
+        if value == "?":
+            # TODO: check for this in the return handling


Suggested change

# TODO: check for this in the return handling

williballenthin added 8 commits March 19, 2025 15:15

cape: detect OS via info.machine.platform

3acc0fe

cape: don't require pe baseaddress

02405e2

cape: relax requirement for PE input files

c057a3b

cape: linux: support no parent pid

7d28cf8

cape: linux: handle no thread ids

4008775

cape: linux: handle no environ

664a6d8

cape: linux: handle status code return types

81419db

cape: linux: handle weird argument edge case type

c8131bd

williballenthin added enhancement New feature or request dynamic related to dynamic analysis flavor cape related to CAPE sandbox report analysis labels Mar 19, 2025

gemini-code-assist bot reviewed Mar 19, 2025

View reviewed changes

github-actions bot requested changes Mar 19, 2025

View reviewed changes

gemini-code-assist bot suggested changes Mar 19, 2025

View reviewed changes

williballenthin commented Mar 19, 2025

View reviewed changes



		def extract_format(report: CapeReport) -> Iterator[tuple[Feature, Address]]:

	return NO_ADDRESS
	logger.warning("CAPE report missing static analysis")
	return NO_ADDRESS

dynamic: cape: support non-PE files #2636

Are you sure you want to change the base?

dynamic: cape: support non-PE files #2636

Uh oh!

Conversation

williballenthin commented Mar 19, 2025

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

williballenthin commented Mar 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

williballenthin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

williballenthin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

williballenthin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

williballenthin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants