-
Notifications
You must be signed in to change notification settings - Fork 626
Add Superblock Scope #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Superblock Scope #2665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @yelhamer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello everyone, gemini-code-assist here to provide a summary of this pull request. This PR introduces the superblock scope to capa, addressing the suggestion in issue #2657. The goal is to enable the identification of capabilities that manifest across a contiguous sequence of basic blocks within a function, often referred to as a superblock. This allows for more granular and potentially more accurate rule matching compared to just function or basic block scopes, by considering the control flow structure.
Highlights
- New Superblock Scope: Introduces a new
SUPERBLOCKscope incapa.rulesto represent contiguous sequences of basic blocks within a function. Rules can now be defined and matched at this scope. - Superblock Matching Logic: Adds a
SuperblockMatcherclass incapa.capabilities.static.pyresponsible for building a basic block flow graph for a function, collecting features from all basic blocks, performing initial rule matching at the superblock scope, and then pruning matches to ensure they only span contiguous basic blocks (superblocks) of length 2 or more. - Superblock Address Type: Defines a new
SuperblockAddressclass incapa.features.address.pyto represent the location of a matched superblock, storing a list of the addresses of the basic blocks it contains. - Extractor Updates: Adds abstract methods
get_next_basic_blocksandget_basic_block_sizeto theStaticFeatureExtractorbase class, and implements them for the Binja, dnfile, and Vivisect extractors. This provides the necessary control flow and size information for theSuperblockMatcher. - Integration into Analysis and Rendering: Integrates the superblock matching process into the static analysis pipeline (
find_code_capabilities,find_static_capabilities), updates the result document structure (capa.render.result_document.py) and freezing logic (capa.features.freeze/__init__.py) to include and handle superblock matches and addresses, and updates the verbose renderer (capa.render.vverbose.py,capa.render.verbose.py) to display superblock locations. - Rule Parsing and Validation: Updates the rule parsing logic (
capa.rules.__init__.py) to support thesuperblocksubscope within function scope rules and updates feature inheritance to reflect the new scope hierarchy (instruction -> basic block -> superblock -> function).
Changelog
Click here to see the changelog
- capa/capabilities/static.py
- Added
superblock_matchestoCodeCapabilities. - Introduced
FlowGraphNodeandSuperblockMatcherclasses for superblock identification and matching. - Modified
find_code_capabilitiesto useSuperblockMatcherand include superblock results. - Modified
find_static_capabilitiesto aggregate and process superblock matches.
- Added
- capa/features/address.py
- Added
SuperblockAddressclass to represent a sequence of basic block addresses.
- Added
- capa/features/extractors/base_extractor.py
- Added abstract methods
get_next_basic_blocksandget_basic_block_size.
- Added abstract methods
- capa/features/extractors/binja/extractor.py
- Implemented
get_next_basic_blocksandget_basic_block_sizefor the Binja extractor.
- Implemented
- capa/features/extractors/dnfile/extractor.py
- Implemented
get_next_basic_blocksandget_basic_block_sizefor the dnfile extractor.
- Implemented
- capa/features/extractors/viv/extractor.py
- Imported
envi. - Implemented
get_next_basic_blocksandget_basic_block_sizefor the Vivisect extractor.
- Imported
- capa/features/freeze/init.py
- Added
SUPERBLOCKtoAddressTypeenum. - Updated
Addressmodel to handleSuperblockAddressvalues. - Added
from_capaandto_capalogic forSuperblockAddress.
- Added
- capa/loader.py
- Added logic to track and include matched superblocks in the static layout computation.
- Updated
FunctionLayoutstructure to includematched_superblocks.
- capa/render/result_document.py
- Added
SuperblockLayoutmodel. - Added
matched_superblocksfield toFunctionLayout.
- Added
- capa/render/verbose.py
- Added formatting for
SuperblockAddress.
- Added formatting for
- capa/render/vverbose.py
- Modified rendering functions (
render_statement,render_node,render_match) to handle theis_superblockcontext for potentially rendering basic block locations within a superblock. - Added a debug log and relaxed a check in
render_rulesfor file scope matches (potentially unrelated).
- Modified rendering functions (
- capa/rules/init.py
- Added
SUPERBLOCKto theScopeenum. - Included
SUPERBLOCKinSTATIC_SCOPESandSTATIC_SCOPE_ORDER. - Updated
SUPPORTED_FEATURESinheritance: basic block features are available in superblock scope, and superblock features are available in function scope. - Added logic to
build_statementsto support thesuperblocksubscope within function rules. - Updated the error message for
instructionsubscope to includesuperblockas a valid parent. - Added
SUPERBLOCKtoSTATIC_SCOPES_BY_DEPTH. - Added
superblock_rulesproperty toRuleSet.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Basic blocks in a line,
A superblock, quite fine.
Features matched with care,
What malware is there?
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the superblock scope, which is a significant new feature for capa. The overall structure and integration across different modules seem well-thought-out.
However, there are a few critical concerns:
- Testing: The PR description states "No new tests needed". For a feature of this magnitude, comprehensive unit tests for
SuperblockMatcher,SuperblockAddress, and integration tests for rules utilizing the newsuperblockscope are essential to ensure correctness and prevent regressions. This is a critical omission. - Documentation: Similarly, "No documentation update needed" is indicated. The new
superblockscope, its behavior, and how to write rules for it must be documented for users and rule developers.
Addressing these points, along with the specific code comments below, will greatly improve the quality and reliability of this new feature. The introduction of SuperblockMatcher and the associated logic in _prune is quite complex and warrants careful review and thorough testing.
Summary of Findings
- Missing Tests: The pull request lacks tests for the new superblock scope functionality. Unit tests for
SuperblockMatcherandSuperblockAddress, as well as integration tests for rules using this scope, are crucial. - Missing Documentation: The new superblock scope is not documented. Users and rule developers will need guidance on how to use this feature and write rules for it.
- Potential Runtime Error in Superblock Pruning: In
capa/capabilities/static.py, theget_superblock_headfunction uses an undefined variablelocations, which will likely cause aNameError. - Incorrect Handling of File-Scope Rule Non-Matches in Renderer: In
capa/render/vverbose.py, the logic for rendering file-scope rules appears to crash if a rule doesn't match, due to anIndexErrorwhen accessingmatches[0]. - Limited Superblock Support for .NET (dnfile): The
dnfileextractor stubsget_next_basic_blocks, meaning superblock analysis will not work for .NET files using this extractor. This limitation should be clarified. - Code Complexity: The
_prunemethod inSuperblockMatcheris complex and could benefit from simplification or more detailed comments for better maintainability.
Merge Readiness
This pull request introduces a valuable new feature. However, due to the critical issues identified (missing tests, missing documentation, and potential runtime errors), I recommend that these changes not be merged until these issues are thoroughly addressed. The complexity of the new matching logic also warrants careful testing. I am unable to approve this pull request; please ensure these concerns are resolved and further reviews are conducted before merging.
This PR is a suggested implementation for the superblock scope suggested in #2657 .
Checklist