Add Frida log to capa analysis workflow #2

xukunzh · 2025-06-04T22:14:11Z

Files Added

scripts/frida/java_monitor.js - Frida hook script for Java API monitoring
scripts/frida/log_converter.py - Converts Frida logs to capa JSON format
scripts/frida/README.md - Documents the complete workflow process

Files Changed

features/extractors/frida/extractor.py - FridaExtractor class
features/extractors/frida/models.py - Data models

java_monitor.js

Currently implements basic Java file operation monitoring as proof of concept
Q: Should I go ahead and implement the selective API list idea from the proposal? This would automatically extract APIs from test_rules to build the monitoring list. And then auto build the monitoring script?
Q: How many APIs does capa sandbox integrations typically monitor? I think hooking a few hundred Java APIs with Frida is probably the upper limit, am I right？

log_converter.py

Converts raw Frida logs to capa-compatible JSON format
My current JSON fields (api, arguments, thread_id, timestamp, caller, return_value)

extractor.py & models.py

Only updated to use from_json_file() method instead of from_frida_log()
Now follows capa's standard model: JSON file → capa engine → feature extraction

larchchen · 2025-06-05T08:05:49Z

capa/features/extractors/frida/extractor.py

-        yield from self.global_features
+        """Basic global features"""
+        yield OS("android"), NO_ADDRESS
+        yield Arch("aarch64"), NO_ADDRESS 


No gurantee to be aarch64 in anyway. Better to remove.

capa/features/extractors/frida/extractor.py

scripts/frida/test_rules/file_operation.yml

capa/features/extractors/frida/models.py

mike-hunhoff

Great first commit, thank you! See my comments

scripts/frida/log_converter.py

scripts/frida/java_monitor.js

scripts/frida/log_converter.py

mike-hunhoff · 2025-06-05T16:28:54Z

capa/features/extractors/frida/extractor.py

+    """
+    def __init__(self, report: FridaReport):
+        super().__init__(
+            hashes=SampleHashes(md5="", sha1="", sha256="")


Is it possible for Frida to log these? If not, we may need to require users to provide both the Frida-generated log file and original file to capa, like we do with other extractors e.g. BinExport, VMRay, etc..

From what I’ve found, Frida cannot access original APK file to compute hashes at runtime. Marked this as a TODO. Will revisit it. :)

I think it would be possible for Frida to access the local storage and find the APK, then compute the digest.

But I agree it is not very important at this stage, you can revisit this part later. For the time being, maybe give the digests via command line?

capa/features/extractors/frida/extractor.py

scripts/frida/README.md

xukunzh added 2 commits May 29, 2025 15:05

add basic Android dynamic extractor framework

cc06df8

Add Frida log to capa analysis workflow

5415459