Skip to content

Conversation

@twiggler
Copy link
Contributor

@twiggler twiggler commented Mar 28, 2025

Support for single files, take two.

Implementation details:

  • When running in single file mode (--single-file), a minimal target is created with a default OS, and a virtual file system loaded with the specified (log) files.
  • Added a get_files method (template method pattern) to the plugin super class, which depending on the target mode, delegates to _get_files in case of an ordinary target, and the _get_minimal_files in case of a minimal target. _get_files should be overridden by plugins, _get_minimal_files typically not (but see the IIS plugin 🙄). Note that use of get_files is optional.
  • A similar mechanism is added for check_compatible, because when running against a minimal target, compatibility checks should typically be disabled.

@twiggler twiggler marked this pull request as draft March 28, 2025 08:40
@twiggler twiggler force-pushed the single-file-reloaded branch from 2619f8c to 7f10bcd Compare March 28, 2025 08:40
@codecov
Copy link

codecov bot commented Mar 28, 2025

Codecov Report

Attention: Patch coverage is 90.78947% with 14 lines in your changes missing coverage. Please review.

Project coverage is 77.87%. Comparing base (bc77a4f) to head (58c3a77).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dissect/target/plugins/apps/webserver/iis.py 89.47% 8 Missing ⚠️
dissect/target/loaders/direct.py 82.60% 4 Missing ⚠️
dissect/target/plugin.py 90.90% 1 Missing ⚠️
dissect/target/plugins/os/windows/log/evt.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1082      +/-   ##
==========================================
+ Coverage   77.83%   77.87%   +0.04%     
==========================================
  Files         357      358       +1     
  Lines       32521    32558      +37     
==========================================
+ Hits        25314    25356      +42     
+ Misses       7207     7202       -5     
Flag Coverage Δ
unittests 77.87% <90.78%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twiggler twiggler changed the title WIP Single (log) files, take two Mar 28, 2025
@twiggler twiggler marked this pull request as ready for review March 28, 2025 08:57
@Schamper Schamper self-requested a review March 28, 2025 14:10
@Schamper Schamper force-pushed the single-file-reloaded branch from ec4c396 to 9c8556d Compare April 24, 2025 13:52
Copy link
Member

@Schamper Schamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've proposed some changes in 054c9c3

One big change is to treat paths as-is as passed by the user. Globs should be expanded by the shell in my opinion, and there should be a clearer distinction between directories and files. Most plugins will work directly on files, however something like a browser plugin may restrict itself to profile directories instead.

Let me know what you think.

As I was looking at this and fiddling with it a bit myself, I was thinking about how this could work for all types of plugins. Some idea sketches:

  • For user bound paths, we could just add a get_user_paths(self) -> Iterator[tuple[UserDetails, Path]] and do the same as here
  • Some plugins have complicated file requirements, e.g.:
class Foo(Plugin):
    @export
    def bar(self): ...  # Uses a.txt

    @export
    def baz(self): ...  # Uses b.txt

It's not really possible to cover these with the current proposed solution. However, #1122 may provide an answer here? Perhaps you could bind something like:

@export
def bar(self): ...
@bar.paths
def bar(self): return [self.target.fs.path("a.txt")]

That approach for plugins that are structured like this could also work for a possible detect(fh) feature as described in #789

  • This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?

@twiggler
Copy link
Contributor Author

twiggler commented Apr 25, 2025

One big change is to treat paths as-is as passed by the user. Globs should be expanded by the shell in my opinion, and there should be a clearer distinction between directories and files. Most plugins will work directly on files, however something like a browser plugin may restrict itself to profile directories instead.

Let me know what you think.

As discussed in private communication, I implemented the glob expansion to support shells which do not auto-expand, most notably the windows shell, although powershell might do expansion.

However, as pointed out we don´t do manual expansion of command line arguments anywhere else. In the long term, manual expansion, when supported, is perhaps best performed when parsing command line arguments, instead of the "business logic" layer.

yield from self.parse_autodetect_format_log(autoFile)
# We don't implement _get_paths() in the IIS plugin because there's little use for it for the way the plugin
# is currently implemented. So handle direct files here.
if self.target.is_direct:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, the variadic signature of get_files wasn't elegant, and required some rework of the logic, but files could be retrieved through a common interface. The escape hatch here feels like we went full circle :). That being said, it is hard to say which one is better. It might come down on how it would work out with other plugins which do require direct / single file support.

Copy link
Contributor Author

@twiggler twiggler Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick inventory of how this would pan out for other plugins which might benefit from single file support:

nginx

Multiple categories of files discovered by parsing of config file and scanning of directories
Requires get_files to somehow differentiate between categories of files

caddy

works

apache:

Multiple types of files discovered by parsing of config file and scanning of directories
Requires get_files to somehow differentiate between categories of files

citrix:

similar to apache

mcaffee:

works

sophos:

Multiple categories of log_files . Requires get_files to somehow differentiate between categories of files

symantec:

Multiple categories of log files. Requires get_files to somehow differentiate between categories of files

trendmicro:

Multiple categories of log files. Requires get_files to somehow differentiate between categories of files

teamviewer:

works

audit:

works

lastlog:

works

messages:

works

mssql:

works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree that while the current solution is adequate for the short-to mid term to satisfy user demands for single file capability, it is not flexible enough to accommodate other plugins which process multiple types of files.

Therefore, we have decided to make architectural changes, such as moving the parsers out of the plugin, before supporting single files in the general case. The explicit switch in the plugin on "direct-mode" is acceptable for now.

@twiggler
Copy link
Contributor Author

twiggler commented Apr 25, 2025

Let me know what you think.

As I was looking at this and fiddling with it a bit myself, I was thinking about how this could work for all types of plugins.
Some idea sketches:

Great that you bring this up. I am having doubts if this is the right path. Basically, every solution feels a bit wrong to me, although I also think they are acceptable for the short- to mid term.

We had this discussion before, but at first sight there is a tension between making it easy to write plugins, and separating concerns so that individual components are straightforward to compose.

Or perhaps this is a false dichotomy. It might be possible to offer a simple facade to help people on their way, which abstract a collection of more fine-grained api's, while more seasoned developers can program directly against those more complex api's.

Ultimately, it is much easier to compose indiviual components, then to break up an aggregate after the fact.

Now there is the constraint that we are programming in Python, and certain abstractions which might get compiled away in other languages, might result in performance problems. I don´t know.

Further, there is the more practical concern if navigating to a position where things are easier composable is worth the cost investment. I don´t know yet.

I will try to get more concrete in a later comment

@twiggler
Copy link
Contributor Author

Let me know what you think.
As I was looking at this and fiddling with it a bit myself, I was thinking about how this could work for all types of plugins.
Some idea sketches:

Great that you bring this up. I am having doubts if this is the right path. Basically, every solution feels a bit wrong to me, although I also think they are acceptable for the short- to mid term.

We had this discussion before, but at first sight there is a tension between making it easy to write plugins, and separating concerns so that individual components are straightforward to compose.

Or perhaps this is a false dichotomy. It might be possible to offer a simple facade to help people on their way, which abstract a collection of more fine-grained api's, while more seasoned developers can program directly against those more complex api's.

Ultimately, it is much easier to compose indiviual components, then to break up an aggregate after the fact.

Now there is the constraint that we are programming in Python, and certain abstractions which might get compiled away in other languages, might result in performance problems. I don´t know.

Further, there is the more practical concern if navigating to a position where things are easier composable is worth the cost investment. I don´t know yet.

I will try to get more concrete in a later comment

We are continuing this discussion in other channels

@twiggler twiggler force-pushed the single-file-reloaded branch 3 times, most recently from f417797 to ec6c33e Compare May 1, 2025 07:48
@twiggler twiggler force-pushed the single-file-reloaded branch from ec6c33e to 58c3a77 Compare May 1, 2025 09:32
@twiggler twiggler merged commit e8bad72 into main May 1, 2025
21 of 24 checks passed
@twiggler twiggler deleted the single-file-reloaded branch May 1, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants