-
Notifications
You must be signed in to change notification settings - Fork 2
Add thumbcache parser #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
864817b
DIS-1352 Add a thumbcache idx parser code
Miauwkeru 6832a26
DIS-1352 Add testdata to git lfs
Miauwkeru 3a4aafb
Add tools to extract thumbcache
Miauwkeru 47030a9
Apply suggestions from code review
Miauwkeru b336264
Use a string instead of binary string
Miauwkeru 73c050b
Add constants instead of magic numbers
Miauwkeru b8ff8ad
Add typehints
Miauwkeru ca4fc94
Add the missed constants and typehints
Miauwkeru 45778e2
DIS-1352 Add correct readme
Miauwkeru 09a1d4a
Add fix for test_thumbcache assertion
Miauwkeru d1e15c8
Add some user information to the tools
Miauwkeru bb3fcfd
Improve error handling
Miauwkeru 4f400a4
Resolve an issue of unknown entries
Miauwkeru 6c7ab88
Add an additional relation with the INDEX_HEADER_V2
Miauwkeru c2f49fe
Made the pytest results a bit more clear
Miauwkeru cbec47a
Apply suggestions from code review
Miauwkeru e64b193
Remove data argument from __init__
Miauwkeru a5fab7d
Rename instances of file to fh
Miauwkeru e94b03f
Apply suggestions from code review
Miauwkeru def18c5
Rename cstruct variables
Miauwkeru 158c9b7
Use comments consistently in the c_thumbcache headers
Miauwkeru efc9caa
Use argparse.exit for error conditions
Miauwkeru b9a87ce
Add docs link to documentation
Miauwkeru 90608c0
Define python_requires
Miauwkeru fedb801
Put important classes in __init__.py
Miauwkeru File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| *.db filter=lfs diff=lfs merge=lfs -text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,53 @@ | ||
| # dissect.thumbcache | ||
|
|
||
| This is a project to parse windows thumbcache. | ||
| A Dissect module implementing parsers for the thumbcache of Windows systems. | ||
| This is commonly used to see which files were opened on a system. | ||
Miauwkeru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Windows Vista+ | ||
|
|
||
| The project currently only supports the Windows Vista+ indexed thumbcache. The Windows XP format is currently not implemented. | ||
|
|
||
| ## Installation | ||
|
|
||
| `dissect.thumbcache` is available on [PyPI](https://pypi.org/project/dissect.thumbcache/). | ||
|
|
||
| ```bash | ||
| pip install dissect.thumbcache | ||
| ``` | ||
|
|
||
| This module is also automatically installed if you install the `dissect` package. | ||
|
|
||
| ## Build and test instructions | ||
|
|
||
| This project uses `tox` to build source and wheel distributions. Run the following command from the root folder to build | ||
| these: | ||
|
|
||
| ```bash | ||
| tox -e build | ||
| ``` | ||
|
|
||
| The build artifacts can be found in the `dist/` directory. | ||
|
|
||
| `tox` is also used to run linting and unit tests in a self-contained environment. To run both linting and unit tests | ||
| using the default installed Python version, run: | ||
|
|
||
| ```bash | ||
| tox | ||
| ``` | ||
|
|
||
| For a more elaborate explanation on how to build and test the project, please see [the | ||
| documentation](https://docs.dissect.tools/en/latest/contributing/developing.html#building-testing). | ||
|
|
||
| ## Contributing | ||
|
|
||
| The Dissect project encourages any contribution to the codebase. To make your contribution fit into the project, please | ||
| refer to [the style guide](https://docs.dissect.tools/en/latest/contributing/style-guide.html). | ||
|
|
||
| ## Copyright and license | ||
|
|
||
| Dissect is released as open source by Fox-IT (<https://www.fox-it.com>) part of NCC Group Plc | ||
| (<https://www.nccgroup.com>). | ||
|
|
||
| Developed by the Dissect Team (<[email protected]>) and made available at <https://github.com/fox-it/dissect>. | ||
|
|
||
| License terms: AGPL3 (<https://www.gnu.org/licenses/agpl-3.0.html>). For more information, see the LICENSE file. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| from dissect.cstruct import cstruct | ||
|
|
||
| c_thumbcache_index_def = """ | ||
| struct INDEX_HEADER_V1 { | ||
Miauwkeru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| char signature[4]; | ||
| uint32 version; | ||
| uint32 unknown1; | ||
| uint32 used_entries; | ||
| uint32 total_entries; | ||
| uint32 unknown2; | ||
| }; | ||
| struct INDEX_HEADER_V2 { | ||
| char signature[8]; // 0x00 | ||
| uint64 version; // 0x08 | ||
| uint32 unknown1; // 0x10 | ||
| uint32 used_entries; // 0x14 | ||
| uint32 total_entries; // 0x18 | ||
| uint32 unknown2; // 0x1B | ||
| }; // 0x20 | ||
| struct VISTA_ENTRY { | ||
| char hash[8]; | ||
| uint64 last_modified; | ||
| uint32 flags; | ||
| }; | ||
| struct WINDOWS7_ENTRY { | ||
| char hash[8]; | ||
| uint32 flags; | ||
| }; | ||
| struct WINDOWS8_ENTRY { | ||
| char hash[8]; | ||
| uint32 flags; | ||
| uint32 unknown; // Is sometimes filled with information, couldn't figure out what it meant yet though. | ||
| }; | ||
| struct CACHE_HEADER { | ||
| char signature[4]; | ||
| uint32 version; | ||
| uint32 type; | ||
| uint32 size; | ||
| uint32 offset; | ||
| uint32 entries; | ||
| }; | ||
| struct CACHE_HEADER_VISTA { | ||
| char signature[4]; | ||
| uint32 version; | ||
| uint32 type; | ||
| uint32 offset; | ||
| uint32 size; | ||
| uint32 entries; | ||
| }; | ||
| struct CACHE_ENTRY { | ||
| char signature[4]; | ||
| uint32 size; | ||
| char hash[8]; | ||
| uint32 identifier_size; | ||
| uint32 padding_size; | ||
| uint32 data_size; | ||
| uint32 _unknown3; | ||
| }; | ||
| struct CACHE_ENTRY_VISTA { | ||
| char signature[4]; | ||
| uint32 size; | ||
| char hash[8]; | ||
| wchar extension[4]; | ||
| uint32 identifier_size; | ||
| uint32 padding_size; | ||
| uint32 data_size; | ||
| uint32 _unknown3; | ||
| }; | ||
| """ | ||
| c_thumbcache_index = cstruct() | ||
| c_thumbcache_index.load(c_thumbcache_index_def) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| class Error(Exception): | ||
| """A generic exception for the thumbcache module.""" | ||
|
|
||
| pass | ||
|
|
||
|
|
||
| class NotAnIndexFileError(Error): | ||
| """Raises if a thumbnail index signature could not be found.""" | ||
|
|
||
| pass | ||
|
|
||
|
|
||
| class InvalidSignatureError(Error): | ||
| """Raises if the signature does not match the expected value.""" | ||
|
|
||
| pass | ||
|
|
||
|
|
||
| class UnknownThumbnailTypeError(Error): | ||
| """Raises if an unknown thumbnail type was found.""" | ||
|
|
||
| pass |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| from __future__ import annotations | ||
|
|
||
| from datetime import datetime | ||
| from typing import BinaryIO, Iterator | ||
|
|
||
| from dissect.cstruct import Structure | ||
| from dissect.util import ts | ||
|
|
||
| from dissect.thumbcache.c_thumbcache import c_thumbcache_index | ||
| from dissect.thumbcache.exceptions import NotAnIndexFileError | ||
| from dissect.thumbcache.util import ThumbnailType | ||
|
|
||
| INDEX_ENTRIES = { | ||
| ThumbnailType.WINDOWS_7: 5, | ||
| ThumbnailType.WINDOWS_81: 11, | ||
| ThumbnailType.WINDOWS_10: 14, | ||
| ThumbnailType.WINDOWS_VISTA: 5, | ||
| } | ||
|
|
||
| MAX_IMM_OFFSET = 4 | ||
| BYTES_IN_NUMBER = 4 | ||
| IDENTIFIER_BYTES = 8 | ||
|
|
||
|
|
||
| class ThumbnailIndex: | ||
| _signature = b"IMMM" | ||
|
|
||
| def __init__(self, file: BinaryIO) -> None: | ||
Miauwkeru marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| self.file = file | ||
| self._header = None | ||
|
|
||
| @property | ||
| def header(self) -> Structure: | ||
| if self._header is None: | ||
| self._header = self._find_header(self.file) | ||
| return self._header | ||
|
|
||
| def _find_header(self, file: BinaryIO) -> Structure: | ||
| """Searches for the header signature, and puts ``file`` at the correct position. | ||
|
|
||
| From Windows 8.1 onward, the two fields seem to use a 64-bit format field | ||
| inside the header with the value ``0C 00 30 20``. | ||
|
|
||
| Args: | ||
| file: The file to read the header and indexes from. | ||
|
|
||
| Returns: | ||
| A c_thumbcache_index.INDEX_HEADER structure. | ||
|
|
||
| Raises: | ||
| NotAThumbnailIndexFileError: If the ``IMMM`` signature could not be found. | ||
| """ | ||
| position = file.tell() | ||
| buffer = file.read(len(c_thumbcache_index.INDEX_HEADER_V1)) | ||
| offset = buffer.find(self._signature) | ||
|
|
||
| if offset == MAX_IMM_OFFSET: | ||
| file.seek(position) | ||
|
|
||
| header = c_thumbcache_index.INDEX_HEADER_V2(file) | ||
| # From looking at the index files, it has a specific amount of information. | ||
| # It is alligned in the follwing way: | ||
| # INDEX_HEADER_V2 | ||
| # length of an INDEX_ENTRY of a specific type including the cache_offsets | ||
| # After that point, there are only zero bytes, which seem to have the following relation: | ||
| # length of the same INDEX_ENTRY - length of the length of INDEX_HEADER_V2 | ||
| # | ||
| # TODO: see if it the data contains any useful information. | ||
|
|
||
| # Read one index entry from the file till only zero bytes | ||
| entry = IndexEntry(file, header.version) | ||
| entry.header | ||
| entry.cache_offsets | ||
|
|
||
| # Read offset to first entry | ||
| zero_bytes = len(entry.header) + INDEX_ENTRIES.get(header.version) * BYTES_IN_NUMBER - len(header) | ||
| file.read(zero_bytes) | ||
| return header | ||
| elif offset == 0: | ||
| return c_thumbcache_index.INDEX_HEADER_V1(buffer) | ||
| else: | ||
| raise NotAnIndexFileError( | ||
| f"The index file signature {self._signature!r} could not be found at the expected location." | ||
| ) | ||
|
|
||
| @property | ||
| def version(self) -> int: | ||
| return self.header.version | ||
|
|
||
| @property | ||
| def type(self) -> ThumbnailType: | ||
| return ThumbnailType(self.version) | ||
|
|
||
| @property | ||
| def total_entries(self) -> int: | ||
| return self.header.total_entries | ||
|
|
||
| @property | ||
| def used_entries(self) -> int: | ||
| return self.header.used_entries | ||
|
|
||
| def entries(self) -> Iterator[IndexEntry]: | ||
| """Returns all index entries that are actually used.""" | ||
| for _ in range(self.total_entries): | ||
| entry = IndexEntry(self.file, self.type) | ||
| entry.header | ||
| entry.cache_offsets | ||
|
|
||
| if entry.in_use(): | ||
| yield entry | ||
|
|
||
|
|
||
| class IndexEntry: | ||
| def __init__(self, file: BinaryIO, type: ThumbnailType, data=[]) -> None: | ||
Miauwkeru marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| self.file = file | ||
| self.type = type | ||
| self._header = None | ||
| self._data = None | ||
|
|
||
| @property | ||
| def header(self) -> Structure: | ||
| if not self._header: | ||
| self._header = self._select_header() | ||
| return self._header | ||
|
|
||
| def _select_header(self) -> Structure: | ||
| """Selects header version according to the thumbnailtype.""" | ||
| if self.type == ThumbnailType.WINDOWS_VISTA: | ||
| return c_thumbcache_index.VISTA_ENTRY(self.file) | ||
| elif self.type == ThumbnailType.WINDOWS_7: | ||
| return c_thumbcache_index.WINDOWS7_ENTRY(self.file) | ||
| else: | ||
| return c_thumbcache_index.WINDOWS8_ENTRY(self.file) | ||
|
|
||
| def in_use(self) -> bool: | ||
| return self.identifier != b"\x00" * IDENTIFIER_BYTES | ||
|
|
||
| @property | ||
| def identifier(self) -> bytes: | ||
| return self.header.hash | ||
|
|
||
| @property | ||
| def flags(self) -> int: | ||
| return self.header.flags | ||
|
|
||
| @property | ||
| def cache_offsets(self) -> list[int]: | ||
| """Retrieves the index data entries. | ||
|
|
||
| These are offsets into the thumbcache files, where the order specifies in which of the files. | ||
| More information about the order can be found in :class:`Thumbcache`. | ||
|
|
||
| """ | ||
| if not self._data: | ||
| size = INDEX_ENTRIES.get(self.type) | ||
| self._data = c_thumbcache_index.uint32[size](self.file) | ||
| if self.type > ThumbnailType.WINDOWS_7: | ||
| # Alignment step | ||
| self.file.read((size % 2) * BYTES_IN_NUMBER) | ||
| return self._data | ||
|
|
||
| @property | ||
| def last_modified(self) -> datetime: | ||
| if self.type == ThumbnailType.WINDOWS_VISTA: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just out of curiosity: Only Vista contains a "last modified" entry in the header?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it was only contained in its initial release on windows vista |
||
| return ts.wintimestamp(self._header.last_modified) | ||
| return None | ||
|
|
||
| def __repr__(self) -> str: | ||
| return ( | ||
| f"identifier={self.identifier.hex()} flags={hex(self.flags)} " | ||
| f"cache_offsets={[hex(x) for x in self.cache_offsets]}" | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| from pathlib import Path | ||
| from typing import Iterator | ||
|
|
||
| from dissect.thumbcache.index import IndexEntry, ThumbnailIndex | ||
| from dissect.thumbcache.thumbcache_file import ThumbcacheEntry, ThumbcacheFile | ||
|
|
||
|
|
||
| class Thumbcache: | ||
| """This class combines the thumbnailindex and thumbcachefile. | ||
|
|
||
| The class looks up all files inside ``path`` that have the same ``prefix``. | ||
|
|
||
| Args: | ||
| path: The directory that contains the thumbcache files. | ||
| prefix: The start of the name to search for. | ||
| """ | ||
|
|
||
| def __init__(self, path: Path, prefix: str = "thumbcache") -> None: | ||
| self._mapping: dict[str, Path] = {} | ||
| self.index_file, self.cache_files = self._populate_files(path, prefix) | ||
|
|
||
| def _populate_files(self, path: Path, prefix: str) -> tuple[Path, list[Path]]: | ||
| cache_files = [] | ||
| index_file = None | ||
| for file in path.glob(f"{prefix}*"): | ||
| if file.name.endswith("_idx.db"): | ||
| index_file = file | ||
| else: | ||
| cache_files.append(file) | ||
| return index_file, cache_files | ||
|
|
||
| @property | ||
| def mapping(self) -> dict[int, Path]: | ||
| """Looks at the version field in the cache file header.""" | ||
| if not self._mapping: | ||
| for file in self.cache_files: | ||
| with file.open("rb") as cache_file: | ||
| t_file = ThumbcacheFile(cache_file) | ||
| key = t_file.type | ||
| self._mapping.update({key: file}) | ||
| return self._mapping | ||
|
|
||
| def entries(self) -> Iterator[tuple[Path, ThumbcacheEntry]]: | ||
| """Iterates through all the specific entries from the thumbcache files.""" | ||
| used_entries = list(self.index_entries()) | ||
|
|
||
| for entry in used_entries: | ||
| yield from self._entries_from_offsets(entry.cache_offsets) | ||
|
|
||
| def index_entries(self) -> Iterator[IndexEntry]: | ||
| """Iterates through all the index entries that are in use.""" | ||
| with self.index_file.open("rb") as i_file: | ||
| for entry in ThumbnailIndex(i_file).entries(): | ||
| yield entry | ||
|
|
||
| def _entries_from_offsets(self, offsets: list[int]) -> Iterator[tuple[Path, ThumbcacheEntry]]: | ||
| """Retrieves Thumbcache entries from a ThumbcacheFile using offsets.""" | ||
| for idx, offset in enumerate(offsets): | ||
| if offset == 0xFFFFFFFF: | ||
| continue | ||
|
|
||
| cache_path = self.mapping.get(idx) | ||
|
|
||
| with cache_path.open("rb") as cache_file: | ||
| yield cache_path, ThumbcacheFile(cache_file)[offset] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the default template for the readme. See dissect.target for example (but you know ;))