Skip to content

Extend zip parser to be less reliant on the central directory#1460

Merged
val-ms merged 4 commits intoCisco-Talos:mainfrom
frsell1:update_zip_parser
Mar 28, 2025
Merged

Extend zip parser to be less reliant on the central directory#1460
val-ms merged 4 commits intoCisco-Talos:mainfrom
frsell1:update_zip_parser

Conversation

@frsell1
Copy link
Contributor

@frsell1 frsell1 commented Feb 27, 2025

Zip file concatenation is an evasion method used by threat actors in the wild: https://perception-point.io/blog/evasive-concatenated-zip-trojan-targets-windows-users/. ClamAV currently relies on the last central directory it can find and will therefore miss the first zip in the concatenation.

The proposed addition to the zip parser searches for local file headers between and around the files already found from the central directory header. It then adds these found files to the list of zip records found by the central directory. The idea is that scanning for a local file header is time intensive, so we should minimize the number of bytes we need to scan. At the same time, we need to scan for local file headers since we cannot trust the central directory header as a source of truth.

There was already functionality that would search for local file headers after parsing the central directory (at the end of the cli_unzip function), but this required the appended local file headers to occur consecutively.

@frsell1 frsell1 changed the title Extend zip parser to be less reliant on the Extend zip parser to be less reliant on the central directory Feb 27, 2025
@val-ms val-ms mentioned this pull request Mar 3, 2025
@val-ms
Copy link
Contributor

val-ms commented Mar 21, 2025

Fixes #1014

Copy link
Contributor

@val-ms val-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created some feature/regression tests to verify that this fixes a few scenarios and ran into 3 bugs. I've created fixes for those, and then added some opinionated code cleanup on top of that.

It's too much to add in code review comments here, so I'm linking to my branch and if you're good with those changes, you or I can pull them onto your branch.

main...val-ms:clamav:CLAM-2722-update_zip_parser-tests

@val-ms
Copy link
Contributor

val-ms commented Mar 23, 2025

I should add that I LOVE the direction you went for solving this problem, and greatly appreciate your effort to make the code match our existing style.

@frsell1
Copy link
Contributor Author

frsell1 commented Mar 25, 2025

Thanks for the review and fixes. Just merged your changes to this branch

frsell1 and others added 2 commits March 25, 2025 12:15
Add logic to search for additional file entries between those that have
been found using the central directory.
@val-ms val-ms force-pushed the update_zip_parser branch from 05886f0 to 3cde5ee Compare March 25, 2025 16:22
Fixes:

- We need to look at the local headers if no central directory headers are
  found. Restructured the main `cli_unzip()` function to allocate an empty
  zip catalogue when we can't use a central directory at all.

- In `index_local_file_headers_within_bounds()`, we must decrement the
  `coff` variable after adding the size of a file entry using
  `parse_local_file_header()`, to account for the increment when it loops
  around. If we don't, the next entry won't be at 'PK\x03\x04', it will be
  at 'K\0x03\x04'.

- Attempt to unzip when encrypted if we don't have a valid password.
  This may enable extraction for files where a header lies about encryption.

- The `fmap_need_off()` call to get the `compressed_data` pointer used the
  wrong size, checking if there was enough data for a header instead of
  for the compressed data that follows the header. I stumbled across this
  older bug when testing extraction of a zip where the file entries are
  tiny and I'd stripped off the central directory. As a result, there
  wasn't enough data for a whole file header and my test failed.

Cleanup:

- Initialize status variables as CL_ERROR and only assign to CL_SUCCESS if
  successful. This is to protect against future changes in case someone
  accidentally goes-to-done without setting the status.

- Remove legacy use of CL_CLEAN. Not a functional change.
  This mostly a stylistic preference.

- Use calloc instead of malloc + memset in a couple places.
  Make use of the new allocation macros with goto-done error handling.

- Some opinionated format changes such as shifting some longer function
  arguments all to a new line so they're no so far to the right.

- Auto-format with clang-format.
@val-ms val-ms force-pushed the update_zip_parser branch from 3cde5ee to 96c00b6 Compare March 25, 2025 16:52
@frsell1
Copy link
Contributor Author

frsell1 commented Mar 25, 2025

Fixed a bug in index_local_file_headers where the end_offset would be set to the end of the next file in the zip. This caused index_local_file_headers to extract duplicate files that were already extracted in index_the_central_directory

@frsell1 frsell1 force-pushed the update_zip_parser branch from 4b3035c to fab5065 Compare March 25, 2025 20:54
@val-ms
Copy link
Contributor

val-ms commented Mar 27, 2025

This PR is currently causing two of our internal test suite tests to fail:
image

I'll investigate

@Cisco-Talos Cisco-Talos deleted a comment from DailyInvestors Mar 27, 2025
Fix bug in end_offset calculation when searching for additional
file entries.

Fix size of memset after realloc.
@val-ms val-ms force-pushed the update_zip_parser branch from 281a38e to 1199c89 Compare March 28, 2025 13:46
@val-ms
Copy link
Contributor

val-ms commented Mar 28, 2025

Squashed the last two commits together to tidy up, and so the commits will be signed.

@val-ms val-ms merged commit 1638ce2 into Cisco-Talos:main Mar 28, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants