Extend zip parser to be less reliant on the central directory by frsell1 · Pull Request #1460 · Cisco-Talos/clamav

frsell1 · 2025-02-27T02:55:56Z

Zip file concatenation is an evasion method used by threat actors in the wild: https://perception-point.io/blog/evasive-concatenated-zip-trojan-targets-windows-users/. ClamAV currently relies on the last central directory it can find and will therefore miss the first zip in the concatenation.

The proposed addition to the zip parser searches for local file headers between and around the files already found from the central directory header. It then adds these found files to the list of zip records found by the central directory. The idea is that scanning for a local file header is time intensive, so we should minimize the number of bytes we need to scan. At the same time, we need to scan for local file headers since we cannot trust the central directory header as a source of truth.

There was already functionality that would search for local file headers after parsing the central directory (at the end of the cli_unzip function), but this required the appended local file headers to occur consecutively.

val-ms · 2025-03-21T15:31:11Z

Fixes #1014

val-ms

I created some feature/regression tests to verify that this fixes a few scenarios and ran into 3 bugs. I've created fixes for those, and then added some opinionated code cleanup on top of that.

It's too much to add in code review comments here, so I'm linking to my branch and if you're good with those changes, you or I can pull them onto your branch.

main...val-ms:clamav:CLAM-2722-update_zip_parser-tests

libclamav/unzip.c

val-ms · 2025-03-23T03:12:07Z

I should add that I LOVE the direction you went for solving this problem, and greatly appreciate your effort to make the code match our existing style.

frsell1 · 2025-03-25T15:46:20Z

Thanks for the review and fixes. Just merged your changes to this branch

Add logic to search for additional file entries between those that have been found using the central directory.

Fixes: - We need to look at the local headers if no central directory headers are found. Restructured the main `cli_unzip()` function to allocate an empty zip catalogue when we can't use a central directory at all. - In `index_local_file_headers_within_bounds()`, we must decrement the `coff` variable after adding the size of a file entry using `parse_local_file_header()`, to account for the increment when it loops around. If we don't, the next entry won't be at 'PK\x03\x04', it will be at 'K\0x03\x04'. - Attempt to unzip when encrypted if we don't have a valid password. This may enable extraction for files where a header lies about encryption. - The `fmap_need_off()` call to get the `compressed_data` pointer used the wrong size, checking if there was enough data for a header instead of for the compressed data that follows the header. I stumbled across this older bug when testing extraction of a zip where the file entries are tiny and I'd stripped off the central directory. As a result, there wasn't enough data for a whole file header and my test failed. Cleanup: - Initialize status variables as CL_ERROR and only assign to CL_SUCCESS if successful. This is to protect against future changes in case someone accidentally goes-to-done without setting the status. - Remove legacy use of CL_CLEAN. Not a functional change. This mostly a stylistic preference. - Use calloc instead of malloc + memset in a couple places. Make use of the new allocation macros with goto-done error handling. - Some opinionated format changes such as shifting some longer function arguments all to a new line so they're no so far to the right. - Auto-format with clang-format.

frsell1 · 2025-03-25T20:14:40Z

Fixed a bug in index_local_file_headers where the end_offset would be set to the end of the next file in the zip. This caused index_local_file_headers to extract duplicate files that were already extracted in index_the_central_directory

val-ms · 2025-03-27T14:18:44Z

This PR is currently causing two of our internal test suite tests to fail:

I'll investigate

Fix bug in end_offset calculation when searching for additional file entries. Fix size of memset after realloc.

val-ms · 2025-03-28T13:47:04Z

Squashed the last two commits together to tidy up, and so the commits will be signed.

frsell1 changed the title ~~Extend zip parser to be less reliant on the~~ Extend zip parser to be less reliant on the central directory Feb 27, 2025

val-ms mentioned this pull request Mar 3, 2025

updated unzip.c #1463

Closed

val-ms mentioned this pull request Mar 21, 2025

Forged ZIP archive manages to hide the Trojan from clamav #1014

Closed

val-ms requested changes Mar 23, 2025

View reviewed changes

libclamav/unzip.c Outdated Show resolved Hide resolved

frsell1 and others added 2 commits March 25, 2025 12:15

Zip: Extend parsing to rely less on central directory

8d8433b

Add logic to search for additional file entries between those that have been found using the central directory.

Tests: add tests for malformed and concatenated zips

9e96b63

val-ms force-pushed the update_zip_parser branch from 05886f0 to 3cde5ee Compare March 25, 2025 16:22

val-ms force-pushed the update_zip_parser branch from 3cde5ee to 96c00b6 Compare March 25, 2025 16:52

frsell1 force-pushed the update_zip_parser branch from 4b3035c to fab5065 Compare March 25, 2025 20:54

Cisco-Talos deleted a comment from DailyInvestors Mar 27, 2025

Zip: Fix additional bugs found in PR review

1199c89

Fix bug in end_offset calculation when searching for additional file entries. Fix size of memset after realloc.

val-ms force-pushed the update_zip_parser branch from 281a38e to 1199c89 Compare March 28, 2025 13:46

val-ms approved these changes Mar 28, 2025

View reviewed changes

val-ms merged commit 1638ce2 into Cisco-Talos:main Mar 28, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend zip parser to be less reliant on the central directory#1460

Extend zip parser to be less reliant on the central directory#1460
val-ms merged 4 commits intoCisco-Talos:mainfrom
frsell1:update_zip_parser

frsell1 commented Feb 27, 2025 •

edited

Loading

Uh oh!

val-ms commented Mar 21, 2025

Uh oh!

val-ms left a comment

Uh oh!

Uh oh!

val-ms commented Mar 23, 2025

Uh oh!

frsell1 commented Mar 25, 2025

Uh oh!

frsell1 commented Mar 25, 2025

Uh oh!

val-ms commented Mar 27, 2025

Uh oh!

val-ms commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

frsell1 commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

val-ms commented Mar 21, 2025

Uh oh!

val-ms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

val-ms commented Mar 23, 2025

Uh oh!

frsell1 commented Mar 25, 2025

Uh oh!

frsell1 commented Mar 25, 2025

Uh oh!

val-ms commented Mar 27, 2025

Uh oh!

val-ms commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frsell1 commented Feb 27, 2025 •

edited

Loading