Extend zip parser to be less reliant on the central directory#1460
Extend zip parser to be less reliant on the central directory#1460val-ms merged 4 commits intoCisco-Talos:mainfrom
Conversation
|
Fixes #1014 |
val-ms
left a comment
There was a problem hiding this comment.
I created some feature/regression tests to verify that this fixes a few scenarios and ran into 3 bugs. I've created fixes for those, and then added some opinionated code cleanup on top of that.
It's too much to add in code review comments here, so I'm linking to my branch and if you're good with those changes, you or I can pull them onto your branch.
|
I should add that I LOVE the direction you went for solving this problem, and greatly appreciate your effort to make the code match our existing style. |
|
Thanks for the review and fixes. Just merged your changes to this branch |
Add logic to search for additional file entries between those that have been found using the central directory.
05886f0 to
3cde5ee
Compare
Fixes: - We need to look at the local headers if no central directory headers are found. Restructured the main `cli_unzip()` function to allocate an empty zip catalogue when we can't use a central directory at all. - In `index_local_file_headers_within_bounds()`, we must decrement the `coff` variable after adding the size of a file entry using `parse_local_file_header()`, to account for the increment when it loops around. If we don't, the next entry won't be at 'PK\x03\x04', it will be at 'K\0x03\x04'. - Attempt to unzip when encrypted if we don't have a valid password. This may enable extraction for files where a header lies about encryption. - The `fmap_need_off()` call to get the `compressed_data` pointer used the wrong size, checking if there was enough data for a header instead of for the compressed data that follows the header. I stumbled across this older bug when testing extraction of a zip where the file entries are tiny and I'd stripped off the central directory. As a result, there wasn't enough data for a whole file header and my test failed. Cleanup: - Initialize status variables as CL_ERROR and only assign to CL_SUCCESS if successful. This is to protect against future changes in case someone accidentally goes-to-done without setting the status. - Remove legacy use of CL_CLEAN. Not a functional change. This mostly a stylistic preference. - Use calloc instead of malloc + memset in a couple places. Make use of the new allocation macros with goto-done error handling. - Some opinionated format changes such as shifting some longer function arguments all to a new line so they're no so far to the right. - Auto-format with clang-format.
3cde5ee to
96c00b6
Compare
|
Fixed a bug in |
4b3035c to
fab5065
Compare
Fix bug in end_offset calculation when searching for additional file entries. Fix size of memset after realloc.
281a38e to
1199c89
Compare
|
Squashed the last two commits together to tidy up, and so the commits will be signed. |

Zip file concatenation is an evasion method used by threat actors in the wild: https://perception-point.io/blog/evasive-concatenated-zip-trojan-targets-windows-users/. ClamAV currently relies on the last central directory it can find and will therefore miss the first zip in the concatenation.
The proposed addition to the zip parser searches for local file headers between and around the files already found from the central directory header. It then adds these found files to the list of zip records found by the central directory. The idea is that scanning for a local file header is time intensive, so we should minimize the number of bytes we need to scan. At the same time, we need to scan for local file headers since we cannot trust the central directory header as a source of truth.
There was already functionality that would search for local file headers after parsing the central directory (at the end of the
cli_unzipfunction), but this required the appended local file headers to occur consecutively.