Skip to content

Clam 1931 Add support for Beginning Extended Area Descriptor files#921

Closed
ragusaa wants to merge 1 commit intoCisco-Talos:mainfrom
ragusaa:CLAM-1931-FixISOParsing
Closed

Clam 1931 Add support for Beginning Extended Area Descriptor files#921
ragusaa wants to merge 1 commit intoCisco-Talos:mainfrom
ragusaa:CLAM-1931-FixISOParsing

Conversation

@ragusaa
Copy link
Contributor

@ragusaa ragusaa commented May 16, 2023

No description provided.

@ragusaa ragusaa changed the title Clam 1931 fix iso parsing Clam 1931 fix iso parsing NOT ready for code review May 16, 2023
@ragusaa ragusaa force-pushed the CLAM-1931-FixISOParsing branch from ff29fd6 to c8c7427 Compare May 24, 2023 14:09
@ragusaa ragusaa changed the title Clam 1931 fix iso parsing NOT ready for code review Clam 1931 Add support for Beginning Extended Area Descriptor files May 24, 2023
@ragusaa ragusaa force-pushed the CLAM-1931-FixISOParsing branch from 130ff9a to 9cc8683 Compare May 24, 2023 15:12
Add support for Beginning Extended Area Descriptor (BEA01) iso files.
@ragusaa ragusaa force-pushed the CLAM-1931-FixISOParsing branch from 9f2c30a to d9dd6d5 Compare May 24, 2023 21:24
@val-ms
Copy link
Contributor

val-ms commented May 30, 2023

@ragusaa the new title for this PR doesn't make any sense to me. I have no idea what Beginning Extended Area Descriptor files are. A PR description would also be really helpful.

My understanding is that this PR attempts to identify UDF sessions in a CD-ROM/DVD image, in the same way we do for ISO 9660 sessions. The goal being to use the file system to find and scan the files.

I did some reading to try to understand the concepts. I came across this PDF which is helpful, and a bit overwhelming http://www.osta.org/specs/pdf/udf260.pdf

This Microsoft document describing IMAPI (UDF) Multisession Layout has a nice graphic showing multiple UDF sessions in an image https://learn.microsoft.com/en-us/windows/win32/imapi/imapi-multisession-layout

image

I tried stripping down your PR to just the functional code (removing all dead code and extra printfs), and then squashing that down to 1 commit so I could easily see what is new code and what the new code does. It seems to me that the new code really isn't modifications of existing ISO code so much as a completely new UDF file parser, that starts with this line in the ISO parser:

    next = (uint8_t *)cli_memstr((char *)privol + 2049, 2448 + 6 - 2049, "CD001", 5);
    if (!next)
        // Parse UDF
        return testrefactor(ctx, offset);
    
    // Parse ISO9660

It looks like you have a good start figuring out the UDF file system structure and parsing it in C to find the file entries. Outside of the very first line of code, this seems to be fully independent from the ISO9660 parser logic.

I also tried doing some scan testing with the files from the Jira issue. For the new UDF files attached in the comments, I found that it identified the file system as CL_TYPE_ISO9660 but that it didn't extract/scan the file entries. I haven't debugged in or really followed the code too much to see how far it gets.

I am very tempted to say that the UDF stuff should be separated out into a completely separate cli_scanudf() function, and a new CL_TYPE_UDF type created.

The code you've added seems like a new module trying to process an entirely different structure than our old ISO9660 code. So I'm thinking once you have the foundation of this parser, you should rewrite it in Rust.

Regardless if it is in C or Rust, I would also recommend a model where you make an object/handle for processing UDF file systems. In Rust it would be a struct (like a C++ class). You could have it open the file to do initial processing. And then have a function (method) along the lines of get_next_file() that would return the filename, file data, and any other important metadata (I don't know if UDF file systems have comments or something like some other archive formats). You would call that repeatedly until all files have been processed. For each extracted file, then you could call cli_magic_scan_buff()`. We have a process kind of like that for EGG archives, https://github.com/Cisco-Talos/clamav/blob/main/libclamav/scanners.c#L884

@ragusaa
Copy link
Contributor Author

ragusaa commented Jun 2, 2023

PR replaced by #941

@ragusaa ragusaa closed this Jun 2, 2023
@ragusaa
Copy link
Contributor Author

ragusaa commented Jun 2, 2023

PR replaced by 941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants