-
Notifications
You must be signed in to change notification settings - Fork 816
Clam 2196 html style image extract #813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clam 2196 html style image extract #813
Conversation
Convert integer bools to bool bools.
This commit adds a feature to find, decode, and scan each image found within HTML <style> tags where the image data is embedded in `url()` function parameters a base64 blob In C in the html normalization process we extract style tag contents to new buffer for processing. We call into a new feature in Rust code to find and decode each image (if there are multiple). Once extracted, the images are scanned as contained files of unknown type, and file type identifcation will determine the actual type.
shutton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely need to address the discarded error.
| // Decode the base64 encoded image | ||
| match base64::decode(base64_image) | ||
| match general_purpose::STANDARD.decode(base64_image) | ||
| .map_err(|e| CssExtractError::Base64Decode(format!("{}", e))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be throwing away this error, since matching on it just evaluates to None
If this is what you intended, this is a simpler construct (no match):
general_purpose::STANDARD.decode(base64_image).ok()That discards the error, converting the result to Some(image) upon success, or None upon error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I like this. Thank you!
|
I just ran with memory leak sanitizer, and got the following output. `aragusa@ubuntu:~/PR-813$ ~/install.PR.SANS/bin/clamscan -d ~/sigs.downloaded/ --allmatch --bytecode-unsigned Clean_HTML_samples /home/aragusa/PR-813/Clean_HTML_samples/imgtag_background.html: OK ----------- SCAN SUMMARY ----------- ================================================================= Direct leak of 8 byte(s) in 1 object(s) allocated from: Indirect leak of 56 byte(s) in 1 object(s) allocated from: Indirect leak of 16 byte(s) in 1 object(s) allocated from: Indirect leak of 8 byte(s) in 1 object(s) allocated from: SUMMARY: AddressSanitizer: 88 byte(s) leaked in 4 allocation(s). Do not worry about the 'index -1 out of bounds for type'. That is a false positive |
|
I just ran with main with sanitizers enabled and got the same leaks, so they are not new. |
|
Interesting. Yeah it doesn't look related. Pretty small, but we should fix them fr. Let's make a Jira. |
I found that the `url(data:` type does not matter to a browser. In addition, whitespace may be placed in a few locations and the browser will ignore it. This commit accounts for this, and updates the test accordingly.
2ffb039 to
a4e372f
Compare
|
Squashed the review fixups. |
Add support for extracting and scanning base64'd images embedded in HTML <style> CSS
url()function parameters.Tests included in the PR. For manual testing, see the samples listed in Jira.