Skip to content

Conversation

@val-ms
Copy link
Contributor

@val-ms val-ms commented Jan 21, 2023

Add support for extracting and scanning base64'd images embedded in HTML <style> CSS url() function parameters.

Tests included in the PR. For manual testing, see the samples listed in Jira.

Convert integer bools to bool bools.
This commit adds a feature to find, decode, and scan each image found
within HTML <style> tags where the image data is embedded in `url()`
function parameters a base64 blob

In C in the html normalization process we extract style tag contents
to new buffer for processing. We call into a new feature in Rust code to
find and decode each image (if there are multiple).

Once extracted, the images are scanned as contained files of unknown
type, and file type identifcation will determine the actual type.
Copy link
Contributor

@shutton shutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely need to address the discarded error.

// Decode the base64 encoded image
match base64::decode(base64_image)
match general_purpose::STANDARD.decode(base64_image)
.map_err(|e| CssExtractError::Base64Decode(format!("{}", e)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be throwing away this error, since matching on it just evaluates to None

If this is what you intended, this is a simpler construct (no match):

general_purpose::STANDARD.decode(base64_image).ok()

That discards the error, converting the result to Some(image) upon success, or None upon error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I like this. Thank you!

@ragusaa
Copy link
Contributor

ragusaa commented Jan 26, 2023

I just ran with memory leak sanitizer, and got the following output.

`aragusa@ubuntu:~/PR-813$ ~/install.PR.SANS/bin/clamscan -d ~/sigs.downloaded/ --allmatch --bytecode-unsigned Clean_HTML_samples
LibClamAV Warning: **************************************************
LibClamAV Warning: *** The virus database is older than 7 days! ***
LibClamAV Warning: *** Please update it as soon as possible. ***
LibClamAV Warning: **************************************************
../libclamav/tomsfastmath/bit/fp_div_2d.c:55:18: runtime error: index -1 out of bounds for type 'fp_digit [136]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../libclamav/tomsfastmath/bit/fp_div_2d.c:55:18 in
Loading: 5h 06m, ETA: 0s [========================>] 8.65M/8.65M sigs
Compiling: 23s, ETA: 0s [========================>] 41/41 tasks ks

/home/aragusa/PR-813/Clean_HTML_samples/imgtag_background.html: OK
/home/aragusa/PR-813/Clean_HTML_samples/css_background.html: OK

----------- SCAN SUMMARY -----------
Known viruses: 8649349
Engine version: 1.1.0-devel-20230125
Scanned directories: 1
Scanned files: 2
Infected files: 0
Data scanned: 0.02 MB
Data read: 0.02 MB (ratio 1.50:1)
Time: 18433.614 sec (307 m 13 s)
Start Date: 2023:01:25 18:24:08
End Date: 2023:01:25 23:31:22

=================================================================
==161059==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x498ca9 in realloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498ca9)
#1 0x7fb9718d3478 in cli_realloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:272:13
#2 0x7fb97195960e in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:425:46
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971956cf3 in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:90:38
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971958dcc in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:362:52
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971958cd7 in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:350:46
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: 88 byte(s) leaked in 4 allocation(s).
`

Do not worry about the 'index -1 out of bounds for type'. That is a false positive

@ragusaa
Copy link
Contributor

ragusaa commented Jan 27, 2023

I just ran with main with sanitizers enabled and got the same leaks, so they are not new.

@val-ms
Copy link
Contributor Author

val-ms commented Jan 27, 2023

Interesting. Yeah it doesn't look related. Pretty small, but we should fix them fr. Let's make a Jira.

I found that the `url(data:` type does not matter to a browser.
In addition, whitespace may be placed in a few locations and the browser
will ignore it.

This commit accounts for this, and updates the test accordingly.
@val-ms val-ms force-pushed the CLAM-2196-html-style-image-extract branch from 2ffb039 to a4e372f Compare February 8, 2023 04:01
@val-ms
Copy link
Contributor Author

val-ms commented Feb 8, 2023

Squashed the review fixups.

@val-ms val-ms merged commit dcaaf86 into Cisco-Talos:main Feb 8, 2023
@val-ms val-ms deleted the CLAM-2196-html-style-image-extract branch February 8, 2023 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants