Clam 2196 html style image extract #813

val-ms · 2023-01-21T00:26:15Z

Add support for extracting and scanning base64'd images embedded in HTML <style> CSS url() function parameters.

Tests included in the PR. For manual testing, see the samples listed in Jira.

Convert integer bools to bool bools.

This commit adds a feature to find, decode, and scan each image found within HTML <style> tags where the image data is embedded in `url()` function parameters a base64 blob In C in the html normalization process we extract style tag contents to new buffer for processing. We call into a new feature in Rust code to find and decode each image (if there are multiple). Once extracted, the images are scanned as contained files of unknown type, and file type identifcation will determine the actual type.

shutton

Definitely need to address the discarded error.

libclamav/htmlnorm.c

shutton · 2023-01-25T17:09:16Z

libclamav_rust/src/css_image_extract.rs

            // Decode the base64 encoded image
-            match base64::decode(base64_image)
+            match general_purpose::STANDARD.decode(base64_image)
                .map_err(|e| CssExtractError::Base64Decode(format!("{}", e)))


You seem to be throwing away this error, since matching on it just evaluates to None

If this is what you intended, this is a simpler construct (no match):

general_purpose::STANDARD.decode(base64_image).ok()

That discards the error, converting the result to Some(image) upon success, or None upon error.

Oh I like this. Thank you!

libclamav_rust/src/css_image_extract.rs

ragusaa · 2023-01-26T18:13:18Z

I just ran with memory leak sanitizer, and got the following output.

`aragusa@ubuntu:~/PR-813$ ~/install.PR.SANS/bin/clamscan -d ~/sigs.downloaded/ --allmatch --bytecode-unsigned Clean_HTML_samples
LibClamAV Warning: **************************************************
LibClamAV Warning: *** The virus database is older than 7 days! ***
LibClamAV Warning: *** Please update it as soon as possible. ***
LibClamAV Warning: **************************************************
../libclamav/tomsfastmath/bit/fp_div_2d.c:55:18: runtime error: index -1 out of bounds for type 'fp_digit [136]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../libclamav/tomsfastmath/bit/fp_div_2d.c:55:18 in
Loading: 5h 06m, ETA: 0s [========================>] 8.65M/8.65M sigs
Compiling: 23s, ETA: 0s [========================>] 41/41 tasks ks

/home/aragusa/PR-813/Clean_HTML_samples/imgtag_background.html: OK
/home/aragusa/PR-813/Clean_HTML_samples/css_background.html: OK

----------- SCAN SUMMARY -----------
Known viruses: 8649349
Engine version: 1.1.0-devel-20230125
Scanned directories: 1
Scanned files: 2
Infected files: 0
Data scanned: 0.02 MB
Data read: 0.02 MB (ratio 1.50:1)
Time: 18433.614 sec (307 m 13 s)
Start Date: 2023:01:25 18:24:08
End Date: 2023:01:25 23:31:22

=================================================================
==161059==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x498ca9 in realloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498ca9)
#1 0x7fb9718d3478 in cli_realloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:272:13
#2 0x7fb97195960e in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:425:46
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971956cf3 in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:90:38
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971958dcc in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:362:52
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x498b02 in calloc (/home/aragusa/install.PR.SANS/bin/clamscan+0x498b02)
#1 0x7fb9718d33ee in cli_calloc /home/aragusa/clamav-upstream/build.sans/../libclamav/others_common.c:251:13
#2 0x7fb971958cd7 in cli_bcomp_addpatt /home/aragusa/clamav-upstream/build.sans/../libclamav/matcher-byte-comp.c:350:46
#3 0x7fb971993e4f in readdb_parse_ldb_subsignature /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:529:34
#4 0x7fb9719d5bd9 in load_oneldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2186:15
#5 0x7fb9719a4dd6 in cli_loadldb /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:2257:15
#6 0x7fb97199c7da in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#7 0x7fb9719874cc in cli_tgzload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:343:19
#8 0x7fb9719852db in cli_cvdload /home/aragusa/clamav-upstream/build.sans/../libclamav/cvd.c:711:11
#9 0x7fb97199b0b7 in cli_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c
#10 0x7fb9719b4787 in cli_loaddbdir /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5207:15
#11 0x7fb9719b4787 in cl_load /home/aragusa/clamav-upstream/build.sans/../libclamav/readdb.c:5318:19
#12 0x4ccf87 in scanmanager /home/aragusa/clamav-upstream/build.sans/../clamscan/manager.c:1261:24
#13 0x4ca27c in main /home/aragusa/clamav-upstream/build.sans/../clamscan/clamscan.c:171:11
#14 0x7fb970779082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: 88 byte(s) leaked in 4 allocation(s).
`

Do not worry about the 'index -1 out of bounds for type'. That is a false positive

ragusaa · 2023-01-27T17:22:51Z

I just ran with main with sanitizers enabled and got the same leaks, so they are not new.

val-ms · 2023-01-27T18:15:37Z

Interesting. Yeah it doesn't look related. Pretty small, but we should fix them fr. Let's make a Jira.

I found that the `url(data:` type does not matter to a browser. In addition, whitespace may be placed in a few locations and the browser will ignore it. This commit accounts for this, and updates the test accordingly.

val-ms · 2023-02-08T04:01:51Z

Squashed the review fixups.

val-ms added 3 commits December 14, 2022 16:00

Minor code cleanup

f40f5b7

Convert integer bools to bool bools.

Test: verify clamscan detecting 2 images from same HTML style block

00c8df0

val-ms assigned shutton, jimmy-sonny and ragusaa Jan 21, 2023

shutton suggested changes Jan 25, 2023

View reviewed changes

ragusaa approved these changes Feb 6, 2023

View reviewed changes

HTML <style> image extraction improvement

a4e372f

I found that the `url(data:` type does not matter to a browser. In addition, whitespace may be placed in a few locations and the browser will ignore it. This commit accounts for this, and updates the test accordingly.

val-ms force-pushed the CLAM-2196-html-style-image-extract branch from 2ffb039 to a4e372f Compare February 8, 2023 04:01

val-ms merged commit dcaaf86 into Cisco-Talos:main Feb 8, 2023

val-ms deleted the CLAM-2196-html-style-image-extract branch February 8, 2023 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clam 2196 html style image extract #813

Clam 2196 html style image extract #813

Uh oh!

val-ms commented Jan 21, 2023

Uh oh!

shutton left a comment

Uh oh!

Uh oh!

shutton Jan 25, 2023

Uh oh!

val-ms Feb 4, 2023

Uh oh!

Uh oh!

ragusaa commented Jan 26, 2023 •

edited

Loading

Uh oh!

ragusaa commented Jan 27, 2023

Uh oh!

val-ms commented Jan 27, 2023

Uh oh!

val-ms commented Feb 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Clam 2196 html style image extract #813

Clam 2196 html style image extract #813

Uh oh!

Conversation

val-ms commented Jan 21, 2023

Uh oh!

shutton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shutton Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

val-ms Feb 4, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ragusaa commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ragusaa commented Jan 27, 2023

Uh oh!

val-ms commented Jan 27, 2023

Uh oh!

val-ms commented Feb 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ragusaa commented Jan 26, 2023 •

edited

Loading