-
Notifications
You must be signed in to change notification settings - Fork 816
Allow clamdscan to have a separate config file; use clamd's if not found #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
While the idea and work here are sound, and this may be a direction we wish to move in the future, it will likely be as part of a larger configuration re-work as we move closer to a 1.0 release. For those reasons, and given the time that has passed (which requires a revisiting of clamdscan options to ensure completeness), it is unlikely we will be including this PR in upstream. |
val-ms
pushed a commit
that referenced
this pull request
Jun 28, 2020
Looking through the list of issues, I spotted some easy ones and submitted
some fixes:
- 225229 - In cli_rarload: Leak of memory or pointers to system resources.
If finding the necessary libunrar functions fails (should be rare),we now
dlclose libunrar.
225224 - In main (freshclam.c): A copied piece of code is inconsistent with
the original (CWE-398). A minor copy-paste error was present, and optOutList
could be cleaned up in one of the failure edge cases.
225228 - In decodecdb: Out-of-bounds access to a buffer (CWE-119). Off by one
error when tokenizing certain CDB sig fields for printing with sigtool. Ex:
$ cat test.cdb
a:CL_TYPE_7Z:1-2-3:/.*/:1-2-3:1-2-3:0:1-2-3::
$ cat test.cdb | ../installed/bin/sigtool --decode
VIRUS NAME: a
CONTAINER TYPE: CL_TYPE_7Z
CONTAINER SIZE: WITHIN RANGE 1 to 2
FILENAME REGEX: /.*/
COMPRESSED FILESIZE: WITHIN RANGE 1 to 2
UNCOMPRESSED FILESIZE: WITHIN RANGE 1 to 2
ENCRYPTION: NO
FILE POSITION: =================================================================
==17245==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffe3136d10 at pc 0x7f0f31c3f414 bp 0x7fffe3136c70 sp 0x7fffe3136c60
WRITE of size 8 at 0x7fffe3136d10 thread T0
#0 0x7f0f31c3f413 in cli_strtokenize ../../libclamav/str.c:524
#1 0x559e9797dc91 in decodecdb ../../sigtool/sigtool.c:2929
#2 0x559e9797ea66 in decodesig ../../sigtool/sigtool.c:3058
#3 0x559e9797f31e in decodesigs ../../sigtool/sigtool.c:3162
#4 0x559e97981fbc in main ../../sigtool/sigtool.c:3638
#5 0x7f0f3100fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#6 0x559e9795a1d9 in _start (/home/zelda/workspace/clamav-devel/installed/bin/sigtool+0x381d9)
Address 0x7fffe3136d10 is located in stack of thread T0 at offset 48 in frame
#0 0x559e9797d113 in decodecdb ../../sigtool/sigtool.c:2840
This frame has 1 object(s):
[32, 48) 'range' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../libclamav/str.c:524 in cli_strtokenize
- 225223 - In cli_egg_deflate_decompress: Reads an uninitialized pointer or
its target (CWE-457). Certain fail cases would call inflateEnd on an
uninitialized stream. Now it’s only called after initialization occurs.
- 225220 - In buildcld: Use of an uninitialized variable (CWE-457). Certain
fail cases would result in oldDir being used before initialization. It now
gets zeroed before the first fail case.
- 225219 - In cli_egg_open: Leak of memory or pointers to system resources
(CWE-404). If certain realloc’s failed, several structures would not be cleaned up
- 225218 - In cli_scanhwpml: Code block is unreachable because of the syntactic
structure of the code (CWE-561). With certain macros set, there could be two
consecutive return statements.
shutton
added a commit
to shutton/clamav
that referenced
this pull request
Jan 8, 2024
Correct homepage URL
shutton
added a commit
to shutton/clamav
that referenced
this pull request
Jan 9, 2024
…epage-url Correct homepage URL
val-ms
added a commit
that referenced
this pull request
Sep 4, 2024
…objstm-1.0.7 Fix possible out of bounds read in PDF parser (1.0.7)
val-ms
added a commit
that referenced
this pull request
Oct 11, 2025
By limiting the embedded file recognition in embedded files, we detect fewer embedded files overall. For example, imagine a PE with a structure of embedded files like so: outer pe: emb. file #1: valid pe #1 emb. file #2: valid pe #2 emb. file #3: valid pe #3 emb. file #4: false positive for pe emb. file #5: false positive for pe emb. file #6: false positive for pe emb. file #7: false positive for pe emb. file #8: false positive for pe emb. file #9: false positive for pe emb. file #10: false positive for pe emb. file #10: valid pe #4 With an embedded objects limit of 10, we won't extract that 4th valid PE file. However, previous we allowed detection of embedded files within embedded files, so ClamAV mistook the above structure for something like this: outer pe: emb. file #1: valid pe #1 emb. file #1: valid pe #2 emb. file #1: valid pe #3 emb. file #1: false positive for pe emb. file #2: false positive for pe emb. file #3: false positive for pe emb. file #4: false positive for pe emb. file #5: false positive for pe emb. file #6: false positive for pe emb. file #7: false positive for pe emb. file #8: valid pe #4 As you can see, this is able to find and scan that 4th PE file without exceeding an embedded object limit of 10. The old way of detecting embedded files within embedded files has other drawbacks and is obviously inaccurate in terms of the actual file structure. But it did have that going for it. Anyways, to improve detection, this PR bumps the embedded objects limit to 16. I think that's okay since we've added header checks for several types like PE's, and have also removed the need to drop embedded PE files to a temp file for each scan. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 12, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 15, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Very simply, clamdscan needs the possibility of its own configuration file and with a fallback to clamd.conf. Although users can specify this on the command line, it's much nicer for packaging and deployment that it has its own configuration file that can be used out-of-the-box so to speak.
With this relatively simple code change, we manage that. There is a caveat: I cannot glean from parseopts how to determine if the configuration option was set by the provided command-line option or if the setting comes from the default setting. The work around is commented in the code.