Skip to content

Conversation

@otheus
Copy link

@otheus otheus commented Mar 20, 2014

Very simply, clamdscan needs the possibility of its own configuration file and with a fallback to clamd.conf. Although users can specify this on the command line, it's much nicer for packaging and deployment that it has its own configuration file that can be used out-of-the-box so to speak.

With this relatively simple code change, we manage that. There is a caveat: I cannot glean from parseopts how to determine if the configuration option was set by the provided command-line option or if the setting comes from the default setting. The work around is commented in the code.

@micksola
Copy link

micksola commented Aug 8, 2019

While the idea and work here are sound, and this may be a direction we wish to move in the future, it will likely be as part of a larger configuration re-work as we move closer to a 1.0 release. For those reasons, and given the time that has passed (which requires a revisiting of clamdscan options to ensure completeness), it is unlikely we will be including this PR in upstream.

@m-sola m-sola closed this Sep 13, 2019
val-ms pushed a commit that referenced this pull request Jun 28, 2020
Looking through the list of issues, I spotted some easy ones and submitted
some fixes:

- 225229 - In cli_rarload: Leak of memory or pointers to system resources.
If finding the necessary libunrar functions fails (should be rare),we now
dlclose libunrar.

225224 - In main (freshclam.c): A copied piece of code is inconsistent with
the original (CWE-398). A minor copy-paste error was present, and optOutList
could be cleaned up in one of the failure edge cases.

225228 - In decodecdb: Out-of-bounds access to a buffer (CWE-119). Off by one
error when tokenizing certain CDB sig fields for printing with sigtool. Ex:

$ cat test.cdb
a:CL_TYPE_7Z:1-2-3:/.*/:1-2-3:1-2-3:0:1-2-3::

$ cat test.cdb | ../installed/bin/sigtool --decode
VIRUS NAME: a
CONTAINER TYPE: CL_TYPE_7Z
CONTAINER SIZE: WITHIN RANGE 1 to 2
FILENAME REGEX: /.*/
COMPRESSED FILESIZE: WITHIN RANGE 1 to 2
UNCOMPRESSED FILESIZE: WITHIN RANGE 1 to 2
ENCRYPTION: NO
FILE POSITION: =================================================================
==17245==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffe3136d10 at pc 0x7f0f31c3f414 bp 0x7fffe3136c70 sp 0x7fffe3136c60
WRITE of size 8 at 0x7fffe3136d10 thread T0
    #0 0x7f0f31c3f413 in cli_strtokenize ../../libclamav/str.c:524
    #1 0x559e9797dc91 in decodecdb ../../sigtool/sigtool.c:2929
    #2 0x559e9797ea66 in decodesig ../../sigtool/sigtool.c:3058
    #3 0x559e9797f31e in decodesigs ../../sigtool/sigtool.c:3162
    #4 0x559e97981fbc in main ../../sigtool/sigtool.c:3638
    #5 0x7f0f3100fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #6 0x559e9795a1d9 in _start (/home/zelda/workspace/clamav-devel/installed/bin/sigtool+0x381d9)

Address 0x7fffe3136d10 is located in stack of thread T0 at offset 48 in frame
    #0 0x559e9797d113 in decodecdb ../../sigtool/sigtool.c:2840

  This frame has 1 object(s):
    [32, 48) 'range' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../libclamav/str.c:524 in cli_strtokenize

- 225223 - In cli_egg_deflate_decompress: Reads an uninitialized pointer or
its target (CWE-457). Certain fail cases would call inflateEnd on an
uninitialized stream. Now it’s only called after initialization occurs.

- 225220 - In buildcld: Use of an uninitialized variable (CWE-457). Certain
fail cases would result in oldDir being used before initialization. It now
gets zeroed before the first fail case.

- 225219 - In cli_egg_open: Leak of memory or pointers to system resources
(CWE-404).  If certain realloc’s failed, several structures would not be cleaned up

- 225218 - In cli_scanhwpml: Code block is unreachable because of the syntactic
structure of the code (CWE-561).  With certain macros set, there could be two
consecutive return statements.
shutton added a commit to shutton/clamav that referenced this pull request Jan 8, 2024
shutton added a commit to shutton/clamav that referenced this pull request Jan 9, 2024
val-ms added a commit that referenced this pull request Sep 4, 2024
…objstm-1.0.7

Fix possible out of bounds read in PDF parser (1.0.7)
val-ms added a commit that referenced this pull request Oct 11, 2025
By limiting the embedded file recognition in embedded files, we detect
fewer embedded files overall.

For example, imagine a PE with a structure of embedded files like so:

outer pe:
 emb. file #1: valid pe #1
 emb. file #2: valid pe #2
 emb. file #3: valid pe #3
 emb. file #4: false positive for pe
 emb. file #5: false positive for pe
 emb. file #6: false positive for pe
 emb. file #7: false positive for pe
 emb. file #8: false positive for pe
 emb. file #9: false positive for pe
 emb. file #10: false positive for pe
 emb. file #10: valid pe #4

With an embedded objects limit of 10, we won't extract that 4th valid PE
file.

However, previous we allowed detection of embedded files within embedded
files, so ClamAV mistook the above structure for something like this:

outer pe:
 emb. file #1: valid pe #1
   emb. file #1: valid pe #2
     emb. file #1: valid pe #3
       emb. file #1: false positive for pe
       emb. file #2: false positive for pe
       emb. file #3: false positive for pe
       emb. file #4: false positive for pe
       emb. file #5: false positive for pe
       emb. file #6: false positive for pe
       emb. file #7: false positive for pe
       emb. file #8: valid pe #4

As you can see, this is able to find and scan that 4th PE file without
exceeding an embedded object limit of 10.

The old way of detecting embedded files within embedded files has other
drawbacks and is obviously inaccurate in terms of the actual file
structure. But it did have that going for it.

Anyways, to improve detection, this PR bumps the embedded objects limit
to 16. I think that's okay since we've added header checks for several
types like PE's, and have also removed the need to drop embedded PE
files to a temp file for each scan.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 12, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 15, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants