Skip to content

Conversation

@mgjani
Copy link

@mgjani mgjani commented Sep 5, 2014

Modifies two existing configuration options and adds a third to allow
user to specify whether or not file access should be prevented based
on on-access scan results.

@lattera
Copy link
Contributor

lattera commented Oct 5, 2014

Why did you rename 200+ files? Why did you delete inflate/deflate for Windows?

@mgjani
Copy link
Author

mgjani commented Oct 5, 2014

I didn't do either of those things. My initial push was for modifications to 2 files, and neither of those files was anywhere near the Windows code. The only thing I can think that is causing problems with a pull from my GitHub repo is that I've been doing fetch and merge to my local copy, then push that back to the GitHub repo. What you're seeing are likely changes that were made in the upstream and I've merely merged/pushed those back up.

@vrtadmin
Copy link

This has been fixed in our Internal GIT

@vrtadmin vrtadmin closed this Oct 18, 2016
val-ms added a commit that referenced this pull request Sep 4, 2024
…4.1-changes-with-CVE-fixes

Clam 2638 clam 2627 clam 2634 1.4.1 changes with CVE fixes
val-ms added a commit that referenced this pull request Oct 11, 2025
By limiting the embedded file recognition in embedded files, we detect
fewer embedded files overall.

For example, imagine a PE with a structure of embedded files like so:

outer pe:
 emb. file #1: valid pe #1
 emb. file #2: valid pe #2
 emb. file #3: valid pe #3
 emb. file #4: false positive for pe
 emb. file #5: false positive for pe
 emb. file #6: false positive for pe
 emb. file #7: false positive for pe
 emb. file #8: false positive for pe
 emb. file #9: false positive for pe
 emb. file #10: false positive for pe
 emb. file #10: valid pe #4

With an embedded objects limit of 10, we won't extract that 4th valid PE
file.

However, previous we allowed detection of embedded files within embedded
files, so ClamAV mistook the above structure for something like this:

outer pe:
 emb. file #1: valid pe #1
   emb. file #1: valid pe #2
     emb. file #1: valid pe #3
       emb. file #1: false positive for pe
       emb. file #2: false positive for pe
       emb. file #3: false positive for pe
       emb. file #4: false positive for pe
       emb. file #5: false positive for pe
       emb. file #6: false positive for pe
       emb. file #7: false positive for pe
       emb. file #8: valid pe #4

As you can see, this is able to find and scan that 4th PE file without
exceeding an embedded object limit of 10.

The old way of detecting embedded files within embedded files has other
drawbacks and is obviously inaccurate in terms of the actual file
structure. But it did have that going for it.

Anyways, to improve detection, this PR bumps the embedded objects limit
to 16. I think that's okay since we've added header checks for several
types like PE's, and have also removed the need to drop embedded PE
files to a temp file for each scan.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 12, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit to val-ms/clamav that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL Cisco-Talos#4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE Cisco-Talos#4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE Cisco-Talos#4: false positive,
        embedded MSEXE Cisco-Talos#5: DLL Cisco-Talos#4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE Cisco-Talos#4: false positive,
      embedded MSEXE Cisco-Talos#5: false positive,
      embedded MSEXE Cisco-Talos#6: DLL Cisco-Talos#4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE Cisco-Talos#4: false positive,
    embedded MSEXE Cisco-Talos#5: false positive,
    embedded MSEXE Cisco-Talos#6: false positive,
    embedded MSEXE Cisco-Talos#7: false positive,
    embedded MSEXE Cisco-Talos#8: DLL Cisco-Talos#4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE Cisco-Talos#4:  DLL #1,
  embedded MSEXE Cisco-Talos#5:  false positive,
  embedded MSEXE Cisco-Talos#6:  DLL #2,
  embedded MSEXE Cisco-Talos#7:  DLL #3,
  embedded MSEXE Cisco-Talos#8:  false positive,
  embedded MSEXE Cisco-Talos#9:  false positive,
  embedded MSEXE Cisco-Talos#10: false positive,
  embedded MSEXE Cisco-Talos#11: false positive,
  embedded MSEXE Cisco-Talos#12: DLL Cisco-Talos#4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit to val-ms/clamav that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL Cisco-Talos#4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE Cisco-Talos#4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE Cisco-Talos#4: false positive,
        embedded MSEXE Cisco-Talos#5: DLL Cisco-Talos#4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE Cisco-Talos#4: false positive,
      embedded MSEXE Cisco-Talos#5: false positive,
      embedded MSEXE Cisco-Talos#6: DLL Cisco-Talos#4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE Cisco-Talos#4: false positive,
    embedded MSEXE Cisco-Talos#5: false positive,
    embedded MSEXE Cisco-Talos#6: false positive,
    embedded MSEXE Cisco-Talos#7: false positive,
    embedded MSEXE Cisco-Talos#8: DLL Cisco-Talos#4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE Cisco-Talos#4:  DLL #1,
  embedded MSEXE Cisco-Talos#5:  false positive,
  embedded MSEXE Cisco-Talos#6:  DLL #2,
  embedded MSEXE Cisco-Talos#7:  DLL #3,
  embedded MSEXE Cisco-Talos#8:  false positive,
  embedded MSEXE Cisco-Talos#9:  false positive,
  embedded MSEXE Cisco-Talos#10: false positive,
  embedded MSEXE Cisco-Talos#11: false positive,
  embedded MSEXE Cisco-Talos#12: DLL Cisco-Talos#4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
val-ms added a commit that referenced this pull request Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL #4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE #4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE #4: false positive,
        embedded MSEXE #5: DLL #4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE #4: false positive,
      embedded MSEXE #5: false positive,
      embedded MSEXE #6: DLL #4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE #4: false positive,
    embedded MSEXE #5: false positive,
    embedded MSEXE #6: false positive,
    embedded MSEXE #7: false positive,
    embedded MSEXE #8: DLL #4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE #4:  DLL #1,
  embedded MSEXE #5:  false positive,
  embedded MSEXE #6:  DLL #2,
  embedded MSEXE #7:  DLL #3,
  embedded MSEXE #8:  false positive,
  embedded MSEXE #9:  false positive,
  embedded MSEXE #10: false positive,
  embedded MSEXE #11: false positive,
  embedded MSEXE #12: DLL #4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants