Initial work to provide better-usable fanotify for on-access scanning under Linux..#8
Closed
mgjani wants to merge 14 commits into
Closed
Initial work to provide better-usable fanotify for on-access scanning under Linux..#8mgjani wants to merge 14 commits into
mgjani wants to merge 14 commits into
Conversation
Modifies two existing configuration options and adds a third to allow user to specify whether or not file access should be prevented based on on-access scan results.
OnAccessIncludeDirectory
Contributor
|
Why did you rename 200+ files? Why did you delete inflate/deflate for Windows? |
Author
|
I didn't do either of those things. My initial push was for modifications to 2 files, and neither of those files was anywhere near the Windows code. The only thing I can think that is causing problems with a pull from my GitHub repo is that I've been doing fetch and merge to my local copy, then push that back to the GitHub repo. What you're seeing are likely changes that were made in the upstream and I've merely merged/pushed those back up. |
|
This has been fixed in our Internal GIT |
val-ms
added a commit
that referenced
this pull request
Sep 4, 2024
…4.1-changes-with-CVE-fixes Clam 2638 clam 2627 clam 2634 1.4.1 changes with CVE fixes
val-ms
added a commit
that referenced
this pull request
Oct 11, 2025
By limiting the embedded file recognition in embedded files, we detect fewer embedded files overall. For example, imagine a PE with a structure of embedded files like so: outer pe: emb. file #1: valid pe #1 emb. file #2: valid pe #2 emb. file #3: valid pe #3 emb. file #4: false positive for pe emb. file #5: false positive for pe emb. file #6: false positive for pe emb. file #7: false positive for pe emb. file #8: false positive for pe emb. file #9: false positive for pe emb. file #10: false positive for pe emb. file #10: valid pe #4 With an embedded objects limit of 10, we won't extract that 4th valid PE file. However, previous we allowed detection of embedded files within embedded files, so ClamAV mistook the above structure for something like this: outer pe: emb. file #1: valid pe #1 emb. file #1: valid pe #2 emb. file #1: valid pe #3 emb. file #1: false positive for pe emb. file #2: false positive for pe emb. file #3: false positive for pe emb. file #4: false positive for pe emb. file #5: false positive for pe emb. file #6: false positive for pe emb. file #7: false positive for pe emb. file #8: valid pe #4 As you can see, this is able to find and scan that 4th PE file without exceeding an embedded object limit of 10. The old way of detecting embedded files within embedded files has other drawbacks and is obviously inaccurate in terms of the actual file structure. But it did have that going for it. Anyways, to improve detection, this PR bumps the embedded objects limit to 16. I think that's okay since we've added header checks for several types like PE's, and have also removed the need to drop embedded PE files to a temp file for each scan. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 12, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
to val-ms/clamav
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL Cisco-Talos#4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: DLL Cisco-Talos#4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: DLL Cisco-Talos#4 } embedded MSEXE #3: DLL #3, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: false positive, embedded MSEXE Cisco-Talos#7: false positive, embedded MSEXE Cisco-Talos#8: DLL Cisco-Talos#4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: DLL #1, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: DLL #2, embedded MSEXE Cisco-Talos#7: DLL #3, embedded MSEXE Cisco-Talos#8: false positive, embedded MSEXE Cisco-Talos#9: false positive, embedded MSEXE Cisco-Talos#10: false positive, embedded MSEXE Cisco-Talos#11: false positive, embedded MSEXE Cisco-Talos#12: DLL Cisco-Talos#4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
to val-ms/clamav
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL Cisco-Talos#4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: DLL Cisco-Talos#4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: DLL Cisco-Talos#4 } embedded MSEXE #3: DLL #3, embedded MSEXE Cisco-Talos#4: false positive, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: false positive, embedded MSEXE Cisco-Talos#7: false positive, embedded MSEXE Cisco-Talos#8: DLL Cisco-Talos#4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE Cisco-Talos#4: DLL #1, embedded MSEXE Cisco-Talos#5: false positive, embedded MSEXE Cisco-Talos#6: DLL #2, embedded MSEXE Cisco-Talos#7: DLL #3, embedded MSEXE Cisco-Talos#8: false positive, embedded MSEXE Cisco-Talos#9: false positive, embedded MSEXE Cisco-Talos#10: false positive, embedded MSEXE Cisco-Talos#11: false positive, embedded MSEXE Cisco-Talos#12: DLL Cisco-Talos#4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
val-ms
added a commit
that referenced
this pull request
Oct 14, 2025
I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Modifies two existing configuration options and adds a third to allow
user to specify whether or not file access should be prevented based
on on-access scan results.