feat(fs): use git commit hash as cache key for clean repositories#8278
Merged
Conversation
When scanning a git repository that is in a clean state, use the commit hash as the cache key instead of calculating it from blob info. This allows for better cache reuse and prevents unnecessary cache deletion.
tamirkiviti13
approved these changes
Jan 22, 2025
- Introduced a new fixture in `internal/gittest/testdata/fixture.go` to clone a Git repository for unit tests. - Updated the `magefiles/magefile.go` to include the new fixture in the unit test dependencies. - Added the test repository path to `.gitignore` to prevent it from being tracked.
- Added NewServerWithRepository function to create a git server with an existing repository, including cloning and fetching all branches and tags. - Updated NewTestServer to check for the existence of the test repository and provide a clear error message if not found. - Modified the cloning process in fixture.go to include all branches and tags. - Removed unused test files.
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Add debug logging when using the latest git commit hash for calculating the artifact cache key, providing more visibility into the cache key generation process.
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Contributor
DmitriyLewen
left a comment
There was a problem hiding this comment.
LGTM
But i think we need to update docs:
fsmay save/extract cache for clean repositories- by default we don't save cache for repositories (
--cache-backend memoryis default forfs/repomode) - it works by default for client/server mode
- cache only works if the repository directory is the target of Trivy scan (but maybe that's redundant)
| art.logger.Debug("Using the latest commit hash for calculating cache key", log.String("commit_hash", hash)) | ||
| art.commitHash = hash | ||
| } else { | ||
| art.logger.Debug("Random cache key will be used", log.Err(err)) |
Contributor
There was a problem hiding this comment.
Looks like we don't need to show this log unless it's a git repository directory.
Because we will show the following log for each fs scan
[fs] Random cache key will be used err="failed to open git repository: repository does not exist"
Contributor
There was a problem hiding this comment.
also, it might make sense to add path here.
Collaborator
Author
There was a problem hiding this comment.
It looks like there is a bug 😄 I'll fix it.
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Signed-off-by: knqyf263 <knqyf263@gmail.com>
Signed-off-by: knqyf263 <knqyf263@gmail.com>
DmitriyLewen
approved these changes
Jan 27, 2025
Contributor
DmitriyLewen
left a comment
There was a problem hiding this comment.
LGTM.
Can we merge this PR?
RingoDev
referenced
this pull request
in RingoDev/trivy
Feb 26, 2025
…278) Signed-off-by: knqyf263 <knqyf263@gmail.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Improve cache efficiency for Git repositories by using commit hash as cache key when the repository is in a clean state.
Description
When scanning a Git repository that is in a clean state, this change uses the commit hash as the cache key instead of calculating it from UUID. This allows for better cache reuse and prevents unnecessary cache deletion.
Currently, the file system scanner uses UUID to calculate the cache key, which effectively does not work as a cache key and is deleted after scanning with DeleteBlobs. However, if the directory is a Git repository, we can use the commit hash as a cache key, eliminating the need to delete the cache after scanning.
Key changes:
Related Issues
Checklist