-
Notifications
You must be signed in to change notification settings - Fork 473
feat: replace requirementsenhanceable extractor with transitive enricher
#2294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
09bea1c to
df28fb3
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2294 +/- ##
==========================================
- Coverage 68.64% 68.60% -0.05%
==========================================
Files 169 168 -1
Lines 12646 12629 -17
==========================================
- Hits 8681 8664 -17
- Misses 3294 3295 +1
+ Partials 671 670 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ) | ||
|
|
||
| if len(plugins) == 0 { | ||
| if countNotEnrichers(plugins) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to make sure at least one extractor is enabled, shall we count the number of extractors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just preserving our current logic which technically doesn't account for detectors, and I think that's ok for now because the long-run goal is to be running detectors, it's just right now we don't have enough stable ones to advertise them or justify making us less "extractor-first" 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean we are not currently running detectors and we would like to run them in the future?
also can you add a comment for this? the inconsistency between the function name and logging confuses me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we support running detectors via --experimental-plugins but we don't have any enabled by default, and the scanner was built with a focus around extractor-type plugins and vulnerabilities-type findings even though now we have stuff like detectors-type plugins and secrets-type findings, which is why we've got oddities like this messaging...
My understanding is that we're mostly ignoring this for now because there's still a lot of discussion going on how these sorts of things should be supported.
It's probably worth checking what @another-rex thinks too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.
Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.
Then in addition you can add Annotators and Enrichers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have added a comment about this
| Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm it will be nice that we have some logging to indicate the enircher has been run even if we don't have the number of extra packages extracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any thoughts on how that should look and how to do it? scalibr currently doesn't look to provide an AfterEnricher hook like it does for extractors and detectors, and enrichers look to be a lot more conditional so I'm not sure if we can reliable report on enrichers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with how post-run hooks run but I do see value adding an AfterEnricher hook - whether the stats is reliable sounds another problem.
Or alternative, we can add more logging to this transitive dependency enricher to provide more information. I will take a look at this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering AfterEnricher may not happen immediately, I think we can add some logging in the enricher.
I don't see why we should avoid logging in transitive extractor for extra information? we already have the logging about failed resolution.
| Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package | ||
| Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.
| ) | ||
|
|
||
| if len(plugins) == 0 { | ||
| if countNotEnrichers(plugins) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.
Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.
Then in addition you can add Annotators and Enrichers.
| plugins := scalibrplugin.Resolve(actions.PluginsEnabled, actions.PluginsDisabled) | ||
|
|
||
| // todo: use Enricher.RequiredPlugins to check this generically | ||
| if accessors.DependencyClients[osvschema.EcosystemPyPI] != nil && isRequirementsExtractorEnabled(plugins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... We are now doing this in two different ways, and we should probably standardise on a specific way here.
Here we are depending on the accessors system, where we initialise the correct clients based on the input flags.
However, for base images, I tried just initialising the base image enricher no matter what, and let it be filtered out by Capabilities in osv-scalibr. Since no actual network requests are made during the initiation, it should appear to be identical to the end user.
I can kind of see some pros and cons of both approaches?
Using accessors:
- Pro: Lazy initiazation, we don't have to do as much work if disabled by the flags.
- More control in osv-scanner about what is enabled/disabled
- Con: We'll be replicating the logic of capabilities to filter out plugins
- It's hard to figure out how to initialize custom enrichers if the accessors are disabled by the flag.
Any opinions/suggestions on which way to go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per offline discussion, considering we are moving towards using enrichers for job that requires network access, I am leaning to getting rid of the accessors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that's the right direction if "no actual network requests are made during the initiation" holds (+ "initialization isn't expensive")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear though, that change is relating to the --no-resolve flag, where we either remove it entirely or change the implication to be "we don't add enrichers for resolving transitive dependencies"
df28fb3 to
f81a361
Compare
|
@cuixq fwiw it seems like the enricher might be flakier than the existing extractor? it doesn't seem to be too common and might be a third-party thing, but afaik we've never had any flakey issues with the existing extractor. |
|
this error seems not like a flaky error: both resolution failure should be deterministic if the artifacts don't change - not sure if related to the artifacts from registry. and it only failed for windows 🤔 maybe don't worry too much about it now - see if this is going to happen again. |
|
yeah at this point I think we should just keep an eye on it, but fwiw its not just a Windows thing. |
|
Huh reran this and it failed again immediately. |
|
@cuixq Is there some retry logic that's missing in the osv-scalibr impl? |
|
I missed the comments from last week - I will take a look later. |
On the surface this was pretty much a drop-in replacement which is really exciting, but there are a couple of quirks that we may or may not want to deal with before landing this.
The first one is very obvious which is that our "found packages" count is very different - it's technically correct, but I think ideally it would be good if we could express that we found x more packages via the enricher.
The second one is more interesting and I suspect one we'll probably ignore for the time being, but right now
scalibrimplicitly adds extractors that are required by enrichers as part of doing the scan which is at odds with our (experimental) abilities to enable and disable plugins - for now I've made it so that the enricher is only enabled if therequirementsextractor is enabled, since we don't have that many enrichers with plugin requirements right now.Resolves #2289