feat: replace `requirementsenhanceable` extractor with transitive enricher #2294

G-Rath · 2025-10-22T01:52:57Z

On the surface this was pretty much a drop-in replacement which is really exciting, but there are a couple of quirks that we may or may not want to deal with before landing this.

The first one is very obvious which is that our "found packages" count is very different - it's technically correct, but I think ideally it would be good if we could express that we found x more packages via the enricher.

The second one is more interesting and I suspect one we'll probably ignore for the time being, but right now scalibr implicitly adds extractors that are required by enrichers as part of doing the scan which is at odds with our (experimental) abilities to enable and disable plugins - for now I've made it so that the enricher is only enabled if the requirements extractor is enabled, since we don't have that many enrichers with plugin requirements right now.

Resolves #2289

codecov-commenter · 2025-10-22T02:35:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.60%. Comparing base (280ac6a) to head (a5d4f0d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2294      +/-   ##
==========================================
- Coverage   68.64%   68.60%   -0.05%     
==========================================
  Files         169      168       -1     
  Lines       12646    12629      -17     
==========================================
- Hits         8681     8664      -17     
- Misses       3294     3295       +1     
+ Partials      671      670       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cuixq · 2025-10-22T22:32:59Z

pkg/osvscanner/scan.go

 	)

-	if len(plugins) == 0 {
+	if countNotEnrichers(plugins) == 0 {


if we want to make sure at least one extractor is enabled, shall we count the number of extractors?

this is just preserving our current logic which technically doesn't account for detectors, and I think that's ok for now because the long-run goal is to be running detectors, it's just right now we don't have enough stable ones to advertise them or justify making us less "extractor-first" 🤷

do you mean we are not currently running detectors and we would like to run them in the future?

also can you add a comment for this? the inconsistency between the function name and logging confuses me.

Currently we support running detectors via --experimental-plugins but we don't have any enabled by default, and the scanner was built with a focus around extractor-type plugins and vulnerabilities-type findings even though now we have stuff like detectors-type plugins and secrets-type findings, which is why we've got oddities like this messaging...

My understanding is that we're mostly ignoring this for now because there's still a lot of discussion going on how these sorts of things should be supported.

It's probably worth checking what @another-rex thinks too

Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.

Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.

Then in addition you can add Annotators and Enrichers.

have added a comment about this

cuixq · 2025-10-22T22:34:28Z

cmd/osv-scanner/scan/source/__snapshots__/command_test.snap

+Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages
+Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages


hmmm it will be nice that we have some logging to indicate the enircher has been run even if we don't have the number of extra packages extracted.

Do you have any thoughts on how that should look and how to do it? scalibr currently doesn't look to provide an AfterEnricher hook like it does for extractors and detectors, and enrichers look to be a lot more conditional so I'm not sure if we can reliable report on enrichers

I am not familiar with how post-run hooks run but I do see value adding an AfterEnricher hook - whether the stats is reliable sounds another problem.

Or alternative, we can add more logging to this transitive dependency enricher to provide more information. I will take a look at this later.

AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.

considering AfterEnricher may not happen immediately, I think we can add some logging in the enricher.

I don't see why we should avoid logging in transitive extractor for extra information? we already have the logging about failed resolution.

another-rex · 2025-10-23T01:56:53Z

cmd/osv-scanner/scan/source/__snapshots__/command_test.snap

+Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages
+Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package
+Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages


AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.

another-rex · 2025-10-23T02:07:03Z

pkg/osvscanner/scan.go

 	)

-	if len(plugins) == 0 {
+	if countNotEnrichers(plugins) == 0 {


Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.

Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.

Then in addition you can add Annotators and Enrichers.

another-rex · 2025-10-23T02:41:24Z

pkg/osvscanner/scan.go

 	plugins := scalibrplugin.Resolve(actions.PluginsEnabled, actions.PluginsDisabled)

+	// todo: use Enricher.RequiredPlugins to check this generically
+	if accessors.DependencyClients[osvschema.EcosystemPyPI] != nil && isRequirementsExtractorEnabled(plugins) {


Hmm... We are now doing this in two different ways, and we should probably standardise on a specific way here.

Here we are depending on the accessors system, where we initialise the correct clients based on the input flags.

However, for base images, I tried just initialising the base image enricher no matter what, and let it be filtered out by Capabilities in osv-scalibr. Since no actual network requests are made during the initiation, it should appear to be identical to the end user.

I can kind of see some pros and cons of both approaches?

Using accessors:

Pro: Lazy initiazation, we don't have to do as much work if disabled by the flags.

More control in osv-scanner about what is enabled/disabled

Con: We'll be replicating the logic of capabilities to filter out plugins

It's hard to figure out how to initialize custom enrichers if the accessors are disabled by the flag.

Any opinions/suggestions on which way to go here?

as per offline discussion, considering we are moving towards using enrichers for job that requires network access, I am leaning to getting rid of the accessors.

yeah I think that's the right direction if "no actual network requests are made during the initiation" holds (+ "initialization isn't expensive")

Just to be clear though, that change is relating to the --no-resolve flag, where we either remove it entirely or change the implication to be "we don't add enrichers for resolving transitive dependencies"

…icher

G-Rath · 2025-10-23T21:37:11Z

@cuixq fwiw it seems like the enricher might be flakier than the existing extractor? it doesn't seem to be too common and might be a third-party thing, but afaik we've never had any flakey issues with the existing extractor.

See this run for an example:

--- FAIL: TestCommand (0.00s)
    --- FAIL: TestCommand/requirements.txt_can_have_all_kinds_of_names (6.94s)
        command_test.go:331: 
            - Snapshot - 2
            + Received + 2
            
            @@ -6,8 +6,8 @@
            
              Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages
              Scanned <rootdir>/testdata/locks-requirements/the_requirements_for_test.txt file and found 1 package
              Scanned <rootdir>/testdata/locks-requirements/unresolvable-requirements.txt file and found 3 packages
            - failed resolution: no file can be used for parsing requirements for package flask-cors version 1.0
            - failed to parse metadata for file Flask-Cors-1.0.tar.gz: sdist: dependencies in setup.py, not in PKG-INFO
            + failed resolution: resolution impossible:
            + no candidates at all for: pytz ">=2011k"
              Total 15 packages affected by 53 known vulnerabilities (2 Critical, 20 High, 28 Medium, 0 Low, 3 Unknown) from 1 ecosystem.
              53 vulnerabilities can be fixed.
              ↵
            
            at __snapshots__\command_test.snap:1349

cuixq · 2025-10-23T22:13:50Z

this error seems not like a flaky error: both resolution failure should be deterministic if the artifacts don't change - not sure if related to the artifacts from registry. and it only failed for windows 🤔

maybe don't worry too much about it now - see if this is going to happen again.

G-Rath · 2025-10-23T22:24:13Z

yeah at this point I think we should just keep an eye on it, but fwiw its not just a Windows thing.

another-rex · 2025-10-29T05:18:39Z

Huh reran this and it failed again immediately.

another-rex · 2025-10-29T05:18:57Z

@cuixq Is there some retry logic that's missing in the osv-scalibr impl?

cuixq · 2025-11-02T23:53:39Z

I missed the comments from last week - I will take a look later.

G-Rath force-pushed the use-requirements-enricher branch 2 times, most recently from 09bea1c to df28fb3 Compare October 22, 2025 02:27

G-Rath marked this pull request as ready for review October 22, 2025 02:46

G-Rath requested review from another-rex and cuixq October 22, 2025 02:47

cuixq reviewed Oct 22, 2025

View reviewed changes

another-rex reviewed Oct 23, 2025

View reviewed changes

G-Rath mentioned this pull request Oct 23, 2025

feat: update osv-scalibr #2297

Merged

G-Rath added 2 commits October 24, 2025 08:47

feat: replace requirementsenhanceable extractor with transitive enr…

727b139

…icher

fix: only enable enricher when required extractor is enabled

f81a361

G-Rath force-pushed the use-requirements-enricher branch from df28fb3 to f81a361 Compare October 23, 2025 19:47

G-Rath added 2 commits October 24, 2025 08:50

feat: enable explicit plugins

2bd0e13

chore: add comment about why we only mention extractors

a5d4f0d

G-Rath requested review from another-rex and cuixq October 23, 2025 21:41

cuixq self-assigned this Nov 2, 2025

feat: replace requirementsenhanceable extractor with transitive enricher #2294

Are you sure you want to change the base?

feat: replace requirementsenhanceable extractor with transitive enricher #2294

Uh oh!

Conversation

G-Rath commented Oct 22, 2025

Uh oh!

codecov-commenter commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

G-Rath commented Oct 23, 2025

Uh oh!

cuixq commented Oct 23, 2025

Uh oh!

G-Rath commented Oct 23, 2025

Uh oh!

another-rex commented Oct 29, 2025

Uh oh!

another-rex commented Oct 29, 2025

Uh oh!

cuixq commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: replace `requirementsenhanceable` extractor with transitive enricher #2294

feat: replace `requirementsenhanceable` extractor with transitive enricher #2294

codecov-commenter commented Oct 22, 2025 •

edited

Loading