Skip to content

Conversation

@G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Oct 22, 2025

On the surface this was pretty much a drop-in replacement which is really exciting, but there are a couple of quirks that we may or may not want to deal with before landing this.

The first one is very obvious which is that our "found packages" count is very different - it's technically correct, but I think ideally it would be good if we could express that we found x more packages via the enricher.

The second one is more interesting and I suspect one we'll probably ignore for the time being, but right now scalibr implicitly adds extractors that are required by enrichers as part of doing the scan which is at odds with our (experimental) abilities to enable and disable plugins - for now I've made it so that the enricher is only enabled if the requirements extractor is enabled, since we don't have that many enrichers with plugin requirements right now.

Resolves #2289

@G-Rath G-Rath force-pushed the use-requirements-enricher branch 2 times, most recently from 09bea1c to df28fb3 Compare October 22, 2025 02:27
@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.60%. Comparing base (280ac6a) to head (a5d4f0d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2294      +/-   ##
==========================================
- Coverage   68.64%   68.60%   -0.05%     
==========================================
  Files         169      168       -1     
  Lines       12646    12629      -17     
==========================================
- Hits         8681     8664      -17     
- Misses       3294     3295       +1     
+ Partials      671      670       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@G-Rath G-Rath marked this pull request as ready for review October 22, 2025 02:46
@G-Rath G-Rath requested review from another-rex and cuixq October 22, 2025 02:47
)

if len(plugins) == 0 {
if countNotEnrichers(plugins) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to make sure at least one extractor is enabled, shall we count the number of extractors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just preserving our current logic which technically doesn't account for detectors, and I think that's ok for now because the long-run goal is to be running detectors, it's just right now we don't have enough stable ones to advertise them or justify making us less "extractor-first" 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean we are not currently running detectors and we would like to run them in the future?

also can you add a comment for this? the inconsistency between the function name and logging confuses me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we support running detectors via --experimental-plugins but we don't have any enabled by default, and the scanner was built with a focus around extractor-type plugins and vulnerabilities-type findings even though now we have stuff like detectors-type plugins and secrets-type findings, which is why we've got oddities like this messaging...

My understanding is that we're mostly ignoring this for now because there's still a lot of discussion going on how these sorts of things should be supported.

It's probably worth checking what @another-rex thinks too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.

Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.

Then in addition you can add Annotators and Enrichers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added a comment about this

Comment on lines +1332 to +1355
Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages
Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm it will be nice that we have some logging to indicate the enircher has been run even if we don't have the number of extra packages extracted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any thoughts on how that should look and how to do it? scalibr currently doesn't look to provide an AfterEnricher hook like it does for extractors and detectors, and enrichers look to be a lot more conditional so I'm not sure if we can reliable report on enrichers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with how post-run hooks run but I do see value adding an AfterEnricher hook - whether the stats is reliable sounds another problem.

Or alternative, we can add more logging to this transitive dependency enricher to provide more information. I will take a look at this later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering AfterEnricher may not happen immediately, I think we can add some logging in the enricher.

I don't see why we should avoid logging in transitive extractor for extra information? we already have the logging about failed resolution.

Comment on lines +1332 to +1355
Scanned <rootdir>/testdata/locks-requirements/my-requirements.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements-dev.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements-transitive.txt file and found 4 packages
Scanned <rootdir>/testdata/locks-requirements/requirements.prod.txt file and found 1 package
Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AfterEnricher stats hook is likely the best way to do this, we should avoid logging directly in the transitive extractor.

)

if len(plugins) == 0 {
if countNotEnrichers(plugins) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to leave it like this for now. Though probably a good idea to add a quick comment about why the error message mentions extractors if we are also counting detectors.

Eventually we'll want to support at least 1 enabled of: filesystem extractors, standalone extractors, detectors.

Then in addition you can add Annotators and Enrichers.

plugins := scalibrplugin.Resolve(actions.PluginsEnabled, actions.PluginsDisabled)

// todo: use Enricher.RequiredPlugins to check this generically
if accessors.DependencyClients[osvschema.EcosystemPyPI] != nil && isRequirementsExtractorEnabled(plugins) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... We are now doing this in two different ways, and we should probably standardise on a specific way here.

Here we are depending on the accessors system, where we initialise the correct clients based on the input flags.

However, for base images, I tried just initialising the base image enricher no matter what, and let it be filtered out by Capabilities in osv-scalibr. Since no actual network requests are made during the initiation, it should appear to be identical to the end user.

I can kind of see some pros and cons of both approaches?

Using accessors:

  • Pro: Lazy initiazation, we don't have to do as much work if disabled by the flags.
  • More control in osv-scanner about what is enabled/disabled
  • Con: We'll be replicating the logic of capabilities to filter out plugins
  • It's hard to figure out how to initialize custom enrichers if the accessors are disabled by the flag.

Any opinions/suggestions on which way to go here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per offline discussion, considering we are moving towards using enrichers for job that requires network access, I am leaning to getting rid of the accessors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that's the right direction if "no actual network requests are made during the initiation" holds (+ "initialization isn't expensive")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear though, that change is relating to the --no-resolve flag, where we either remove it entirely or change the implication to be "we don't add enrichers for resolving transitive dependencies"

@G-Rath G-Rath mentioned this pull request Oct 23, 2025
@G-Rath G-Rath force-pushed the use-requirements-enricher branch from df28fb3 to f81a361 Compare October 23, 2025 19:47
@G-Rath
Copy link
Collaborator Author

G-Rath commented Oct 23, 2025

@cuixq fwiw it seems like the enricher might be flakier than the existing extractor? it doesn't seem to be too common and might be a third-party thing, but afaik we've never had any flakey issues with the existing extractor.

See this run for an example:

--- FAIL: TestCommand (0.00s)
    --- FAIL: TestCommand/requirements.txt_can_have_all_kinds_of_names (6.94s)
        command_test.go:331: 
            - Snapshot - 2
            + Received + 2
            
            @@ -6,8 +6,8 @@
            
              Scanned <rootdir>/testdata/locks-requirements/requirements.txt file and found 3 packages
              Scanned <rootdir>/testdata/locks-requirements/the_requirements_for_test.txt file and found 1 package
              Scanned <rootdir>/testdata/locks-requirements/unresolvable-requirements.txt file and found 3 packages
            - failed resolution: no file can be used for parsing requirements for package flask-cors version 1.0
            - failed to parse metadata for file Flask-Cors-1.0.tar.gz: sdist: dependencies in setup.py, not in PKG-INFO
            + failed resolution: resolution impossible:
            + no candidates at all for: pytz ">=2011k"
              Total 15 packages affected by 53 known vulnerabilities (2 Critical, 20 High, 28 Medium, 0 Low, 3 Unknown) from 1 ecosystem.
              53 vulnerabilities can be fixed.
              ↵
            
            at __snapshots__\command_test.snap:1349

@G-Rath G-Rath requested review from another-rex and cuixq October 23, 2025 21:41
@cuixq
Copy link
Contributor

cuixq commented Oct 23, 2025

this error seems not like a flaky error: both resolution failure should be deterministic if the artifacts don't change - not sure if related to the artifacts from registry. and it only failed for windows 🤔

maybe don't worry too much about it now - see if this is going to happen again.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Oct 23, 2025

yeah at this point I think we should just keep an eye on it, but fwiw its not just a Windows thing.

@another-rex
Copy link
Collaborator

Huh reran this and it failed again immediately.

@another-rex
Copy link
Collaborator

@cuixq Is there some retry logic that's missing in the osv-scalibr impl?

@cuixq
Copy link
Contributor

cuixq commented Nov 2, 2025

I missed the comments from last week - I will take a look later.

@cuixq cuixq self-assigned this Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use transitivedependency enricher for transitive extraction for requirements.txt

4 participants