-
-
Notifications
You must be signed in to change notification settings - Fork 201
Add GSoC 2025 Project Report #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AyanSinhaMahapatra
merged 7 commits into
aboutcode-org:main
from
michaelehab:gsoc2025-report
Aug 31, 2025
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
e94108c
Add GSoC 2025 Project Report
michaelehab 5fb8713
Update GSoC 2025 report
michaelehab be2ee66
Fix failing tests and update report
michaelehab 64f1bd3
Merge branch 'aboutcode-org:main' into gsoc2025-report
michaelehab ebfaeb1
Add images to static folder and fix bullet points
michaelehab ce25ecf
Merge branch 'gsoc2025-report' of https://github.com/michaelehab/abou…
michaelehab b7f46c9
Add Importers Registries Diagram
michaelehab File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+1.91 MB
docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+90.3 KB
docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
230 changes: 230 additions & 0 deletions
230
docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,230 @@ | ||
| VulnerableCode: On-demand live evaluation of packages | ||
| ===================================================== | ||
|
|
||
| Organization - `AboutCode <https://www.aboutcode.org>`_ | ||
| ----------------------------------------------------------- | ||
| | **Michael Ehab Mikhail** | ||
| | GitHub: `michaelehab <https://github.com/michaelehab>`_ | ||
| | LinkedIn: `@michaelehab16 <https://www.linkedin.com/in/michaelehab16/>`_ | ||
| | Project: `VulnerableCode | ||
| <https://github.com/aboutcode-org/vulnerablecode>`_ | ||
| | Official GSoC project page: `Project Link | ||
| <https://summerofcode.withgoogle.com/programs/2025/projects/uF0kzMAg>`_ | ||
| | GSoC Proposal: `Proposal Link | ||
| <https://docs.google.com/document/d/1Tkk4MoPWXFj9r_U5cp3E4AhJW6QlHxTElyzpII_f4LM/edit?usp=sharing>`_ | ||
|
|
||
| Overview | ||
| -------- | ||
|
|
||
| VulnerableCode traditionally relied on **batch importers** to fetch | ||
| and store all advisories from a source at once. While effective for | ||
| building complete databases, batch importers are slow and | ||
| resource-heavy for developers who only need vulnerability | ||
| data for a **single package**. | ||
|
|
||
| This project introduces **live importers**, a new class of | ||
| importers that operate in a *package-first* mode. Instead of | ||
| pulling all advisories, they run against a single | ||
| PackageURL (PURL), returning only the advisories affecting | ||
| that package. This makes vulnerability evaluation | ||
| **faster, more efficient, and more personalized**, since the | ||
| database is gradually filled with only the advisories | ||
| that matter to each user. | ||
|
|
||
| To support this, I added: | ||
|
|
||
| * A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers. | ||
| * A new **API endpoint** that accepts a **PURL**, enqueues compatible | ||
| live importer pipelines into a Redis queue, and executes them asynchronously | ||
| via workers. | ||
| * Integration with **VulnTotal** and its **browser extension**, enabling users | ||
| to evaluate packages in real-time through a seamless interface. | ||
|
|
||
| This work bridges the gap between **batch-first databases** and | ||
| **package-first queries**, improving VulnerableCode's flexibility and enabling | ||
| better integration with developer workflows. | ||
|
|
||
| .. note:: | ||
| A PURL (Package URL) is a universal way to identify and locate software | ||
| packages. `More on PURL <https://github.com/package-url/purl-spec>`_ | ||
|
|
||
|
|
||
| Project Design and Architecture | ||
| ------------------------------- | ||
|
|
||
| The new live importers system builds on existing batch importers, while introducing | ||
| a parallel registry and asynchronous execution model for package-first runs. | ||
|
|
||
| Importer Registries | ||
| ^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| * ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2). | ||
| * ``LIVE_IMPORTERS_REGISTRY`` holds live importers. | ||
|
|
||
| Each live importer: | ||
|
|
||
| * Inherits from its batch importer (when logic can be reused), or directly | ||
| from ``VulnerableCodeBaseImporterPipelineV2`` when a separate | ||
| implementation is needed. | ||
| * Declares a ``supported_types`` array, defining compatible package | ||
| ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc). | ||
| * Implements a package-first ``collect_advisories()`` method, which | ||
| restricts results to advisories relevant to the given PURL. | ||
|
|
||
| Live importer executions are asynchronous: once triggered, they are placed in | ||
| a Redis-backed job queue and processed by dedicated workers. This prevents | ||
| blocking the main API thread and allows multiple evaluations to run safely | ||
| in parallel. | ||
|
|
||
| .. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png | ||
| :alt: Class architecture of importers registries | ||
| :align: center | ||
| :width: 70% | ||
|
|
||
| Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and | ||
| ``LIVE_IMPORTERS_REGISTRY``. | ||
|
|
||
| API Endpoint | ||
| ^^^^^^^^^^^^ | ||
|
|
||
| The new API endpoint is responsible for handling live evaluation requests. | ||
|
|
||
| * Input: | ||
|
|
||
| * ``purl`` (required) | ||
| * Execution: | ||
|
|
||
| * Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types`` | ||
| match the PURL. | ||
| * Enqueues the pipelines runs of these live importers in a ``live`` rq. | ||
| * Returns the **Live Run ID**, information about the pipelines to | ||
| run, and the status url. | ||
| * The status URL shows the current state of a live evaluation run | ||
| and its individual pipeline runs. | ||
|
|
||
| * Output: | ||
|
|
||
| * Once workers complete execution, the resulting advisories are imported | ||
| into the database and exposed as JSON through the status endpoint. | ||
|
|
||
| .. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png | ||
| :alt: Live Pipeline Run Class | ||
| :align: center | ||
| :width: 70% | ||
|
|
||
| Live Pipeline Run Class and how it groups multiple PipelineRuns. | ||
|
|
||
| .. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png | ||
| :alt: Live Importers API request flow | ||
| :align: center | ||
| :width: 70% | ||
|
|
||
| Flow of API endpoint: selecting compatible live importers and executing | ||
| them in parallel. | ||
|
|
||
| Integration with VulnTotal | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The new API was integrated into VulnTotal as an optional datasource: | ||
|
|
||
| * VulnTotal now checks the local environment for | ||
| ``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``. | ||
| * If enabled, VulnTotal queries VulnerableCode in package-first mode. | ||
| * This allows VulnTotal to use both its proprietary datasources **and** | ||
| the user's gradually built local database, improving coverage and | ||
| personalization. | ||
|
|
||
| Integration with VulnTotal Browser Extension | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The VulnTotal browser extension was updated to support live importers: | ||
|
|
||
| * Users can enable the "Local VulnerableCode" datasource and live evaluation option. | ||
| * When enabled, package lookups are forwarded to the new API, retrieving | ||
| advisories in real-time. | ||
| * This reduces setup effort—developers can get live vulnerability checks | ||
| directly in their browser, provided they have a local VC instance. | ||
|
|
||
| .. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif | ||
| :alt: Live evaluation demo in VulnTotal browser extension | ||
| :align: center | ||
| :width: 70% | ||
|
|
||
| VulnTotal and its browser extension consuming the new live evaluation API. | ||
|
|
||
| Linked Pull Requests | ||
| -------------------- | ||
|
|
||
| .. list-table:: | ||
| :widths: 10 40 20 | ||
| :header-rows: 1 | ||
|
|
||
| * - Sr. no | ||
| - Name | ||
| - Link | ||
| * - 1 | ||
| - Add Live Evaluation API endpoint and PyPa live pipeline importer | ||
| - `aboutcode-org/vulnerablecode#1969 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1969>`_ | ||
| * - 2 | ||
| - Add Gitlab Live V2 Importer | ||
| - `aboutcode-org/vulnerablecode#1910 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1910>`_ | ||
| * - 3 | ||
| - Add Curl Live Importer V2 | ||
| - `aboutcode-org/vulnerablecode#1923 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1923>`_ | ||
| * - 4 | ||
| - Add Elixir Security Live V2 Importer | ||
| - `aboutcode-org/vulnerablecode#1935 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1935>`_ | ||
| * - 5 | ||
| - Add NPM Live Importer V2 | ||
| - `aboutcode-org/vulnerablecode#1941 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1941>`_ | ||
| * - 6 | ||
| - Add GitHub OSV Live V2 Importer Pipeline | ||
| - `aboutcode-org/vulnerablecode#1977 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1977>`_ | ||
| * - 7 | ||
| - Add Postgres Live V2 Importer Pipeline | ||
| - `aboutcode-org/vulnerablecode#1982 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1982>`_ | ||
| * - 8 | ||
| - Add PySec Live V2 Importer Pipeline | ||
| - `aboutcode-org/vulnerablecode#1983 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1983>`_ | ||
| * - 9 | ||
| - Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation | ||
| - `aboutcode-org/vulnerablecode#1985 | ||
| <https://github.com/aboutcode-org/vulnerablecode/pull/1985>`_ | ||
| * - 10 | ||
| - Integrate Local VulnerableCode datasource and live evaluation | ||
| - `aboutcode-org/vulntotal-extension#17 | ||
| <https://github.com/aboutcode-org/vulntotal-extension/pull/17>`_ | ||
|
|
||
|
|
||
| Closing Thoughts | ||
| ------------------- | ||
|
|
||
| This project was an exciting step forward from my 2024 GSoC work. By moving | ||
| from batch importers to package-first live importers, We enabled a faster, | ||
| more personalized, and more flexible way of building vulnerability databases. | ||
|
|
||
| I especially enjoyed designing the **registry + API architecture** and | ||
| integrating Redis queues and workers for asynchronous execution. This improved | ||
| scalability, responsiveness, and fault tolerance, ensuring the API never blocks | ||
| and multiple live evaluations can run in parallel. I also appreciated discussing | ||
| it with mentors and integrating it seamlessly across | ||
| **VulnerableCode, VulnTotal, and the browser extension**. | ||
|
|
||
| This work lays the foundation for even richer interactivity | ||
| in the ecosystem and brings vulnerability evaluation closer | ||
| to developers' workflows. | ||
|
|
||
| I appreciated the weekly status calls and the feedback I received from my | ||
| mentors and the amazing team. They were really helpful and supportive. | ||
| `Philippe Ombredanne <https://github.com/pombredanne>`_, | ||
| `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_, | ||
| `Tushar Goel <https://github.com/TG1999>`_, | ||
| `Keshav Priyadarshi <https://github.com/keshav-space>`_ | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link your previous GSoC 2024 page here, it's in the same sphinx docs so use https://www.sphinx-doc.org/en/master/usage/referencing.html#ref-role