Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/source/archive/gsoc-toc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ designed to encourage university student participation in open source
software development. It was started by Google in 2005. More about GSoC -
`<https://summerofcode.withgoogle.com/about/>`_

GSoC 2025
---------

.. toctree::
:maxdepth: 2

gsoc/reports/2025/vulnerablecode_michael

GSoC 2024
---------

Expand Down
230 changes: 230 additions & 0 deletions docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
VulnerableCode: On-demand live evaluation of packages
=====================================================

Organization - `AboutCode <https://www.aboutcode.org>`_
-----------------------------------------------------------
| **Michael Ehab Mikhail**
| GitHub: `michaelehab <https://github.com/michaelehab>`_
| LinkedIn: `@michaelehab16 <https://www.linkedin.com/in/michaelehab16/>`_
| Project: `VulnerableCode
<https://github.com/aboutcode-org/vulnerablecode>`_
| Official GSoC project page: `Project Link
<https://summerofcode.withgoogle.com/programs/2025/projects/uF0kzMAg>`_
| GSoC Proposal: `Proposal Link
<https://docs.google.com/document/d/1Tkk4MoPWXFj9r_U5cp3E4AhJW6QlHxTElyzpII_f4LM/edit?usp=sharing>`_

Overview
--------

VulnerableCode traditionally relied on **batch importers** to fetch
and store all advisories from a source at once. While effective for
building complete databases, batch importers are slow and
resource-heavy for developers who only need vulnerability
data for a **single package**.

This project introduces **live importers**, a new class of
importers that operate in a *package-first* mode. Instead of
pulling all advisories, they run against a single
PackageURL (PURL), returning only the advisories affecting
that package. This makes vulnerability evaluation
**faster, more efficient, and more personalized**, since the
database is gradually filled with only the advisories
that matter to each user.

To support this, I added:

* A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers.
* A new **API endpoint** that accepts a **PURL**, enqueues compatible
live importer pipelines into a Redis queue, and executes them asynchronously
via workers.
* Integration with **VulnTotal** and its **browser extension**, enabling users
to evaluate packages in real-time through a seamless interface.

This work bridges the gap between **batch-first databases** and
**package-first queries**, improving VulnerableCode's flexibility and enabling
better integration with developer workflows.

.. note::
A PURL (Package URL) is a universal way to identify and locate software
packages. `More on PURL <https://github.com/package-url/purl-spec>`_


Project Design and Architecture
-------------------------------

The new live importers system builds on existing batch importers, while introducing
a parallel registry and asynchronous execution model for package-first runs.

Importer Registries
^^^^^^^^^^^^^^^^^^^

* ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2).
* ``LIVE_IMPORTERS_REGISTRY`` holds live importers.

Each live importer:

* Inherits from its batch importer (when logic can be reused), or directly
from ``VulnerableCodeBaseImporterPipelineV2`` when a separate
implementation is needed.
* Declares a ``supported_types`` array, defining compatible package
ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc).
* Implements a package-first ``collect_advisories()`` method, which
restricts results to advisories relevant to the given PURL.

Live importer executions are asynchronous: once triggered, they are placed in
a Redis-backed job queue and processed by dedicated workers. This prevents
blocking the main API thread and allows multiple evaluations to run safely
in parallel.

.. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png
:alt: Class architecture of importers registries
:align: center
:width: 70%

Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and
``LIVE_IMPORTERS_REGISTRY``.

API Endpoint
^^^^^^^^^^^^

The new API endpoint is responsible for handling live evaluation requests.

* Input:

* ``purl`` (required)
* Execution:

* Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types``
match the PURL.
* Enqueues the pipelines runs of these live importers in a ``live`` rq.
* Returns the **Live Run ID**, information about the pipelines to
run, and the status url.
* The status URL shows the current state of a live evaluation run
and its individual pipeline runs.

* Output:

* Once workers complete execution, the resulting advisories are imported
into the database and exposed as JSON through the status endpoint.

.. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png
:alt: Live Pipeline Run Class
:align: center
:width: 70%

Live Pipeline Run Class and how it groups multiple PipelineRuns.

.. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png
:alt: Live Importers API request flow
:align: center
:width: 70%

Flow of API endpoint: selecting compatible live importers and executing
them in parallel.

Integration with VulnTotal
^^^^^^^^^^^^^^^^^^^^^^^^^^

The new API was integrated into VulnTotal as an optional datasource:

* VulnTotal now checks the local environment for
``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``.
* If enabled, VulnTotal queries VulnerableCode in package-first mode.
* This allows VulnTotal to use both its proprietary datasources **and**
the user's gradually built local database, improving coverage and
personalization.

Integration with VulnTotal Browser Extension
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The VulnTotal browser extension was updated to support live importers:

* Users can enable the "Local VulnerableCode" datasource and live evaluation option.
* When enabled, package lookups are forwarded to the new API, retrieving
advisories in real-time.
* This reduces setup effort—developers can get live vulnerability checks
directly in their browser, provided they have a local VC instance.

.. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif
:alt: Live evaluation demo in VulnTotal browser extension
:align: center
:width: 70%

VulnTotal and its browser extension consuming the new live evaluation API.

Linked Pull Requests
--------------------

.. list-table::
:widths: 10 40 20
:header-rows: 1

* - Sr. no
- Name
- Link
* - 1
- Add Live Evaluation API endpoint and PyPa live pipeline importer
- `aboutcode-org/vulnerablecode#1969
<https://github.com/aboutcode-org/vulnerablecode/pull/1969>`_
* - 2
- Add Gitlab Live V2 Importer
- `aboutcode-org/vulnerablecode#1910
<https://github.com/aboutcode-org/vulnerablecode/pull/1910>`_
* - 3
- Add Curl Live Importer V2
- `aboutcode-org/vulnerablecode#1923
<https://github.com/aboutcode-org/vulnerablecode/pull/1923>`_
* - 4
- Add Elixir Security Live V2 Importer
- `aboutcode-org/vulnerablecode#1935
<https://github.com/aboutcode-org/vulnerablecode/pull/1935>`_
* - 5
- Add NPM Live Importer V2
- `aboutcode-org/vulnerablecode#1941
<https://github.com/aboutcode-org/vulnerablecode/pull/1941>`_
* - 6
- Add GitHub OSV Live V2 Importer Pipeline
- `aboutcode-org/vulnerablecode#1977
<https://github.com/aboutcode-org/vulnerablecode/pull/1977>`_
* - 7
- Add Postgres Live V2 Importer Pipeline
- `aboutcode-org/vulnerablecode#1982
<https://github.com/aboutcode-org/vulnerablecode/pull/1982>`_
* - 8
- Add PySec Live V2 Importer Pipeline
- `aboutcode-org/vulnerablecode#1983
<https://github.com/aboutcode-org/vulnerablecode/pull/1983>`_
* - 9
- Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation
- `aboutcode-org/vulnerablecode#1985
<https://github.com/aboutcode-org/vulnerablecode/pull/1985>`_
* - 10
- Integrate Local VulnerableCode datasource and live evaluation
- `aboutcode-org/vulntotal-extension#17
<https://github.com/aboutcode-org/vulntotal-extension/pull/17>`_


Closing Thoughts
-------------------

This project was an exciting step forward from my 2024 GSoC work. By moving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link your previous GSoC 2024 page here, it's in the same sphinx docs so use https://www.sphinx-doc.org/en/master/usage/referencing.html#ref-role

from batch importers to package-first live importers, We enabled a faster,
more personalized, and more flexible way of building vulnerability databases.

I especially enjoyed designing the **registry + API architecture** and
integrating Redis queues and workers for asynchronous execution. This improved
scalability, responsiveness, and fault tolerance, ensuring the API never blocks
and multiple live evaluations can run in parallel. I also appreciated discussing
it with mentors and integrating it seamlessly across
**VulnerableCode, VulnTotal, and the browser extension**.

This work lays the foundation for even richer interactivity
in the ecosystem and brings vulnerability evaluation closer
to developers' workflows.

I appreciated the weekly status calls and the feedback I received from my
mentors and the amazing team. They were really helpful and supportive.
`Philippe Ombredanne <https://github.com/pombredanne>`_,
`Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_,
`Tushar Goel <https://github.com/TG1999>`_,
`Keshav Priyadarshi <https://github.com/keshav-space>`_