diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png new file mode 100644 index 0000000..266d2e7 Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif b/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif new file mode 100644 index 0000000..874eb53 Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png new file mode 100644 index 0000000..111eaba Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png new file mode 100644 index 0000000..fae1fa7 Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png differ diff --git a/docs/source/archive/gsoc-toc.rst b/docs/source/archive/gsoc-toc.rst index 421be09..5cf915b 100755 --- a/docs/source/archive/gsoc-toc.rst +++ b/docs/source/archive/gsoc-toc.rst @@ -8,6 +8,14 @@ designed to encourage university student participation in open source software development. It was started by Google in 2005. More about GSoC - ``_ +GSoC 2025 +--------- + +.. toctree:: + :maxdepth: 2 + + gsoc/reports/2025/vulnerablecode_michael + GSoC 2024 --------- diff --git a/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst b/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst new file mode 100644 index 0000000..1e43e60 --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst @@ -0,0 +1,230 @@ +VulnerableCode: On-demand live evaluation of packages +===================================================== + +Organization - `AboutCode `_ +----------------------------------------------------------- +| **Michael Ehab Mikhail** +| GitHub: `michaelehab `_ +| LinkedIn: `@michaelehab16 `_ +| Project: `VulnerableCode + `_ +| Official GSoC project page: `Project Link + `_ +| GSoC Proposal: `Proposal Link + `_ + +Overview +-------- + +VulnerableCode traditionally relied on **batch importers** to fetch +and store all advisories from a source at once. While effective for +building complete databases, batch importers are slow and +resource-heavy for developers who only need vulnerability +data for a **single package**. + +This project introduces **live importers**, a new class of +importers that operate in a *package-first* mode. Instead of +pulling all advisories, they run against a single +PackageURL (PURL), returning only the advisories affecting +that package. This makes vulnerability evaluation +**faster, more efficient, and more personalized**, since the +database is gradually filled with only the advisories +that matter to each user. + +To support this, I added: + +* A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers. +* A new **API endpoint** that accepts a **PURL**, enqueues compatible + live importer pipelines into a Redis queue, and executes them asynchronously + via workers. +* Integration with **VulnTotal** and its **browser extension**, enabling users + to evaluate packages in real-time through a seamless interface. + +This work bridges the gap between **batch-first databases** and +**package-first queries**, improving VulnerableCode's flexibility and enabling +better integration with developer workflows. + +.. note:: + A PURL (Package URL) is a universal way to identify and locate software + packages. `More on PURL `_ + + +Project Design and Architecture +------------------------------- + +The new live importers system builds on existing batch importers, while introducing +a parallel registry and asynchronous execution model for package-first runs. + +Importer Registries +^^^^^^^^^^^^^^^^^^^ + +* ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2). +* ``LIVE_IMPORTERS_REGISTRY`` holds live importers. + +Each live importer: + +* Inherits from its batch importer (when logic can be reused), or directly + from ``VulnerableCodeBaseImporterPipelineV2`` when a separate + implementation is needed. +* Declares a ``supported_types`` array, defining compatible package + ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc). +* Implements a package-first ``collect_advisories()`` method, which + restricts results to advisories relevant to the given PURL. + +Live importer executions are asynchronous: once triggered, they are placed in +a Redis-backed job queue and processed by dedicated workers. This prevents +blocking the main API thread and allows multiple evaluations to run safely +in parallel. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png + :alt: Class architecture of importers registries + :align: center + :width: 70% + + Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and + ``LIVE_IMPORTERS_REGISTRY``. + +API Endpoint +^^^^^^^^^^^^ + +The new API endpoint is responsible for handling live evaluation requests. + +* Input: + + * ``purl`` (required) +* Execution: + + * Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types`` + match the PURL. + * Enqueues the pipelines runs of these live importers in a ``live`` rq. + * Returns the **Live Run ID**, information about the pipelines to + run, and the status url. + * The status URL shows the current state of a live evaluation run + and its individual pipeline runs. + +* Output: + + * Once workers complete execution, the resulting advisories are imported + into the database and exposed as JSON through the status endpoint. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png + :alt: Live Pipeline Run Class + :align: center + :width: 70% + + Live Pipeline Run Class and how it groups multiple PipelineRuns. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png + :alt: Live Importers API request flow + :align: center + :width: 70% + + Flow of API endpoint: selecting compatible live importers and executing + them in parallel. + +Integration with VulnTotal +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The new API was integrated into VulnTotal as an optional datasource: + +* VulnTotal now checks the local environment for + ``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``. +* If enabled, VulnTotal queries VulnerableCode in package-first mode. +* This allows VulnTotal to use both its proprietary datasources **and** + the user's gradually built local database, improving coverage and + personalization. + +Integration with VulnTotal Browser Extension +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The VulnTotal browser extension was updated to support live importers: + +* Users can enable the "Local VulnerableCode" datasource and live evaluation option. +* When enabled, package lookups are forwarded to the new API, retrieving + advisories in real-time. +* This reduces setup effort—developers can get live vulnerability checks + directly in their browser, provided they have a local VC instance. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif + :alt: Live evaluation demo in VulnTotal browser extension + :align: center + :width: 70% + + VulnTotal and its browser extension consuming the new live evaluation API. + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - Add Live Evaluation API endpoint and PyPa live pipeline importer + - `aboutcode-org/vulnerablecode#1969 + `_ + * - 2 + - Add Gitlab Live V2 Importer + - `aboutcode-org/vulnerablecode#1910 + `_ + * - 3 + - Add Curl Live Importer V2 + - `aboutcode-org/vulnerablecode#1923 + `_ + * - 4 + - Add Elixir Security Live V2 Importer + - `aboutcode-org/vulnerablecode#1935 + `_ + * - 5 + - Add NPM Live Importer V2 + - `aboutcode-org/vulnerablecode#1941 + `_ + * - 6 + - Add GitHub OSV Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1977 + `_ + * - 7 + - Add Postgres Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1982 + `_ + * - 8 + - Add PySec Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1983 + `_ + * - 9 + - Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation + - `aboutcode-org/vulnerablecode#1985 + `_ + * - 10 + - Integrate Local VulnerableCode datasource and live evaluation + - `aboutcode-org/vulntotal-extension#17 + `_ + + +Closing Thoughts +------------------- + +This project was an exciting step forward from my 2024 GSoC work. By moving +from batch importers to package-first live importers, We enabled a faster, +more personalized, and more flexible way of building vulnerability databases. + +I especially enjoyed designing the **registry + API architecture** and +integrating Redis queues and workers for asynchronous execution. This improved +scalability, responsiveness, and fault tolerance, ensuring the API never blocks +and multiple live evaluations can run in parallel. I also appreciated discussing +it with mentors and integrating it seamlessly across +**VulnerableCode, VulnTotal, and the browser extension**. + +This work lays the foundation for even richer interactivity +in the ecosystem and brings vulnerability evaluation closer +to developers' workflows. + +I appreciated the weekly status calls and the feedback I received from my +mentors and the amazing team. They were really helpful and supportive. +`Philippe Ombredanne `_, +`Ayan Sinha Mahapatra `_, +`Tushar Goel `_, +`Keshav Priyadarshi `_