Standalone C++ Multimodal Ingestion Pipeline PoC for AIPC (GSoC 2026 Idea 2) #34421

Lagmator22 · 2026-03-01T13:28:10Z

Lagmator22
Mar 1, 2026

Hi @zhaohb and @18582088138,

I'm a sophomore Computer Science student with extensive C++ and ML pipeline experience. I initially expressed my interest in the "OpenVINO Deep Search AI Assistant on Multimodal Personal Database" (Idea 2) project back on January 21st (Discussion #33730), and since then, I've been actively developing a solution.

I've closely followed the discussions here regarding the need for strict resource control and the limitations of Python wrappers for AIPC deployment. To prove my readiness for this project, I went ahead and built OvaSearch: a completely native C++ multimodal RAG engine PoC.

Repository: https://github.com/Lagmator22/OvaSearch
Demo Video: https://drive.google.com/file/d/1lrPoPA8nCbHOSsjEkVldNYA3m0ZxI8St/view?usp=drivesdk

What I have currently implemented:

Native C++ Inference Pipeline: Uses the OpenVINO C++ API for all language and vision execution, completely bypassing the Python GIL for heavy operations and tightly controlling memory overhead.
Multimodal Ingestion: Dynamically handles vectorizing text and images. (PDF/DOCX files drop into the data folder and are auto-converted via an external Python script, which the C++ engine then instantly indexes).
OpenVINO Models: Successfully runs Qwen2-VL-2B (Vision), Llama-3.2-3B (Language), and bge-small-en-v1.5 all locally.
Vector Search: Uses USearch HNSW (384-dim cosine) for efficient, lightweight retrieval.
Granular Caching: Per-file caching that detects file modifications and updates only those specific chunks without requiring a full knowledge base rebuild.
Production Routing: Built-in filename-aware retrieval to isolate sources and prevent cross-document LLM hallucinations.
Proposed GSoC Extensions:

Building on this robust C++ backend, my goals for the summer timeline would be:

Video Ingestion Pipeline: Integrating OpenCV with the Vision model to sample and analyze video frames.
Hardware Device Routing: Adding explicit OpenVINO device selection flags to map the LLM to the GPU and Embeddings to the NPU on Intel AI PCs.
Graphical Interface: Building a standalone native Qt/ImGui desktop UI over the current CLI.
Advanced Memory Management: Implementing LRU cache eviction for scaled personal databases.
I believe this PoC directly addresses the core requirements for handling Word, PDF, images, and videos on AI PCs with strict resource control.

I would love your feedback on the architecture and whether this aligns with your vision for the project!

Best, Gurman Singh (GitHub: lagmator22)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone C++ Multimodal Ingestion Pipeline PoC for AIPC (GSoC 2026 Idea 2) #34421

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Standalone C++ Multimodal Ingestion Pipeline PoC for AIPC (GSoC 2026 Idea 2) #34421

Uh oh!

Lagmator22 Mar 1, 2026

Replies: 0 comments

Lagmator22
Mar 1, 2026