Standalone C++ Multimodal Ingestion Pipeline PoC for AIPC (GSoC 2026 Idea 2) #34421
Lagmator22
started this conversation in
Google Summer of Code
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @zhaohb and @18582088138,
I'm a sophomore Computer Science student with extensive C++ and ML pipeline experience. I initially expressed my interest in the "OpenVINO Deep Search AI Assistant on Multimodal Personal Database" (Idea 2) project back on January 21st (Discussion #33730), and since then, I've been actively developing a solution.
I've closely followed the discussions here regarding the need for strict resource control and the limitations of Python wrappers for AIPC deployment. To prove my readiness for this project, I went ahead and built OvaSearch: a completely native C++ multimodal RAG engine PoC.
Repository: https://github.com/Lagmator22/OvaSearch
Demo Video: https://drive.google.com/file/d/1lrPoPA8nCbHOSsjEkVldNYA3m0ZxI8St/view?usp=drivesdk
What I have currently implemented:
Native C++ Inference Pipeline: Uses the OpenVINO C++ API for all language and vision execution, completely bypassing the Python GIL for heavy operations and tightly controlling memory overhead.
Multimodal Ingestion: Dynamically handles vectorizing text and images. (PDF/DOCX files drop into the data folder and are auto-converted via an external Python script, which the C++ engine then instantly indexes).
OpenVINO Models: Successfully runs Qwen2-VL-2B (Vision), Llama-3.2-3B (Language), and bge-small-en-v1.5 all locally.
Vector Search: Uses USearch HNSW (384-dim cosine) for efficient, lightweight retrieval.
Granular Caching: Per-file caching that detects file modifications and updates only those specific chunks without requiring a full knowledge base rebuild.
Production Routing: Built-in filename-aware retrieval to isolate sources and prevent cross-document LLM hallucinations.
Proposed GSoC Extensions:
Building on this robust C++ backend, my goals for the summer timeline would be:
Video Ingestion Pipeline: Integrating OpenCV with the Vision model to sample and analyze video frames.
Hardware Device Routing: Adding explicit OpenVINO device selection flags to map the LLM to the GPU and Embeddings to the NPU on Intel AI PCs.
Graphical Interface: Building a standalone native Qt/ImGui desktop UI over the current CLI.
Advanced Memory Management: Implementing LRU cache eviction for scaled personal databases.
I believe this PoC directly addresses the core requirements for handling Word, PDF, images, and videos on AI PCs with strict resource control.
I would love your feedback on the architecture and whether this aligns with your vision for the project!
Best, Gurman Singh (GitHub: lagmator22)
Beta Was this translation helpful? Give feedback.
All reactions