Reaching out to mentors for “AI-Powered Diagnostic Agent for Edge Devices” #34406
Replies: 2 comments
-
|
Hi @Singh-Gurparas please find the replies below, thanks.
The project aims to answer one question, how do we go from 10,000+ raw logs to few useful insights without burning the CPU? There is a balance between accuracy and completeness.
We have internal datasets, but it's not for public access since it contains private info.
At the initial phase, the plan is that AI performs the "dirty work" (repetitive, dull tasks) while humans focus on high level decision making. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the clarification @andy-vm — that helps. Would it make sense for the pipeline to prioritize log reduction first (e.g., filtering or clustering events) so the LLM only processes a smaller set of high-signal logs? Also, since internal datasets aren’t available, would it be reasonable to experiment with synthetic failure logs or public Linux/kernel log samples to evaluate the diagnostic workflow? Appreciate the guidance. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @liulis-sg and @andy-vm,
Hope you're doing well.
I’m highly intrigued by the “AI-Powered Diagnostic Agent for Edge Devices” project and look forward to applying to GSoC 2026. The project aligns closely with my background, building tooling that helps monitor, detect, analyze, and respond to system-level issues.
From studying the project description, my understanding is that the key challenge is not just log summarization, but building a reliable diagnostic workflow that can:
• Collect heterogeneous signals (system logs, kernel messages, hardware metrics)
• Normalize them into a structured diagnostic context
• Use local models to reason about failure patterns under resource constraints
• Provide actionable recommendations instead of generic summaries
I’ve started sketching a potential architecture with three layers:
Signal Collection Layer – modular collectors for journalctl, dmesg, hardware telemetry, and service logs
Analysis Layer – structured parsing + rule-based triage combined with local LLM reasoning for correlation and explanation
Knowledge Layer – retrieval from curated issue patterns and documentation to map symptoms to fixes
My focus would be on making the agent dependable and explainable rather than purely generative — ensuring it can justify why it suggests a particular root cause or remediation step.
I would appreciate your guidance on a few points:
Should the project prioritize robustness of log interpretation or breadth of integrations in the initial phase?
Are there existing EMT debugging workflows or datasets that would be helpful to study before designing the pipeline?
Would mentors prefer the system to lean more toward deterministic diagnostics with AI assistance, or a more AI-first reasoning approach?
I’m planning to begin experimenting with a small prototype around structured log parsing and local summarization to better understand EMT constraints, and I would value any direction on where early contributions could be most useful.
Look forward to hearing from you. Thanks for your time.
Beta Was this translation helpful? Give feedback.
All reactions