[OpenVINO] Add agentic skill for adding new model support#1616
[OpenVINO] Add agentic skill for adding new model support#1616rkazants wants to merge 13 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Pull request overview
This PR adds an agentic skill documentation file that provides comprehensive guidance for adding support for new model architectures from HuggingFace transformers and diffusers libraries to the optimum-intel project. The skill enables model export to OpenVINO IR format and inference through the optimum-intel API.
Changes:
- Added a detailed skill documentation file (
skills/SKILL.md) containing workflows, code examples, and reference materials for implementing new model support in optimum-intel - Included practical examples for model architecture analysis and patching patterns (particularly Mixture of Experts)
- Provided references to test files, documentation locations, and external resources
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add quantization test requirement in SKILL.md.
|
@rkazants @popovaan @IlyasMoutawwakil @echarlaix @ljaljushkin This skill is a fantastic addition to optimum. OpenVINO repositories have good user documentation, but for development and contribution it remains difficult for new contributors to find somewhere to start. Though I haven't contributed yet, I am advanced openvino user and still struggle to see how exactly optimum fits together. Even with source code study, tools like deepwiki, reading PRs from your team openvino repositories demand high level of skill from developers to even start. That said, this skill takes an approach to documentation that could be a healthy direction. Maintainer perspective on the procedure of model adding is valuable, and largely missing from this repository. See this discussion case;
Look, I run OpenArc, and we have a discord server with a ton of users... we often discuss the intel ecosystem. One topic that we identify as a tension, are the excellent performance of openvino vs its lacking SOTA open source model support, again contrast against the difficulty of adding support. I think more documentation like this skill could encourage a healthy contribution environment. Plus, I learned a ton from reading this skill about optimum, and applied some its lessons to my adventures from-scratching qwen-asr/qwen-tts, and had some insights into why my attempt to patch glm-4.7 flash failed. Overall, openvino documentation lacks hands on discussion and I encourage your team to devote more development time to this sort of addition. Learning openvino needs to be easier! Thanks for your work, I am always learning so much from everyone who contributes!! |
|
@rkazants Used this skill to support https://huggingface.co/zai-org/GLM-4.7-Flash Perf with GenAI: The model is too big to run transformers ground truth with WWB, gets OOM killed (128gb ram). Proposal to skill improvement: instruct model to clone appropriate transformers version to workspace. Agent tries to read transformer source with custom bash commands or python scripts. I believe it will be more efficient to clone sources and use tool calls. |
What does this PR do?
Fixes # (issue)
Before submitting