Skip to content

[OpenVINO] Add agentic skill for adding new model support#1616

Draft
rkazants wants to merge 13 commits intohuggingface:mainfrom
rkazants:agentic_model_adding
Draft

[OpenVINO] Add agentic skill for adding new model support#1616
rkazants wants to merge 13 commits intohuggingface:mainfrom
rkazants:agentic_model_adding

Conversation

@rkazants
Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an agentic skill documentation file that provides comprehensive guidance for adding support for new model architectures from HuggingFace transformers and diffusers libraries to the optimum-intel project. The skill enables model export to OpenVINO IR format and inference through the optimum-intel API.

Changes:

  • Added a detailed skill documentation file (skills/SKILL.md) containing workflows, code examples, and reference materials for implementing new model support in optimum-intel
  • Included practical examples for model architecture analysis and patching patterns (particularly Mixture of Experts)
  • Provided references to test files, documentation locations, and external resources

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rkazants and others added 6 commits February 19, 2026 16:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@SearchSavior
Copy link
Copy Markdown

@rkazants @popovaan @IlyasMoutawwakil @echarlaix @ljaljushkin
Hey guys!

This skill is a fantastic addition to optimum. OpenVINO repositories have good user documentation, but for development and contribution it remains difficult for new contributors to find somewhere to start. Though I haven't contributed yet, I am advanced openvino user and still struggle to see how exactly optimum fits together. Even with source code study, tools like deepwiki, reading PRs from your team openvino repositories demand high level of skill from developers to even start.

That said, this skill takes an approach to documentation that could be a healthy direction. Maintainer perspective on the procedure of model adding is valuable, and largely missing from this repository. See this discussion case;

The original code contains a conditional branch inside a Python for-loop. For certain example inputs, this branch may be skipped during tracing, resulting in an incorrect or incomplete final graph. Additionally, the non-vectorized implementation produces a very large OpenVINO graph with excessive nodes, which is expensive for graph transformations and significantly increases model conversion time. So here is the patch that provides a vectorized form of MoE....

Look, I run OpenArc, and we have a discord server with a ton of users... we often discuss the intel ecosystem. One topic that we identify as a tension, are the excellent performance of openvino vs its lacking SOTA open source model support, again contrast against the difficulty of adding support.

I think more documentation like this skill could encourage a healthy contribution environment. Plus, I learned a ton from reading this skill about optimum, and applied some its lessons to my adventures from-scratching qwen-asr/qwen-tts, and had some insights into why my attempt to patch glm-4.7 flash failed.

Overall, openvino documentation lacks hands on discussion and I encourage your team to devote more development time to this sort of addition. Learning openvino needs to be easier!

Thanks for your work, I am always learning so much from everyone who contributes!!

@as-suvorov
Copy link
Copy Markdown

@rkazants Used this skill to support https://huggingface.co/zai-org/GLM-4.7-Flash
PR: as-suvorov#1

Perf with GenAI:
1st token latency | 2339.79 ms
2nd token latency | 425.86 ms/token
Throughput | 2.35 tokens/s

The model is too big to run transformers ground truth with WWB, gets OOM killed (128gb ram).
wwb similarity for int4 model optimum-intel vs GenAI: 0.9742136

Proposal to skill improvement: instruct model to clone appropriate transformers version to workspace. Agent tries to read transformer source with custom bash commands or python scripts. I believe it will be more efficient to clone sources and use tool calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants