[OpenVINO] support ai-sage/GigaChat3-10B-A1.8B-bf16#1626
[OpenVINO] support ai-sage/GigaChat3-10B-A1.8B-bf16#1626Mohamed-Ashraf273 wants to merge 41 commits intohuggingface:mainfrom
Conversation
d376bab to
f88c6a8
Compare
949c2b6 to
c26ffe8
Compare
|
Hi @popovaan , |
|
Thanks for the PR! Please add tests for this model. For now, use a locally generated tiny model. I'm currently investigating whether we're allowed to invite GSoC contributors to the |
Got it, thanks! |
Hi @popovaan, @rkazants, |
rkazants
left a comment
There was a problem hiding this comment.
please also add export tests. The same test set that you have added for the previuos model.
Update documentation.
There was a problem hiding this comment.
Pull request overview
This PR aims to add OpenVINO export/inference support coverage for the ai-sage/GigaChat3-10B-A1.8B-bf16 family by extending OpenVINO test fixtures and adjusting DeepSeek patching logic used during export.
Changes:
- Add a
gigachat3tiny-random model fixture and include it in OpenVINO decoder integration coverage. - Update decoder tests for
gigachat3(expected SDPA count, relaxed logits tolerance, and skip conditions for incompatible Transformers versions). - Refactor DeepSeek attention patching to use a versioned factory function and extend MoE patching to handle MLP blocks exposing
expertsbut notmoe_infer.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
tests/openvino/utils_tests.py |
Adds the gigachat3 test model mapping; adjusts which models are treated as remote-code in tests. |
tests/openvino/test_decoder.py |
Adds gigachat3 to tested architectures and config expectations; tweaks tolerance/skip logic; adds debug output. |
optimum/exporters/openvino/model_patcher.py |
Updates DeepSeek patcher to use a unified attention forward factory and broadens MoE patching behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
@rkazants |
Hi @popovaan, I’ve finished adding the tests and temporarily published Would it be possible to invite me to the group so I can publish it there, or would you prefer to handle the publishing? Please let me know if any changes are needed. |
|
Hi. Can I help test the model? |
Hi! |
|
https://huggingface.co/ai-sage/GigaChat3.1-10B-A1.8B-bf16 I hope they didn't change the architecture.🥺 |
|
Hi @rkazants I’ve completed all the requested revisions. Could you please take a look? |
|
@Mohamed-Ashraf273, please confirm that you have tested a real model (not tiny one) and the generation results match to the reference. I am asking it because I am not sure if past_key)values are updated, |
|
@Mohamed-Ashraf273, btw just yesterday new Gigachat 3.1 released: ai-sage/GigaChat3.1-10B-A1.8B-bf16 |
@rkazants |
Got it, gonna check it after solving the issues. Thanks! |
|
@rkazants , @popovaan
and for fast-test (4 samples):
Test script: import argparse
from importlib.resources import files
from pathlib import Path
import torch
import whowhatbench
import yaml
from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
DEFAULT_MODEL_ID = "ai-sage/GigaChat3-10B-A1.8B-bf16"
DEFAULT_MODEL_DIR = "./output_dir"
DEFAULT_MAX_NEW_TOKENS = 128
FAST_PROMPTS = [
"Who is the most famous programmer?",
"Who is Leo Tolstoy?",
"Explain what artificial intelligence is.",
"What is deep learning?",
]
def load_full_prompts():
prompt_path = files("whowhatbench.prompts").joinpath("text_prompts.yaml")
prompt_data = yaml.safe_load(prompt_path.read_text(encoding="utf-8"))
return prompt_data["en"]["prompts"]
def build_generation_kwargs(tokenizer, max_new_tokens: int):
eos_token_id = tokenizer.eos_token_id
pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else eos_token_id
bos_token_id = tokenizer.bos_token_id
generation_kwargs = {
"do_sample": False,
"num_beams": 1,
"max_new_tokens": max_new_tokens,
"use_cache": True,
"use_model_defaults": False,
}
if eos_token_id is not None:
generation_kwargs["eos_token_id"] = eos_token_id
if pad_token_id is not None:
generation_kwargs["pad_token_id"] = pad_token_id
if bos_token_id is not None:
generation_kwargs["bos_token_id"] = bos_token_id
return generation_kwargs
def prepare_inputs(model, tokenizer, prompt: str):
device = getattr(model, "device", "cpu")
inputs = tokenizer(prompt, return_tensors="pt").to(device)
inputs.pop("token_type_ids", None)
return inputs
def generate_answer(
model,
tokenizer,
prompt,
max_new_tokens,
crop_question,
use_chat_template=False,
empty_adapters=False,
num_assistant_tokens=0,
assistant_confidence_threshold=0.0,
):
del crop_question, use_chat_template, empty_adapters, num_assistant_tokens, assistant_confidence_threshold
inputs = prepare_inputs(model, tokenizer, prompt)
tokens = model.generate(**inputs, **build_generation_kwargs(tokenizer, max_new_tokens))
prompt_len = inputs["input_ids"].shape[-1]
return tokenizer.decode(tokens[0, prompt_len:], skip_special_tokens=True)
def load_models(model_id: str, model_dir: str):
model_dir_path = Path(model_dir).expanduser().resolve()
if not model_dir_path.exists():
raise FileNotFoundError(
f"OpenVINO model directory was not found: {model_dir_path}. "
"Run `python demo.py export` first or pass the correct exported model path."
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
base_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
optimized_model = OVModelForCausalLM.from_pretrained(
str(model_dir_path),
use_cache=True,
load_in_8bit=False,
quantization_config=None,
)
return tokenizer, base_model, optimized_model
def export_model(model_id: str, model_dir: str):
model_dir_path = Path(model_dir).expanduser().resolve()
model_dir_path.mkdir(parents=True, exist_ok=True)
optimized_model = OVModelForCausalLM.from_pretrained(
model_id,
export=True,
use_cache=True,
load_in_8bit=False,
quantization_config=None,
)
optimized_model.save_pretrained(str(model_dir_path))
print(f"Saved OpenVINO model to {model_dir_path}")
def run_test(model_id: str, model_dir: str, prompts, max_new_tokens: int, top_k: int):
tokenizer, base_model, optimized_model = load_models(model_id, model_dir)
evaluator = whowhatbench.TextEvaluator(
base_model=base_model,
tokenizer=tokenizer,
test_data=prompts,
max_new_tokens=max_new_tokens,
use_chat_template=False,
gen_answer_fn=generate_answer,
)
_, metrics = evaluator.score(optimized_model, gen_answer_fn=generate_answer)
print("similarity:", metrics["similarity"][0])
print()
print("Worst examples:")
for example in evaluator.worst_examples(top_k=top_k, metric="similarity"):
print("=========================")
print("Prompt:", example["prompt"])
print("Baseline:", example["source_model"])
print("Optimized:", example["optimized_model"])
print()
def build_parser():
parser = argparse.ArgumentParser(description="Simple GigaChat3 OpenVINO demo.")
subparsers = parser.add_subparsers(dest="command", required=True)
export_parser = subparsers.add_parser("export")
export_parser.add_argument("--model-id", default=DEFAULT_MODEL_ID)
export_parser.add_argument("--model-dir", default=DEFAULT_MODEL_DIR)
test_parser = subparsers.add_parser("test")
test_parser.add_argument("--model-id", default=DEFAULT_MODEL_ID)
test_parser.add_argument("--model-dir", default=DEFAULT_MODEL_DIR)
test_parser.add_argument("--max-new-tokens", type=int, default=DEFAULT_MAX_NEW_TOKENS)
test_parser.add_argument("--top-k", type=int, default=5)
fast_test_parser = subparsers.add_parser("fast-test")
fast_test_parser.add_argument("--model-id", default=DEFAULT_MODEL_ID)
fast_test_parser.add_argument("--model-dir", default=DEFAULT_MODEL_DIR)
fast_test_parser.add_argument("--max-new-tokens", type=int, default=DEFAULT_MAX_NEW_TOKENS)
fast_test_parser.add_argument("--top-k", type=int, default=5)
return parser
def main():
args = build_parser().parse_args()
if args.command == "export":
export_model(args.model_id, args.model_dir)
return
if args.command == "test":
run_test(
model_id=args.model_id,
model_dir=args.model_dir,
prompts=load_full_prompts(),
max_new_tokens=args.max_new_tokens,
top_k=args.top_k,
)
return
run_test(
model_id=args.model_id,
model_dir=args.model_dir,
prompts=FAST_PROMPTS,
max_new_tokens=args.max_new_tokens,
top_k=args.top_k,
)
if __name__ == "__main__":
main()How reproduce tests: python demo.py export
python demo.py test
python demo.py fast-test
|
@rkazants, |
|
@rkazants , @popovaan Format test: |
|
Hello. Could you please tell me at what point can I start converting a model and not update the converted model anymore? At what point is everything considered secure? Sorry for the inconvenience. |
|
@savvadesogle |



What does this PR do?
Conversion cmd-line for ai-sage/GigaChat3-10B-A1.8B-bf16:
optimum-cli export openvino -m ai-sage/GigaChat3-10B-A1.8B-bf16 ./output_dir --task text-generation-with-pastInference of
ai-sage/GigaChat3-10B-A1.8B-bf16using OpenVINO backend:Solving Issue: #1608
Before submitting