Skip to content

Conversation

@IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Nov 3, 2025

What does this PR do?

This is an attempt to standardize native transformers export support of an export backend (dynamo, onnx)

For now it works with encoder models (bert, vit, etc) which are the easiest, and decoder models (gpt2, llama) which require the creation of a pkv instance with real tensors. this step can be done by using the model's config but for simplicity I'm running a forward pass and retrieving the pkv from the outputs. Dynamic shapes can be passed by user or generated automatically by creating a dict with Dim.AUTO and letting torch infer which axes are dynamic.

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_id = "hf-internal-testing/tiny-random-BertForMaskedLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))

dynamo_gpt2 = AutoModelForMaskedLM.from_pretrained(
    model_id,
    export_config={
        "export_format": "onnx",
        "sample_inputs": sample_inputs,
        "dynamic": True,
        "f": "bert.onnx",
    },
)

# testing with different sized inputs
new_input = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
onnx_outputs = dynamo_gpt2.exported_model.call_reference(**new_input)  # uses numpy under the hood
ort_outputs = dynamo_gpt2.exported_model(**new_input)  # uses onnxruntime under the hood
print(onnx_outputs)
print(ort_outputs)
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hf-internal-testing/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))

dynamo_gpt2 = AutoModelForCausalLM.from_pretrained(
    model_id,
    export_config={
        "export_format": "onnx",
        "sample_inputs": sample_inputs,
        "dynamic": True,
    },
)

# testing with different sized inputs
new_inputs = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
new_inputs["past_key_values"] = dynamo_gpt2(**new_inputs).past_key_values  # we can't pass pkv with empty tensors
onnx_outputs = dynamo_gpt2.exported_model.call_reference(**new_inputs)  # uses numpy under the hood
ort_outputs = dynamo_gpt2.exported_model(**new_inputs)  # uses onnxruntime under the hood
print(onnx_outputs)
print(ort_outputs)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as draft November 3, 2025 14:29
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil
Copy link
Member Author

IlyasMoutawwakil commented Nov 5, 2025

Currently all models (except a select few) are tested and pass the tests successfully !

389 passed, 87 skipped, 413 warnings in 143.73s (0:02:23)

skipped tests either:

  • explicitly skipped with test_torch_exportable = False, this is for custom cache models and some MoEs (15).
  • errors with an informative error torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNod (67).
  • errors with a cryptic Expected cond to be True, but got False.. (16).

@IlyasMoutawwakil IlyasMoutawwakil changed the title Hf exporters [PoC] HF exporters Nov 6, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, aya_vision, bamba, bart, bigbird_pegasus, biogpt, chameleon, cohere2_vision, colqwen2, ctrl, deepseek_vl, deepseek_vl_hybrid, emu3, eomt, evolla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants