[PoC] HF exporters #41992

IlyasMoutawwakil · 2025-11-03T14:20:21Z

What does this PR do?

This is an attempt to standardize native transformers export support of an export backend (dynamo, onnx)

For now it works with encoder models (bert, vit, etc) which are the easiest, and decoder models (gpt2, llama) which require the creation of a pkv instance with real tensors. this step can be done by using the model's config but for simplicity I'm running a forward pass and retrieving the pkv from the outputs. Dynamic shapes can be passed by user or generated automatically by creating a dict with Dim.AUTO and letting torch infer which axes are dynamic.

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_id = "hf-internal-testing/tiny-random-BertForMaskedLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))

dynamo_gpt2 = AutoModelForMaskedLM.from_pretrained(
    model_id,
    export_config={
        "export_format": "onnx",
        "sample_inputs": sample_inputs,
        "dynamic": True,
        "f": "bert.onnx",
    },
)

# testing with different sized inputs
new_input = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
onnx_outputs = dynamo_gpt2.exported_model.call_reference(**new_input)  # uses numpy under the hood
ort_outputs = dynamo_gpt2.exported_model(**new_input)  # uses onnxruntime under the hood
print(onnx_outputs)
print(ort_outputs)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hf-internal-testing/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))

dynamo_gpt2 = AutoModelForCausalLM.from_pretrained(
    model_id,
    export_config={
        "export_format": "onnx",
        "sample_inputs": sample_inputs,
        "dynamic": True,
    },
)

# testing with different sized inputs
new_inputs = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
new_inputs["past_key_values"] = dynamo_gpt2(**new_inputs).past_key_values  # we can't pass pkv with empty tensors
onnx_outputs = dynamo_gpt2.exported_model.call_reference(**new_inputs)  # uses numpy under the hood
ort_outputs = dynamo_gpt2.exported_model(**new_inputs)  # uses onnxruntime under the hood
print(onnx_outputs)
print(ort_outputs)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-11-03T14:35:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ausal LMs

IlyasMoutawwakil · 2025-11-05T11:36:10Z

Currently all models (except a select few) are tested and pass the tests successfully !

389 passed, 87 skipped, 413 warnings in 143.73s (0:02:23)

skipped tests either:

explicitly skipped with test_torch_exportable = False, this is for custom cache models and some MoEs (15).
errors with an informative error torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNod (67).
errors with a cryptic Expected cond to be True, but got False.. (16).

…rmers into hf-exporters

… model types which is more user facing

github-actions · 2025-11-08T22:38:28Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, aya_vision, bamba, bart, bigbird_pegasus, biogpt, chameleon, cohere2_vision, colqwen2, ctrl, deepseek_vl, deepseek_vl_hybrid, emu3, eomt, evolla

IlyasMoutawwakil added 6 commits October 24, 2025 14:51

initial poc

4721d30

support exporting causal models

448206d

fix cache recreation issue

46ef449

group utils

23d96d9

dynamic axis on a best effort basis

e37cf45

allow user to pass their own pkv

7552964

IlyasMoutawwakil marked this pull request as draft November 3, 2025 14:29

IlyasMoutawwakil and others added 15 commits November 4, 2025 07:49

Merge branch 'main' into hf-exporters

5857f10

misc

fba576d

cascading exports

c4a3a2d

add encoder decoder cache support

25904a1

add testing for dynamo exporter

3f95193

fix cases that are easy to fix

f07de57

disable torch export for some models using custom caches

7a9e3f7

fix more models

ba02172

solve issue in model return fake tensors

ba7b4b8

disable more models with custom caches

ad73271

fix biogpt

6b838d9

biogpt

8488793

style

41dda35

error on generative encoder decoders and process attention mask for c…

c157f03

…ausal LMs

prepare_cache_inputs_for_export helper method

6eaa9f1

IlyasMoutawwakil added 6 commits November 5, 2025 13:13

add comments about non-tested models

9c4afb5

style

cfa6977

fix bamba export

e58aca3

paligemma

d2184fe

deepseek and zamba

14ea0d2

skip reformer for its custom cache

a08b663

IlyasMoutawwakil and others added 3 commits November 6, 2025 11:02

comments

f8b172f

monolith encoder-decoder export is possible

26418a0

Merge branch 'main' into hf-exporters

8a89c96

IlyasMoutawwakil changed the title ~~Hf exporters~~ [PoC] HF exporters Nov 6, 2025

IlyasMoutawwakil added 24 commits November 6, 2025 15:33

comments

c31154c

Merge branch 'hf-exporters' of https://github.com/huggingface/transfo…

2b07f0a

…rmers into hf-exporters

catch cryptic torch assertion error

552fd13

revert moe changes

d758aa3

patch moes during the export

6e99760

revert unnecessary changes

43114b0

fix idefics2

3021194

fix bidirectional attention mask causing the cryptic export error

d7942bb

fix granitemoehybrid

982aba5

fix bidirectional

9d542e9

skip qwen 2.5 omni as it has data dependent input processing

293db77

fix idefic

3d6e4b9

skip omdet turbo as it uses a custom cache class

6bdbca7

fix a bunch of seq2seqLM models

cdc38dd

skip some models with highly data-dependent helper methods

f11b7ac

make FalconH1 and qwen3_next exportable

abd43a5

skip emu3

6a2be43

fix some vlms

a7400bf

fix emot and evolla

d7eae95

fix qwen2 audio and vits

c694212

create list of unsupported models and raise not implemented

ff97577

fix timesfm and eomt

62ee8dd

revert ibert

178c7cc

remove test_torch_exportable flag replaced by the list of unsupported…

a8fe5af

… model types which is more user facing

cleanup

11b28e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PoC] HF exporters #41992

[PoC] HF exporters #41992

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PoC] HF exporters #41992

Are you sure you want to change the base?

[PoC] HF exporters #41992

Conversation

IlyasMoutawwakil commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading