Skip to content

omni entrypoint support tokenizer arg#572

Merged
Gaohan123 merged 9 commits intovllm-project:mainfrom
divyanshsinghvi:support_base_engine_args_omni_entrypoint
Jan 8, 2026
Merged

omni entrypoint support tokenizer arg#572
Gaohan123 merged 9 commits intovllm-project:mainfrom
divyanshsinghvi:support_base_engine_args_omni_entrypoint

Conversation

@divyanshsinghvi
Copy link
Contributor

@divyanshsinghvi divyanshsinghvi commented Jan 1, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fixes #571

[Will help in understanding context]:

#498 (comment) ](#571 (comment))

Essentially: The issue comes for the non standard models which doesn't have config.json and even if I manually add config.json if the tokenizer is in subfolder it can't be specified in automap.

The structure of the repo https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512/tree/main

Here the tokenizer is in subfolder CosyBlank-EN

So based on my understanding currently AutoTokenizer can't find the path so one has to specify in stage_config .yaml but they don't support relative paths, and though I can add local paths to the same in config but for each user it will be different, so a better way would be to pass it directly from the entrypoint stage.

Also separately this might allow user to make changes to engine parameters like gpu_memory_utilization rather than changing the yaml everytime.

It will allow passing tokenizer like this.

    omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        tokenizer=args.tokenizer,
    )

Test Plan

#498 Requires passing tokenizer through argument to avoid any hardcoded paths in configs.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@divyanshsinghvi divyanshsinghvi marked this pull request as ready for review January 1, 2026 10:49
@hsliuustc0106
Copy link
Collaborator

@Gaohan123 @ywang96 PTAL

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide a typical usage example that needs this modification in PR description?

@divyanshsinghvi
Copy link
Contributor Author

divyanshsinghvi commented Jan 2, 2026

Could you please provide a typical usage example that needs this modification in PR description?

Updated if it helps. I had marked the comments also in the description if that helps to clarify context.

Maybe a question to ask would be why we should not be allowing passing arguments like this?
Reason I can think about it:
a. Adds behaviour where if one likes a single point of update to params not multiple ways where priority needs to be setup correctly this approach won't fit that scenario.
b. One may inadvertently expose params which are ideally not user facing.

Any other reason?

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 4, 2026
@Gaohan123
Copy link
Collaborator

After thinking, it is more complicated than I expected. Actually, in PR #206 I already supported passing vllm cli args to AsyncOmni. Now the AR module and diffusion module have been unified into stage expression. The feature needs to be adapted. Ideally, we better not add an additional base_engine_args argument, which is not consistent with traditional vllm usage. We should filter args needed by LLMEngine and DiffusionEngine respectively from all kwargs. Just like the method, EngineArgs.from_cli_args(). Thanks a lot for your effort. Please check if you can understand my concern and further modify it.

@divyanshsinghvi
Copy link
Contributor Author

divyanshsinghvi commented Jan 6, 2026

After thinking, it is more complicated than I expected. Actually, in PR #206 I already supported passing vllm cli args to AsyncOmni. Now the AR module and diffusion module have been unified into stage expression. The feature needs to be adapted. Ideally, we better not add an additional base_engine_args argument, which is not consistent with traditional vllm usage. We should filter args needed by LLMEngine and DiffusionEngine respectively from all kwargs. Just like the method, EngineArgs.from_cli_args(). Thanks a lot for your effort. Please check if you can understand my concern and further modify it.

@Gaohan123 This makes sense, but just to get correct implementation you want,

When I call Omni

I should make this call

omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        tokenizer=args.tokenizer
    )

instead of

        omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        base_engine_args={"tokenizer": args.tokenizer}
    )

And internally where required I pass tokenizer correctly extracting from kwargs

@divyanshsinghvi
Copy link
Contributor Author

After thinking, it is more complicated than I expected. Actually, in PR #206 I already supported passing vllm cli args to AsyncOmni. Now the AR module and diffusion module have been unified into stage expression. The feature needs to be adapted. Ideally, we better not add an additional base_engine_args argument, which is not consistent with traditional vllm usage. We should filter args needed by LLMEngine and DiffusionEngine respectively from all kwargs. Just like the method, EngineArgs.from_cli_args(). Thanks a lot for your effort. Please check if you can understand my concern and further modify it.

@Gaohan123 This makes sense, but just to get correct implementation you want,

When I call Omni

I should make this call

omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        tokenizer=args.tokenizer
    )

instead of

        omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        base_engine_args={"tokenizer": args.tokenizer}
    )

And internally where required I pass tokenizer correctly extracting from kwargs

After thinking, it is more complicated than I expected. Actually, in PR #206 I already supported passing vllm cli args to AsyncOmni. Now the AR module and diffusion module have been unified into stage expression. The feature needs to be adapted. Ideally, we better not add an additional base_engine_args argument, which is not consistent with traditional vllm usage. We should filter args needed by LLMEngine and DiffusionEngine respectively from all kwargs. Just like the method, EngineArgs.from_cli_args(). Thanks a lot for your effort. Please check if you can understand my concern and further modify it.

@Gaohan123 This makes sense, but just to get correct implementation you want,

When I call Omni

I should make this call

omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        tokenizer=args.tokenizer
    )

instead of

        omni = Omni(
        model=args.model,
        stage_configs_path=args.stage_config,
        trust_remote_code=True,
        log_file=args.log_file,
        base_engine_args={"tokenizer": args.tokenizer}
    )

And internally where required I pass tokenizer correctly extracting from kwargs

One issue with this is the base_engine_args are common across all stages, so right now we can't specify stage wise configuration.

@divyanshsinghvi
Copy link
Contributor Author

Right now I am adding support for tokenizer, as that's the common arg I found, if you have any other that can be supported through base args, I can add those.

@divyanshsinghvi divyanshsinghvi changed the title omni entrypoint support base_engine_args omni entrypoint support tokenizer arg Jan 6, 2026
Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!
Actually, I think the idea should be use cli args as default setting for all stages. And in stage config, user can modify corresponding args for certain stage. Besides, here it not only needs tokenizer, but also all other args. Anyway, I will fix it in a new PR.

@Gaohan123 Gaohan123 merged commit 8c12593 into vllm-project:main Jan 8, 2026
7 checks passed
@divyanshsinghvi
Copy link
Contributor Author

LGTM. Thanks! Actually, I think the idea should be use cli args as default setting for all stages. And in stage config, user can modify corresponding args for certain stage. Besides, here it not only needs tokenizer, but also all other args. Anyway, I will fix it in a new PR.

Got it. Yes I can do that for other common stuff, but I was not sure. I can send a PR separate from this doing that and we can go over which metrics actually need to be left.

Shirley125 pushed a commit to Shirley125/vllm-omni that referenced this pull request Jan 9, 2026
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
sniper35 pushed a commit to sniper35/vllm-omni that referenced this pull request Jan 10, 2026
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support passing base_engine_args to be passed through Omni entrypoint to overwrite engine_args provided with yaml.

3 participants