2.0.4: Support Qwen3-ASR and Qwen3-TTS w/ streaming by SearchSavior · Pull Request #73 · SearchSavior/OpenArc

SearchSavior · 2026-03-29T22:12:20Z

Since January I have been working on a full openvino implementation of qwen3-asr and qwen3-tts. Now, support has arrived.

Instead of using the qwen-tts official repo I decided to attempt rebuilding in pytorch from scratch to really go deep into optimizing with openvino, from scratch. This required an entire separate codebase I need to get cleaned up. We'll need it as reference to improve this implementation because I did not use transformers anywhere in the pipeline, save AutoTokenizer.

OpenArc now supports

Base: voice cloning

VoiceDesign: using only text to describe a voice

CustomVoice: using voices trained/implemented by Qwen

https://huggingface.co/collections/Echo9Zulu/qwen3-tts-openvino

https://huggingface.co/Echo9Zulu/Qwen3-ASR-0.6B-INT8_ASYM-OpenVINO

The workflow for qwen-asr and qwen-tts follows everything else in Openarc, and integrates seamlessly into the existing user flow. openarc add has some new options for each model_type. However, qwen-tts has many knobs I haven't worked out how to implement in a way that is as easy as everything else.

Let's look at an example for voice cloning using the openai python library for voice cloning;

import base64
import os
from pathlib import Path

from openai import OpenAI

API_KEY = os.environ["OPENARC_API_KEY"]

BASE_URL = "http://localhost:8003/v1"
MODEL = "voice_clone"

REF_WAV = Path("reference.wav")

text = "Echo9Zulu is an insane person"

ref_audio_b64 = base64.b64encode(REF_WAV.read_bytes()).decode("ascii")
ref_text = "Transcript of what is spoken in the reference WAV, for ICL."

qwen3_tts = {
    "input": text,
    "ref_audio_b64": ref_audio_b64,  # audio we want to clone
    "ref_text": ref_text,                           # transcription of audio we want to clone
    "x_vector_only": False,                   # use if you don't want a transcription, or can't provide one, or are testing something wild
    "language": "english",
    "max_new_tokens": 2048,
    "do_sample": True,
    "top_k": 50,
    "top_p": 1.0,
    "temperature": 0.9,
    "repetition_penalty": 1.05,
    "subtalker_do_sample": True,
    "subtalker_top_k": 50,
    "subtalker_top_p": 1.0,
    "subtalker_temperature": 0.9,
    "stream": True,
    "stream_chunk_frames": 300,   # amount of frames to stream in a chunk. 300 came from the official impl
    "stream_left_context": 25,        # amount of frames kept from the last chunk
}

client = OpenAI(base_url=BASE_URL, api_key=API_KEY)


response = client.audio.speech.create(
    model=MODEL,
    input=text,
    voice=MODEL, 
    response_format="wav",
    extra_body={"openarc_tts": {"qwen3_tts": qwen3_tts}},
)

Path("out_speech.wav").write_bytes(response.content)
print("Wrote out_speech.wav")

Everything is set at request time; the engine is stateless, one copy of the model sits in memory, and the API design builds around the shape of its inputs. Ultimately audio language models do magic to model other data than text, but in the machine room, everything still flows in and out of requests, making the code we use to control model behavior quite dynamic.

VERY MUCH set it once and forget it, leaving all managment to a downstream application. I have some ideas about how to make this easier to configure, but for now,this a very good initial commit.

The other example I suggest trying is demos/talk_to_llm.py, which supports voice clone streaming with an llm in the loop. A little cumbersome vs most of the other tools, but it checks all the "does everything work" boxes.

Hardware Requirements

OpenArc implementation of qwen3-tts makes heavy use of dynamic shapes and cannot support NPU yet. However, once I get the source repo in order it should be possible. HOWEVER, after studying this... our community should put effort into a different tts solution for NPU. For low powered devices, even the openvino optimizations are not fast enough for real time due to computational complexity of predicting codebooks. It can't be paralelized, is data dependent... I have more notes on this I have yet to synthesize into a writeup. So thats next ;)

Right now, the entire model does not run on GPU device. Instead, I found through testing that openvino provides better CPU kernels for some of the sub-model ops than GPU, which all worked mostly to limit how much time gets spent predicting new audio codebooks.

Performance

I'll update this soon

- add entrypoint in qwen3_tts.py to help test performance and debug.

This reverts commit ea180f4.

- refresh documentation - provide qwen-tts examples

SearchSavior and others added 27 commits February 16, 2026 21:53

- update demos and tests for whisper

f151c70

- qwen3_asr checkpoint

89895ec

- initial commit of qwen3_asr

d0e278d

remove docker-compose from git ignore

a11bab5

- qwen3 tts model load, unload, queues, pydantic models

cdaa300

- flat gen_config implementation

328c2aa

- introduce pydantic abstractin for openai extra_body --> "openarc_tts"

feedb10

- improve qwen3_asr

f30f3ee

- namespace changes

c534de2

- confirm that qwen3_tts works with microphone

094c270

- add signifigant optimizations to subgraph execution for qwen3-tts

abc02fe

- add entrypoint in qwen3_tts.py to help test performance and debug.

- implment qwen3_tts streaming

6c4eabe

- remove stray audio

55b7b85

- remove old tests

c969b50

initial commit for mkdocs documentation UPGRADE

ea180f4

Revert "initial commit for mkdocs documentation UPGRADE"

f14db5d

This reverts commit ea180f4.

- updated readme

eceb552

- add -depth, to simulate context in openarc bench

60b283e

- begin mkdocs refactor

46c76b2

- change from mkdocs to zensical

5fb8230

- moar changes to commands

72d3e91

- add site/ to git ignore

9db0da4

add .cache/ to gitignore (more zensical biz)

7287e6a

- update command docs

49efe89

- refresh documentation - provide qwen-tts examples

- moar update power

fa2d1f3

- update redme to reflect migrating docs to zensical

b1fbe13

Merge branch 'main' into 2.0.4

e4a5bbb

SearchSavior merged commit 9cae2c2 into main Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.0.4: Support Qwen3-ASR and Qwen3-TTS w/ streaming#73

2.0.4: Support Qwen3-ASR and Qwen3-TTS w/ streaming#73
SearchSavior merged 27 commits intomainfrom
2.0.4

SearchSavior commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SearchSavior commented Mar 29, 2026

Hardware Requirements

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant