Qwen tts#172

Open

knyazer wants to merge 44 commits intomainfrom

Contributor

knyazer commented Mar 17, 2026

No description provided.


          qwen tts 1/n

c6cb96e

knyazer mentioned this pull request

Closed

knyazer added 2 commits

March 17, 2026 21:55


          neater config loading

86245d2


          linters

401b317

knyazer marked this pull request as ready for review

March 18, 2026 17:06

chatgpt-codex-connector bot reviewed

View reviewed changes

chatgpt-codex-connector bot left a comment

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 401b3177da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lalamo/modules/audio/qwen3_tts/qwen3_tts_text_decoding.py Show resolved Hide resolved

lalamo/models/tts_model.py Outdated Show resolved Hide resolved

lalamo/modules/audio/qwen3_tts/qwen3_tts_text_decoding.py Outdated Show resolved Hide resolved


          latent tts + formatting fixes

8d5bdc5

knyazer marked this pull request as draft

March 19, 2026 16:56

knyazer added 7 commits

March 19, 2026 18:33


          more strict generation config

316c4ec


          update

c11e071


          Merge branch 'main' into qwen-tts-new

9272bce


          finalizing qwen-tts and latent tts

be8212d


          Merge branch 'main' into qwen-tts-new

34d201e


          fix up little issues

22709a7


          finishing up

e915f6e

knyazer marked this pull request as ready for review

March 24, 2026 00:47

chatgpt-codex-connector bot reviewed

View reviewed changes

chatgpt-codex-connector bot left a comment

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e915f6e0a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lalamo/model_import/common.py Show resolved Hide resolved

lalamo/model_import/common.py Outdated Show resolved Hide resolved

knyazer added 6 commits

March 24, 2026 01:23


          fixing minor bugs

bd5662c


          update trans deps

03dad34


          correct dep

ab49ecb


          fix fish

e22ddd3


          Merge branch 'main' into qwen-tts-new

42aaa66


          transformers version

b364957

knyazer force-pushed the qwen-tts-new branch from c8457e9 to b364957 Compare

March 25, 2026 23:48

knyazer commented

View reviewed changes

lalamo/model_import/loaders/audio_loaders.py Outdated Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/model_import/loaders/audio_loaders.py Outdated Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/model_import/loaders/fishaudio_loaders.py Outdated Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/model_import/loaders/fishaudio_loaders.py Outdated Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/model_import/loaders/fishaudio_loaders.py Outdated Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/model_import/model_configs/huggingface/qwen3_5.py Show resolved Hide resolved

knyazer commented

View reviewed changes

lalamo/modules/audio/latent_tts.py

Comment on lines +69 to +70

		@abstractmethod
		def default_generation_config(self) -> LatentTTSGenerationConfig: ...

Contributor Author

knyazer Apr 7, 2026

i think that's a nice abstraction, maybe let's have the same for normal TTS too?

knyazer commented

View reviewed changes

lalamo/modules/audio/latent_tts.py

Comment on lines +72 to +73

		def create_tokenizer(self, model_path: Path \| str) -> Tokenizer:
		return Tokenizer.from_file(str(Path(model_path) / "tokenizer.json"))

Contributor Author

knyazer Apr 7, 2026

this is a nontrivial decision, write a comment why we need this (some private models need to override create tokenizer)? i think?

knyazer commented

View reviewed changes

lalamo/modules/audio/latent_tts.py

Comment on lines +75 to +76

		def create_message_processor(self, config: TTSMessageProcessorConfig, tokenizer: Tokenizer) -> TTSMessageProcessor:
		return TTSMessageProcessor(config, tokenizer)

Contributor Author

knyazer Apr 7, 2026

no, wait, this is what needs to be overriden. Do we really need a create_tokenizer method?

knyazer commented

View reviewed changes

lalamo/audio/tts_message_processor.py Outdated

-                  Current class is reserved for future usage of audio prompts
-                  to condition style of generated audio
-                  """
+                  waveform: Float[Array, "*"]

Contributor Author

knyazer Apr 7, 2026

waveform: Float[Array, " audio_samples"]

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py

+                  def import_weights(self, weights: ParameterTree[Array]) -> Self:
+                      weights = require_mapping(weights)
+                      block_weights = weights["decoder_blocks"]
+                      assert isinstance(block_weights, Sequence)

Contributor Author

knyazer Apr 7, 2026

useless assert?

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

+                      d_out = spatial_params.d_out
+                      num_keys = 2 + len(rates)
+                      keys = jax.random.split(key, num_keys)

Contributor Author

knyazer Apr 7, 2026

let's maybe do this as first_conv_key, final_conv_key, *decoder_keys =

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

+                      input_channel = spatial_params.input_channel
+                      channels = spatial_params.channels
+                      rates = spatial_params.rates
+                      d_out = spatial_params.d_out

Contributor Author

knyazer Apr 7, 2026 •

edited

Loading

one letter variable name is meh

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

Comment on lines +838 to +840

+                  mlp_ratio: float = 4.0
+                  kernel_size: int = 7
+                  dilation: int = 1

Contributor Author

knyazer Apr 7, 2026 •

edited

Loading

remove the defaults

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

+                      y = self.act2(y)
+                      y = self.conv2(y)
+                      pad = x.shape[1] - y.shape[1]

Contributor Author

knyazer Apr 7, 2026

please use shape unpacking instead of direct indexing

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

+              class SnakeBetaConfig:
+                  precision: DTypeLike
+                  alpha_init: float = 1.0
+                  no_div_by_zero: float = 1e-9

Contributor Author

knyazer Apr 7, 2026 •

edited

Loading

that's a weird var name, call it maybe eps or something like that?

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py

+                              output = self._call_causal(x)
+                          case Conv1dPadding.SYMMETRIC:
+                              output = self._call_symmetric(x)

Contributor Author

knyazer Apr 7, 2026

hard raise on anything else

knyazer commented

View reviewed changes

lalamo/modules/audio/common_modules.py Outdated

                   ) -> Float[Array, "batch sequence_out out_channels"]:
-                      length = x.shape[1]  # sequence dimension is axis 1
-                      pad = self.padding
+                      length = x.shape[1]

Contributor Author

knyazer Apr 7, 2026

let's not use direct indexing, unpack the shape tuple

knyazer commented

View reviewed changes

lalamo/modules/audio/__init__.py Outdated

    
            @@ -1 +1,13 @@
          
              # TODO @peter.glushkov: think carefully what to export once audio submodule is more stable

Contributor Author

knyazer Apr 7, 2026 •

edited

Loading

reassing todo to knyazer

knyazer commented

View reviewed changes

lalamo/modules/audio/qwen3_tts/qwen3_tts_text_decoding.py Outdated

+                  key: PRNGKeyArray,
+              ) -> Int[Array, " batch"]:
+                  processed_logits = vmap(sampling_policy.process_logits)(logits)
+                  sample_keys = jax.random.split(key, logits.shape[0])

Contributor Author

knyazer Apr 7, 2026

please don't index into shapes directly, use tuple unpacking

knyazer commented

View reviewed changes

lalamo/modules/audio/qwen3_tts/qwen3_tts_text_decoding.py

Comment on lines +158 to +160

+                  @classmethod
+                  def format_instruction(cls, style: str) -> str:
+                      return f"<|im_start|>user\n{style}<|im_end|>\n"

Contributor Author

knyazer Apr 7, 2026

errr doesn't this function contradict default definition of how formatting is supposed to work? the hardcoded prompt, no?

knyazer commented

View reviewed changes

lalamo/modules/audio/qwen3_tts/qwen3_tts_text_decoding.py Outdated

+                          dtype=jnp.int32,
+                      )
+                      special_hidden = self._project_text_embeddings(special_text_tokens)
+                      tts_bos_embed, tts_eos_embed, tts_pad_embed = jnp.split(special_hidden, 3, axis=1)

Contributor Author

knyazer Apr 7, 2026 •

edited

Loading

this seems to be very fragile, can't we just tokenize the whole formatted prompt directly instead of doing this surgery on raw embeddings

knyazer commented

View reviewed changes

lalamo/modules/audio/qwen3_tts/qwen3_tts_modules.py Outdated

+                      key: PRNGKeyArray,
+                  ) -> "VectorQuantization":
+                      key_codebook, key_project = jax.random.split(key)
+                      codebook_dim = dim if codebook_dim is None else codebook_dim

Contributor Author

knyazer Apr 7, 2026

isn't codebook_dim never None? check, if so - update this line and type annotation

knyazer commented

View reviewed changes

lalamo/modules/audio/qwen3_tts/qwen3_tts_modules.py Outdated

+                  def export_weights(self) -> ParameterTree[Array]:
+                      project_out_weights: ParameterTree[Array]
+                      if self.project_out is None:
+                          project_out_weights = {}

Contributor Author

knyazer Apr 7, 2026

follow convention in other files, first of all use assert require_mapping(project_out_weights), then use project_out_weights or None when exporting

knyazer added 6 commits

April 7, 2026 05:15


          final clean up

a51e9a9


          microfixes

00435ac


          Merge branch 'main' into qwen-tts-new

48acaac


          few fixes for the tests precision stuff

d90c3a1


          fill in defaults

b05bb06


          new style of origins

e9ceb22

knyazer commented

View reviewed changes

lalamo/model_import/common.py Outdated

Comment on lines +192 to +194

+                  weights_paths = origin.resolve_weights(progress_callback)
+                  config_path = origin.resolve_file(model_spec.configs.model_config, progress_callback)
+                  extra_config_paths = tuple(origin.resolve_file(ec, progress_callback) for ec in model_spec.configs.extra_configs)

Contributor Author

knyazer Apr 9, 2026

should we move configs under Origin?


          better origin structuring

e188cb5

knyazer commented

View reviewed changes

lalamo/model_import/model_specs/origins.py

		)


		class Origin(RegistryABC):

Contributor Author

knyazer Apr 9, 2026

registry abc?

knyazer commented

View reviewed changes

lalamo/model_import/model_specs/origins.py

+                  @property
+                  @abstractmethod
+                  def description(self) -> str: ...

Contributor Author

knyazer Apr 9, 2026

uv run lalamo convert mars8nano --path blahblah

ModelSpec(
origin: Mars8NanoOrigin("../weights.pth", TORCH)
)

uv run lalamo convert mars8nano --custom-origin {origin: local, blahblah}

uv run lalamo convert mars8nano

each origin wants a list of keys that it wants to get to be resolved, okay

cli args back/forth with arbitrary

import models gets stuff from kwargs

cli origin? ENV origin?

knyazer added 2 commits

April 10, 2026 23:26


          wip: origins cli

2df469c


          new origin system

e3e5cc1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet