Conversation
…build_stack for padding and review
02f6e66 to
86339b0
Compare
|
You probably need to rebase to fix server CIs. |
CISC
left a comment
There was a problem hiding this comment.
EditorConfig error is unrelated.
Hi, I think this PR is ready to be merged. It seems the failures in the ggml-ci-x64-cpu-low-perf and ggml-ci-x64-cpu-high-perf CI jobs are unrelated to my changes. |
CISC
left a comment
There was a problem hiding this comment.
Tested conversion, it was not a pleasant experience. :)
This works far more smoothly...
|
Gave this a try and it's working for me, however the transcription quality drops rapidly the longer the audio gets. A 10s clip works perfectly fine, but a 1 min clip just gets interpreted as random words. Also since i couldn't find quants online, i made some https://huggingface.co/concedo/GLM-ASR-Nano-2512-GGUF |
Sounds like a rope issue? |
|
does this use linear rope or 1d mrope? |
|
So it looks like linear rope, but I don't think it's related to rope at all, though maybe it is an attention issue. Anyway, you can test on mtmd-cli. Here are some test files: this one will work fine exact same file but longer, it breaks down or frequently returns empty output I suspect it's worse once the audio preprocessor splits the audio into multiple chunks. I'm willing to help with troubleshooting, but only if someone actually plans to look into it. |
|
here is the complete audio source file. if you use this, it will just output garbage. |
|
@piDack I think this implementation needs to be updated after these files were merged to be hf compatible: https://huggingface.co/zai-org/GLM-ASR-Nano-2512/commit/8172188f128059b57f09686dda5b36edff1606dd Currently it's outputting this:
|
* [model] add glm-asr support * fix format for ci * fix convert format for ci * update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review * check root architecture for convert hf script * fix conficlt with upstream * fix convert script for glm asr & format clip-impl * format * restore hparams text * improved conversion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* [model] add glm-asr support * fix format for ci * fix convert format for ci * update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review * check root architecture for convert hf script * fix conficlt with upstream * fix convert script for glm asr & format clip-impl * format * restore hparams text * improved conversion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Make sure to read the contributing guidelines before submitting a PR
This PR adds support for the GLM-ASR architecture, specifically validating with the zai-org/GLM-ASR-Nano-2512 model.
Key Changes:
convert_hf_to_gguf.pyto handle dynamic configuration keys (glm-asr use "lm_config" instead of text_config). It now correctly identifies the config section by checking:llm_config_key = "lm_config" if "lm_config" in self.hparams else "text_config"Result