Hello,
thanks for your work !
I take a look at your code and it seems it's based on the native ace-step release from the first days.
im no longer active suporting ace-step, as the developer deleted mostly all my helpfull msg in discord and im not happy with the way the release was and how it's now erm fixed.
over the last weeks i fixed the 1 day release code, improved a lot.
i now testet your lora code, as i like the way it looks and works.
but i found some problems and maybe this information will help, if you have the time, to make it even better.
i postet in the early day's of the release a detailed guide in discord how to make lora training -71.6% faster.
but dev just ignore the info and actual its deleted, maybe not understand as the release code seems only vibe coded.
maybe you can use it now for your trainer:
EXECUTIVE_SUMMARY.md
performance_comparison.md
and here all you need to implement:
IMPLEMENTATION_GUIDE.md
│ BEFORE (Original Side-Step):
│ ├─ 1 training step: 3-4 Sekunden
│ ├─ 100 steps (1 Epoch): 5-7 Minuten
│ └─ 10 Epochs: 50-70 Minuten
│ AFTER (Mit allen 5 Optimierungen):
│ ├─ 1 training step: 0.8-1.2 Sekunden (-75%)
│ ├─ 100 steps (1 Epoch): 1.5-2 Minuten
│ └─ 10 Epochs: 15-20 Minuten
│ SPEEDUP FAKTOR: ~3-4x
its in german, you can translate to your home language or give it a ai you code with.
i used claude code to make this comparison, its a analyse of my lora training code vs your training code.
so you can see directly what to do, to get -71.6% speed too. its not much work, as i did the most wen i fix the ace-step 1day code.
what really was/is a mess...another topic.
as i do not host any repos here, or will in the future, i now use your trainer + my speed fix.
but wont upload it, as i do not want to have the stress with "issues" other people have with code that run fine for me.
no offence, im happy about everyone is doing open source, but im long ago out if this.
but i never stop sharing knowledge or stop helping others, so here im posting the info for you and your user.
fell free to ask questions if you have, happy coding and faster training times :)
EDIT:
i tested also the ace-step transcriber/captioner, 44gb wasted space on my ssd...
for me both were not usefull for the music styles i train, like hardstyle, tekk, phonk...and my homelanguage.
i spend like a week to find a better way to prepare the training data.
you should take a look at "music flamingo" from nvidia.
do not use the 16gb safetensors file, its like 30gb vram wen working and if using bits/bytes it became very slow.
like 5 min per song...
i make a little tool working like this:
mp3 -> wav
original song wav -> music-flamingo-hf-GGUF (via llama_cpp.exe not the lib) = all needed caption data, bpm, key etc
original song wav -> (HF becruily) mel_band_roformer_vocals -> original song voval stem wav -> music-flamingo-hf-GGUF (via llama_cpp.exe not the lib) = songtext lyrics with suno style tags
gemini via free api key -> create from music flamingo caption text, lora training caption
gemini via free api key -> create from songtext lyrics text, lora training songtext format
before i enter the lora keyword i want to activate the trained lora.
result is a file like this and takes like 1 minute per 4-5 min song:
caption: Dark, hypnotic Minimal Techno track defined by a stark, industrial-infused rhythmic evolution. Opening with a crisp, punchy four-on-the-floor kick and resonant synth bass, the piece establishes a cold, meditative foundation. Deep, authoritative male spoken-word vocals in German enter, delivering philosophical reflections with intimate, dry processing. As the arrangement progresses, sparse arpeggiated synth lines and subtle percussive clicks emerge, building a hypnotic tension through granular filter sweeps and layered noise textures. The track maintains a steady, relentless momentum, alternating between rhythmic intensity and transient harmonic shifts. The experience concludes with an atmospheric thinning of the electronic layers, allowing the resonant synth pads to drift into a final, hollow silence.
genre: Minimal Techno
tags: Minimal Techno, Industrial, Hypnotic, Introspective, Cold, Spoken-Word, Deep Male Vocal, Punchy Kick, Resonant Bass, Evolving Synth Textures, Precise Production, Dark Atmosphere, German Monologue, Rhythmic Loops, Berlin Style
bpm: 130
key: B Minor
signature: 4
custom_tag: TEKK
lyrics:
[Intro]
Ey, bleib mal stehen
Warte mal
Bleib einfach mal stehen
Schließ deine Augen
Und hör mir zu
Und hör mir zu
etc.
by using the gguf version, the lora data prep tool do not need more then 12gb vram.
so should work for mostly all users, i have 24gb but i like wen code does not need the max and is..lets say: clear coded by using multiple tools, if 1 tool did not do the job perfect.
maybe another idea for you, if you want to rework your i believe qwen-omni caption etc data prep function ;)
so, enough spam here. have fun :)
Hello,
thanks for your work !
I take a look at your code and it seems it's based on the native ace-step release from the first days.
im no longer active suporting ace-step, as the developer deleted mostly all my helpfull msg in discord and im not happy with the way the release was and how it's now erm fixed.
over the last weeks i fixed the 1 day release code, improved a lot.
i now testet your lora code, as i like the way it looks and works.
but i found some problems and maybe this information will help, if you have the time, to make it even better.
i postet in the early day's of the release a detailed guide in discord how to make lora training -71.6% faster.
but dev just ignore the info and actual its deleted, maybe not understand as the release code seems only vibe coded.
maybe you can use it now for your trainer:
EXECUTIVE_SUMMARY.md
performance_comparison.md
and here all you need to implement:
IMPLEMENTATION_GUIDE.md
│ BEFORE (Original Side-Step):
│ ├─ 1 training step: 3-4 Sekunden
│ ├─ 100 steps (1 Epoch): 5-7 Minuten
│ └─ 10 Epochs: 50-70 Minuten
│ AFTER (Mit allen 5 Optimierungen):
│ ├─ 1 training step: 0.8-1.2 Sekunden (-75%)
│ ├─ 100 steps (1 Epoch): 1.5-2 Minuten
│ └─ 10 Epochs: 15-20 Minuten
│ SPEEDUP FAKTOR: ~3-4x
its in german, you can translate to your home language or give it a ai you code with.
i used claude code to make this comparison, its a analyse of my lora training code vs your training code.
so you can see directly what to do, to get -71.6% speed too. its not much work, as i did the most wen i fix the ace-step 1day code.
what really was/is a mess...another topic.
as i do not host any repos here, or will in the future, i now use your trainer + my speed fix.
but wont upload it, as i do not want to have the stress with "issues" other people have with code that run fine for me.
no offence, im happy about everyone is doing open source, but im long ago out if this.
but i never stop sharing knowledge or stop helping others, so here im posting the info for you and your user.
fell free to ask questions if you have, happy coding and faster training times :)
EDIT:
i tested also the ace-step transcriber/captioner, 44gb wasted space on my ssd...
for me both were not usefull for the music styles i train, like hardstyle, tekk, phonk...and my homelanguage.
i spend like a week to find a better way to prepare the training data.
you should take a look at "music flamingo" from nvidia.
do not use the 16gb safetensors file, its like 30gb vram wen working and if using bits/bytes it became very slow.
like 5 min per song...
i make a little tool working like this:
mp3 -> wav
original song wav -> music-flamingo-hf-GGUF (via llama_cpp.exe not the lib) = all needed caption data, bpm, key etc
original song wav -> (HF becruily) mel_band_roformer_vocals -> original song voval stem wav -> music-flamingo-hf-GGUF (via llama_cpp.exe not the lib) = songtext lyrics with suno style tags
gemini via free api key -> create from music flamingo caption text, lora training caption
gemini via free api key -> create from songtext lyrics text, lora training songtext format
before i enter the lora keyword i want to activate the trained lora.
result is a file like this and takes like 1 minute per 4-5 min song:
caption: Dark, hypnotic Minimal Techno track defined by a stark, industrial-infused rhythmic evolution. Opening with a crisp, punchy four-on-the-floor kick and resonant synth bass, the piece establishes a cold, meditative foundation. Deep, authoritative male spoken-word vocals in German enter, delivering philosophical reflections with intimate, dry processing. As the arrangement progresses, sparse arpeggiated synth lines and subtle percussive clicks emerge, building a hypnotic tension through granular filter sweeps and layered noise textures. The track maintains a steady, relentless momentum, alternating between rhythmic intensity and transient harmonic shifts. The experience concludes with an atmospheric thinning of the electronic layers, allowing the resonant synth pads to drift into a final, hollow silence.
genre: Minimal Techno
tags: Minimal Techno, Industrial, Hypnotic, Introspective, Cold, Spoken-Word, Deep Male Vocal, Punchy Kick, Resonant Bass, Evolving Synth Textures, Precise Production, Dark Atmosphere, German Monologue, Rhythmic Loops, Berlin Style
bpm: 130
key: B Minor
signature: 4
custom_tag: TEKK
lyrics:
[Intro]
Ey, bleib mal stehen
Warte mal
Bleib einfach mal stehen
Schließ deine Augen
Und hör mir zu
Und hör mir zu
etc.
by using the gguf version, the lora data prep tool do not need more then 12gb vram.
so should work for mostly all users, i have 24gb but i like wen code does not need the max and is..lets say: clear coded by using multiple tools, if 1 tool did not do the job perfect.
maybe another idea for you, if you want to rework your i believe qwen-omni caption etc data prep function ;)
so, enough spam here. have fun :)