use repeat_interleave for input_features in kd_trainer by zhuxiaoxuhit · Pull Request #121 · wenet-e2e/west

zhuxiaoxuhit · 2026-03-07T15:01:59Z

Same issue as #119 but in KnowledgeDistillationTrainer._prepare_logprob_inputs.

input_features uses .repeat(num_generations, 1, 1) while all other tensors in the same function use .repeat_interleave(num_generations, dim=0). These two produce different element orderings when batch size > 1:

repeat_interleave: [s0, s0, s0, s1, s1, s1, ...] which matches how generated_ids are laid out
repeat: [s0, s1, s0, s1, s0, s1, ...] which does not

So when batch size > 1, input_features ends up paired with the wrong completions during log probability computation, making both the student and teacher logprob calculations incorrect and corrupting the KD training signal.

yuekaizhang · 2026-03-09T08:18:36Z

@zhuxiaoxuhit Thanks.

yuekaizhang · 2026-03-09T08:23:47Z

@robin1001 Would you mind helping merge this one also? Thanks.

use repeat_interleave for input_features in kd_trainer

b6f5cfd

yuekaizhang approved these changes Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use repeat_interleave for input_features in kd_trainer#121

use repeat_interleave for input_features in kd_trainer#121
zhuxiaoxuhit wants to merge 1 commit intowenet-e2e:mainfrom
zhuxiaoxuhit:fix/input-features-repeat-interleave-in-kd-trainer

zhuxiaoxuhit commented Mar 7, 2026

Uh oh!

yuekaizhang commented Mar 9, 2026

Uh oh!

yuekaizhang commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhuxiaoxuhit commented Mar 7, 2026

Uh oh!

yuekaizhang commented Mar 9, 2026

Uh oh!

yuekaizhang commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants