Generation in full_finetune_distributed recipe

Hi, I want to generate as a validation step in the full_finetune_distributed recipe. I tried copying the code from the generate recipe, but the output is nonsensical (probably because the model is distributed across multiple GPUs).

I am using Llama 3.2 1 billion and your example prompt `Tell me a joke.`.
When using the generate recipe, I get: `Tell me a joke.The joke is that I'm going to the bathroom`
When using the copied generate function in full_finetune_distributed recipe (before training!): `Tell me a joke.rg check generure check hasure hasure check`


**Reproduction:**
1. Copy the full_finetune_distributed recipe
2. Add
 ```python
def convert_prompt_to_tokens(
        self,
        prompt: dict[Role, str],
    ) -> list[int]:
        """
        Convert the prompt string to a user message with optional system messages
        and tokenize using the prompt template defined on the tokenizer.
        """
        messages = []
        if "system" in prompt and prompt["system"] is not None:
            messages.append(Message(role="system", content=prompt["system"]))
        messages.extend(
            [
                Message(role="user", content=prompt["user"]),
                # Empty assistant message to kick-start generation
                Message(role="assistant", content=""),
            ]
        )
        return self._tokenizer({"messages": messages}, inference=True)["tokens"]

    @torch.inference_mode()
    def generate(self):
        cfg = {
            "prompt": {"system": None, "user": "Tell me a joke."},
            "max_new_tokens": 10,
            "temperature": 0.6,
            "top_k": 300,
        }
        tokens = self.convert_prompt_to_tokens(
            cfg["prompt"],
        )
        prompt = torch.tensor(tokens, dtype=torch.int, device=self._device)

        custom_generate_next_token = None

        t0 = time.perf_counter()
        generated_tokens, _ = generation.generate(
            model=self._model,
            prompt=prompt,
            max_generated_tokens=cfg["max_new_tokens"],
            pad_id=self._tokenizer.pad_id,
            temperature=cfg["temperature"],
            top_k=cfg["top_k"],
            stop_tokens=self._tokenizer.stop_tokens,
            custom_generate_next_token=custom_generate_next_token,
        )
        generated_tokens = generated_tokens.tolist()
        t = time.perf_counter() - t0

        print(self._tokenizer.decode(generated_tokens[0]))
```
3. Add `self.generate()` directly before the trainings loop.

What should I do to get sensible generation results?

Torchtune version  0.7.0.dev20250521+cpu


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generation in full_finetune_distributed recipe #2786

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generation in full_finetune_distributed recipe #2786

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions