Skip to content

Generation in full_finetune_distributed recipe #2786

@FSadrieh

Description

@FSadrieh

Hi, I want to generate as a validation step in the full_finetune_distributed recipe. I tried copying the code from the generate recipe, but the output is nonsensical (probably because the model is distributed across multiple GPUs).

I am using Llama 3.2 1 billion and your example prompt Tell me a joke..
When using the generate recipe, I get: Tell me a joke.The joke is that I'm going to the bathroom
When using the copied generate function in full_finetune_distributed recipe (before training!): Tell me a joke.rg check generure check hasure hasure check

Reproduction:

  1. Copy the full_finetune_distributed recipe
  2. Add
def convert_prompt_to_tokens(
       self,
       prompt: dict[Role, str],
   ) -> list[int]:
       """
       Convert the prompt string to a user message with optional system messages
       and tokenize using the prompt template defined on the tokenizer.
       """
       messages = []
       if "system" in prompt and prompt["system"] is not None:
           messages.append(Message(role="system", content=prompt["system"]))
       messages.extend(
           [
               Message(role="user", content=prompt["user"]),
               # Empty assistant message to kick-start generation
               Message(role="assistant", content=""),
           ]
       )
       return self._tokenizer({"messages": messages}, inference=True)["tokens"]

   @torch.inference_mode()
   def generate(self):
       cfg = {
           "prompt": {"system": None, "user": "Tell me a joke."},
           "max_new_tokens": 10,
           "temperature": 0.6,
           "top_k": 300,
       }
       tokens = self.convert_prompt_to_tokens(
           cfg["prompt"],
       )
       prompt = torch.tensor(tokens, dtype=torch.int, device=self._device)

       custom_generate_next_token = None

       t0 = time.perf_counter()
       generated_tokens, _ = generation.generate(
           model=self._model,
           prompt=prompt,
           max_generated_tokens=cfg["max_new_tokens"],
           pad_id=self._tokenizer.pad_id,
           temperature=cfg["temperature"],
           top_k=cfg["top_k"],
           stop_tokens=self._tokenizer.stop_tokens,
           custom_generate_next_token=custom_generate_next_token,
       )
       generated_tokens = generated_tokens.tolist()
       t = time.perf_counter() - t0

       print(self._tokenizer.decode(generated_tokens[0]))
  1. Add self.generate() directly before the trainings loop.

What should I do to get sensible generation results?

Torchtune version 0.7.0.dev20250521+cpu

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions