-
Notifications
You must be signed in to change notification settings - Fork 686
Closed
Labels
discussionStart a discussionStart a discussion
Description
Hi, I want to generate as a validation step in the full_finetune_distributed recipe. I tried copying the code from the generate recipe, but the output is nonsensical (probably because the model is distributed across multiple GPUs).
I am using Llama 3.2 1 billion and your example prompt Tell me a joke..
When using the generate recipe, I get: Tell me a joke.The joke is that I'm going to the bathroom
When using the copied generate function in full_finetune_distributed recipe (before training!): Tell me a joke.rg check generure check hasure hasure check
Reproduction:
- Copy the full_finetune_distributed recipe
- Add
def convert_prompt_to_tokens(
self,
prompt: dict[Role, str],
) -> list[int]:
"""
Convert the prompt string to a user message with optional system messages
and tokenize using the prompt template defined on the tokenizer.
"""
messages = []
if "system" in prompt and prompt["system"] is not None:
messages.append(Message(role="system", content=prompt["system"]))
messages.extend(
[
Message(role="user", content=prompt["user"]),
# Empty assistant message to kick-start generation
Message(role="assistant", content=""),
]
)
return self._tokenizer({"messages": messages}, inference=True)["tokens"]
@torch.inference_mode()
def generate(self):
cfg = {
"prompt": {"system": None, "user": "Tell me a joke."},
"max_new_tokens": 10,
"temperature": 0.6,
"top_k": 300,
}
tokens = self.convert_prompt_to_tokens(
cfg["prompt"],
)
prompt = torch.tensor(tokens, dtype=torch.int, device=self._device)
custom_generate_next_token = None
t0 = time.perf_counter()
generated_tokens, _ = generation.generate(
model=self._model,
prompt=prompt,
max_generated_tokens=cfg["max_new_tokens"],
pad_id=self._tokenizer.pad_id,
temperature=cfg["temperature"],
top_k=cfg["top_k"],
stop_tokens=self._tokenizer.stop_tokens,
custom_generate_next_token=custom_generate_next_token,
)
generated_tokens = generated_tokens.tolist()
t = time.perf_counter() - t0
print(self._tokenizer.decode(generated_tokens[0]))- Add
self.generate()directly before the trainings loop.
What should I do to get sensible generation results?
Torchtune version 0.7.0.dev20250521+cpu
Metadata
Metadata
Assignees
Labels
discussionStart a discussionStart a discussion