-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Add llama_chat_apply_template() #5538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Update: seems like I completely missed the discussion about this subject since 11/2023: #4216 (comment) Many developers expect that we have some kind of "jinja parser" for the template I need to add a clarification on the inline docs, so it will be clear that |
llama.cpp
Outdated
|
|
||
| // Simple version of "llama_apply_chat_template" that only works with strings | ||
| // This function uses heuristic checks to determine commonly used template. It is not a jinja parser. | ||
| int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the const args in front and pass by ref:
| int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) { | |
| int32_t llama_chat_apply_template_internal( | |
| const std::string & chat_template, | |
| const std::vector<const llama_chat_message *> & chat, | |
| std::string & dest, bool add_ass) { |
The terms chat and conversation seem conflated. Propose to use chat universally (apply to other places too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed all the occurrences of conversation and msg to chat in this commit: 73fbd67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - good job!
| // load template from model | ||
| std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes | ||
| std::string template_key = "tokenizer.chat_template"; | ||
| int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), curr_tmpl.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that I made a mistake in this line: it should be model_template.size(), not curr_tmpl.size(). I'm fixing it in the next PR (using this function in server)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P/s: It took me almost 1 hr to figure out this error. Sorry if I accidentally put somebody else in the same situation as me.
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
Closes #5527
Since most gguf models already have chat template saved in its metadata, developers can use
llama_chat_apply_templatefunction to format the chat:llama_chat_message conversation[] = { {"system", "You are a helpful assistant"}, {"user", "Hello"}, {"assistant", "Hi there"}, {"user", "Who are you"}, {"assistant", " I am an assistant "}, {"user", "Another question"}, }; size_t message_count = 6; // ideally, size of buffer should be 2 * (total number of characters of all messages) std::vector<char> formatted_chat(1024); res = llama_chat_apply_template( model, // by default, template is taken from model metadata nullptr, // alternatively, you can supply your own template as string conversation, message_count, true, // add a trailing "assistant" prompt, only use on chatml for now formatted_chat.data(), formatted_chat.size() ); formatted_chat.resize(res); std::cout << std::string(formatted_chat.data(), formatted_chat.size());Result (chatml for example):
CC @ggerganov and @cebtenzzre for review