Conversation
|
I think I'll pull from dottxt-ai/outlines#781 which will probably solve 1 and 3 |
|
thanks, looking good so far... its nice that outlines already supports exl2 |
|
@edk208 Some notes
So in summary I think these are all the changes that can work from the main branch of outlines so far. Happy to get feedback! |
|
I'll do the streaming idea tonight |
|
what do you mean by the "logic of first doing preprocess and then generating tokens"? do you mean the first model.forward with preprocess_only = True? |
|
@edk208 sry for the confusion and yes. To my understanding, the process is
I think step 1 is technically not possible in outlines but steps 2 and 3 might be possible in the above pr. Let me try it tomorrow |
|
@isamu-isozaki yes that's correct. The preprocess runs the prompts through and sets up the KV cache, then you can round-robin through them and generate one token at a time. Interesting that outlines doesn't like step 1. I would imagine it would have to do that anyway. I can take a look too in the next few days. |
|
Hi! I think the main logic is done. For the test I used config.ini with the model from here Then on the client side, I did from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=1.0,
openai_api_base="http://localhost:5000/v1",
openai_api_key="Test",
streaming=True,
max_tokens=1024)
messages = [
SystemMessage(
content="You are a helpful assistant."
),
HumanMessage(
content="Who is more impressive? Bob or Fred?"
)
]
choices = ["Bob", "Fred"]
for chunk in llm.stream(messages, extra_body={"stop_at":"done", "outlines_type": "choices", "choices": choices}):
print(chunk.content, end="", flush=True)which got me Bob. I can do more tests if you want but I think it's working. One main logic here is that for adding new parameters to the open ai API we use extra_body rather than function calling/tool calling since I couldn't think of an easy way to translate it. |
This is a draft PR. Currently, the 3 main parts left to do to make this work is