Skip to content

Comments

Recreated settings changes - Adds serveral options for llamacpp and ollama#1703

Merged
imartinez merged 5 commits intozylon-ai:mainfrom
icsy7867:llama-settings
Mar 11, 2024
Merged

Recreated settings changes - Adds serveral options for llamacpp and ollama#1703
imartinez merged 5 commits intozylon-ai:mainfrom
icsy7867:llama-settings

Conversation

@icsy7867
Copy link
Contributor

Original PR here:
#1677

llama-cpp https://llama-cpp-python.readthedocs.io/en/latest/api-reference/
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html#

ollama - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py

No configurable changes. -
openailike - https://docs.llamaindex.ai/en/stable/examples/llm/localai.html#localai

Not sure about the model_kwargs. The value is references for openai, but I could not find documentation on what values were allowed.
openai - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py
https://docs.llamaindex.ai/en/stable/examples/llm/openai.html

For the text/description I used the values found here:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

LlamaCPP, where it used the same K/V, had the same values. However my setup is currently using ollama, need some testing done for LlamaCPP.

I also added the temperature under the main llm.settings. This should allow the models that supports this value to be edited/changed.

@icsy7867
Copy link
Contributor Author

Hmm small bug... num_predict: 128 doesnt do what I think it does. It tells llamaindex the maximum size of the response. So this should probably be set to -1 or -2 by default.

It is odd though, that the default says "128", but if you dont set that kwarg, you get a larger response.

@icsy7867
Copy link
Contributor Author

Looking at the ollama code:
https://github.com/ollama/ollama/blob/f878e91070af750709f1b3195eeb9fbdcaad2bef/openai/openai.go#L174

	if r.MaxTokens != nil {
		options["num_predict"] = *r.MaxTokens
	}

It looks like the default is 128, unless you have max tokens set. Then it just makes the value the same as the max tokens. Alternatively. setting this to "Max new tokens" might make more sense.

@imartinez imartinez merged commit 02dc83e into zylon-ai:main Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants