|
6 | 6 | "source": [ |
7 | 7 | "# Llama.cpp\n", |
8 | 8 | "\n", |
9 | | - "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", |
| 9 | + "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n", |
10 | 10 | "\n", |
11 | | - "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp), which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", |
| 11 | + "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp#description) models, which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", |
12 | 12 | "\n", |
13 | 13 | "This notebook goes over how to run `llama-cpp-python` within LangChain.\n", |
14 | 14 | "\n", |
|
54 | 54 | "source": [ |
55 | 55 | "### Installation with OpenBLAS / cuBLAS / CLBlast\n", |
56 | 56 | "\n", |
57 | | - "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", |
| 57 | + "`llama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", |
58 | 58 | "\n", |
59 | 59 | "Example installation with cuBLAS backend:" |
60 | 60 | ] |
|
177 | 177 | "\n", |
178 | 178 | "You don't need an `API_TOKEN` as you will run the LLM locally.\n", |
179 | 179 | "\n", |
180 | | - "It is worth understanding which models are suitable to be used on the desired machine." |
| 180 | + "It is worth understanding which models are suitable to be used on the desired machine.\n", |
| 181 | + "\n", |
| 182 | + "[TheBloke's](https://huggingface.co/TheBloke) Hugging Face models have a `Provided files` section that exposes the RAM required to run models of different quantisation sizes and methods (eg: [Llama2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF#provided-files)).\n", |
| 183 | + "\n", |
| 184 | + "This [github issue](https://github.com/facebookresearch/llama/issues/425) is also relevant to find the right model for your machine." |
181 | 185 | ] |
182 | 186 | }, |
183 | 187 | { |
|
0 commit comments