Skip to content

Commit b03ed5e

Browse files
authored
feat: Add Xorbits Inference for local deployment (#7151)
1 parent 31a4f53 commit b03ed5e

File tree

7 files changed

+625
-1
lines changed

7 files changed

+625
-1
lines changed

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
# ChangeLog
22

3-
## [0.7.22]
3+
## Unreleased
4+
5+
### New Features
6+
- Added Xorbits inference for local deployments (#7151)
7+
8+
## [0.7.22] - 2023-08-08
49

510
### New Features
611
- add ensemble retriever notebook (#7190)

docs/core_modules/model_modules/llms/modules.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,11 @@ maxdepth: 1
8181
---
8282
/examples/llm/llama_api.ipynb
8383
```
84+
85+
## Xorbits Inference
86+
```{toctree}
87+
---
88+
maxdepth: 1
89+
---
90+
/examples/llm/XinferenceLocalDeployment.ipynb
91+
```
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"id": "7096589b-daaf-440a-b89d-b4956f2db4b2",
7+
"metadata": {
8+
"tags": []
9+
},
10+
"source": [
11+
"# Using Xorbits Inference to Deploy Local LLMs - in 3 steps!\n"
12+
]
13+
},
14+
{
15+
"attachments": {},
16+
"cell_type": "markdown",
17+
"id": "d8cfbe6f-4c50-4c4f-90f9-03bb91201ef5",
18+
"metadata": {},
19+
"source": [
20+
"## <span style=\"font-size: xx-large;;\">🤖 </span> Installing and Running Xorbits Inference (1/3)\n",
21+
"\n",
22+
"#### i. Run `pip install \"xinference[all]\"` in a terminal window\n",
23+
"\n",
24+
"#### ii. After installation is complete, restart this jupyter notebook\n",
25+
"\n",
26+
"#### iii. Run `xinference` in a new terminal window\n",
27+
"\n",
28+
"#### iv. You should see something similar to the following output:\n",
29+
"\n",
30+
"```\n",
31+
"INFO:xinference:Xinference successfully started. Endpoint: http://127.0.0.1:9997\n",
32+
"INFO:xinference.core.service:Worker 127.0.0.1:21561 has been added successfully\n",
33+
"INFO:xinference.deploy.worker:Xinference worker successfully started.\n",
34+
"```\n",
35+
"\n",
36+
"#### v. In the endpoint description, locate the endpoint port number after the colon. In the above case it is `9997`\n",
37+
"\n",
38+
"#### vi. Paste the endpoint port number in the following cell"
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": null,
44+
"id": "5d520d56",
45+
"metadata": {},
46+
"outputs": [],
47+
"source": [
48+
"port = 9997 # replace with your endpoint port number"
49+
]
50+
},
51+
{
52+
"attachments": {},
53+
"cell_type": "markdown",
54+
"id": "93139076",
55+
"metadata": {},
56+
"source": [
57+
"## <span style=\"font-size: xx-large;;\">🚀 </span> Downloading and Launching Local Models (2/3)\n",
58+
"\n",
59+
"#### In this step, simply run the following code blocks\n",
60+
"\n",
61+
"#### Also, feel free to change the model configuration for different experiences!\n",
62+
"\n",
63+
"#### The latest list of supported models can be found in Xorbits Inference's [official GitHub page](https://github.com/xorbitsai/inference/blob/main/README.md)\n",
64+
"\n",
65+
"##### Here are the parameter options for vicuna-v1.3, ranked from the least space-consuming to the most resource-intensive but high-performing:\n",
66+
"\n",
67+
"model_size_in_billions: `7`, `13`, `33`\n",
68+
"\n",
69+
"quantization: `q2_K`, `q3_K_L`, `q3_K_M`, `q3_K_S`, `q4_0`, `q4_1`, `q4_K_M`, `q4_K_S`, `q5_0`, `q5_1`, `q5_K_M`, `q5_K_S`, `q6_K`, `q8_0`\n",
70+
"\n",
71+
"##### Here are a few of the supported models:\n",
72+
"\n",
73+
"| Name | Type | Language | Format | Size (in billions) | Quantization |\n",
74+
"|---------------|------------------|----------|---------|--------------------|-----------------------------------------|\n",
75+
"| baichuan | Foundation Model | en, zh | ggmlv3 | 7 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |\n",
76+
"| llama-2-chat | RLHF Model | en | ggmlv3 | 7, 13, 70 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |\n",
77+
"| chatglm | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |\n",
78+
"| chatglm2 | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |\n",
79+
"| wizardlm-v1.0 | SFT Model | en | ggmlv3 | 7, 13, 33 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |\n",
80+
"| wizardlm-v1.1 | SFT Model | en | ggmlv3 | 13 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |\n",
81+
"| vicuna-v1.3 | SFT Model | en | ggmlv3 | 7, 13 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |\n",
82+
"\n",
83+
"\n",
84+
"In order to achieve satisfactory results, it is recommended to use models above 13 billion in size."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": null,
90+
"id": "fd1d259c",
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"# If Xinference can not be imported, you may need to restart jupyter notebook\n",
95+
"from llama_index import (\n",
96+
" ListIndex,\n",
97+
" TreeIndex,\n",
98+
" VectorStoreIndex,\n",
99+
" KeywordTableIndex,\n",
100+
" KnowledgeGraphIndex,\n",
101+
" SimpleDirectoryReader,\n",
102+
" ServiceContext,\n",
103+
")\n",
104+
"from llama_index.llms import Xinference\n",
105+
"from xinference.client import RESTfulClient\n",
106+
"from IPython.display import Markdown, display"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "b48c6d7a-7a38-440b-8ecb-f43f9050ee54",
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"# Define a client to send commands to xinference\n",
117+
"client = RESTfulClient(f\"http://localhost:{port}\")\n",
118+
"\n",
119+
"# Download and Launch a model, this may take a while the first time\n",
120+
"model_uid = client.launch_model(\n",
121+
" model_name=\"llama-2-chat\",\n",
122+
" model_size_in_billions=7,\n",
123+
" model_format=\"ggmlv3\",\n",
124+
" quantization=\"q2_K\",\n",
125+
" n_ctx=4096,\n",
126+
")\n",
127+
"\n",
128+
"llm = Xinference(endpoint=f\"http://localhost:{port}\", model_uid=model_uid)\n",
129+
"service_context = ServiceContext.from_defaults(llm=llm)"
130+
]
131+
},
132+
{
133+
"attachments": {},
134+
"cell_type": "markdown",
135+
"id": "094a02b7",
136+
"metadata": {},
137+
"source": [
138+
"## <span style=\"font-size: xx-large;;\">🕺 </span> Index the Data and Start Chatting! (3/3)\n",
139+
"\n",
140+
"#### In this step, simply run the following code blocks\n",
141+
"\n",
142+
"#### Also, feel free to change the index that is used for different experiences\n",
143+
"\n",
144+
"#### A list of all available indexes can be found in Llama Index's [official Docs](https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/index/modules.html)\n",
145+
"\n",
146+
"Here are some available indexes that are imported:\n",
147+
"\n",
148+
"`ListIndex`, `TreeIndex`, `VetorStoreIndex`, `KeywordTableIndex`, `KnowledgeGraphIndex`\n",
149+
"\n",
150+
"The following code uses `VetorStoreIndex`. To change index, simply replace its name with another index"
151+
]
152+
},
153+
{
154+
"cell_type": "code",
155+
"execution_count": null,
156+
"id": "708b323e-d314-4b83-864b-22a1ead60de9",
157+
"metadata": {},
158+
"outputs": [],
159+
"source": [
160+
"# create index from the data\n",
161+
"documents = SimpleDirectoryReader(\"../data/paul_graham\").load_data()\n",
162+
"\n",
163+
"# change index name in the following line\n",
164+
"index = VectorStoreIndex.from_documents(\n",
165+
" documents=documents, service_context=service_context\n",
166+
")"
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"id": "2c2de13b-133f-404e-9661-2acafcdf2573",
173+
"metadata": {
174+
"scrolled": false
175+
},
176+
"outputs": [],
177+
"source": [
178+
"# ask a question and display the answer\n",
179+
"query_engine = index.as_query_engine()\n",
180+
"\n",
181+
"question = \"What did the author do after his time at Y Combinator?\"\n",
182+
"\n",
183+
"response = query_engine.query(question)\n",
184+
"display(Markdown(f\"<b>{response}</b>\"))"
185+
]
186+
}
187+
],
188+
"metadata": {
189+
"kernelspec": {
190+
"display_name": "Python 3 (ipykernel)",
191+
"language": "python",
192+
"name": "python3"
193+
},
194+
"language_info": {
195+
"codemirror_mode": {
196+
"name": "ipython",
197+
"version": 3
198+
},
199+
"file_extension": ".py",
200+
"mimetype": "text/x-python",
201+
"name": "python",
202+
"nbconvert_exporter": "python",
203+
"pygments_lexer": "ipython3",
204+
"version": "3.11.3"
205+
}
206+
},
207+
"nbformat": 4,
208+
"nbformat_minor": 5
209+
}

llama_index/llms/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from llama_index.llms.palm import PaLM
2020
from llama_index.llms.predibase import PredibaseLLM
2121
from llama_index.llms.replicate import Replicate
22+
from llama_index.llms.xinference import Xinference
2223

2324
__all__ = [
2425
"OpenAI",
@@ -40,4 +41,5 @@
4041
"CompletionResponseGen",
4142
"CompletionResponseAsyncGen",
4243
"LLMMetadata",
44+
"Xinference",
4345
]

0 commit comments

Comments
 (0)