Skip to content

Commit 313ebec

Browse files
authored
Merge pull request sgl-project#101 from sgl-project/97-doc-update-readme
update README.md, make it more friendly
2 parents 2f3f35b + 95eb383 commit 313ebec

File tree

1 file changed

+11
-75
lines changed

1 file changed

+11
-75
lines changed

README.md

Lines changed: 11 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22

33
SGL-JAX is a high-performance, JAX-based inference engine for Large Language Models (LLMs), specifically optimized for Google TPUs. It is engineered from the ground up to deliver exceptional throughput and low latency for the most demanding LLM serving workloads.
44

5-
The engine integrates state-of-the-art techniques to maximize hardware utilization and serving efficiency, making it an ideal solution for deploying large-scale models in production with TPU.
5+
The engine incorporates state-of-the-art techniques to maximize hardware utilization and serving efficiency, making it ideal for deploying large-scale models in production on TPUs.
6+
7+
[![Pypi](https://img.shields.io/badge/pypi-sglang--jax-orange.svg)](https://pypi.org/project/sglang-jax) [![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](https://github.com/sgl-project/sglang-jax?tab=Apache-2.0-1-ov-file#readme)
68

79
## Key Features
810

@@ -23,84 +25,18 @@ SGL-JAX operates on a distributed architecture designed for scalability and perf
2325
4. **Model Runner**: Manages the actual JAX-based model execution, including the forward pass, attention computation, and KV cache operations.
2426
5. **Radix Cache**: A global, memory-efficient KV cache that is shared across all requests, enabling prefix reuse and reducing the memory footprint.
2527

26-
## Quick Start
27-
28-
Follow these steps to get a model server up and running.
29-
30-
### 1. Installation
31-
32-
First, clone the repository and install the necessary dependencies. It is recommended to do this in a virtual environment.
33-
34-
```bash
35-
git clone https://github.com/your-org/sgl-jax.git
36-
cd sgl-jax/python
37-
pip install -e .
38-
```
39-
40-
### 2. Launch the Server
41-
42-
You can launch the OpenAI-compatible API server using the `sgl_jax.launch_server` module.
43-
44-
```bash
45-
# Example: Launching a server for Qwen1.5-7B-Chat
46-
python -m sgl_jax.launch_server \
47-
--model-path Qwen/Qwen1.5-7B-Chat \
48-
--tp-size 4 \
49-
--port 8000 \
50-
--host 0.0.0.0
51-
```
28+
---
5229

53-
**Key Arguments**:
54-
* `--model-path`: The path to the model on the Hugging Face Hub or a local directory.
55-
* `--tp-size`: The number of TPU devices to use for tensor parallelism.
56-
* `--port`: The port for the API server.
57-
* `--host`: The host address to bind the server to.
30+
## Getting Started
5831

59-
### 3. Send a Request
60-
61-
Once the server is running, you can interact with it using any OpenAI-compatible client, such as `curl` or the `openai` Python library.
62-
63-
#### Using `curl`:
64-
65-
```bash
66-
curl http://localhost:8000/v1/chat/completions \
67-
-H "Content-Type: application/json" \
68-
-d '{
69-
"model": "Qwen/Qwen1.5-7B-Chat",
70-
"messages": [
71-
{"role": "system", "content": "You are a helpful assistant."},
72-
{"role": "user", "content": "Hello, what is JAX?"}
73-
],
74-
"max_tokens": 100,
75-
"temperature": 0.7
76-
}'
77-
```
78-
79-
#### Using the `openai` Python client:
80-
81-
```python
82-
import openai
83-
84-
# Point the client to the local server
85-
client = openai.OpenAI(
86-
api_key="your-api-key", # Can be any string
87-
base_url="http://localhost:8000/v1"
88-
)
89-
90-
response = client.chat.completions.create(
91-
model="Qwen/Qwen1.5-7B-Chat",
92-
messages=[
93-
{"role": "system", "content": "You are a helpful assistant."},
94-
{"role": "user", "content": "Hello, what is JAX?"}
95-
]
96-
)
97-
98-
print(response.choices[0].message.content)
99-
```
32+
- [Install SGL-JAX](https://github.com/sgl-project/sglang-jax/blob/main/docs/get_started/install.md)
33+
- [Quick Start](https://github.com/sgl-project/sglang-jax/blob/main/docs/basic_usage/qwen.md)
34+
- [Benchmark and Profiling](https://github.com/sgl-project/sglang-jax/blob/main/docs/developer_guide/benchmark_and_profiling.md)
35+
- [Contribution Guide](https://github.com/sgl-project/sglang-jax/blob/main/docs/developer_guide/contribution_guide.md)
10036

10137
## Documentation
10238

103-
For more features and usage details, please read the documents in the [`docs`](./docs/) directory.
39+
For more features and usage details, please read the documents in the [`docs`](https://github.com/sgl-project/sglang-jax/tree/main/docs) directory.
10440

10541
## Supported Models
10642

@@ -112,7 +48,7 @@ SGL-JAX is designed for easy extension to new model architectures. It currently
11248

11349
## Performance and Benchmarking
11450

115-
Performance is a core focus of SGL-JAX. The engine is continuously benchmarked to ensure high throughput and low latency. For detailed performance evaluation and to run the benchmarks yourself, please see the scripts located in the `benchmark/` and `python/sgl_jax/` directories (e.g., `bench_serving.py`).
51+
For detailed performance evaluation and to run the benchmarks yourself, please see the scripts located in the `benchmark/` and `python/sgl_jax/` directories (e.g., `bench_serving.py`).
11652

11753
## Testing
11854

0 commit comments

Comments
 (0)