docs: update server streaming mode documentation by CentricStorm · Pull Request #9519 · ggml-org/llama.cpp

CentricStorm · 2024-09-17T05:03:29Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Server documentation:

Mentioned that streaming mode uses a different response format
Added a link to documentation of that response format
Reduced the size of n_predict in existing non-streamed example script (because on some computers 512 tokens can take a long time)
Made style of existing non-streamed example script more consistent

examples/server/README.md

CentricStorm · 2024-11-28T04:16:01Z

Updated the streaming mode example script with split data handling, which has been tested with these unit tests:

// Chunks with a total of 9 tokens: " token token token token, token, token,", split at different positions.
const chunks = [
    `data: {"co`,
    `ntent":" token"}\n\n`,
    `data: {"content":" token"}\n`,
    `\n`,
    `data: {"content":" token"}\n\n`,
    `data: {"content":" token"}\n\ndata: {"co`,
    `ntent":","}\n\n`,
    `data: {"content":" token"}\n\ndata: {"content":","}\n`,
    `\n`,
    `data: {"content":" token"}\n\ndata: {"content":","}\n\n`
]

Avoided using Node.js readline so that it can work in browsers as well with minimal modification.

ngxson · 2024-11-28T10:53:25Z

Btw have you been able to test it with latest version on master branch? We added [DONE] as last data chunk at some point, to be aligned with openai implementation

CentricStorm · 2024-11-29T01:02:17Z

Btw have you been able to test it with latest version on master branch? We added [DONE] as last data chunk at some point, to be aligned with openai implementation

It seems like #9459 only added the [DONE] event for the OpenAI-compatible /chat/completions API, not for the /completion API that this example uses:

https://github.com/ggerganov/llama.cpp/blob/678d7994f4da0af3d29046be99950ac999ee9762/examples/server/server.cpp#L3027

CentricStorm · 2024-12-09T05:26:49Z

Example script still works with b4291 (ce8784b), but changed localhost to 127.0.0.1 because of a separate issue with the latest version of Node.js.

ngxson

On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.

The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.

In this case, the code you provided is the same as openai implementation (because they also use SSE+POST method), there are many libraries on npm that can handle this (for example, this). So adding this here brings no more additional info to the docs, while adding maintenance cost in the future.

CentricStorm · 2024-12-11T06:02:28Z

On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.

The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.

Removed example code.

examples/server/README.md

Provide more documentation for streaming mode.

CentricStorm · 2024-12-11T22:01:43Z

Suggestions implemented.

Provide more documentation for streaming mode.

github-actions bot added examples server labels Sep 17, 2024

ggerganov approved these changes Sep 17, 2024

View reviewed changes

CentricStorm mentioned this pull request Sep 17, 2024

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

Closed

ngxson reviewed Sep 18, 2024

View reviewed changes

examples/server/README.md Outdated Show resolved Hide resolved

ngxson approved these changes Sep 18, 2024

View reviewed changes

CentricStorm requested review from ggerganov and ngxson November 28, 2024 04:27

ggerganov approved these changes Nov 28, 2024

View reviewed changes

CentricStorm closed this Dec 9, 2024

CentricStorm deleted the patch-2 branch December 9, 2024 04:43

CentricStorm restored the patch-2 branch December 9, 2024 04:43

CentricStorm deleted the patch-2 branch December 9, 2024 04:43

CentricStorm restored the patch-2 branch December 9, 2024 04:45

CentricStorm deleted the patch-2 branch December 9, 2024 04:46

CentricStorm restored the patch-2 branch December 9, 2024 04:53

CentricStorm reopened this Dec 9, 2024

ngxson requested changes Dec 9, 2024

View reviewed changes

ngxson requested changes Dec 11, 2024

View reviewed changes

examples/server/README.md Outdated Show resolved Hide resolved

examples/server/README.md Outdated Show resolved Hide resolved

docs: update server streaming mode documentation

40c0724

Provide more documentation for streaming mode.

ngxson approved these changes Dec 11, 2024

View reviewed changes

ngxson merged commit 5555c0c into ggml-org:master Dec 11, 2024

CentricStorm deleted the patch-2 branch December 12, 2024 02:18

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

docs: update server streaming mode documentation (ggml-org#9519)

a51a27f

Provide more documentation for streaming mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update server streaming mode documentation#9519

docs: update server streaming mode documentation#9519
ngxson merged 1 commit intoggml-org:masterfrom
CentricStorm:patch-2

CentricStorm commented Sep 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

CentricStorm commented Nov 28, 2024 •

edited

Loading

Uh oh!

ngxson commented Nov 28, 2024 •

edited

Loading

Uh oh!

CentricStorm commented Nov 29, 2024

Uh oh!

CentricStorm commented Dec 9, 2024

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

CentricStorm commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

CentricStorm commented Dec 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CentricStorm commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CentricStorm commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CentricStorm commented Nov 29, 2024

Uh oh!

CentricStorm commented Dec 9, 2024

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CentricStorm commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

CentricStorm commented Dec 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CentricStorm commented Sep 17, 2024 •

edited

Loading

CentricStorm commented Nov 28, 2024 •

edited

Loading

ngxson commented Nov 28, 2024 •

edited

Loading

ngxson left a comment •

edited

Loading