docs: update server streaming mode documentation#9519
docs: update server streaming mode documentation#9519ngxson merged 1 commit intoggml-org:masterfrom CentricStorm:patch-2
Conversation
|
Updated the streaming mode example script with split data handling, which has been tested with these unit tests: Avoided using Node.js |
|
Btw have you been able to test it with latest version on |
It seems like #9459 only added the |
|
Example script still works with b4291 (ce8784b), but changed |
There was a problem hiding this comment.
On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.
The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.
In this case, the code you provided is the same as openai implementation (because they also use SSE+POST method), there are many libraries on npm that can handle this (for example, this). So adding this here brings no more additional info to the docs, while adding maintenance cost in the future.
Removed example code. |
Provide more documentation for streaming mode.
|
Suggestions implemented. |
Provide more documentation for streaming mode.
Server documentation:
n_predictin existing non-streamed example script (because on some computers 512 tokens can take a long time)