Skip to content

Commit d6ec04d

Browse files
XinyaoWapre-commit-ci[bot]
authored andcommitted
Add timeout param for DocSum and FaqGen to deal with long context (opea-project#1329)
* Add timeout param for DocSum and FaqGen to deal with long context Make timeout param configurable, solve issue opea-project/GenAIExamples#1481 Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
1 parent 523eca1 commit d6ec04d

File tree

10 files changed

+17
-10
lines changed

10 files changed

+17
-10
lines changed

comps/cores/proto/api_protocol.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ class ChatCompletionRequest(BaseModel):
195195
# top_p: Optional[float] = None # Priority use openai
196196
typical_p: Optional[float] = None
197197
# repetition_penalty: Optional[float] = None
198+
timeout: Optional[int] = None
198199

199200
# doc: begin-chat-completion-extra-params
200201
echo: Optional[bool] = Field(

comps/llms/src/doc-summarization/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,8 @@ curl http://${your_ip}:9000/v1/docsum \
133133

134134
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
135135

136+
With long contexts, request may get canceled due to its generation taking longer than the default `timeout` value (120s for TGI). Increase it as needed.
137+
136138
**summary_type=stuff**
137139

138140
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
@@ -157,7 +159,7 @@ In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.ma
157159
```bash
158160
curl http://${your_ip}:9000/v1/docsum \
159161
-X POST \
160-
-d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false}' \
162+
-d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false, "timeout":200}' \
161163
-H 'Content-Type: application/json'
162164
```
163165

@@ -170,6 +172,6 @@ In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * inpu
170172
```bash
171173
curl http://${your_ip}:9000/v1/docsum \
172174
-X POST \
173-
-d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000}' \
175+
-d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000, "timeout":200}' \
174176
-H 'Content-Type: application/json'
175177
```

comps/llms/src/doc-summarization/integrations/tgi.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ async def invoke(self, input: DocSumChatCompletionRequest):
7070
temperature=input.temperature if input.temperature else 0.01,
7171
repetition_penalty=input.repetition_penalty if input.repetition_penalty else 1.03,
7272
streaming=input.stream,
73+
timeout=input.timeout if input.timeout is not None else 120,
7374
server_kwargs=server_kwargs,
7475
task="text-generation",
7576
)

comps/llms/src/doc-summarization/integrations/vllm.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ async def invoke(self, input: DocSumChatCompletionRequest):
6363
top_p=input.top_p if input.top_p else 0.95,
6464
streaming=input.stream,
6565
temperature=input.temperature if input.temperature else 0.01,
66+
request_timeout=float(input.timeout) if input.timeout is not None else None,
6667
)
6768
result = await self.generate(input, self.client)
6869

comps/llms/src/faq-generation/integrations/tgi.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ async def invoke(self, input: ChatCompletionRequest):
6767
temperature=input.temperature if input.temperature else 0.01,
6868
repetition_penalty=input.repetition_penalty if input.repetition_penalty else 1.03,
6969
streaming=input.stream,
70+
timeout=input.timeout if input.timeout is not None else 120,
7071
server_kwargs=server_kwargs,
7172
)
7273
result = await self.generate(input, self.client)

comps/llms/src/faq-generation/integrations/vllm.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ async def invoke(self, input: ChatCompletionRequest):
6060
top_p=input.top_p if input.top_p else 0.95,
6161
streaming=input.stream,
6262
temperature=input.temperature if input.temperature else 0.01,
63+
request_timeout=float(input.timeout) if input.timeout is not None else None,
6364
)
6465
result = await self.generate(input, self.client)
6566

tests/llms/test_llms_doc-summarization_tgi.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,15 +125,15 @@ function validate_microservices() {
125125
'text' \
126126
"docsum-tgi" \
127127
"docsum-tgi" \
128-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false}'
128+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false, "timeout":200}'
129129

130130
echo "Validate refine mode..."
131131
validate_services \
132132
"$URL" \
133133
'text' \
134134
"docsum-tgi" \
135135
"docsum-tgi" \
136-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000}'
136+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000, "timeout":200}'
137137
}
138138

139139
function stop_docker() {

tests/llms/test_llms_doc-summarization_tgi_on_intel_hpu.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,15 +126,15 @@ function validate_microservices() {
126126
'text' \
127127
"docsum-tgi-gaudi" \
128128
"docsum-tgi-gaudi" \
129-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false}'
129+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false, "timeout":200}'
130130

131131
echo "Validate refine mode..."
132132
validate_services \
133133
"$URL" \
134134
'text' \
135135
"docsum-tgi-gaudi" \
136136
"docsum-tgi-gaudi" \
137-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000}'
137+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000, "timeout":200}'
138138
}
139139

140140
function stop_docker() {

tests/llms/test_llms_doc-summarization_vllm.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,15 +140,15 @@ function validate_microservices() {
140140
'text' \
141141
"docsum-vllm" \
142142
"docsum-vllm" \
143-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false}'
143+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false, "timeout":200}'
144144

145145
echo "Validate refine mode..."
146146
validate_services \
147147
"$URL" \
148148
'text' \
149149
"docsum-vllm" \
150150
"docsum-vllm" \
151-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000}'
151+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000, "timeout":200}'
152152
}
153153

154154
function stop_docker() {

tests/llms/test_llms_doc-summarization_vllm_on_intel_hpu.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,15 +139,15 @@ function validate_microservices() {
139139
'text' \
140140
"docsum-vllm-gaudi" \
141141
"docsum-vllm-gaudi" \
142-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false}'
142+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "map_reduce", "chunk_size": 2000, "stream":false, "timeout":200}'
143143

144144
echo "Validate refine mode..."
145145
validate_services \
146146
"$URL" \
147147
'text' \
148148
"docsum-vllm-gaudi" \
149149
"docsum-vllm-gaudi" \
150-
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000}'
150+
'{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "summary_type": "refine", "chunk_size": 2000, "timeout":200}'
151151
}
152152

153153
function stop_docker() {

0 commit comments

Comments
 (0)