Skip to content

Resolves #2905 openai compatible model provider add llama.cpp rerank support#2906

Merged
KevinHuSh merged 1 commit intoinfiniflow:mainfrom
ziyu4huang:issue-2905-resolve
Oct 21, 2024
Merged

Resolves #2905 openai compatible model provider add llama.cpp rerank support#2906
KevinHuSh merged 1 commit intoinfiniflow:mainfrom
ziyu4huang:issue-2905-resolve

Conversation

@ziyu4huang
Copy link
Contributor

@ziyu4huang ziyu4huang commented Oct 20, 2024

What problem does this PR solve?

Resolve #2905

due to the in-consistent of token size, I make it safe to limit 500 in code, since there is no config param to control

my llama.cpp run set -ub to 1024:

${llama_path}/bin/llama-server --host 0.0.0.0 --port 9901 -ub 1024 -ngl 99 -m $gguf_file --reranking "$@"

Type of change

  • New Feature (non-breaking change which adds functionality)

Here is my test Ragflow use llama.cpp

lot update_slots: id  0 | task 458 | prompt done, n_past = 416, n_tokens = 416
slot      release: id  0 | task 458 | stop processing: n_past = 416, truncated = 0
slot launch_slot_: id  0 | task 459 | processing task
slot update_slots: id  0 | task 459 | tokenizing prompt, len = 2
slot update_slots: id  0 | task 459 | prompt tokenized, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 111
slot update_slots: id  0 | task 459 | kv cache rm [0, end)
slot update_slots: id  0 | task 459 | prompt processing progress, n_past = 111, n_tokens = 111, progress = 1.000000
slot update_slots: id  0 | task 459 | prompt done, n_past = 111, n_tokens = 111
slot      release: id  0 | task 459 | stop processing: n_past = 111, truncated = 0
srv  update_slots: all slots are idle
request: POST /rerank 172.23.0.4 200

@ziyu4huang ziyu4huang changed the title Resolves #2905 Resolves #2905 openai compatible model provider add llama.cpp rerank support Oct 20, 2024
@yingfeng yingfeng added the ci Continue Integration label Oct 21, 2024
@KevinHuSh KevinHuSh merged commit e5f7733 into infiniflow:main Oct 21, 2024
Halfknow pushed a commit to Halfknow/ragflow that referenced this pull request Nov 11, 2024
…pp rerank support (infiniflow#2906)

### What problem does this PR solve?
Resolve infiniflow#2905 



due to the in-consistent of token size, I make it safe to limit 500 in
code, since there is no config param to control

my llama.cpp run set -ub to 1024:

${llama_path}/bin/llama-server --host 0.0.0.0 --port 9901 -ub 1024 -ngl
99 -m $gguf_file --reranking "$@"





### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Here is my test Ragflow use llama.cpp

```
lot update_slots: id  0 | task 458 | prompt done, n_past = 416, n_tokens = 416
slot      release: id  0 | task 458 | stop processing: n_past = 416, truncated = 0
slot launch_slot_: id  0 | task 459 | processing task
slot update_slots: id  0 | task 459 | tokenizing prompt, len = 2
slot update_slots: id  0 | task 459 | prompt tokenized, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 111
slot update_slots: id  0 | task 459 | kv cache rm [0, end)
slot update_slots: id  0 | task 459 | prompt processing progress, n_past = 111, n_tokens = 111, progress = 1.000000
slot update_slots: id  0 | task 459 | prompt done, n_past = 111, n_tokens = 111
slot      release: id  0 | task 459 | stop processing: n_past = 111, truncated = 0
srv  update_slots: all slots are idle
request: POST /rerank 172.23.0.4 200

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: add rerank support to llama.cpp rerank

3 participants