Skip to content

Conversation

@njbrake
Copy link
Contributor

@njbrake njbrake commented Oct 17, 2025

Description

Works with our existing logic!

To set up llama cpp I did

 brew install llama.cpp
llama-server -hf ggml-org/Qwen3-1.7B-GGUF --jinja --port 8090

Make sure you don't have LLAMA_API_KEY set, otherwise it'll add auth to the llama cpp server which we don't want.
That's an annoying thing, the Meta Llama API and Llama.cpp share the same API key name.

PR Type

🆕 New Feature ## Relevant issues

Checklist

  • I have added unit tests that prove my fix/feature works
  • New and existing tests pass locally
  • Documentation was updated where necessary
  • I have read and followed the contribution guidelines```

@njbrake njbrake linked an issue Oct 17, 2025 that may be closed by this pull request
@njbrake njbrake merged commit c6e5ee1 into main Oct 17, 2025
9 checks passed
@njbrake njbrake deleted the 287-llamacpp-completion-api-reasoning-support branch October 17, 2025 15:25
@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/any_llm/providers/llamacpp/llamacpp.py 100.00% <100.00%> (ø)

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Llama.cpp Completion API Reasoning Support

2 participants