Skip to content

Add a FastAPI app #113

Merged
juliendenize merged 36 commits intomainfrom
improve_llama_cpp_integration
Jul 25, 2025
Merged

Add a FastAPI app #113
juliendenize merged 36 commits intomainfrom
improve_llama_cpp_integration

Conversation

@juliendenize
Copy link
Copy Markdown
Contributor

@juliendenize juliendenize commented Jul 1, 2025

This PR adds a FastAPI app server to mistral-common.

The features include:

  • tokenizing a prompt, list of messages, or chat completion requests (ours and openai)
  • detokenize a list of tokens
  • apply a "chat template". We actually don't have chat templates in mistral-common but a dedicated route will return the same string as a chat template would. Not gonna do that as discussed on Slack because might be prone to issues (we should tokenize always as ints)

This should improve integrations with some LLM inference backends such as llama.cpp.

Edit:
Now also supports tool call parsing via the detokenize route when passing to the request as_message=True

@juliendenize juliendenize self-assigned this Jul 1, 2025
@juliendenize juliendenize force-pushed the improve_llama_cpp_integration branch from e85ebee to ec07ed7 Compare July 23, 2025 14:35
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits for docs

@juliendenize juliendenize merged commit 10b44c0 into main Jul 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants