Add vllm support #202

meanderingstream · 2025-11-09T17:42:36Z

Pull Request

Adds a vLLM provider that supports self-hosted models.

As a self-hosted model, the user must configure and start a vLLM service before making ReqLLM calls against the service url. In ReqLLM, there are some differences between continually hosted models and self-hosted models. Self-hosted models need the ability to configure the location of the self-hosted Provider base_url. Since, the model service is not publicly hosted, the ReqLLM Catalog capability was added to support run-time configuration of providers like vLLM.

However, vLLM doesn't host multiple models on the same port. This PR also adds a Model base_url field that is used to override the Provider base_url (and port) field when setting up the HTTP request.

The capability was tested against locally hosted models. The following small models were used to verify chat generation, chat streaming and chat with an image, HuggingFaceTB/SmolLM2-135M-Instruct and HuggingFaceTB/SmolVLM-256M-Instruct.

By default, vLLM uses OPENAI_API_KEY as an environment variable. The presence of a value is needed but, by default, the value isn't checked.

For ReqLLM users that want to try other models, I've used the following to deploy local models.

Set up vLLM:
I have direct experience with running models on a Linux server with a small, 10G VRAM 3080 GPU. In addition to installing vLLM as a python program, it is possible to deploy vLLM as a docker service. For CPU serving, there is a docker configuration for CPU in the vLLM codebase. Please refer to the vLLM documentation for further details on how to set up a service.

I found it useful to specify the --served-model-name model_name_you_choose parameter. The model_name_you_choose becomes the model_id in a ReqLLM catalog configuration.

Set up ReqLLM:
In a client Elixir application, configure the models in environment configurations, i.e. dev.exs, etc.

Here is an example configuration:

import Config

config :req_llm, :catalog,
  allow: %{
    vllm: ["SmolLM2-135M-Instruct"]
  },
  overrides: [],
  custom: [
    %{
      provider: %{
        id: "vllm",
        name: "vLLM Local",
        base_url: "http://localhost:8000/v1",
        env: ["OPENAI_API_KEY"]
      },
      models: [
        %{
          id: "SmolLM2-135M-Instruct",
          name: "SmolLM2 135M Instruct",
          api: "chat",
          modalities: %{
            "input" => ["text"],
            "output" => ["text"]
          },
          limit: %{
            "context" => 8192
          },
          cost: %{
            "input" => 0.0,
            "output" => 0.0
          }
        }
      ]
    }
  ]

When trying to test a local vLLM model , using a local clone of the ReqLLM project, it is currently a little tricky. In the project root folder the lib/examples/scripts or iex -S mix will work when using the following steps, The catalog_allow.exs needs to have the locally hosted models listed in the vllm_models configuration, i.e.

vllm_models = ~w(
  SmolLM2-135M-Instruct
  SmolVLM-256M-Instruct
)

Additional notes:
I've found that the vllm_test.exs is a little flaky when running locally. It might be because the test models are defined and then excluded in the model JSON files. Running the tests multiple times, will demonstrate that they pass.

Type of Contribution

Core Library - Changes to core modules or data structures
New Provider - Adding a new LLM provider
Provider Feature - Adding capabilities to existing provider
Bug Fix - Fixing existing functionality
Documentation - Docs/guides only

Checklist

Tests pass (mix test)
Quality checks pass (mix quality)
Documentation updated

If Provider Changes

Fixtures generated (mix mc "provider:*" --record)
Model compatibility passes (mix mc "provider:*")

Model Compatibility Output:

# Paste output if provider changes

Related Issues

Closes #

VLLM tests are passing.

mikehostetler · 2025-11-09T18:39:05Z

@meanderingstream AMAZING WORK!!! Thanks so much Scott!

coveralls · 2025-11-09T18:40:48Z

Pull Request Test Coverage Report for Build 5c2674cd99687ab3f3fd03b17cb48c3983e38f8b-PR-202

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

5 of 5 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.06%) to 52.821%

Totals
Change from base Build 6fe32b91cd07861e771cebc4b8f3115bcb8219a9:	0.06%
Covered Lines:	3998
Relevant Lines:	7569

💛 - Coveralls

meanderingstream · 2025-11-11T13:34:48Z

Is there something I need to do to fix the failing CI? The log file looks like it was an infrastructure challenge.

mikehostetler · 2025-11-12T14:19:04Z

@meanderingstream there's a small test flakiness issue - don't worry about it

meanderingstream · 2025-11-18T23:57:53Z

@mikehostetler It looks like this PR is going to be dead code. The reason I wrote the code this way is because when there are two or more vLLM models running on the same node, they run on different ports. Something like localhost:8000 and localhost:8001.

I think the key decision is whether the LLMDB.Model will get a base_url like the code in this PR or whether we are just going to have a LLMDB configuration with multiple entries, one for each port. Something like local1: ...., base_url: localhost:8000 ...
local2: .... base_url: localhost:8001 ....

If you decide to add base_url to Model, then it will probably look like the following:

%LLMDB.Model{
  id: "gpt-4o-mini",
  provider: :openai,
  name: "GPT-4o mini",
  family: "gpt-4o",
  limits: %{context: 128_000, output: 16_384},
  cost: %{input: 0.15, output: 0.60},
  base_url: "http://localhost:8002,
  capabilities: %{
    chat: true,
    tools: %{enabled: true, streaming: true},
    json: %{native: true, schema: true},
    streaming: %{text: true, tool_calls: true}
  },
  tags: [],
  deprecated?: false,
  aliases: [],
  extra: %{}
}

If so, then ReqLLM.Provider.Options#inject_base_url_from_registry/3 would need the code to first use the model.base_url and then continue with the existing code. Something like this is shown in the Options#inject_base_url_from_registry of this PR.

Note: I didn't see that Options#effective_base_url did anything useful. When I tried to make the changes in the effective_base_url and then call the effective_base_url in the inject_base...., then I broke a whole bunch of model tests. I'm not sure why there are two methods with different implementations.

Based upon your follow-up comments, I would be happy to make these changes in the two libraries in order to get vLLM support to work.

Please let me know if you want me to create the two PRs.

mikehostetler · 2025-11-19T20:33:16Z

@meanderingstream Even with the conflicts, we can probably salvage this PR - the changes to integrate LLMDB were pretty straightforward. I am happy to help with the transition.

Regarding adding base_url to %Model - I'm fine adding it as an optional field into LLMDB and when it's not defined or null, the Provider base URL takes over.

meanderingstream added 3 commits November 8, 2025 20:10

add vLLM provider.

b2ac801

VLLM tests are passing.

Fix issue with Model base_url approach not working for providers.

99ccb08

Fix quality issues

5c2674c

mikehostetler mentioned this pull request Nov 13, 2025

Add vLLM provider with OpenAI compatibility and custom configuration #38

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vllm support #202

Add vllm support #202

Uh oh!

meanderingstream commented Nov 9, 2025 •

edited

Loading

Uh oh!

mikehostetler commented Nov 9, 2025

Uh oh!

coveralls commented Nov 9, 2025

Uh oh!

meanderingstream commented Nov 11, 2025

Uh oh!

mikehostetler commented Nov 12, 2025

Uh oh!

meanderingstream commented Nov 18, 2025 •

edited

Loading

Uh oh!

mikehostetler commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add vllm support #202

Are you sure you want to change the base?

Add vllm support #202

Uh oh!

Conversation

meanderingstream commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Type of Contribution

Checklist

If Provider Changes

Related Issues

Uh oh!

mikehostetler commented Nov 9, 2025

Uh oh!

coveralls commented Nov 9, 2025

Pull Request Test Coverage Report for Build 5c2674cd99687ab3f3fd03b17cb48c3983e38f8b-PR-202

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

meanderingstream commented Nov 11, 2025

Uh oh!

mikehostetler commented Nov 12, 2025

Uh oh!

meanderingstream commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikehostetler commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meanderingstream commented Nov 9, 2025 •

edited

Loading

meanderingstream commented Nov 18, 2025 •

edited

Loading