Skip to content

Conversation

@meanderingstream
Copy link

@meanderingstream meanderingstream commented Nov 9, 2025

Pull Request

Adds a vLLM provider that supports self-hosted models.

As a self-hosted model, the user must configure and start a vLLM service before making ReqLLM calls against the service url. In ReqLLM, there are some differences between continually hosted models and self-hosted models. Self-hosted models need the ability to configure the location of the self-hosted Provider base_url. Since, the model service is not publicly hosted, the ReqLLM Catalog capability was added to support run-time configuration of providers like vLLM.

However, vLLM doesn't host multiple models on the same port. This PR also adds a Model base_url field that is used to override the Provider base_url (and port) field when setting up the HTTP request.

The capability was tested against locally hosted models. The following small models were used to verify chat generation, chat streaming and chat with an image, HuggingFaceTB/SmolLM2-135M-Instruct and HuggingFaceTB/SmolVLM-256M-Instruct.

By default, vLLM uses OPENAI_API_KEY as an environment variable. The presence of a value is needed but, by default, the value isn't checked.

For ReqLLM users that want to try other models, I've used the following to deploy local models.

Set up vLLM:
I have direct experience with running models on a Linux server with a small, 10G VRAM 3080 GPU. In addition to installing vLLM as a python program, it is possible to deploy vLLM as a docker service. For CPU serving, there is a docker configuration for CPU in the vLLM codebase. Please refer to the vLLM documentation for further details on how to set up a service.

I found it useful to specify the --served-model-name model_name_you_choose parameter. The model_name_you_choose becomes the model_id in a ReqLLM catalog configuration.

Set up ReqLLM:
In a client Elixir application, configure the models in environment configurations, i.e. dev.exs, etc.

Here is an example configuration:

import Config

config :req_llm, :catalog,
  allow: %{
    vllm: ["SmolLM2-135M-Instruct"]
  },
  overrides: [],
  custom: [
    %{
      provider: %{
        id: "vllm",
        name: "vLLM Local",
        base_url: "http://localhost:8000/v1",
        env: ["OPENAI_API_KEY"]
      },
      models: [
        %{
          id: "SmolLM2-135M-Instruct",
          name: "SmolLM2 135M Instruct",
          api: "chat",
          modalities: %{
            "input" => ["text"],
            "output" => ["text"]
          },
          limit: %{
            "context" => 8192
          },
          cost: %{
            "input" => 0.0,
            "output" => 0.0
          }
        }
      ]
    }
  ]

When trying to test a local vLLM model , using a local clone of the ReqLLM project, it is currently a little tricky. In the project root folder the lib/examples/scripts or iex -S mix will work when using the following steps, The catalog_allow.exs needs to have the locally hosted models listed in the vllm_models configuration, i.e.

vllm_models = ~w(
  SmolLM2-135M-Instruct
  SmolVLM-256M-Instruct
)

Additional notes:
I've found that the vllm_test.exs is a little flaky when running locally. It might be because the test models are defined and then excluded in the model JSON files. Running the tests multiple times, will demonstrate that they pass.

Type of Contribution

  • Core Library - Changes to core modules or data structures
  • New Provider - Adding a new LLM provider
  • Provider Feature - Adding capabilities to existing provider
  • Bug Fix - Fixing existing functionality
  • Documentation - Docs/guides only

Checklist

  • Tests pass (mix test)
  • Quality checks pass (mix quality)
  • Documentation updated

If Provider Changes

  • Fixtures generated (mix mc "provider:*" --record)
  • Model compatibility passes (mix mc "provider:*")

Model Compatibility Output:

# Paste output if provider changes

Related Issues

Closes #

@mikehostetler
Copy link
Contributor

@meanderingstream AMAZING WORK!!! Thanks so much Scott!

@coveralls
Copy link

Pull Request Test Coverage Report for Build 5c2674cd99687ab3f3fd03b17cb48c3983e38f8b-PR-202

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 5 of 5 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 52.821%

Totals Coverage Status
Change from base Build 6fe32b91cd07861e771cebc4b8f3115bcb8219a9: 0.06%
Covered Lines: 3998
Relevant Lines: 7569

💛 - Coveralls

@meanderingstream
Copy link
Author

Is there something I need to do to fix the failing CI? The log file looks like it was an infrastructure challenge.

@mikehostetler
Copy link
Contributor

@meanderingstream there's a small test flakiness issue - don't worry about it

@meanderingstream
Copy link
Author

meanderingstream commented Nov 18, 2025

@mikehostetler It looks like this PR is going to be dead code. The reason I wrote the code this way is because when there are two or more vLLM models running on the same node, they run on different ports. Something like localhost:8000 and localhost:8001.

I think the key decision is whether the LLMDB.Model will get a base_url like the code in this PR or whether we are just going to have a LLMDB configuration with multiple entries, one for each port. Something like local1: ...., base_url: localhost:8000 ...
local2: .... base_url: localhost:8001 ....

If you decide to add base_url to Model, then it will probably look like the following:

%LLMDB.Model{
  id: "gpt-4o-mini",
  provider: :openai,
  name: "GPT-4o mini",
  family: "gpt-4o",
  limits: %{context: 128_000, output: 16_384},
  cost: %{input: 0.15, output: 0.60},
  base_url: "http://localhost:8002,
  capabilities: %{
    chat: true,
    tools: %{enabled: true, streaming: true},
    json: %{native: true, schema: true},
    streaming: %{text: true, tool_calls: true}
  },
  tags: [],
  deprecated?: false,
  aliases: [],
  extra: %{}
}

If so, then ReqLLM.Provider.Options#inject_base_url_from_registry/3 would need the code to first use the model.base_url and then continue with the existing code. Something like this is shown in the Options#inject_base_url_from_registry of this PR.

Note: I didn't see that Options#effective_base_url did anything useful. When I tried to make the changes in the effective_base_url and then call the effective_base_url in the inject_base...., then I broke a whole bunch of model tests. I'm not sure why there are two methods with different implementations.

Based upon your follow-up comments, I would be happy to make these changes in the two libraries in order to get vLLM support to work.

Please let me know if you want me to create the two PRs.

@mikehostetler
Copy link
Contributor

@meanderingstream Even with the conflicts, we can probably salvage this PR - the changes to integrate LLMDB were pretty straightforward. I am happy to help with the transition.

Regarding adding base_url to %Model - I'm fine adding it as an optional field into LLMDB and when it's not defined or null, the Provider base URL takes over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants