Skip to content

Releases: mostlygeek/llama-swap

v171

29 Oct 07:12
a89b803

Choose a tag to compare

This release includes a unique feature to show model loading progress in the Reasoning content. When enabled in the config llama-swap will stream a bit of data so there is no silence when waiting for the model to swap and load.

  • Add a new global config setting: sendLoadingState: true
  • Add a new model override setting: model.sendLoadingState: true to control it on per model basis

Demo:

llama-swap-issue-366.mp4

Thanks to @ServeurpersoCom for the very cool idea!

Changelog

  • a89b803 Stream loading state when swapping models (#371)

v170

26 Oct 03:44
f852689

Choose a tag to compare

Fix a bug where a panic() can cause llama-swap to lock up or exit. Recommended update.

Changelog

  • f852689 proxy: add panic recovery to Process.ProxyRequest (#363)

v169

26 Oct 00:41
e250e71

Choose a tag to compare

This update adds usage tracking for API calls made to POST /upstream/{model}/{api}. Now, chats in the llama-server UI show up in the Activities tab. Any request to this endpoint that includes usage or timing info will appear there (infill, embeddings, etc).

Changelog

  • e250e71 Include metrics from upstream chat requests (#361)
  • d18dc26 cmd/wol-proxy: tweak logs to show what is causing wake ups (#356)

v168

24 Oct 05:25
8357714

Choose a tag to compare

Changelog

  • 8357714 ui: fix avg token/sec calculation on models page (#357)

Averages were replaced with percentiles and a histogram:

image

v167

21 Oct 03:57
c07179d

Choose a tag to compare

This release adds cmd/wol-proxy, a Wake-on-LAN proxy for llama-swap. If llama-swap lives on a high idle wattage server that suspends after an idle period, wol-proxy will automatically wake that server up and then reverse proxy the requests.

A niche use case but hopefully it will save a lot of wasted energy from idle GPUs.

Changelog

  • c07179d cmd/wol-proxy: add wol-proxy (#352)
  • 7ff5063 Update README for setup instructions clarity [skip ci]
  • 9fc0431 Clean up and Documentation (#347) [skip ci]

v166

16 Oct 02:35
6516532

Choose a tag to compare

This release includes support for TLS certificates from contributor @dwrz!

To use it:

./llama-swap --tls-cert-file /path/to/cert.pem --tls-key-file /path/to/key.pem ...

Generating a self-signed certificate:

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

Changelog

v165

11 Oct 19:19
5392783

Choose a tag to compare

Changelog

v164

07 Oct 06:00
00b738c

Choose a tag to compare

Changelog

v163

05 Oct 03:06
70930e4

Choose a tag to compare

This release includes two new features:

  • model macros (#330): macros can now be defined as part of a model's configuration. These take precedence over macros defined at the global level.
  • model metadata (#333): metadata can now be defined in a model's configuration. This is a schema-less object that supports integers, floats, bools, strings, arrays and child objects. metadata fields also support macro substitution. Metadata is only used in the v1/models endpoint under a new JSON key: meta.llamaswap.

Other smaller changes:

  • macro values can be any integer, string, bools, or float types. This enhancement makes JSON encoding of metadata with macros behave as expected. Previously macro values could only be strings.

Changelog

  • 70930e4 proxy: add support for user defined metadata in model configs (#333)
  • 1f61791 proxy/config: add model level macros (#330)
  • 216c40b proxy/config: create config package and migrate configuration (#329)

v162

25 Sep 23:50
9e3d491

Choose a tag to compare

Changelog

  • 9e3d491 proxyToUpstream: add redirect with trailing slash to upstream endpoint (#322)