Skip to content

[Bug] Unsloth Studio Context Length Resets on Model Load/Startup #6854

Description

@TheAlex25

Bug Report: Unsloth Studio Context Length Resets on Model Load/Startup

Status: New
Severity: Medium (Affects usability for long-context tasks)
Component: UI / Configuration Management


Summary

The "Run settings" in Unsloth Studio do not persist across sessions or model reloads. Specifically, the Context Length resets to the default value of 4096 tokens every time a model is loaded (manually or automatically), even if it was previously configured and saved at a higher value (e.g., 32768). Interestingly, "Sampling settings" persist correctly across sessions, suggesting the issue is isolated to the Run settings configuration logic.

Environment

  • OS: Ubuntu 26.04
  • Unsloth Studio Version: v0.1.471-beta
  • Package Version: 2026.6.9
  • llama.cpp Version: b9860-mix-c19e218

Steps to Reproduce

  1. Launch Unsloth Studio.
  2. Load a model.
  3. Navigate to Run settings.
  4. Change the Context Length from 4096 to 32768.
  5. Save the configuration or restart the application/reload the model.
  6. Observe the Context Length setting upon reload.

Expected Behavior

The Context Length should remain at the user-defined value (32768) when a model is loaded manually or via auto-load, matching the behavior of the "Sampling settings."

Actual Behavior

The Context Length reverts to 4096 tokens regardless of previous manual adjustments.

Note: The UI correctly identifies that higher context lengths exceed VRAM and suggests using system RAM for KV cache, but it fails to retain the setting itself.


Feature Requests & Suggestions

  1. Persistence for Run Settings: Ensure that all "Run settings" (Context Length, KV Cache Dtype, etc.) are saved and restored alongside Sampling settings when a model is initialized.
  2. Per-Model Configuration Profiles: Implement the ability to save specific configurations on a per-model basis. Currently, it appears settings are either global or failing to bind to specific model instances upon reload. This would allow users to have high context for large models and low context/high speed for smaller ones without manual readjustment every time.

Attachments

  • (Refer to provided screenshots showing the discrepancy between the 4096 default state and the intended 32768 state with VRAM warning.)
Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions