Skip to content

epic: Introduce smart model management #6000

@qnixsynapse

Description

@qnixsynapse

Problem Statement

Currently, Jan defaults to a context length of 8192 tokens. When llama.cpp preallocates its KV cache based on this fixed context length, it often consumes a significant portion of the system's available memory. This can lead to situations where the operating system has insufficient memory, resulting in excessive swapping, system unresponsiveness, and ultimately, hangs. This issue severely impacts the user experience, especially on devices with limited RAM or when running larger models.


Feature Idea

The goal is to introduce smart model management that dynamically adjusts the context length based on the backend device's available memory. This will ensure that even after llama.cpp preallocates its KV cache, there's enough memory remaining for the OS, preventing system hangs and improving overall stability.

The proposed solution involves:

  1. Memory Detection: Accurately determine the total and available memory of the backend device.
  2. Dynamic Context Length Calculation: Based on the detected memory, calculate an optimal context length that fits within a user-defined memory usage threshold.
  3. Configurable Memory Usage Settings: Provide users with pre-defined settings to control memory allocation:
    • Balanced: Utilizes approximately 60% of the available memory for the model's context.
    • High: Pushes memory usage up to approximately 90% of the available memory, suitable for users prioritizing maximum context length.
    • Low: Restricts memory usage to less than 50% of the available memory, ideal for systems with limited resources or users running other demanding applications.
  4. Model Loading Error Handling: If a model, even with the "Low" setting, is too large to fit within the calculated memory constraints, the system should display a clear "Model cannot be loaded" error message, preventing potential crashes.

This feature will enhance stability, provide a smoother user experience, and allow for more efficient utilization of system resources.

Sub-issues

Metadata

Metadata

Assignees

Labels

needs: eng specsNeeds engineering specs & discussionos: linuxLinux issuesos: macMac issuesos: windowsWindows issuesplatform: desktopDekstop related issuerustPull requests that update rust code

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions