-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Problem Statement
Currently, Jan defaults to a context length of 8192 tokens. When llama.cpp preallocates its KV cache based on this fixed context length, it often consumes a significant portion of the system's available memory. This can lead to situations where the operating system has insufficient memory, resulting in excessive swapping, system unresponsiveness, and ultimately, hangs. This issue severely impacts the user experience, especially on devices with limited RAM or when running larger models.
Feature Idea
The goal is to introduce smart model management that dynamically adjusts the context length based on the backend device's available memory. This will ensure that even after llama.cpp preallocates its KV cache, there's enough memory remaining for the OS, preventing system hangs and improving overall stability.
The proposed solution involves:
- Memory Detection: Accurately determine the total and available memory of the backend device.
- Dynamic Context Length Calculation: Based on the detected memory, calculate an optimal context length that fits within a user-defined memory usage threshold.
- Configurable Memory Usage Settings: Provide users with pre-defined settings to control memory allocation:
- Balanced: Utilizes approximately 60% of the available memory for the model's context.
- High: Pushes memory usage up to approximately 90% of the available memory, suitable for users prioritizing maximum context length.
- Low: Restricts memory usage to less than 50% of the available memory, ideal for systems with limited resources or users running other demanding applications.
- Model Loading Error Handling: If a model, even with the "Low" setting, is too large to fit within the calculated memory constraints, the system should display a clear "Model cannot be loaded" error message, preventing potential crashes.
This feature will enhance stability, provide a smoother user experience, and allow for more efficient utilization of system resources.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status