Paper: Understanding Is Getting the Context Right — An OS for Language Models #20663

MikeyBeez · 2026-02-10T00:27:42Z

MikeyBeez
Feb 10, 2026

Sharing a design paper that proposes treating context management as an operating system concern — directly relevant to how LlamaIndex handles retrieval and context assembly.

Core argument: Context selection, not context length, is the dominant factor in reasoning quality. The paper proposes a two-agent architecture:

A curator agent (lightweight local model) that continuously manages what enters the reasoning agent's context window
Threaded conversation history (DAG structure) instead of flat sequential logs — preserving reasoning trajectory rather than retrieving isolated chunks
Two manifests per turn: a compact topic index for scope awareness, and a curated active context payload
Provenance-aware metadata distinguishing user-stated facts from search results from model inferences
Exponential decay with current theory marking — information fades by default, working hypotheses are protected
A persistent repository that accumulates into an emergent knowledge graph across sessions

The thread-based retrieval approach is a direct contrast to embedding-based chunk retrieval: instead of finding semantically similar fragments, you retrieve the full reasoning chain within a topic. The paper argues this preserves context that chunk retrieval loses.

Paper and PDF: github.com/MikeyBeez/fuzzyOS
DOI: 10.5281/zenodo.18571717

Interested in thoughts from people building retrieval and context assembly systems.

aniruddhaadak80 · 2026-03-14T05:51:46Z

aniruddhaadak80
Mar 14, 2026

Hello @MikeyBeez,

Thanks for starting this discussion! When dealing with AI/LLM integrations, Vector DBs, or agent frameworks, quirks like this can usually be traced back to a few specific moving parts:

Environment & Dependency Versions: The AI tooling ecosystem evolves incredibly rapidly. Double-check your local environment (Node/Python versions) to ensure packages like langchain, transformers, or openai are on the latest stable releases. Breaking API changes upstream happen often.
Context Window & Embeddings: If you are seeing strange truncation, blank responses, or timeouts, make sure the payload you are sending fits well within the specific model???s maximum context window. Also verify your vector chunk sizes match the dimensionality of your chosen embedding model.
Rate Limits: If you are hitting external APIs, occasionally verify that your API key is correctly scoped and that you are not hitting concurrency or token rate limits which might silently fail or hang the process.

If you are still blocked, providing a minimal reproducible snippet or logging the raw request/response payload (scrubbed of secrets) usually helps pinpoint the exact failure layer much faster.

Hope this helps point you in the right direction. Let me know if you make any progress!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper: Understanding Is Getting the Context Right — An OS for Language Models #20663

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Paper: Understanding Is Getting the Context Right — An OS for Language Models #20663

Uh oh!

MikeyBeez Feb 10, 2026

Replies: 1 comment

Uh oh!

aniruddhaadak80 Mar 14, 2026

MikeyBeez
Feb 10, 2026

aniruddhaadak80
Mar 14, 2026