You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're developing an AI-based legal platform, using a RAG (Retrieval-Augmented Generation) architecture. Our stack includes LangChain, an OpenAI model, and FAISS for vector retrieval from a large, static database of legal documents.
The platform is working well for single-turn or short conversations. However, we're hitting a significant challenge with maintaining context in long-running user chats.
The Core Challenge: Context Window vs. Complex Legal Dialogue 🤔
Legal conversations are rarely simple. A user might start with a general question about contract law, then ask about a specific clause, then ask how a recent court precedent affects that clause. Each new question builds upon the previous context.
Our main problems are:
Context Window Limitations: As the conversation grows, the chat history quickly exceeds the token limit of the LLM.
Loss of Nuance: Simply truncating the history (e.g., keeping only the last 4 turns) is not viable. A critical detail mentioned in the first message could be essential for answering the tenth.
Inefficiency and Cost: Passing an ever-growing chat history with every API call is inefficient and dramatically increases operational costs.
RAG Pollution: The user's entire chat history can "pollute" the query sent to our legal vector database, leading to less relevant document retrievals.
Initial Approaches & Their Limitations
We've explored some standard LangChain memory types, but they fall short for our specific legal use case:
Standard Buffer Memory (ConversationBufferMemory): This is the root of the problem. It works great until it hits the token limit.
Windowed Memory (ConversationBufferWindowMemory): Better, but still risky. We might cut off the crucial, foundational part of the legal query.
Summarization Memory (ConversationSummaryMemory): This seems promising, but automatic summarization can be a double-edged sword in law. A summary like "user asked about inheritance" loses the critical context that "the user is an undeclared heir and the will is being contested." Precision is everything in law.
🚀 Seeking Community Wisdom: How Do We Solve This?
We believe this is a common challenge for anyone building sophisticated, stateful AI agents. We're opening this discussion to ask for your insights, experiences, and suggestions. How are you tackling long-term memory in your RAG applications?
QuestionAsk and answer questions about GitHub features and usageModelsDiscussions related to GitHub Models
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone,
We're developing an AI-based legal platform, using a RAG (Retrieval-Augmented Generation) architecture. Our stack includes LangChain, an OpenAI model, and FAISS for vector retrieval from a large, static database of legal documents.
The platform is working well for single-turn or short conversations. However, we're hitting a significant challenge with maintaining context in long-running user chats.
The Core Challenge: Context Window vs. Complex Legal Dialogue 🤔
Legal conversations are rarely simple. A user might start with a general question about contract law, then ask about a specific clause, then ask how a recent court precedent affects that clause. Each new question builds upon the previous context.
Our main problems are:
Context Window Limitations: As the conversation grows, the chat history quickly exceeds the token limit of the LLM.
Loss of Nuance: Simply truncating the history (e.g., keeping only the last 4 turns) is not viable. A critical detail mentioned in the first message could be essential for answering the tenth.
Inefficiency and Cost: Passing an ever-growing chat history with every API call is inefficient and dramatically increases operational costs.
RAG Pollution: The user's entire chat history can "pollute" the query sent to our legal vector database, leading to less relevant document retrievals.
Initial Approaches & Their Limitations
We've explored some standard LangChain memory types, but they fall short for our specific legal use case:
Standard Buffer Memory (ConversationBufferMemory): This is the root of the problem. It works great until it hits the token limit.
Windowed Memory (ConversationBufferWindowMemory): Better, but still risky. We might cut off the crucial, foundational part of the legal query.
Summarization Memory (ConversationSummaryMemory): This seems promising, but automatic summarization can be a double-edged sword in law. A summary like "user asked about inheritance" loses the critical context that "the user is an undeclared heir and the will is being contested." Precision is everything in law.
🚀 Seeking Community Wisdom: How Do We Solve This?
We believe this is a common challenge for anyone building sophisticated, stateful AI agents. We're opening this discussion to ask for your insights, experiences, and suggestions. How are you tackling long-term memory in your RAG applications?
Beta Was this translation helpful? Give feedback.
All reactions