- 
                Notifications
    You must be signed in to change notification settings 
- Fork 57
Description
Currently, the LCEL retriever in dialog-lib forces the document content to join question and content together:
However, the user already defines which fields should be embedded in load_csv.py`, so this retriever should keep this choice with a simple return  like
        return [
            Document(
                page_content=content.content,
                metadata={
                    "title": content.question,
                    "category": content.category,
                    "subcategory": content.subcategory,
                    "dataset": content.dataset,
                    "link": content.link,
                },
            )
            for content in relevant_contents
        ]
Moreover, since the default embedding way of langchain's CSVLoader is to already embedd the field name prefixed to the field value, e.g. category: cat1\nsubcategory: subcat1\ncontent: content1 (see this test), it already achieves the same idea that the current implementation does, but in generic way.
That proposition works normally with default project chains, while giving flexibility to users that would implement their own prompt design. For example, the project default RAG Chain has this format_docs:
dialog/src/dialog/llm/agents/lcel.py
Lines 60 to 61 in fbb13af
| def format_docs(docs): | |
| return "\n\n".join([d.page_content for d in docs]) | 
and users can customize this as they wish to achieve their ideas. Later, when we implement metadata saving to the vectorstore, we could even return other metadata dynamically as well.