This is an app that let's you ask questions about any data source by leveraging embeddings, vector databases, large language models and last but not least langchains
- Upload any
file(s)or enter anypathorurlto create Knowledge Bases which can contain multiple files of any type, format and content and create Smart FAQs which are lists of curated numbered Q&As. - The data source or files are loaded and splitted into text document chunks
- The text document chunks are embedded using openai or huggingface embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a custom selection of an LLM model (
gpt-3.5-turboby default), multiple vector store as knowledge bases and a single special smart FAQ vector store - When asking questions to the app, the chain embeds the input prompt and does a similarity search in in the provided vector stores and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
- The app only runs on
py>=3.10! - To run locally or deploy somewhere, execute
cp .env.template .envand set credentials in the newly created.envfile. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.tomlwhen hosted via streamlit. - If you have credentials set like explained above, you can just hit
submitin the authentication without reentering your credentials in the app. - If you run the app consider modifying the configuration in
datachad/backend/constants.py, e.g enabling advanced options - Your data won't load? Feel free to open an Issue or PR and contribute!
- Use previous releases like V1 or V2 for original functionality and UI
If you like to contribute, feel free to grab any task
- Refactor utils, especially the loaders
- Add option to choose model and embeddings
- Enable fully local / private mode
- Add option to upload multiple files to a single dataset
- Decouple datachad modules from streamlit
- remove all local mode and other V1 stuff
- Load existing knowledge bases
- Delete existing knowledge bases
- Enable streaming responses
- Show retrieved context
- Refactor UI
- Introduce smart FAQs
- Exchange downloaded file storage with tempfile
- Add user creation and login
- Add chat history per user
- Make all I/O asynchronous
- Implement FastAPI routes and backend app
- Implement a proper frontend (react or whatever)
- containerize the app
