This project is a web application designed to generate high-quality articles based on user-provided prompts. It leverages LangChain, OpenAI, and FAISS to create an efficient Retrieval-Augmented Generation (RAG) pipeline. This ensures that the AI generates content grounded in a custom dataset provided by the user.
- Custom Data Learning: Uses a documents.txtfile to provide the AI with domain-specific knowledge.
- RAG Implementation: Ensures accurate and contextually relevant articles by combining document retrieval with OpenAI’s powerful LLMs.
- User-Friendly Interface: Provides an easy-to-use web interface built with Flask.
- Efficient Storage and Retrieval: Utilizes FAISS for fast and scalable vector-based information retrieval.
- Customizable Workflow: Allows users to fine-tune the AI’s behavior, data, and underlying components.
- Python 3.10
- pip(Python package installer)
- OpenAI API key (Get it here)
- LangChain
Clone this project to your local machine:
git clone https://github.com/your-repository-link.git
cd AI-Article-GeneratorRun the following command to install all required Python packages:
pip install -r requirements.txtCreate a documents.txt file inside a data directory in your project root:
mkdir data
nano data/documents.txt
Populate this file with the content you want the AI to learn from. This could be blog posts, articles, or any other text data.
To enable RAG, create a FAISS index from your documents.txt file by running:
python prepare_data.pyThis will generate a FAISS index named faiss_index in your project directory. The FAISS index is used to store and retrieve vectorized representations of your data.
- Create a .envfile in the project root:touch .env nano .env 
- Add your OpenAI API key to the .envfile:OPENAI_API_KEY=YOUR_OPENAI_API_KEY
- Ensure the python-dotenvpackage is installed:pip install python-dotenv 
Start the Flask server by running:
python main.pyThis will launch the server on http://0.0.0.0:81.
- Open your browser and navigate to http://0.0.0.0:81.
- Enter the desired title or topic for your article in the provided textarea.
- Click the "Submit" button.
- The generated article will appear in the "Your content will show up here" section.
- If the topic is not found in your dataset, an error will be displayed.
This application implements Retrieval-Augmented Generation (RAG) to generate accurate, relevant articles:
- 
Data Ingestion and Indexing: - The prepare_data.pyscript processes thedocuments.txtfile and vectorizes its contents using OpenAI embeddings.
- These embeddings are stored in a FAISS index for efficient retrieval.
 
- The 
- 
Prompt Handling: - When a user submits a prompt (title), the application queries the FAISS index to retrieve the most relevant chunks of text.
 
- 
Generation with OpenAI: - The retrieved data is used as context to guide the OpenAI LLM in generating the article.
- This ensures the output is both accurate and grounded in the provided dataset.
 
- 
Web Interface: - The Flask-based interface simplifies interaction with the AI, allowing users to generate articles seamlessly.
 
The project integrates Retrieval-Augmented Generation (RAG) using the following components:
- Purpose: Efficiently stores and retrieves vectorized text data.
- Creation: The prepare_data.pyscript converts thedocuments.txtcontent into embeddings and stores them in FAISS.
- Usage: During runtime, the FAISS index retrieves the most relevant text chunks based on the user’s query.
- Purpose: Orchestrates the interaction between the FAISS index, OpenAI LLM, and user inputs.
- Retrieval Chain: Combines the FAISS retriever with OpenAI’s text generation capabilities.
- Purpose: Generates the article based on the retrieved context and user prompt.
- Customization: Supports fine-tuning the prompt to improve the quality of the generated content.
Modify the documents.txt file to train the AI on your desired dataset.
Change the OpenAI model (e.g., text-davinci-003, gpt-4) by adjusting the llm configuration in main.py.
Replace FAISS with other vector stores like Chroma or Pinecone by updating the relevant code in prepare_data.py and main.py.
Enhance the prompts in main.py to guide the AI toward better article generation.
- Sophisticated Prompts: Implement advanced prompt engineering techniques.
- Fine-Tuning: Train the LLM with specific datasets for improved accuracy.
- Add features for editing, saving, and sharing generated articles.
- Include a progress bar for longer article generation tasks.
- Integrate tools to filter and moderate generated content, ensuring ethical and compliant outputs.
This project provides a robust foundation for building an AI-powered article generator. By leveraging RAG, LangChain, and OpenAI, it ensures that the generated content is both relevant and high-quality. With additional customization and improvements, it can be adapted to various use cases, from blogging to enterprise content generation.