⚠️ REPOSITORY MOVED - NO LONGER MAINTAINED HERE

This repository has been transferred to new ownership and is no longer actively maintained in this location.

🔄 Migration Notice

This repository and all associated open-source packages have been moved to a new GitHub organization.

New Location: https://github.com/ponderedw

📍 What This Means

✅ Active development continues at the new location
✅ Latest updates and releases are published there
✅ Issues and pull requests should be submitted to the new repository
⚠️ This repository will no longer receive updates

🔗 Find the Updated Repository

Please visit https://github.com/ponderedw to:

Access the latest version of this package
Report issues or contribute
View updated documentation
Get support from the maintainers

Thank you for your understanding during this transition.

S3 Vector Bucket with LangChain Demo

A demonstration of using AWS S3 Vector Buckets as a vector store for RAG (Retrieval-Augmented Generation) systems with LangChain integration.

Overview

This project shows how to:

Store document embeddings in AWS S3 Vector Buckets
Search vectors using semantic similarity
Integrate with LangChain for RAG workflows
Build a Streamlit interface for document management and chat

Prerequisites

AWS account with appropriate permissions
Python 3.8+
Docker and Docker Compose
just command runner (optional but recommended)

AWS Setup

1. Create S3 Vector Bucket

Navigate to AWS S3 console
Select "Vector Buckets" from the left menu
Create a new vector bucket
Note the bucket name for configuration

2. Create Vector Index

In your vector bucket, create a new index
Set dimensions to 1024 (for Amazon Titan embeddings)
Choose Cosine Similarity as the similarity measure
Note the index name for configuration

Local Setup

1. Clone and Setup

git clone https://github.com/ponderedw/s3-vector-bucket
cd s3-vector-bucket
cp template.env .env

2. Configure Environment

Edit .env file with your settings:

DEPLOY_ENV='local'

# Postgres
POSTGRES_USER='postgres'
POSTGRES_PASSWORD='postgres'
POSTGRES_DB='postgres'

# LLM Model (choose one)
LLM_MODEL_ID='bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0'  # For Bedrock
# LLM_MODEL_ID='anthropic:claude-3-5-sonnet-20241022'            # For Anthropic API
# LLM_MODEL_ID='openai:gpt-4'                                    # For OpenAI API

# Secrets (replace in production)
SECRET_KEY='ThisIsATempSecretForLocalEnvs.ReplaceInProd.'
FAST_API_ACCESS_SECRET_TOKEN='ThisIsATempAccessTokenForLocalEnvs.ReplaceInProd'

# AWS Credentials
AWS_ACCESS_KEY_ID='your-access-key'
AWS_SECRET_ACCESS_KEY='your-secret-key'
AWS_DEFAULT_REGION='us-east-1'

# API Keys (if not using Bedrock)
# ANTHROPIC_API_KEY='your-anthropic-key'
# OPENAI_API_KEY='your-openai-key'

# Vector Store Configuration
bucket='your-vector-bucket-name'
index='your-vector-index-name'

# Optional: Custom embedding model
# embedding_model='amazon.titan-embed-text-v2:0'

3. Download Sample Data

# Download NYC Planning dataset
wget https://ponder-public-assets.s3.us-east-1.amazonaws.com/newsletter-assets/NYC+Planning.zip
unzip "NYC+Planning.zip" -d data/

4. Run the Application

# Using just (recommended)
just all

Usage

Open your browser and navigate to http://localhost:8501

Available Tabs:

Main - Chat interface to query your vector store
Load File - Upload PDF documents to populate your vector index
See Vectors - View and manage stored embeddings

Getting Started:

Go to the Load File tab
Upload PDF files from the data/ folder
Wait for processing to complete
Check the See Vectors tab to see your uploaded data
Switch to the Main tab and try queries like:
- "What are the changing conditions of the Special Hudson Yards District?"
- "Tell me about NYC planning regulations"

Key Features

Vector Operations

Search: Semantic similarity search using embeddings
Upload: Process and store document embeddings
List: View all stored vectors with metadata
Delete: Remove individual vectors or entire documents

LangChain Integration

retriever = VectorDB()
tool = Tool(
    name="Data_Retriever",
    func=retriever.search,
    description="Searches S3 vector bucket for similar embeddings and returns matching results",
)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
data		data
streamlit		streamlit
.gitignore		.gitignore
Dockerfile.server		Dockerfile.server
LICENSE		LICENSE
README.md		README.md
common-postgres.env		common-postgres.env
docker-compose.chat.yml		docker-compose.chat.yml
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.ui.yml		docker-compose.ui.yml
justfile		justfile
requirements-dev.txt		requirements-dev.txt
template.env		template.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚠️ REPOSITORY MOVED - NO LONGER MAINTAINED HERE

🔄 Migration Notice

📍 What This Means

🔗 Find the Updated Repository

S3 Vector Bucket with LangChain Demo

Overview

Prerequisites

AWS Setup

1. Create S3 Vector Bucket

2. Create Vector Index

Local Setup

1. Clone and Setup

2. Configure Environment

3. Download Sample Data

4. Run the Application

Usage

Available Tabs:

Getting Started:

Key Features

Vector Operations

LangChain Integration

Configuration Options

Embedding Models

LLM Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚠️ REPOSITORY MOVED - NO LONGER MAINTAINED HERE

🔄 Migration Notice

📍 What This Means

🔗 Find the Updated Repository

S3 Vector Bucket with LangChain Demo

Overview

Prerequisites

AWS Setup

1. Create S3 Vector Bucket

2. Create Vector Index

Local Setup

1. Clone and Setup

2. Configure Environment

3. Download Sample Data

4. Run the Application

Usage

Available Tabs:

Getting Started:

Key Features

Vector Operations

LangChain Integration

Configuration Options

Embedding Models

LLM Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages