Skip to content

Separate tables for collections and embeddings in PGVectorStore #266

@ezequiel-tcmrio

Description

@ezequiel-tcmrio

Currently, PGVectorStore creates a single table to store embeddings and their metadata.
In some scenarios, it would be useful to have a more explicit relational structure, where:

  • A collections table defines the collections (name, description, id, global metadata);
  • An embeddings table stores the vectors and references the collection via a foreign key (collection_id).

This would make it easier to manage and organize multiple embedding collections in the same PostgreSQL database, especially in multi-user or multi-application contexts.

Question

Is there currently any native way in langchain_postgres to configure PGVectorStore to:

  • Automatically create a table for collections (collections) and
  • Associate each embedding (embeddings) to a collection via collection_id?

If not, is there any recommended workaround to achieve this separation?

Example desired schema

CREATE TABLE collections (
    id SERIAL PRIMARY KEY,
    name TEXT UNIQUE NOT NULL,
    description TEXT
);

CREATE TABLE embeddings (
    id SERIAL PRIMARY KEY,
    collection_id INT REFERENCES collections(id),
    embedding VECTOR(1536),
    metadata JSONB,
    document TEXT
);

Context

I’m building a document ingestion system where each client has its own collection.
I want to avoid creating multiple embeddings tables — instead, I’d like to keep a single embeddings table and a separate collections table to simplify filtering and access control.

Environment

langchain_postgres version: 0.0.16
PostgreSQL version: 15
pgvector extension: installed ✅
Python: 3.11

Optional suggestion

A possible implementation could allow something like:

from langchain_postgres import PGVector

vectorstore = PGVector(
    connection_string="postgresql://user:pass@localhost/db",
    collection_table="collections",
    embedding_table="embeddings",
    create_schema=True,
)

This way, LangChain would create (or use) the tables as needed and maintain the relationship via collection_id.

If this feature exists, I would appreciate your assistance in implementing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions