diff --git a/samples/README.md b/samples/README.md new file mode 100644 index 0000000..244527a --- /dev/null +++ b/samples/README.md @@ -0,0 +1,10 @@ +# Cloud SQL for LlamaIndex Resources + +This directory provides code samples to help you get started with LlamaIndex and Cloud SQL. + +## Guides & Samples + +| Sample | Description | +| ------ | ----------- | +| [Get Started: `PostgresVectorStore`](./llama_index_vector_store.ipynb) | This notebook goes over how to use Cloud SQL to store vector embeddings with the `PostgresVectorStore` class. | +| [Get Started: `PostgresDocumentStore` & `PostgresIndexStore`](./llama_index_doc_store.ipynb) | This notebook goes over how to use Cloud SQL to store documents and indexes. | diff --git a/samples/llama_index_doc_store.ipynb b/samples/llama_index_doc_store.ipynb new file mode 100644 index 0000000..b7b02e5 --- /dev/null +++ b/samples/llama_index_doc_store.ipynb @@ -0,0 +1,557 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "lykrZV4kIH27" + }, + "source": [ + "# Google Cloud SQL for PostgreSQL - `PostgresDocumentStore` & `PostgresIndexStore`\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for PostgreSQL` to store documents and indexes with the `PostgresDocumentStore` and `PostgresIndexStore` classes.\n", + "\n", + "Learn more about the package on [GitHub](https://github.com/googleapis/llama-index-cloud-sql-pg-python/).\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/llama-index-cloud-sql-pg-python/blob/main/samples/llama_index_doc_store.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FMMRtzxKIH28" + }, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n", + " * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/postgres/connect-instance-auth-proxy#create-instance)\n", + " * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n", + " * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `llama-index-cloud-sql-pg`, and the library for the embedding service, `llama-index-embeddings-vertex`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0ZITIDE160OD" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet llama-index-cloud-sql-pg llama-index-llms-vertex llama-index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-primary\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"document_store\" # @param {type: \"string\"}\n", + "USER = \"postgres\" # @param {type: \"string\"}\n", + "PASSWORD = \"my-password\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### PostgresEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a vector store is a `PostgresEngine` object. The `PostgresEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `PostgresEngine` using `PostgresEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgresEngine.from_instance()`:\n", + "\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4cDLSGF8IH2-" + }, + "source": [ + "**Note:** This tutorial demonstrates the async interface. All async methods have corresponding sync methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7tminGFMIH2-" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import PostgresEngine\n", + "\n", + "engine = await PostgresEngine.afrom_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + " user=USER,\n", + " password=PASSWORD,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D9Xs2qhm6X56" + }, + "source": [ + "### Initialize a table\n", + "The `PostgresDocumentStore` class requires a database table. The `PostgresEngine` engine has a helper method `init_doc_store_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "avlyHEMn6gzU" + }, + "outputs": [], + "source": [ + "await engine.ainit_doc_store_table(\n", + " table_name=TABLE_NAME,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sq-9hD0hIH2-" + }, + "source": [ + "#### Optional Tip: 💡\n", + "You can also specify a schema name by passing `schema_name` wherever you pass `table_name`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jkeaWGQZJSgj" + }, + "outputs": [], + "source": [ + "SCHEMA_NAME = \"my_schema\"\n", + "\n", + "await engine.ainit_doc_store_table(\n", + " table_name=TABLE_NAME,\n", + " schema_name=SCHEMA_NAME,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Initialize a default PostgresDocumentStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import PostgresDocumentStore\n", + "\n", + "doc_store = await PostgresDocumentStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "75d_9E0DXfGD" + }, + "source": [ + "### Download data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Hkrau3GvXlvF" + }, + "outputs": [], + "source": [ + "!mkdir -p 'data/paul_graham/'\n", + "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c9YFj3NkXwxX" + }, + "source": [ + "### Load documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lgeYUT6DXqL-" + }, + "outputs": [], + "source": [ + "from llama_index.core import SimpleDirectoryReader\n", + "\n", + "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n", + "print(\"Document ID:\", documents[0].doc_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0gDjKVx7IH2-" + }, + "source": [ + "### Parse into nodes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mYotcg_CIH2-" + }, + "outputs": [], + "source": [ + "from llama_index.core.node_parser import SentenceSplitter\n", + "\n", + "nodes = SentenceSplitter().get_nodes_from_documents(documents)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Lq1-OCC1J5vf" + }, + "source": [ + "### Set up an IndexStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-UUY5TrUJ5AX" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import PostgresIndexStore\n", + "\n", + "\n", + "INDEX_TABLE_NAME = \"index_store\"\n", + "await engine.ainit_index_store_table(\n", + " table_name=INDEX_TABLE_NAME,\n", + ")\n", + "\n", + "index_store = await PostgresIndexStore.create(\n", + " engine=engine,\n", + " table_name=INDEX_TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dLcoAcuAJwhA" + }, + "source": [ + "### Add to Docstore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tZScknluJysE" + }, + "outputs": [], + "source": [ + "from llama_index.core import StorageContext\n", + "\n", + "storage_context = StorageContext.from_defaults(\n", + " docstore=doc_store, index_store=index_store\n", + ")\n", + "\n", + "storage_context.docstore.add_documents(nodes)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mw6FBcMlVyCS" + }, + "source": [ + "## Use with Indexes\n", + "\n", + "The Document Store can be used with multiple indexes. Each index uses the same underlying nodes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gC19fnskVuZN" + }, + "outputs": [], + "source": [ + "from llama_index.core import Settings, SimpleKeywordTableIndex, SummaryIndex\n", + "from llama_index.llms.vertex import Vertex\n", + "\n", + "Settings.llm = Vertex(model=\"gemini-1.5-flash\", project=PROJECT_ID)\n", + "summary_index = SummaryIndex(nodes, storage_context=storage_context)\n", + "keyword_table_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TVsGwZwfIH2_" + }, + "source": [ + "### Query the index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZYYTHeXXIH2_" + }, + "outputs": [], + "source": [ + "query_engine = summary_index.as_query_engine()\n", + "response = query_engine.query(\"What did the author do?\")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dwCu9A6yLWeZ" + }, + "source": [ + "## Load existing indexes\n", + "\n", + "The Document Store can be used with multiple indexes. Each index uses the same underlying nodes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QL7fubw6LnQc" + }, + "outputs": [], + "source": [ + "# note down index IDs\n", + "list_id = summary_index.index_id\n", + "keyword_id = keyword_table_index.index_id" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MWgAhUVeLdqI" + }, + "outputs": [], + "source": [ + "from llama_index.core import load_index_from_storage\n", + "\n", + "# re-create storage context\n", + "storage_context = StorageContext.from_defaults(\n", + " docstore=doc_store, index_store=index_store\n", + ")\n", + "\n", + "# load indices\n", + "summary_index = load_index_from_storage(\n", + " storage_context=storage_context, index_id=list_id\n", + ")\n", + "keyword_table_index = load_index_from_storage(\n", + " storage_context=storage_context, index_id=keyword_id\n", + ")" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/samples/llama_index_vector_store.ipynb b/samples/llama_index_vector_store.ipynb new file mode 100644 index 0000000..a8efe40 --- /dev/null +++ b/samples/llama_index_vector_store.ipynb @@ -0,0 +1,661 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "lykrZV4kIH27" + }, + "source": [ + "# Google Cloud SQL for PostgreSQL - `PostgresVectorStore`\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for PostgreSQL` to store vector embeddings with the `PostgresVectorStore` class.\n", + "\n", + "Learn more about the package on [GitHub](https://github.com/googleapis/llama-index-cloud-sql-pg-python/).\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/llama-index-cloud-sql-pg-python/blob/main/docs/vector_store.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FMMRtzxKIH28" + }, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n", + " * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/postgres/connect-instance-auth-proxy#create-instance)\n", + " * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n", + " * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `llama-index-cloud-sql-pg`, and the library for the embedding service, `llama-index-embeddings-vertex`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0ZITIDE160OD" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet llama-index-cloud-sql-pg llama-index-embeddings-vertex llama-index-llms-vertex llama-index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-primary\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vector_store\" # @param {type: \"string\"}\n", + "USER = \"postgres\" # @param {type: \"string\"}\n", + "PASSWORD = \"my-password\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### PostgresEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a vector store is a `PostgresEngine` object. The `PostgresEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `PostgresEngine` using `PostgresEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgresEngine.from_instance()`:\n", + "\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4cDLSGF8IH2-" + }, + "source": [ + "**Note:** This tutorial demonstrates the async interface. All async methods have corresponding sync methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7tminGFMIH2-" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import PostgresEngine\n", + "\n", + "engine = await PostgresEngine.afrom_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + " user=USER,\n", + " password=PASSWORD,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D9Xs2qhm6X56" + }, + "source": [ + "### Initialize a table\n", + "The `PostgresVectorStore` class requires a database table. The `PostgresEngine` engine has a helper method `init_vector_store_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "avlyHEMn6gzU" + }, + "outputs": [], + "source": [ + "await engine.ainit_vector_store_table(\n", + " table_name=TABLE_NAME,\n", + " vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sq-9hD0hIH2-" + }, + "source": [ + "#### Optional Tip: 💡\n", + "You can also specify a schema name by passing `schema_name` wherever you pass `table_name`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jkeaWGQZJSgj" + }, + "outputs": [], + "source": [ + "SCHEMA_NAME = \"my_schema\"\n", + "\n", + "await engine.ainit_vector_store_table(\n", + " table_name=TABLE_NAME,\n", + " schema_name=SCHEMA_NAME,\n", + " vector_size=768,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jqz0lSePIH2-" + }, + "source": [ + "### Create an embedding class instance\n", + "\n", + "You can use any [Llama Index embeddings model](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/).\n", + "You may need to enable Vertex AI API to use `VertexTextEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Vb2RJocV9_LQ" + }, + "outputs": [], + "source": [ + "from llama_index.core import Settings\n", + "from llama_index.embeddings.vertex import VertexTextEmbedding\n", + "from llama_index.llms.vertex import Vertex\n", + "import google.auth\n", + "\n", + "credentials, project_id = google.auth.default()\n", + "Settings.embed_model = VertexTextEmbedding(\n", + " model_name=\"textembedding-gecko@003\", project=PROJECT_ID, credentials=credentials\n", + ")\n", + "\n", + "Settings.llm = Vertex(model=\"gemini-1.5-flash-002\", project=PROJECT_ID)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Initialize a default PostgresVectorStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import PostgresVectorStore\n", + "\n", + "vector_store = await PostgresVectorStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "75d_9E0DXfGD" + }, + "source": [ + "### Download data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Hkrau3GvXlvF" + }, + "outputs": [], + "source": [ + "!mkdir -p 'data/paul_graham/'\n", + "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c9YFj3NkXwxX" + }, + "source": [ + "### Load documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lgeYUT6DXqL-" + }, + "outputs": [], + "source": [ + "from llama_index.core import SimpleDirectoryReader\n", + "\n", + "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n", + "print(\"Document ID:\", documents[0].doc_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mw6FBcMlVyCS" + }, + "source": [ + "## Use with VectorStoreIndex\n", + "\n", + "Create an index from the vector store by using [`VectorStoreIndex`](https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "11ZSXU56IH2-" + }, + "source": [ + "#### Initialize Vector Store with documents\n", + "\n", + "The simplest way to use a Vector Store is to load a set of documents and build an index from them using `from_documents`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Hl0UyxN6IH2-" + }, + "outputs": [], + "source": [ + "from llama_index.core import StorageContext, VectorStoreIndex\n", + "\n", + "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n", + "index = VectorStoreIndex.from_documents(\n", + " documents, storage_context=storage_context, show_progress=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TVsGwZwfIH2_" + }, + "source": [ + "### Query the index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZYYTHeXXIH2_" + }, + "outputs": [], + "source": [ + "query_engine = index.as_query_engine()\n", + "response = query_engine.query(\"What did the author do?\")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6x7gAIdjIH3C" + }, + "source": [ + "## Create a custom Vector Store\n", + "A Vector Store can take advantage of relational data to filter similarity searches.\n", + "\n", + "Create a new table with custom metadata columns.\n", + "You can also re-use an existing table which already has custom columns for a Document's id, content, embedding, and/or metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_5coN1ViIH3C" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg import Column\n", + "\n", + "# Set table name\n", + "TABLE_NAME = \"vectorstore_custom\"\n", + "# SCHEMA_NAME = \"my_schema\"\n", + "\n", + "await engine.ainit_vector_store_table(\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME,\n", + " vector_size=768, # VertexAI model: textembedding-gecko@003\n", + " metadata_columns=[Column(\"len\", \"INTEGER\")],\n", + ")\n", + "\n", + "\n", + "# Initialize PostgresVectorStore\n", + "custom_store = await PostgresVectorStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME,\n", + " metadata_columns=[\"len\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iNgjeeW-8d_D" + }, + "source": [ + "### Add documents with metadata\n", + "\n", + "[Document `metadata`](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents/) can provide the LLM and retrieval process with more information. Learn more about different approaches for [extracting and adding metadata](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_metadata_extractor/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ypPDoT3M8dlw" + }, + "outputs": [], + "source": [ + "from llama_index.core import Document\n", + "\n", + "fruits = [\"apple\", \"pear\", \"orange\", \"strawberry\", \"banana\", \"kiwi\"]\n", + "documents = [Document(text=fruit, metadata={\"len\": len(fruit)}) for fruit in fruits]\n", + "\n", + "storage_context = StorageContext.from_defaults(vector_store=custom_store)\n", + "custom_doc_index = VectorStoreIndex.from_documents(\n", + " documents, storage_context=storage_context, show_progress=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nipC1GwWIH3D" + }, + "source": [ + "### Search for documents with metadata filter\n", + "\n", + "You can apply pre-filtering to the search results by specifying a `filters` argument" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PGjJC5YAIH3D" + }, + "outputs": [], + "source": [ + "from llama_index.core.vector_stores.types import (\n", + " MetadataFilter,\n", + " MetadataFilters,\n", + " FilterOperator,\n", + ")\n", + "\n", + "filters = MetadataFilters(\n", + " filters=[\n", + " MetadataFilter(key=\"len\", operator=FilterOperator.GT, value=\"5\"),\n", + " ],\n", + ")\n", + "\n", + "query_engine = custom_doc_index.as_query_engine(filters=filters)\n", + "res = query_engine.query(\"List some fruits\")\n", + "print(str(res.source_nodes[0].text))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UTNCR-KJIH3C" + }, + "source": [ + "## Add a Index\n", + "\n", + "Speed up vector search queries by applying a vector index. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6649S-8gIH3C" + }, + "outputs": [], + "source": [ + "from llama_index_cloud_sql_pg.indexes import IVFFlatIndex\n", + "\n", + "index = IVFFlatIndex()\n", + "await vector_store.aapply_vector_index(index)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "73ZH_UygIH3C" + }, + "source": [ + "### Re-index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9VHPp6HQIH3C" + }, + "outputs": [], + "source": [ + "await vector_store.areindex() # Re-index using default index name" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GWhhJclaIH3C" + }, + "source": [ + "### Remove an index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "y595k5o5IH3C" + }, + "outputs": [], + "source": [ + "await vector_store.adrop_vector_index() # Delete index using default name" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}