oracle · gotsysdba · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/docs/content/advanced/api_examples/oci_embed.md b/docs/content/advanced/api_examples/oci_embed.md
@@ -10,14 +10,120 @@ Licensed under the Universal Permissive License v1.0 as shown at http://oss.orac
 
 ## Overview
 
-Creating a vector store from documents stored in OCI Object Storage is a two-step API workflow:
+There are two API workflows for creating a vector store from documents in OCI Object Storage:
 
-1. **Download** objects from an OCI bucket to the server's temporary staging area.
-2. **Embed** the downloaded files into a new vector store.
+1. **Single-call** — `POST /v1/embed/oci/store` downloads and embeds in one request. Recommended when the only source is an OCI bucket.
+2. **Two-step** — `POST /v1/oci/objects/download` followed by `POST /v1/embed/`. Use this when you need to combine OCI objects with other sources (local uploads, web URLs, SQL query results) before embedding.
 
-This separation is intentional — you can accumulate files from multiple downloads (or mix in files from other sources like local uploads) before triggering the embed step.
+## Single-call Workflow
 
-## Step 1: Download Objects from OCI Object Storage
+Download and embed in one request.
+
+**Endpoint:** `POST /v1/embed/oci/store`
+
+| Parameter | Location | Description |
+|---|---|---|
+| `rate_limit` | Query | Embedding API rate limit in requests per minute (default: `0` for unlimited) |
+| `client` | Header | Client identifier for scoping temp storage (default: `server`) |
+| Request body | Body | `OciEmbedRequest` JSON object (see below) |
+
+### OciEmbedRequest Fields
+
+| Field | Type | Description |
+|---|---|---|
+| `bucket_name` | string | Name of the OCI Object Storage bucket |
+| `auth_profile` | string | OCI profile name (case-insensitive). Default: `DEFAULT` |
+| `objects` | array of strings | Object keys to embed. Omit or pass an empty list to embed every supported object in the bucket |
+| `alias` | string | Identifiable alias for the vector store |
+| `description` | string | Human-readable description of the table contents |
+| `embedding_model` | object | `{"provider": "...", "id": "..."}` — the embedding model to use |
+| `chunk_size` | integer | Maximum chunk size in characters (0 for default) |
+| `chunk_overlap` | integer | Overlap between chunks in characters (0 for default) |
+| `distance_strategy` | string | One of: `COSINE`, `EUCLIDEAN_DISTANCE`, `DOT_PRODUCT` |
+| `index_type` | string | Vector index type: `HNSW`, `IVF`, or `HYB` |
+| `parsing_mode` | string | Document parsing mode: `fast` or `deep` |
+
+**Response:** `202 Accepted` with an `EmbedJobAccepted` body — poll `GET /v1/embed/jobs/{job_id}` for the terminal `EmbedProcessingResult`.
+
+| Field | Type | Description |
+|---|---|---|
+| `job_id` | string | Identifier of the scheduled embed job |
+| `status` | string | Initial status (`queued` or `running`) |
+| `location` | string | Path to the job-status endpoint |
+
+### Example — embed specific objects
+
+```bash
+curl -X POST "http://localhost:8000/v1/embed/oci/store?rate_limit=60" \
+  -H "x-api-key: YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -H "client: my-session" \
+  -d '{
+    "bucket_name": "rag-source-docs",
+    "auth_profile": "DEFAULT",
+    "objects": ["product-catalog.pdf", "release-notes/2026-q2.md"],
+    "alias": "product-docs",
+    "description": "Product documentation embedded for RAG",
+    "embedding_model": {
+      "provider": "oci",
+      "id": "cohere.embed-english-v3.0"
+    },
+    "chunk_size": 1000,
+    "chunk_overlap": 100,
+    "distance_strategy": "COSINE",
+    "index_type": "HNSW",
+    "parsing_mode": "fast"
+  }'
+```
+
+### Example — embed every supported object in the bucket
+
+Omit `objects` (or pass `[]`) to embed every object whose extension is supported (`.pdf`, `.html`, `.md`, `.txt`, `.csv`, `.docx`, `.pptx`, `.xlsx`, `.png`, `.jpg`, `.jpeg`):
+
+```bash
+curl -X POST "http://localhost:8000/v1/embed/oci/store" \
+  -H "x-api-key: YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -H "client: my-session" \
+  -d '{
+    "bucket_name": "rag-source-docs",
+    "auth_profile": "DEFAULT",
+    "alias": "all-docs",
+    "embedding_model": {
+      "provider": "oci",
+      "id": "cohere.embed-english-v3.0"
+    },
+    "chunk_size": 1000,
+    "chunk_overlap": 100,
+    "distance_strategy": "COSINE",
+    "index_type": "HNSW"
+  }'
+```
+
+### Polling for completion
+
+The single-call endpoint is asynchronous — the 202 response carries the `job_id`. Poll the job-status endpoint until it reaches a terminal state:
+
+```bash
+curl "http://localhost:8000/v1/embed/jobs/$JOB_ID" \
+  -H "x-api-key: YOUR_API_KEY" \
+  -H "client: my-session"
+```
+
+A successful job's `result` field carries the `EmbedProcessingResult`:
+
+| Field | Type | Description |
+|---|---|---|
+| `message` | string | Status message |
+| `total_chunks` | integer | Number of chunks created |
+| `processed_files` | array | List of successfully processed files |
+| `skipped_files` | array | List of files that were skipped |
+
+## Two-step Workflow
+
+Use this flow when you need to combine OCI objects with other sources (local uploads, web URLs, SQL query results) before embedding. Files from each source endpoint accumulate in the same per-client staging area; the embed call consumes everything that has been staged.
+
+### Step 1: Download Objects from OCI Object Storage
 
 Download one or more objects from an OCI Object Storage bucket to the server's staging directory.
 
@@ -32,7 +138,7 @@ Download one or more objects from an OCI Object Storage bucket to the server's s
 
 **Response:** JSON array of downloaded filenames.
 
-### Example
+#### Example
 
 ```bash
 curl -X POST "http://localhost:8000/v1/oci/objects/download/my-documents/DEFAULT" \
@@ -44,7 +150,7 @@ curl -X POST "http://localhost:8000/v1/oci/objects/download/my-documents/DEFAULT
 
 You can call this endpoint multiple times to accumulate files from the same or different buckets before proceeding to Step 2.
 
-## Step 2: Create and Populate the Vector Store
+### Step 2: Create and Populate the Vector Store
 
 Process all staged files — splitting them into chunks, generating embeddings, and populating the vector store.
 
@@ -56,7 +162,7 @@ Process all staged files — splitting them into chunks, generating embeddings,
 | `client` | Header | Must match the `client` value used in Step 1 |
 | Request body | Body | `VectorStoreConfig` JSON object (see below) |
 
-### VectorStoreConfig Fields
+#### VectorStoreConfig Fields
 
 | Field | Type | Description |
 |---|---|---|
@@ -69,16 +175,9 @@ Process all staged files — splitting them into chunks, generating embeddings,
 | `index_type` | string | Vector index type: `HNSW`, `IVF`, or `HYB` |
 | `parsing_mode` | string | Document parsing mode: `fast` or `deep` |
 
-**Response:** `EmbedProcessingResult` JSON object:
-
-| Field | Type | Description |
-|---|---|---|
-| `message` | string | Status message |
-| `total_chunks` | integer | Number of chunks created |
-| `processed_files` | array | List of successfully processed files |
-| `skipped_files` | array | List of files that were skipped |
+**Response:** `202 Accepted` with an `EmbedJobAccepted` body — same polling contract as the single-call workflow above.
 
-### Example
+#### Example
 
 ```bash
 curl -X POST "http://localhost:8000/v1/embed?rate_limit=60" \
@@ -89,7 +188,7 @@ curl -X POST "http://localhost:8000/v1/embed?rate_limit=60" \
     "alias": "quarterly-reports",
     "description": "Q4 quarterly review documents and metrics",
     "embedding_model": {
-      "provider": "ocigenai",
+      "provider": "oci",
       "id": "cohere.embed-english-v3.0"
     },
     "chunk_size": 1000,
@@ -100,7 +199,7 @@ curl -X POST "http://localhost:8000/v1/embed?rate_limit=60" \
   }'
 ```
 
-## Complete Example
+### Complete Example
 
 A full end-to-end workflow downloading from two buckets and embedding:
 
@@ -132,7 +231,7 @@ curl -X POST "$API_URL/v1/embed?rate_limit=60" \
     "alias": "q4-knowledge-base",
     "description": "Q4 2024 reports and supporting data",
     "embedding_model": {
-      "provider": "ocigenai",
+      "provider": "oci",
       "id": "cohere.embed-english-v3.0"
     },
     "chunk_size": 1000,
@@ -145,6 +244,7 @@ curl -X POST "$API_URL/v1/embed?rate_limit=60" \
 
 ## Notes
 
-- **File cleanup**: Staged files are automatically cleaned up after the embed endpoint completes, whether it succeeds or fails.
-- **Mixing sources**: Files from multiple sources can be accumulated before embedding. In addition to OCI Object Storage downloads, you can upload local files via `POST /v1/embed/local/store` or scrape web content — all files are staged in the same directory scoped by the `client` header.
+- **Single-call vs two-step**: The single-call endpoint downloads directly into a per-request work directory, so it only embeds the objects from the named bucket — files staged via `/v1/embed/local/store`, `/v1/embed/web/store`, or `/v1/embed/sql/store` are *not* pulled into a single-call job. The two-step flow embeds every file currently staged for the client.
+- **File cleanup**: In both workflows, staged files are automatically cleaned up after the embed job completes, whether it succeeds or fails.
+- **Mixing sources**: Files from multiple sources can be accumulated before embedding via the two-step flow. In addition to OCI Object Storage downloads, you can upload local files via `POST /v1/embed/local/store` or scrape web content — all files are staged in the same directory scoped by the `client` header.
 - **Client scoping**: The `client` header isolates temporary storage between different sessions. Use a consistent value across your download and embed calls within a single workflow.
diff --git a/src/client/app/content/tools/tabs/split_embed.py b/src/client/app/content/tools/tabs/split_embed.py
@@ -160,6 +160,7 @@ class FileSourceData:
     web_url: Optional[str] = None
     oci_bucket: Optional[str] = None
     oci_files_selected: Optional[pd.DataFrame] = None
+    oci_all_files: bool = False
     sql_query: Optional[str] = None
     sql_db_alias: Optional[str] = None
 
@@ -172,11 +173,18 @@ def is_valid(self) -> bool:
         if self.file_source == "SQL":
             return bool(self.sql_query and self.sql_query.strip() and self.sql_db_alias)
         if self.file_source == "OCI":
-            return bool(self.oci_files_selected is not None and self.oci_files_selected["Process"].sum() > 0)
+            if not self.oci_bucket:
+                return False
+            return bool(
+                self.oci_all_files
+                or (self.oci_files_selected is not None and self.oci_files_selected["Process"].sum() > 0)
+            )
         return False
 
     def get_button_help(self) -> str:
         """Get help text for the populate button based on file source."""
+        if self.file_source == "OCI" and self.oci_all_files:
+            return "This button is disabled if no source bucket is selected."
         help_map = {
             "Local": "This button is disabled if no local files have been provided.",
             "Web": "This button is disabled if the URL was unable to be validated. Please check the URL.",
@@ -206,6 +214,7 @@ def _get_buckets(compartment_ocid: str, auth_profile: str) -> list:
         return ["No Access to Buckets in this Compartment"]
 
 
+@st.cache_data(ttl=60, show_spinner="Listing bucket objects")
 def _get_bucket_objects(bucket_name: str, auth_profile: str) -> list:
     """Get object names from an OCI bucket."""
     return api_get(f"oci/objects/{bucket_name}/{auth_profile}")
@@ -456,9 +465,25 @@ def _render_load_kb_section(file_sources: list, oci_setup: dict | None) -> FileS
                 disabled=not bucket_compartment,
             )
 
-        src_objects = _get_bucket_objects(data.oci_bucket, auth_profile) if data.oci_bucket else []
-        src_files = _files_data_frame(src_objects)
-        data.oci_files_selected = _files_data_editor(src_files, "source")
+        data.oci_all_files = st.toggle(
+            "Embed all supported files in bucket",
+            value=True,
+            key="runtime_oci_all_files",
+            disabled=not data.oci_bucket,
+            help=(
+                "When enabled, every supported file in the selected bucket is embedded "
+                "without per-file selection. Disable to pick individual files."
+            ),
+        )
+
+        if data.oci_bucket:
+            st.caption(state.optimizer_help.get("embed_supported_file_types", ""))
+            if data.oci_all_files:
+                st.caption(f"All supported files in `{data.oci_bucket}` will be embedded.")
+            else:
+                src_objects = _get_bucket_objects(data.oci_bucket, auth_profile)
+                src_files = _files_data_frame(src_objects)
+                data.oci_files_selected = _files_data_editor(src_files, "source")
 
     return data
 
@@ -652,6 +677,41 @@ def _process_populate_request(
     client_header = {"client": state.optimizer_client}
     auth_profile = state["settings"]["client_settings"].get("oci", {}).get("auth_profile", "")
 
+    if source_data.file_source == "OCI":
+        payload = _build_embed_payload(embed_config)
+        payload["bucket_name"] = source_data.oci_bucket or ""
+        payload["auth_profile"] = auth_profile or "DEFAULT"
+        if not source_data.oci_all_files:
+            oci_selected = source_data.oci_files_selected
+            if oci_selected is None:
+                return None, {}
+            process_list = oci_selected[oci_selected["Process"]].reset_index(drop=True)
+            object_names = process_list["File"].tolist()
+            # An empty ``objects`` list is server-equivalent to omitting
+            # it — i.e. "embed every supported file in the bucket".
+            # Reject zero-selection here so a TOCTOU race past the
+            # disabled-button gate cannot silently embed the whole bucket.
+            if not object_names:
+                return None, {}
+            payload["objects"] = object_names
+        # 7200s mirrors ``/embed/refresh`` (same synchronous-download
+        # shape); /embed/oci/store downloads bucket objects before the
+        # 202, so a ReadTimeout would lose the job_id mid-flight.
+        accepted = api_post(
+            "embed/oci/store",
+            json=payload,
+            params={"rate_limit": rate_limit or 0},
+            extra_headers=client_header,
+            timeout=7200,
+        )
+        job_id = accepted["job_id"]
+        mark_embed_job_started(job_id)
+        try:
+            return job_id, _poll_embed_job(job_id, client_header)
+        except httpx.HTTPStatusError as ex:
+            ex.job_id = job_id  # type: ignore[attr-defined]
+            raise
+
     # Step 1: Store source files on server
     if source_data.file_source == "Local":
         files = helpers.unique_file_payload(state.runtime_local_file_uploader)
@@ -664,17 +724,6 @@ def _process_populate_request(
             json={"query": source_data.sql_query, "db_alias": source_data.sql_db_alias},
             extra_headers=client_header,
         )
-    else:  # OCI
-        oci_selected = source_data.oci_files_selected
-        if oci_selected is None:
-            return None, {}
-        process_list = oci_selected[oci_selected["Process"]].reset_index(drop=True)
-        file_names = process_list["File"].tolist()
-        api_post(
-            f"oci/objects/download/{source_data.oci_bucket or ''}/{auth_profile}",
-            json=file_names,
-            extra_headers=client_header,
-        )
 
     # Step 2: Split and embed — schedule the job and poll for terminal state.
     # 300s acceptance timeout outlasts pre-202 latency (``_settings_lock``