Skip to content
Merged
63 changes: 46 additions & 17 deletions docs/src/user-docs/guides-using-presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,27 +33,52 @@ Using Presto with CLP requires:

1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
but don't start the package just yet.
2. Before starting the package, update the package's config as follows:
2. Before starting the package, update the config (`etc/clp-config.yml`) as follows:

* Open `etc/clp-config.yml` located within the package.
* Uncomment the `database` section.
* Change `database.host` value to a non-localhost hostname/IP.
* After the change, the `database` section should look something like this:
* Set the `package.query_engine` key to `"presto"`.

```yaml
database:
type: "mariadb" # "mariadb" or "mysql"
host: "<new-IP-address>"
port: 3306
name: "clp-db"
package:
storage_engine: "clp-s"
query_engine: "presto"
```

:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's
database runs on the host network. So `localhost` refers to two different entities in those
networks. This limitation will be addressed in the future when we unify Presto and CLP's
deployment infrastructure.
:::
* Set the `database.host` key to a non-localhost hostname/IP.

```yaml
database:
# type: "mariadb"
host: <IP-address>
# port: 3306
# name: "clp-db"
```

:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
:::

Comment on lines 46 to 62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Recommend host.docker.internal over hard-coding host IP; clarify Linux note.

Hard-coding an IP is brittle. Suggest preferring host.docker.internal (works on macOS/Windows and modern Docker on Linux) and add a Linux fallback.

Proposed doc change:

-    * Set the `database.host` key to a non-localhost hostname/IP.
+    * Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on your Linux distro, use your host's LAN IP or the Docker gateway IP.
@@
-        database:
-        #  type: "mariadb"
-          host: <IP-address>
+        database:
+        #  type: "mariadb"
+          host: host.docker.internal   # or your host IP / Docker gateway IP on Linux
         #  port: 3306
         #  name: "clp-db"
@@
-      This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
+      Presto runs inside Docker, while CLP's DB runs on the host. `localhost` would point to different networks. `host.docker.internal` bridges the two.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Set the `database.host` key to a non-localhost hostname/IP.
```yaml
database:
# type: "mariadb"
host: <IP-address>
# port: 3306
# name: "clp-db"
```
:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's database runs on the host network. So localhost refers to two different entities.
:::
* Set the `database.host` to `host.docker.internal` (recommended). If that doesn't resolve on your Linux distro, use your host's LAN IP or the Docker gateway IP.
🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-presto.md around lines 46 to 59, the
instructions currently suggest hard-coding the host IP for database.host; update
the guidance to recommend using host.docker.internal instead of a fixed IP
(which is supported on macOS/Windows and on modern Docker for Linux), and add a
brief Linux fallback note explaining how to enable host.docker.internal or use
the host network or the host's IP (with a pointer to Docker/OS-specific steps)
so readers have a resilient alternative rather than brittle hard-coded
addresses.

* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.

```yaml
results_cache:
# host: "localhost"
# port: 27017
# db_name: "clp-query-results"
# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retention_period: null
retention_period: null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed the whole indenting for this block in another commit

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Call out unbounded cache growth and offer manual cleanup guidance.

Setting retention_period: null disables GC and can cause unbounded growth. Add an explicit warning and a pointer to manual cleanup steps.

Suggested addition:

       results_cache:
@@
-        #  # Retention for search results, in minutes. Set to null to disable automatic deletion.
+        #  # Retention for search results, in minutes. Set to null to disable automatic deletion.
           retention_period: null
       ```
+
+      :::warning
+      With `retention_period: null`, the results cache will grow without bounds. Plan periodic manual cleanup or set a finite retention once GC is supported.
+      :::
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.
```yaml
results_cache:
# host: "localhost"
# port: 27017
# db_name: "clp-query-results"
# stream_collection_name: "stream-files"
#
# # Retention for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
```
* Set the `results_cache.retention_period` key to `null`. The CLP presto integration does not
yet provide support for garbage collection.
🤖 Prompt for AI Agents
docs/src/user-docs/guides-using-presto.md around lines 60 to 73: the docs
currently state that results_cache.retention_period can be set to null but do
not warn about unbounded cache growth; add a clear warning block right after the
retention_period example that states that setting retention_period: null
disables GC and can cause unbounded growth, and include a short recommendation
to plan periodic manual cleanup or set a finite retention once GC is supported,
plus a pointer or link to the manual cleanup steps or housekeeping guide
elsewhere in the docs.

* Update `presto` key with host and port of the Presto cluster. If you follow the
[Setting up Presto](#setting-up-presto) guide, the host is `localhost` and the port is `8889`. The Presto cluster does not need to be running to start the package.

```yaml
presto:
host: <IP-address>
port: <port>
```

3. If you'd like to store your compressed logs on S3, follow the
[using object storage](guides-using-object-storage/index.md) guide.
Expand Down Expand Up @@ -138,7 +163,11 @@ docker compose rm

## Querying your logs through Presto

To query your logs through Presto, you can use the Presto CLI:
To query your logs through Presto, you can use either:

* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Disambiguate CLP Web UI vs Presto Web UI; add link to Presto UI (port verification).

Bullet reads as “WebUI” but it’s CLP’s web UI. Since this PR adds “setting up presto webui”, include the Presto coordinator UI endpoint too and confirm it matches the configured port.

Suggested wording:

-To query your logs through Presto, you can use either:
+To query your logs through Presto, you can use either:
@@
-* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
-`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
+* The CLP Web UI at [http://localhost:4000](http://localhost:4000) (if you changed
+  `webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
+* The Presto Web UI at `http://<presto.host>:<presto.port>/` (e.g., `http://localhost:8889/`) to inspect queries and cluster state
 * The Presto CLI:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
To query your logs through Presto, you can use either:
* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
To query your logs through Presto, you can use either:
* The CLP Web UI at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto Web UI at `http://<presto.host>:<presto.port>/` (e.g., `http://localhost:8889/`) to inspect queries and cluster state
* The Presto CLI:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To query your logs through Presto, you can use either:
* The WebUI available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values)
* The Presto CLI:
To query your logs through Presto, you can use either the [webUI](#querying-in-the-webui) or the
[Presto CLI](#querying-from-the-presto-cli).
### Querying in the webUI
The webUI is available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values).
### Querying from the Presto CLI
To access the Presto CLI, navigate to the `tools/deployment/presto-clp` directory in a new terminal.
Make sure Presto is still running in whichever terminal you originally started it from. Enter the
following command to start the CLI:

I think that this warrants being split into subsections, as it's awkward to go from the end of this list (the Presto CLI point) into an un-indented explanation of the Presto CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about this, but there is text underneath about certain SQL commands that apply to both. And putting the two headers makes it look like it only applies to the CLI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah very true, my bad! I do think that could be clearer; I will come up with a format that makes that clearer, and I'll include it with my next review


```bash
docker compose exec presto-coordinator \
Expand Down
Loading