-
Notifications
You must be signed in to change notification settings - Fork 83
feat(deployment): Support configuration of external third-party services (resolves #1680). #1681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughConditionalizes in-code transforms for database, queue, redis, and results_cache based on a BundledService list, parameterizes Docker Compose service URIs with environment variables, and updates documentation to describe bundled vs external services and external-database/multi-host setup steps. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Possibly related issues
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ctions and multi-host setup clarifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)
176-215: Clarify how managed-service “connection strings” map onto the new host/port configThe new bundling and external
database/results_cachehost/port examples look good and line up with the code. One small docs mismatch: the earlier “AWS DocumentDB or MongoDB Atlas” section tells users to “use the provided connection string when configuring CLP (see below)”, but the new config steps only show host/port fields.To avoid confusion for users coming from a full MongoDB/Atlas URI, consider adding a short clarification here, for example:
- Explicitly state that CLP currently expects separate
host/port(plus optionaldb_name), not a rawmongodb+srv://...URI; and/or- Briefly describe how to extract the host and port from a typical managed-service connection string.
That would align the step-by-step config with the earlier managed-service guidance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
components/clp-py-utils/clp_py_utils/clp_config.py(1 hunks)docs/src/user-docs/guides-external-database.md(1 hunks)docs/src/user-docs/guides-multi-host.md(2 hunks)docs/src/user-docs/quick-start/clp-json.md(1 hunks)docs/src/user-docs/quick-start/clp-text.md(1 hunks)tools/deployment/package/docker-compose-all.yaml(5 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Applied to files:
docs/src/user-docs/guides-multi-host.mddocs/src/user-docs/quick-start/clp-text.md
📚 Learning: 2025-07-05T03:38:16.779Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:48:48.990Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/user-docs/guides-multi-host.mdtools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package/docker-compose-all.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
870-877: Bundled checks correctly avoid clobbering external service endpointsConditioning the
transform_for_container()calls onBundledServicemembership cleanly prevents overwriting host/port when services are external while preserving existing behaviour when they’re bundled. This aligns with the new docs and bundling model.docs/src/user-docs/quick-start/clp-json.md (1)
21-24: Tip text and external-database cross-link read clearlyThe expanded tip nicely combines
--setup-onlyguidance with a clear pointer to the external-database guide, and the relative link path looks correct for this location.docs/src/user-docs/quick-start/clp-text.md (1)
23-26: Consistent external-database guidance across quick-startsThe updated tip matches the clp-json quick-start and clearly directs users who want external databases/third-party services to the dedicated guide. Consistent and easy to follow.
docs/src/user-docs/guides-multi-host.md (1)
40-41: Bundled vs external service guidance is aligned and clearThe multi-host steps now cleanly explain configuring the
bundledlist and explicitly tie external-database usage to skippingdatabase/results-cacheservices while still running the init jobs. This matches the config model and should make multi-host + external DB setups much easier to reason about.Also applies to: 55-65, 123-127
| line-length: | ||
| max: 100 | ||
| allow-non-breakable-words: false | ||
| allow-non-breakable-words: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to be set as true; otherwise, the clp-config.yaml with the long URL inside would fail the rule.
@coderabbitai create an issue to track this lint rule change, and i will submit the lint rule change in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the yamllint rule change
gh issue create \
--repo y-scope/clp \
--title "Update yamllint configuration to allow non-breakable words in long lines" \
--body "## Context
This issue tracks the yamllint configuration change to set \`allow-non-breakable-words: true\` in the \`line-length\` rule. This change is necessary because \`clp-config.yaml\` contains long URLs that would fail the yamllint check with the current setting.
## Background
While working on PR #1681 (support for external third-party services configuration), it was identified that the yamllint \`allow-non-breakable-words\` setting needs to be changed from \`false\` to \`true\` to accommodate long URLs in configuration files.
## Related
- PR: https://github.com/y-scope/clp/pull/1681
- Comment: https://github.com/y-scope/clp/pull/1681#discussion_r2569846542
- Requested by: @junhaoliao
## Proposed Change
In \`.yamllint.yaml\`:
\`\`\`yaml
line-length:
max: 100
allow-non-breakable-words: true # Changed from false
\`\`\`" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1683 to track this yamllint configuration change. The issue includes the context about why this change is needed (to accommodate long URLs in clp-config.yaml) and references this PR and comment as backlinks. The issue has been assigned to you for the follow-up PR.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)
218-223: Multi-host deployment note is helpful but could be more explicit.The note clarifies that
db-table-creatorandresults-cache-indices-creatorjobs are still needed, but it would be helpful to explicitly state that these initialization jobs should be run before starting other services. Consider adding a reference or cross-link to the initialization sequence in the multi-host guide.Suggested refinement:
When using external databases in a multi-host deployment, you do **not** need to start the `database` and `results-cache` Docker Compose services. Skip those services when following the -[multi-host deployment guide][multi-host-guide]. However, you still need to run the database -initialization jobs (`db-table-creator` and `results-cache-indices-creator`). +[multi-host deployment guide][multi-host-guide]. However, you still need to run the database +initialization jobs (`db-table-creator` and `results-cache-indices-creator`) **on one host before** +starting the other CLP services.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
.yamllint.yaml(1 hunks)components/package-template/src/etc/clp-config.yaml(2 hunks)docs/src/user-docs/guides-external-database.md(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-08T17:20:02.560Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.
Applied to files:
.yamllint.yaml
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/package-template/src/etc/clp-config.yaml
📚 Learning: 2025-10-22T21:14:12.225Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Applied to files:
components/package-template/src/etc/clp-config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
.yamllint.yaml (1)
40-40: Verify that non-breakable-words relaxation is necessary.The change allows non-breakable words to exceed the 100-character line-length limit. Confirm that the new environment-variable-based URIs and multi-service configurations in the accompanying changes (ClpConfig transformations, docker-compose-all.yaml) actually introduce non-breakable tokens that exceed this threshold and justify relaxing the constraint.
docs/src/user-docs/guides-external-database.md (2)
178-189: YAML syntax and bundled services example look good.The structured step-by-step configuration flow is clear and provides helpful YAML examples. The commented-out services demonstrate which services can be removed from the bundled list to use external instances.
210-216: I'm unable to access the repository to verify the credentials.yaml structure against the codebase configuration expectations due to cloning failures. However, I can provide a rewritten review comment that reflects this limitation:
Unable to verify credentials.yaml structure without codebase access.
The documentation shows credentials nested under
database:withusernameandpasswordfields. This structure should be verified against the actual configuration parser in the codebase to confirm it matches expectations for external database credential handling. Due to technical limitations preventing repository access, this verification could not be completed.components/package-template/src/etc/clp-config.yaml (1)
19-20: Documentation URL reference is correct and accessible.The URL
https://docs.yscope.com/clp/main/user-docs/guides-external-database.htmlhas been verified to exist and contains the "External database setup" guide with sections covering MariaDB/MySQL and MongoDB configuration for CLP. The anchor fragment#configuring-clp-to-use-external-databasesfollows standard markdown heading conventions and appropriately references a configuration section on that page.
| :::{important} | ||
| If your external service is running on the host machine, **do not use `127.0.0.1`** as the host. | ||
| The CLP components run in containers, so `127.0.0.1` would refer to the container itself. | ||
| Instead, use: | ||
| - `host.docker.internal` to refer to the host machine. | ||
| - The host's actual network IP address (e.g., `192.168.1.10`) | ||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host networking guidance needs platform-specific clarification.
The note about 127.0.0.1 and host.docker.internal is helpful, but host.docker.internal is primarily available on Docker Desktop (Mac/Windows). For Linux Docker hosts without Docker Desktop, users need different guidance—typically using the actual host IP or DNS hostname. Consider adding a platform-specific note.
Suggested addition (after line 207):
- `host.docker.internal` to refer to the host machine.
- The host's actual network IP address (e.g., `192.168.1.10`)
+
+ ::::{note}
+ **For Linux Docker hosts**: `host.docker.internal` is not available. Use the actual host IP address or hostname instead.
+ ::::Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether this coderabbit comment is valid or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
host.docker.internal support Linux Docker 2024 2025
💡 Result:
Short answer: Yes — but with caveats.
- Docker Desktop (macOS, Windows — and Docker Desktop for Linux) provides host.docker.internal by default so containers can resolve the host localhost. [1][2][6]
- On plain Docker Engine on Linux you must create the mapping yourself (common options):
- docker run --add-host=host.docker.internal:host-gateway (or in Compose: extra_hosts: - "host.docker.internal:host-gateway"). The dockerd daemon exposes the special host-gateway value and can be configured with host-gateway-ip. [1][3][5]
- Or map the host gateway/docker0 IP directly (e.g. 172.17.0.1) in /etc/hosts or via extra_hosts. [4][2]
References
- Docker docs — dockerd / host-gateway and --add-host examples. [1]
- Docker docs — host.docker.internal behavior / Networking how‑tos. [2]
- Docker Engine 20.10 release notes (adds Linux support hooks). [3]
- Community notes / common workarounds (docker0 / extra_hosts). [4]
If you want, I can show the exact docker run or docker‑compose lines to use for your setup.
@hoophalab The comment is valid. On Docker Desktop (Mac/Windows/Docker Desktop for Linux), host.docker.internal is available by default, but on plain Docker Engine on Linux, you must create the mapping yourself.
For Linux Docker Engine users, they need to use docker run --add-host=host.docker.internal:host-gateway or configure it in Docker Compose with extra_hosts: - "host.docker.internal:host-gateway".
Suggested documentation improvement:
Update the important block in the documentation (around line 203-207) to include platform-specific guidance:
:::{important}
If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
Instead, use:
- `host.docker.internal` to refer to the host machine (available by default on Docker Desktop)
- The host's actual network IP address (e.g., `192.168.1.10`)
**For Linux Docker Engine users** (not Docker Desktop): `host.docker.internal` requires manual
configuration. Add the following to your `docker-compose-all.yaml` services that need to
connect to the host:
```yaml
extra_hosts:
- "host.docker.internal:host-gateway"Alternatively, use the host's actual IP address instead.
:::
This will help Linux users avoid confusion and provide them with a clear path forward.
---
<details>
<summary>🧠 Learnings used</summary>
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
</details>
<!-- This is an auto-generated reply by CodeRabbit -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this requires more thoughts / investigations. Let's put this review on hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some comments
| @@ -1,4 +1,5 @@ | |||
| # yaml-language-server: $schema=../usr/share/config-schemas/clp-config.schema.json | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # |
| # | ||
| ## Remove any bundled services below if you wish to use your own. For more information, see | ||
| ## https://docs.yscope.com/clp/main/user-docs/guides-external-database.html#configuring-clp-to-use-external-databases | ||
| #bundled: ["database", "queue", "redis", "results_cache"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to keep consistent with the doc
| #bundled: ["database", "queue", "redis", "results_cache"] | |
| #bundled: | |
| # - "database" | |
| # - "queue" | |
| # - "redis" | |
| # - "results_cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to run lint on this file
npx markdownlint-cli2 --config <root_dir>/tools/yscope-dev-utils/exports/lint-configs/.markdownlint-cli2.yaml guides-external-database.md --fix
| :::{important} | ||
| If your external service is running on the host machine, **do not use `127.0.0.1`** as the host. | ||
| The CLP components run in containers, so `127.0.0.1` would refer to the container itself. | ||
| Instead, use: | ||
| - `host.docker.internal` to refer to the host machine. | ||
| - The host's actual network IP address (e.g., `192.168.1.10`) | ||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether this coderabbit comment is valid or not
Description
Enable external third-party services (database, queue, Redis, results cache) to be used instead of bundled services by making host/port configurable via environment variables.
Changes
ClpConfig.transform_for_container()to conditionally transform service configurations only when the services are configured to be bundled.docker-compose-all.yamlto use configured service hosts and ports instead of previously hardcoded service names as the hosts, and hardcoded ports:This allows users to deploy the CLP package with external 3rd party services, without manual modifications after initialization. i.e., users don't anymore have to run
start_clp.shwith--setup-only, make modifications, then manually deploy.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.