Skip to content

Conversation

@junhaoliao
Copy link
Member

@junhaoliao junhaoliao commented Nov 27, 2025

Description

Enable external third-party services (database, queue, Redis, results cache) to be used instead of bundled services by making host/port configurable via environment variables.

Changes

  • Modified ClpConfig.transform_for_container() to conditionally transform service configurations only when the services are configured to be bundled.
  • Updated docker-compose-all.yaml to use configured service hosts and ports instead of previously hardcoded service names as the hosts, and hardcoded ports:
    • Results cache: CLP_RESULTS_CACHE_HOST, CLP_RESULTS_CACHE_PORT
    • Queue: CLP_QUEUE_HOST, CLP_QUEUE_PORT
    • Redis: CLP_REDIS_HOST, CLP_REDIS_PORT
  • Update guides to include external database configuration instructions and multi-host setup clarifications

This allows users to deploy the CLP package with external 3rd party services, without manual modifications after initialization. i.e., users don't anymore have to run start_clp.sh with --setup-only, make modifications, then manually deploy.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

docker run -d \
  -p 192.168.3.77:10086:3306 \
  -e MYSQL_DATABASE=clp-db \
  -e MYSQL_USER=clp-user \
  -e MYSQL_PASSWORD=OB_EUDDk_EU \
  -e MYSQL_ROOT_PASSWORD=OB_EUDDk_EU \
  mariadb:10-jammy

docker run -d \
  -p 192.168.3.77:10087:5672 \
  -e RABBITMQ_DEFAULT_USER=clp-user \
  -e RABBITMQ_DEFAULT_PASS=wosMW_7T488 \
  rabbitmq:3.9.8

docker run -d \
  -p 192.168.3.77:10088:6379 \
  redis:7.2.4 \
  redis-server --requirepass "zHEKvSI3qYQb1ta2KQw_Bg"

docker run -d \
  -p 192.168.3.77:10089:27017 \
  mongo:7.0.1 \
  --replSet rs0 \
  --bind_ip 0.0.0.0

# update etc/clp-config.yaml to set an empty `bundled` []
# and also update the connection configs of the 3rd party services.
# also, make sure credentials.yaml match the above credentials

./sbin/start-clp.sh # observe no failure in startup
./sbin/compress.sh --timestamp-key=Timestamp ~/samples/samples/spark-event-logs/

# use the webui to perform a search with string "123" and observe results are returned
# click "Original file" on any results and observe the log viewer is opened with the log result's context

Summary by CodeRabbit

  • New Features

    • Service endpoints (database, queue, redis, results cache) are now configurable via environment variables with sensible defaults.
  • Bug Fixes

    • Transformations for database, queue, redis and results cache now apply only when those services are declared bundled, avoiding unintended local defaults when using external services.
  • Documentation

    • Added a structured external-database guide, updated multi-host deployment docs and quick-start tips to reference external service setup.
  • Chores

    • Linter rule relaxed for long YAML tokens; added explanatory comments to example config.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Walkthrough

Conditionalizes in-code transforms for database, queue, redis, and results_cache based on a BundledService list, parameterizes Docker Compose service URIs with environment variables, and updates documentation to describe bundled vs external services and external-database/multi-host setup steps.

Changes

Cohort / File(s) Summary
Configuration logic
components/clp-py-utils/clp_py_utils/clp_config.py
transform_for_container calls for database, queue, redis, and results_cache are now executed only when the corresponding BundledService is present in self.bundled, instead of being unconditional.
Docker Compose templates
tools/deployment/package/docker-compose-all.yaml
Replaced hard-coded service endpoints with environment-variable-driven URIs (CLP_RESULTS_CACHE_, CLP_QUEUE_, CLP_REDIS_*) and defaults to allow external host/port configuration; adjusted multi-line URI formatting.
External-database guide
docs/src/user-docs/guides-external-database.md
Reworked flow to introduce a bundled list, separate external database/results_cache connection settings, credentials placement in etc/credentials.yaml, and multi-host notes (host networking/addressing).
Multi-host guide
docs/src/user-docs/guides-multi-host.md
Removed the separate CLP-generated-config update step; added guidance and YAML example to declare bundled vs external services; consolidated distribution/config steps.
Quick-start tips
docs/src/user-docs/quick-start/clp-json.md, docs/src/user-docs/quick-start/clp-text.md
Minor wording changes to Starting CLP tips to reference external-database guidance and clarify external service usage.
Package template config
components/package-template/src/etc/clp-config.yaml
Added comments explaining how to configure external databases and a documentation URL; no semantic config changes.
Linting config
.yamllint.yaml
Changed allow-non-breakable-words to true for line-length handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Review clp_config.py conditional checks for correct enum membership and any edge cases when self.bundled is empty or partial.
  • Verify Docker Compose environment variable names, defaults, and that multiline URI formatting produces valid YAML strings.
  • Scan docs for consistency with config keys and examples (bundled list names, credential file paths).
  • Confirm .yamllint.yaml change doesn't mask unintended long-line issues.

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding support for external third-party services configuration across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as ready for review November 27, 2025 17:16
@junhaoliao junhaoliao requested a review from a team as a code owner November 27, 2025 17:16
@junhaoliao junhaoliao requested a review from hoophalab November 27, 2025 17:16
@junhaoliao junhaoliao changed the title feat(deployment): Support configuration of external 3rd-party services (resolves #1680). feat(deployment): Support configuration of external third-party services (resolves #1680). Nov 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)

176-215: Clarify how managed-service “connection strings” map onto the new host/port config

The new bundling and external database/results_cache host/port examples look good and line up with the code. One small docs mismatch: the earlier “AWS DocumentDB or MongoDB Atlas” section tells users to “use the provided connection string when configuring CLP (see below)”, but the new config steps only show host/port fields.

To avoid confusion for users coming from a full MongoDB/Atlas URI, consider adding a short clarification here, for example:

  • Explicitly state that CLP currently expects separate host/port (plus optional db_name), not a raw mongodb+srv://... URI; and/or
  • Briefly describe how to extract the host and port from a typical managed-service connection string.

That would align the step-by-step config with the earlier managed-service guidance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82f73d8 and f6be19a.

📒 Files selected for processing (6)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
  • docs/src/user-docs/guides-external-database.md (1 hunks)
  • docs/src/user-docs/guides-multi-host.md (2 hunks)
  • docs/src/user-docs/quick-start/clp-json.md (1 hunks)
  • docs/src/user-docs/quick-start/clp-text.md (1 hunks)
  • tools/deployment/package/docker-compose-all.yaml (5 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
  • docs/src/user-docs/quick-start/clp-text.md
📚 Learning: 2025-07-05T03:38:16.779Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:48:48.990Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

870-877: Bundled checks correctly avoid clobbering external service endpoints

Conditioning the transform_for_container() calls on BundledService membership cleanly prevents overwriting host/port when services are external while preserving existing behaviour when they’re bundled. This aligns with the new docs and bundling model.

docs/src/user-docs/quick-start/clp-json.md (1)

21-24: Tip text and external-database cross-link read clearly

The expanded tip nicely combines --setup-only guidance with a clear pointer to the external-database guide, and the relative link path looks correct for this location.

docs/src/user-docs/quick-start/clp-text.md (1)

23-26: Consistent external-database guidance across quick-starts

The updated tip matches the clp-json quick-start and clearly directs users who want external databases/third-party services to the dedicated guide. Consistent and easy to follow.

docs/src/user-docs/guides-multi-host.md (1)

40-41: Bundled vs external service guidance is aligned and clear

The multi-host steps now cleanly explain configuring the bundled list and explicitly tie external-database usage to skipping database/results-cache services while still running the init jobs. This matches the config model and should make multi-host + external DB setups much easier to reason about.

Also applies to: 55-65, 123-127

line-length:
max: 100
allow-non-breakable-words: false
allow-non-breakable-words: true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to be set as true; otherwise, the clp-config.yaml with the long URL inside would fail the rule.

@coderabbitai create an issue to track this lint rule change, and i will submit the lint rule change in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the yamllint rule change
gh issue create \
  --repo y-scope/clp \
  --title "Update yamllint configuration to allow non-breakable words in long lines" \
  --body "## Context

This issue tracks the yamllint configuration change to set \`allow-non-breakable-words: true\` in the \`line-length\` rule. This change is necessary because \`clp-config.yaml\` contains long URLs that would fail the yamllint check with the current setting.

## Background

While working on PR #1681 (support for external third-party services configuration), it was identified that the yamllint \`allow-non-breakable-words\` setting needs to be changed from \`false\` to \`true\` to accommodate long URLs in configuration files.

## Related

- PR: https://github.com/y-scope/clp/pull/1681
- Comment: https://github.com/y-scope/clp/pull/1681#discussion_r2569846542
- Requested by: @junhaoliao

## Proposed Change

In \`.yamllint.yaml\`:
\`\`\`yaml
line-length:
  max: 100
  allow-non-breakable-words: true  # Changed from false
\`\`\`" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1683 to track this yamllint configuration change. The issue includes the context about why this change is needed (to accommodate long URLs in clp-config.yaml) and references this PR and comment as backlinks. The issue has been assigned to you for the follow-up PR.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)

218-223: Multi-host deployment note is helpful but could be more explicit.

The note clarifies that db-table-creator and results-cache-indices-creator jobs are still needed, but it would be helpful to explicitly state that these initialization jobs should be run before starting other services. Consider adding a reference or cross-link to the initialization sequence in the multi-host guide.

Suggested refinement:

 When using external databases in a multi-host deployment, you do **not** need to start the
 `database` and `results-cache` Docker Compose services. Skip those services when following the
-[multi-host deployment guide][multi-host-guide]. However, you still need to run the database
-initialization jobs (`db-table-creator` and `results-cache-indices-creator`).
+[multi-host deployment guide][multi-host-guide]. However, you still need to run the database
+initialization jobs (`db-table-creator` and `results-cache-indices-creator`) **on one host before**
+starting the other CLP services.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6d5051 and cc2aa72.

📒 Files selected for processing (3)
  • .yamllint.yaml (1 hunks)
  • components/package-template/src/etc/clp-config.yaml (2 hunks)
  • docs/src/user-docs/guides-external-database.md (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-08T17:20:02.560Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.

Applied to files:

  • .yamllint.yaml
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/package-template/src/etc/clp-config.yaml
📚 Learning: 2025-10-22T21:14:12.225Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Applied to files:

  • components/package-template/src/etc/clp-config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
.yamllint.yaml (1)

40-40: Verify that non-breakable-words relaxation is necessary.

The change allows non-breakable words to exceed the 100-character line-length limit. Confirm that the new environment-variable-based URIs and multi-service configurations in the accompanying changes (ClpConfig transformations, docker-compose-all.yaml) actually introduce non-breakable tokens that exceed this threshold and justify relaxing the constraint.

docs/src/user-docs/guides-external-database.md (2)

178-189: YAML syntax and bundled services example look good.

The structured step-by-step configuration flow is clear and provides helpful YAML examples. The commented-out services demonstrate which services can be removed from the bundled list to use external instances.


210-216: I'm unable to access the repository to verify the credentials.yaml structure against the codebase configuration expectations due to cloning failures. However, I can provide a rewritten review comment that reflects this limitation:


Unable to verify credentials.yaml structure without codebase access.

The documentation shows credentials nested under database: with username and password fields. This structure should be verified against the actual configuration parser in the codebase to confirm it matches expectations for external database credential handling. Due to technical limitations preventing repository access, this verification could not be completed.

components/package-template/src/etc/clp-config.yaml (1)

19-20: Documentation URL reference is correct and accessible.

The URL https://docs.yscope.com/clp/main/user-docs/guides-external-database.html has been verified to exist and contains the "External database setup" guide with sections covering MariaDB/MySQL and MongoDB configuration for CLP. The anchor fragment #configuring-clp-to-use-external-databases follows standard markdown heading conventions and appropriately references a configuration section on that page.

Comment on lines +202 to +208
:::{important}
If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
Instead, use:
- `host.docker.internal` to refer to the host machine.
- The host's actual network IP address (e.g., `192.168.1.10`)
:::
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Host networking guidance needs platform-specific clarification.

The note about 127.0.0.1 and host.docker.internal is helpful, but host.docker.internal is primarily available on Docker Desktop (Mac/Windows). For Linux Docker hosts without Docker Desktop, users need different guidance—typically using the actual host IP or DNS hostname. Consider adding a platform-specific note.

Suggested addition (after line 207):

   - `host.docker.internal` to refer to the host machine.
   - The host's actual network IP address (e.g., `192.168.1.10`)
+
+  ::::{note}
+  **For Linux Docker hosts**: `host.docker.internal` is not available. Use the actual host IP address or hostname instead.
+  ::::

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@hoophalab hoophalab Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether this coderabbit comment is valid or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

host.docker.internal support Linux Docker 2024 2025

💡 Result:

Short answer: Yes — but with caveats.

  • Docker Desktop (macOS, Windows — and Docker Desktop for Linux) provides host.docker.internal by default so containers can resolve the host localhost. [1][2][6]
  • On plain Docker Engine on Linux you must create the mapping yourself (common options):
    • docker run --add-host=host.docker.internal:host-gateway (or in Compose: extra_hosts: - "host.docker.internal:host-gateway"). The dockerd daemon exposes the special host-gateway value and can be configured with host-gateway-ip. [1][3][5]
    • Or map the host gateway/docker0 IP directly (e.g. 172.17.0.1) in /etc/hosts or via extra_hosts. [4][2]

References

  • Docker docs — dockerd / host-gateway and --add-host examples. [1]
  • Docker docs — host.docker.internal behavior / Networking how‑tos. [2]
  • Docker Engine 20.10 release notes (adds Linux support hooks). [3]
  • Community notes / common workarounds (docker0 / extra_hosts). [4]

If you want, I can show the exact docker run or docker‑compose lines to use for your setup.


@hoophalab The comment is valid. On Docker Desktop (Mac/Windows/Docker Desktop for Linux), host.docker.internal is available by default, but on plain Docker Engine on Linux, you must create the mapping yourself.

For Linux Docker Engine users, they need to use docker run --add-host=host.docker.internal:host-gateway or configure it in Docker Compose with extra_hosts: - "host.docker.internal:host-gateway".

Suggested documentation improvement:

Update the important block in the documentation (around line 203-207) to include platform-specific guidance:

   :::{important}
   If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
   The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
   Instead, use:
   - `host.docker.internal` to refer to the host machine (available by default on Docker Desktop)
   - The host's actual network IP address (e.g., `192.168.1.10`)
   
   **For Linux Docker Engine users** (not Docker Desktop): `host.docker.internal` requires manual 
   configuration. Add the following to your `docker-compose-all.yaml` services that need to 
   connect to the host:
   ```yaml
   extra_hosts:
     - "host.docker.internal:host-gateway"

Alternatively, use the host's actual IP address instead.
:::


This will help Linux users avoid confusion and provide them with a clear path forward.

---

<details>
<summary>🧠 Learnings used</summary>

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).


</details>



<!-- This is an auto-generated reply by CodeRabbit -->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this requires more thoughts / investigations. Let's put this review on hold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some comments

@@ -1,4 +1,5 @@
# yaml-language-server: $schema=../usr/share/config-schemas/clp-config.schema.json
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#

#
## Remove any bundled services below if you wish to use your own. For more information, see
## https://docs.yscope.com/clp/main/user-docs/guides-external-database.html#configuring-clp-to-use-external-databases
#bundled: ["database", "queue", "redis", "results_cache"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep consistent with the doc

Suggested change
#bundled: ["database", "queue", "redis", "results_cache"]
#bundled:
# - "database"
# - "queue"
# - "redis"
# - "results_cache"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to run lint on this file
npx markdownlint-cli2 --config <root_dir>/tools/yscope-dev-utils/exports/lint-configs/.markdownlint-cli2.yaml guides-external-database.md --fix

Comment on lines +202 to +208
:::{important}
If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
Instead, use:
- `host.docker.internal` to refer to the host machine.
- The host's actual network IP address (e.g., `192.168.1.10`)
:::
Copy link
Contributor

@hoophalab hoophalab Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether this coderabbit comment is valid or not

@junhaoliao junhaoliao marked this pull request as draft November 28, 2025 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants