-
Notifications
You must be signed in to change notification settings - Fork 83
docs(package): Add document about using Spider in docker compose packaging. #1647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Warning Rate limit exceeded@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 21 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughDocumentation updates add Spider integration: new "Using Spider" guide and index entry, mark Celery queue/Redis as optional, and introduce Spider services ( Changes
Sequence Diagram(s)sequenceDiagram
participant User as CLI/User
participant CLP as CLP Package
participant SpiderSched as spider-scheduler
participant SpiderWorker as spider-compression-worker
participant DB as spider-db (optional)
participant Storage as ObjectStore
rect rgba(173,216,230,0.12)
Note right of CLP: When compression_scheduler.type = "spider"
end
User->>CLP: start package
CLP->>SpiderSched: submit compression job
SpiderSched->>SpiderWorker: schedule/compression task
SpiderWorker->>Storage: read input blobs
SpiderWorker->>Storage: write compressed output
SpiderWorker->>DB: record task metadata (optional)
SpiderSched->>CLP: report job status
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
docs/src/dev-docs/design-deployment-orchestration.md(8 hunks)docs/src/user-docs/guides-multi-host.md(3 hunks)docs/src/user-docs/guides-using-spider.md(1 hunks)docs/src/user-docs/index.md(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
docs/src/user-docs/guides-multi-host.md
🪛 LanguageTool
docs/src/dev-docs/design-deployment-orchestration.md
[duplication] ~87-~87: Possible typo: you repeated a word.
Context: ...Engine] queue redis end end subgraph Initialization jobs db_...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/user-docs/guides-using-spider.md
[typographical] ~7-~7: It appears that a comma is missing after this introductory phrase.
Context: ...tion with CLP may change in the future. Right now Spider only supports executing CLP comp...
(RIGHT_NOW_PUNCTUATION_COMMA)
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ..." port: 6000 ``` 3. Optionally, before starting the package, update th...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (13)
docs/src/user-docs/guides-multi-host.md (3)
165-175: Clear annotations for optional Celery infrastructure services.The addition of "(optional, only if using Celery)" comments provides helpful context for users choosing between compression frameworks.
199-203: Spider scheduler service startup instruction is clear.The new spider-scheduler service follows the established pattern and includes appropriate optional annotation. The formatting and docker compose syntax are consistent with surrounding commands.
239-249: Spider compression worker service startup instructions align with guide structure.Both the standard compression-worker (now marked as Celery-optional) and the new spider-compression-worker service follow consistent patterns. The optional annotations properly guide users on when to start each service.
docs/src/dev-docs/design-deployment-orchestration.md (5)
32-33: Theme variables for diagram styling are correctly added.The new
tertiaryColorandclusterBkgtheme variables provide proper styling support for the Spider and Celery subgraph visualizations that follow.
45-45: Spider component nodes and dependencies are logically structured.The new spider-scheduler and spider-compression-worker nodes are correctly positioned in the diagram, and their dependencies from db_table_creator match the pattern used for other initialization-dependent services. This aligns with the learnings about compression scheduler initialization requirements.
Also applies to: 47-47, 69-70
84-102: Subgraph organization and styling effectively distinguish framework choices.The Celery and Spider subgraphs clearly separate framework-specific components, with appropriate colour-coding (gold for Celery, cyan for Spider). The styling helps users visually understand the architectural differences between deployment types. However, ensure the deployed docker-compose files (
docker-compose.yaml,docker-compose-base.yaml,docker-compose-spider.yaml,docker-compose-spider-base.yaml) actually exist and match the deployment types table.Also applies to: 123-129
141-157: Service table descriptions are clear and concise.The new rows for spider_scheduler, spider_compression_worker, queue, and redis correctly describe their roles. The optional nature of queue and redis for different configurations is implicit in the context of the deployment types section.
236-244: Deployment types table provides essential configuration guidance.The table clearly explains how
package.compression_scheduler.typeandpackage.query_enginesettings map to deployment variants and their corresponding docker-compose files. This is a valuable addition for users selecting their deployment strategy.docs/src/user-docs/index.md (1)
69-69: Guides toctree updated correctly with Spider guide entry.The new
guides-using-spiderentry is properly positioned in the Guides section and follows the established toctree structure.docs/src/user-docs/guides-using-spider.md (4)
1-10: Title and introduction are clear; note properly contextualizes Spider's current scope.The introductory note appropriately sets expectations that Spider is under active development and currently supports only compression tasks. This manages user expectations effectively.
12-17: Requirements section is specific and actionable.Version constraints are clearly specified, making it straightforward for users to verify their environment. The inclusion of python3-venv is a helpful detail often overlooked.
25-58: Configuration examples are clear and well-structured.The YAML snippets demonstrate the required
compression_scheduler.type: "spider"setting clearly, and the optional configurations for spider_db, spider_scheduler, and credentials are appropriately documented. The reference back to the quick-start guide for next steps maintains good flow between documents.
60-63: Reference links are consistent with project documentation patterns.External references to CLP releases, Docker, Docker Compose, and Spider are properly formatted and link to authoritative sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md
12-12: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 6
(MD009, no-trailing-spaces)
37-37: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
49-49: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
57-57: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
63-63: Files should end with a single newline character
(MD047, single-trailing-newline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-spider.md (2)
25-62: Consider consolidating optional configuration into a dedicated subsection.The three optional configuration items (spider_db, spider_scheduler, and credentials.yaml updates) are currently interspersed with step 2 as separate sub-bullets, which creates a somewhat confusing structure with "(Optional)" repeated multiple times. This has been noted in prior reviews.
For improved readability and to reduce repetition, consider creating a dedicated "Optional Configuration" subsection after the main numbered steps to group all optional settings together:
1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package, but don't start the package just yet. + 2. Before starting the package, update the package's config file (`etc/clp-config.yaml`) as follows: - - * Set the `compression_scheduler.type` key to `"spider"`. + + Set the `compression_scheduler.type` key to `"spider"`: ```yaml compression_scheduler: type: "spider" ``` - * (Optional) Set the `spider_db`. - - ```yaml - spider_db: - db_name: "spider-db" - ``` - - * (Optional) Set the `spider_scheduler`. - - ```yaml - spider_scheduler: - host: "localhost" - port: 6000 - ``` - -3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`) + +3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP. + +### Optional Configuration + +You can optionally customize the following settings: + +* **spider_db**: Set the Spider database configuration. + + ```yaml + spider_db: + db_name: "spider-db" + ``` + +* **spider_scheduler**: Set the Spider scheduler connection details. + + ```yaml + spider_scheduler: + host: "localhost" + port: 6000 + ``` + +* **Spider database credentials**: Before starting the package, update the package's credential file (`etc/credentials.yaml`) to add Spider database credentials as follows: ```yaml spider_db: username: "spider_user" password: "spider_password" ``` - -4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.This improves the flow by keeping the main procedural steps sequential while clearly separating optional customizations.
64-67: Add trailing newline to end of file.The file is missing a trailing newline character, which is required by the MD047 linting rule.
[clp-releases]: https://github.com/y-scope/clp/releases [docker-compose]: https://docs.docker.com/compose/install/ [Docker]: https://docs.docker.com/engine/install/ [Spider]: https://github.com/y-scope/spider +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md
39-39: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
46-46: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
67-67: Files should end with a single newline character
(MD047, single-trailing-newline)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (1)
docs/src/user-docs/guides-using-spider.md (1)
1-10: Documentation content is accurate and well-structured.The guide provides clear setup instructions for integrating Spider with CLP. The introduction appropriately notes that Spider is under active development, and the requirements section specifies concrete version dependencies. Setup steps follow a logical flow from downloading the package to configuration to starting CLP.
Description
This PR:
guides-using-spider.mdfor instructions on setting upSpiderconfig in CLP package.guides-using-spider.mdinto the toctree.guides-multi-node.mdfor instructions on runningSpidercontainers manually.design-deployment-orchestrationfor Spider containers and some minor improvements on existing graph.Checklist
breaking change.
Validation performed
task docs:serveruns successfully.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.