Skip to content

Conversation

@Eden-D-Zhang
Copy link
Contributor

@Eden-D-Zhang Eden-D-Zhang commented Nov 25, 2025

Description

This PR refactors how credentials for the Database config class are handled:

  • Added DbUserCredentials class to store DB username and password, and ClpDbUserType enum class to store different user types.
  • Replaced username and password fields in Database with dict of user types and credential sets.
  • Added user_type parameter to credential loading/validation/connection methods to specify which user is to be used.
  • Refactored DB access logic to use updated DB credential model.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Started the package, compressed logs, performed query, used archive/dataset manager scripts.

Summary by CodeRabbit

  • New Features

    • Per-user database credentials (root and app) and selectable DB engine support for connections and pools.
  • Chores

    • Deployment templates and environment configuration now expose and require separate root and app DB variables.
  • Refactor

    • Configuration and runtime flows updated to load, validate and propagate credentials by user type across connection, tooling and container workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@Eden-D-Zhang Eden-D-Zhang requested a review from a team as a code owner November 25, 2025 04:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

Replace single DB username/password with per-user credentials (CLP and ROOT) across config, SQL adapter, package utilities, scripts, docker-compose and callers; add ClpDbUserType enum and per-user env constants; update APIs and call sites to select credentials by user type.

Changes

Cohort / File(s) Summary
Configuration models & template
components/clp-py-utils/clp_py_utils/clp_config.py, components/clp-package-utils/clp_package_utils/general.py, components/package-template/src/etc/credentials.template.yaml
Add ClpDbUserType and DbUserCredentials; refactor Database to a credentials mapping for CLP and ROOT; add CLP_DB_ROOT_USER/CLP_DB_ROOT_PASS constants; update credential load/validation APIs and template to include root credentials.
Database connection layer
components/clp-py-utils/clp_py_utils/sql_adapter.py
Introduce user_type: ClpDbUserType parameter (default CLP); dispatch connection creation by DatabaseEngine; add private creators for MySQL/MariaDB; propagate user_type into connection and pool creation.
Controller & package utils
components/clp-package-utils/clp_package_utils/controller.py
Read clp_config.database.credentials and set CLP/ROOT env vars using new per-user env names; switch DB image selection to DatabaseEngine.
Package scripts (credential usage)
components/clp-package-utils/clp_package_utils/scripts/* and components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
components/clp-package-utils/clp_package_utils/scripts/{archive_manager,compress,compress_from_s3,dataset_manager,decompress,search}.py
Import ClpDbUserType; replace direct clp_config.database.username/password with credentials[ClpDbUserType.CLP].username / .password when building DB-related environment variables for container invocations.
Job orchestration
components/job-orchestration/job_orchestration/executor/compress/compression_task.py, components/job-orchestration/job_orchestration/scheduler/utils.py
Use validated Database model and ClpDbUserType.CLP to source CLP credentials; replace MySQL-specific factory calls with generic create_connection where applicable.
Deployment (docker-compose)
tools/deployment/package/docker-compose-all.yaml
Use CLP_DB_ROOT_PASS for MYSQL_ROOT_PASSWORD; add CLP_DB_ROOT_PASS and CLP_DB_ROOT_USER to db-table-creator; enforce required-value syntax for CLP_DB_PASS/CLP_DB_USER in services.
Call-site propagation
multiple callers across repo (see above)
Propagate new user_type parameter and update call sites to select per-user credentials; replace top-level username/password usages with credentials mapping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Files/areas needing extra attention:
    • components/clp-py-utils/clp_py_utils/clp_config.py: credential loading from env/file, ensure ensure_credentials_loaded(user_type) and serialization (credentials excluded) are correct.
    • components/clp-py-utils/clp_py_utils/sql_adapter.py: dispatch by DatabaseEngine, connection pooling changes, and preservation of error handling for new user_type paths.
    • components/clp-package-utils controller and scripts: consistent use of CLP vs ROOT credentials and correct env var names.
    • tools/deployment/package/docker-compose-all.yaml: ensure env var references and required-value syntax match runtime expectations.
    • Any serialization/tests/tooling that assumed top-level username/password fields.

Suggested reviewers

  • sitaowang1998

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main purpose of the PR: refactoring the Database config to support multiple user credentials and use separate root credentials.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

71-79: Root password integration is coherent, but avoid leaking it via config dumps

The new CLP_DB_ROOT_PASS_ENV_VAR_NAME, root_password field, has_root_password(), and the file/env loading logic form a consistent, optional root-cred path. Tolerating missing root_password via KeyError/ValueError handling is a good choice.

However, Database.dump_to_primitive_dict() still only excludes username and password, so root_password will now be serialised into any dumped config (e.g., the shared container config written into logs). That weakens the existing “don’t dump DB creds” behaviour specifically for the most privileged credential.

I’d strongly recommend excluding root_password here as well, mirroring the treatment of the other secrets:

     def dump_to_primitive_dict(self):
-        d = self.model_dump(exclude={"username", "password"})
+        d = self.model_dump(exclude={"username", "password", "root_password"})
         return d

This keeps root credentials confined to the credentials file and env vars, and avoids unnecessarily broad exposure inside containers.

Also applies to: 165-180, 233-240, 253-257, 265-269

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f79349 and fc7056f.

📒 Files selected for processing (5)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/general.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (4 hunks)
  • components/package-template/src/etc/credentials.template.yaml (1 hunks)
  • tools/deployment/package/docker-compose-all.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/controller.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • has_root_password (233-239)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
  • get_config_value (28-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (5)
components/package-template/src/etc/credentials.template.yaml (1)

1-10: Template addition for root_password is consistent

The commented root_password example matches the new credentials field and helps discoverability. No issues here.

tools/deployment/package/docker-compose-all.yaml (2)

49-54: Confirm migration story for new CLP_DB_ROOT_PASS requirement

Switching MYSQL_ROOT_PASSWORD to ${CLP_DB_ROOT_PASS:?Please set a value.} cleanly separates root and user passwords, but it also makes CLP_DB_ROOT_PASS mandatory for this compose file. For existing deployments whose credentials.yaml lacks root_password, the generated .env will omit CLP_DB_ROOT_PASS and compose will now fail fast.

If that’s intentional, please make sure the upgrade path and requirement to add/generate a root_password (or re‑generate credentials) is clearly documented. If you want smoother backcompat, you might consider a fallback to CLP_DB_PASS at the config/env layer instead.


80-87: Propagating CLP_DB_ROOT_PASS to db-table-creator looks correct

Wiring CLP_DB_ROOT_PASS into db-table-creator alongside CLP_DB_USER/CLP_DB_PASS matches the new model. Just ensure clp_py_utils.create-db-tables is updated to read CLP_DB_ROOT_PASS (and not implicitly rely on CLP_DB_PASS) so this env var is actually honoured.

components/clp-py-utils/clp_py_utils/clp_config.py (1)

241-257: Optional: decide whether to default root_password to user password for legacy configs

load_credentials_from_file and load_credentials_from_env now make root_password truly optional (swallowing missing keys/vars), while the compose layer can require CLP_DB_ROOT_PASS. For older credentials.yaml files without database.root_password, this means:

  • Database.root_password will stay None.
  • has_root_password() is false.
  • No CLP_DB_ROOT_PASS gets written to .env, yet docker-compose currently insists on it.

If you want smoother upgrades while still allowing a distinct root password, you could consider a compatibility default such as:

  • On load, if root_password is absent, default it to password.
  • Or introduce a small helper that derives CLP_DB_ROOT_PASS from CLP_DB_PASS when has_root_password() is false, and phase that out later.

If the hard requirement for a separate root password is intentional, documenting that explicitly (and maybe providing a migration helper) would help avoid surprises.

Also applies to: 265-269

components/clp-package-utils/clp_package_utils/controller.py (1)

140-150: Conditional CLP_DB_ROOT_PASS export is correct; verify other launch paths

Conditionally adding CLP_DB_ROOT_PASS when database.has_root_password() is true fits the new model and avoids writing a bogus env var when no root cred is configured. Combined with the .env writer’s skip‑None logic, this is sound.

If there are any non–docker‑compose flows that start the DB initialisation / table‑creation logic (e.g., via generate_container_start_cmd + get_credential_env_vars_list), it would be worth checking that they also propagate the root password where needed, or that they intentionally rely only on the regular DB user.

Comment on lines 452 to 461
def generate_credentials_file(credentials_file_path: pathlib.Path):
credentials = {
DB_COMPONENT_NAME: {"username": "clp-user", "password": secrets.token_urlsafe(8)},
DB_COMPONENT_NAME: {
"username": "clp-user",
"password": secrets.token_urlsafe(8),
"root_password": secrets.token_urlsafe(8),
},
QUEUE_COMPONENT_NAME: {"username": "clp-user", "password": secrets.token_urlsafe(8)},
REDIS_COMPONENT_NAME: {"password": secrets.token_urlsafe(16)},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Generating a distinct root_password looks good

Emitting a separate root_password with secrets.token_urlsafe(8) aligns with the new root-credential model and keeps it independent from the user password. If you ever revisit password policy, you might choose a larger token size for the root password, but this is acceptable and consistent with the existing DB password generation.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 452 to
461, the code already emits a distinct "root_password" using
secrets.token_urlsafe(8); leave this separate root credential generation as-is
to match the new root-credential model, no code changes required now, but if you
revisit password policy later consider increasing the token length for
root_password (e.g., token_urlsafe(12-16)).

sitaowang1998
sitaowang1998 previously approved these changes Nov 25, 2025
@sitaowang1998 sitaowang1998 dismissed their stale review November 25, 2025 04:51

Exclude root_password from config serialization.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc7056f and 5cf882d.

📒 Files selected for processing (16)
  • components/clp-package-utils/clp_package_utils/controller.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/general.py (1 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/compress.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/decompress.py (3 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (2 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/search.py (2 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (5 hunks)
  • components/clp-py-utils/clp_py_utils/sql_adapter.py (4 hunks)
  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/scheduler/utils.py (1 hunks)
  • components/package-template/src/etc/credentials.template.yaml (1 hunks)
  • tools/deployment/package/docker-compose-all.yaml (3 hunks)
  • tools/yscope-dev-utils (1 hunks)
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-07-25T21:29:48.947Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.

Applied to files:

  • tools/yscope-dev-utils
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.

Applied to files:

  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py
  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
  • components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py
  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
  • components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py
  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
  • components/clp-package-utils/clp_package_utils/controller.py
  • components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py
  • components/clp-package-utils/clp_package_utils/scripts/search.py
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
  • components/clp-py-utils/clp_py_utils/clp_config.py
  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2024-11-15T16:21:52.122Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-07-03T12:58:18.407Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
  • components/clp-py-utils/clp_py_utils/clp_config.py
  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-02T15:48:58.961Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1368
File: components/clp-mcp-server/clp_mcp_server/__init__.py:11-15
Timestamp: 2025-10-02T15:48:58.961Z
Learning: In the clp-mcp-server component (components/clp-mcp-server/clp_mcp_server/__init__.py), the default host binding of 0.0.0.0 is intentional because the server is designed to be deployed in Docker containers where this binding is necessary to accept external connections.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
  • components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/clp-package-utils/clp_package_utils/scripts/compress.py
🧬 Code graph analysis (10)
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/job-orchestration/job_orchestration/scheduler/utils.py (1)
components/clp-py-utils/clp_py_utils/sql_adapter.py (2)
  • create_connection (105-115)
  • create_connection (123-124)
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (2)
  • read_yaml_config_file (58-64)
  • get_config_value (28-42)
components/clp-package-utils/clp_package_utils/controller.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpDbUserType (166-168)
🔇 Additional comments (17)
tools/deployment/package/docker-compose-all.yaml (3)

52-52: Separate root password from CLP user credentials in database service.

The MYSQL_ROOT_PASSWORD now correctly sources from CLP_DB_ROOT_PASS instead of CLP_DB_PASS, establishing separate credential scopes. The required-value syntax enforces presence at runtime.


463-464: Upgrade mcp-server with required-value validation for database credentials.

The CLP_DB_PASS and CLP_DB_USER environment variables now enforce presence via the ":?Please set a value." pattern, improving consistency with other services and catching missing configuration at startup time.


85-86: Unable to complete automated verification due to repository access issues.

The repository clone operation failed multiple times, preventing me from searching the codebase for how the clp_py_utils.create-db-tables module reads the CLP_DB_ROOT_USER and CLP_DB_ROOT_PASS environment variables.

The original review comment's concern is reasonable and valid: the environment variables are being set in the docker-compose file with required-value validation, but without accessing the Python module implementation, I cannot confirm whether:

  • The module actually reads these variables from the environment
  • The variables are properly passed to database connection logic
  • There are any issues with how they're consumed

Manual verification of the clp_py_utils.create-db-tables module implementation is needed to resolve this concern.

components/clp-package-utils/clp_package_utils/general.py (1)

452-459: Separate root credentials in the default template look consistent and correct

Adding root_username/root_password under DB_COMPONENT_NAME ensures newly generated credentials files carry both user and root credentials, which aligns with the new root/CLP user split and the existing validate_and_load_db_credentials_file flow. No functional or structural issues here from this file’s perspective.

components/package-template/src/etc/credentials.template.yaml (2)

5-6: Template format and naming are consistent with existing patterns.

The new root credentials follow the same structure as the existing username/password fields, using appropriate naming conventions (root_username, root_password) and default example values.


5-6: I'll verify the review comment by checking if the credential generation code and deployment manifests properly handle the root credentials mentioned in the template. Let me start by exploring the codebase.
<function_calls>

#!/bin/bash

First, let's understand the repository structure and locate credential-related files

echo "=== Repository structure ==="
git ls-files --others --exclude-standard | head -20
echo ""
echo "=== Key directories ==="
find . -maxdepth 2 -type d -name "credential" -o -type d -name "config" -o -type d -name "deploy" 2>/dev/null | head -20
echo ""
echo "=== Files matching credential/config patterns ==="
fd -e yaml -e yml -e py -e sh -e json | grep -i credential | head -20


</function_calls>

<function_calls>

#!/bin/bash

Check the target template file

echo "=== Target template file ==="
cat -n components/package-template/src/etc/credentials.template.yaml
echo ""

Find all credential-related files

echo "=== All credential-related files ==="
fd credentials -t f

echo ""
echo "=== All YAML template files ==="
fd -e template.yaml -e template.yml | head -20


</function_calls>

<function_calls>

#!/bin/bash

Search for credential generation code that might read the template

echo "=== Searching for Python code handling root_username/root_password ==="
rg -nP '(root_username|root_password)' --type=py -C 3

echo ""
echo "=== Searching for credential generation/loading code ==="
rg -nP 'credentials.*template|template.*credentials' --type=py -C 2

echo ""
echo "=== Searching for CLP_DB_ROOT environment variables ==="
rg -nP 'CLP_DB_ROOT_(USER|PASS)' -C 2


</function_calls>

<function_calls>

#!/bin/bash

Search in deployment files

echo "=== Docker Compose and deployment files ==="
fd -e docker-compose.yaml -e docker-compose.yml -e Dockerfile -e "*.sh" | head -20

echo ""
echo "=== Search for root_username/root_password in all files ==="
rg -nP '(root_username|root_password)' -C 2

echo ""
echo "=== Search for CLP_DB_ROOT in all files ==="
rg -nP 'CLP_DB_ROOT' -C 2


</function_calls>

<function_calls>

#!/bin/bash

Look for documentation and README files

echo "=== README and documentation files ==="
fd -e md -e rst -e txt -i readme

echo ""
echo "=== Search for root credential documentation ==="
rg -nP '(root.*credential|root.*user|root.*pass)' --type=md -C 2

echo ""
echo "=== Check credentials.template.yaml content ==="
wc -l components/package-template/src/etc/credentials.template.yaml
cat components/package-template/src/etc/credentials.template.yaml


</function_calls>

<function_calls>

#!/bin/bash

Search for code that might load or process this template

echo "=== Search for YAML loading in Python code ==="
rg -nP 'yaml.load|yaml.safe_load' --type=py -C 3

echo ""
echo "=== Search for credentials file loading ==="
rg -nP 'credentials.*.yaml|.yaml.*credentials' --type=py -C 3

echo ""
echo "=== Find Python files in the same directory as template ==="
fd -t f components/package-template/src/etc/ | head -20


</function_calls>

<function_calls>

#!/bin/bash

Search for database-related Python code

echo "=== Search for database initialization code ==="
rg -nP '(database|db|mysql|mariadb).credential|credential.(database|db|mysql|mariadb)' --type=py -C 3 | head -50

echo ""
echo "=== Find Python package initialization in package-template ==="
fd -t f '.py$' components/package-template/ 2>/dev/null | head -20

echo ""
echo "=== Check for docker-compose files ==="
fd -i docker-compose | xargs wc -l 2>/dev/null


</function_calls>

<function_calls>

#!/bin/bash

Comprehensive search for credential usage patterns

echo "=== All Python files in components/package-template ==="
find components/package-template -name "*.py" -type f | head -20

echo ""
echo "=== Files in etc/ directory of package-template ==="
ls -la components/package-template/src/etc/ 2>/dev/null || find components/package-template -name "etc" -type d

echo ""
echo "=== Search for any database root user setup ==="
rg -nP '(root|ROOT).user|user.(root|ROOT)' --type=py --type=sh -C 2 | head -60


</function_calls>

<function_calls>

#!/bin/bash

Direct content check

echo "=== Full content of credentials template ==="
cat components/package-template/src/etc/credentials.template.yaml

echo ""
echo "=== All YAML files in components/package-template ==="
find components/package-template -name ".yaml" -o -name ".yml" | xargs wc -l 2>/dev/null

echo ""
echo "=== Check for docker-compose in root or containers ==="
find . -maxdepth 3 -name "docker-compose*" -type f | head -10


</function_calls>

<function_calls>

#!/bin/bash

Broader search for credential usage

echo "=== All references to root_username anywhere ==="
rg 'root_username' -C 3

echo ""
echo "=== All references to root_password anywhere ==="
rg 'root_password' -C 3

echo ""
echo "=== All references to CLP_DB_ROOT anywhere ==="
rg 'CLP_DB_ROOT' -C 3


</function_calls>

<function_calls>

#!/bin/bash

Search for credential processing Python code

echo "=== Python files that might process credentials ==="
fd -e setup.py -e init.py -e config.py -e credential*.py

echo ""
echo "=== Search for "credentials" file pattern in any context ==="
rg -n 'credentials.template.yaml' -C 2

echo ""
echo "=== Check git history for this file ==="
git log --oneline -n 20 -- components/package-template/src/etc/credentials.template.yaml 2>/dev/null || echo "No git history available"

echo ""
echo "=== Current diff of this file in PR ==="
git diff HEAD -- components/package-template/src/etc/credentials.template.yaml 2>/dev/null || echo "No git diff available"


</function_calls>

<function_calls>

#!/bin/bash

More targeted search for where credentials are actually used

echo "=== Search for files that mention 'credentials' ==="
rg -l 'credentials' | head -30

echo ""
echo "=== Search for environment variable references to DB credentials ==="
rg -nP '(CLP_DB_|DB_|DATABASE_)' --type=py --type=sh | grep -iE '(user|pass|password|root)' | head -20

echo ""
echo "=== Files in package-template that might initialize DB ==="
find components/package-template -type f ( -name ".py" -o -name ".sh" -o -name "*.yaml" ) | xargs ls -lh 2>/dev/null | head -20


</function_calls>

<function_calls>

#!/bin/bash

Let's look at specific files that likely handle credentials

echo "=== Files in components/package-template/src ==="
find components/package-template/src -type f | head -30

echo ""
echo "=== Search for any Python class or function related to credentials ==="
ast-grep --pattern 'class $_ {
$$$
credential
$$$
}'

echo ""
echo "=== Check if credentials.template.yaml is referenced in Python ==="
rg 'credentials.template' -C 3


</function_calls>

<function_calls>

#!/bin/bash

Direct listing of package-template structure

echo "=== Full directory tree of components/package-template ==="
find components/package-template -type f | sort

echo ""
echo "=== Count of each file type ==="
find components/package-template -type f | sed 's/.*.//' | sort | uniq -c


</function_calls>

<function_calls>

#!/bin/bash

Check for any reference to the template file being used

echo "=== All references to 'template' in the codebase ==="
rg -n '.template.' -C 2 | head -60

echo ""
echo "=== All references to copying or using config templates ==="
rg -nP 'template.*copy|copy.*template|setup.*template' --type=py --type=sh -C 2

echo ""
echo "=== Check the template file itself for any clues about usage ==="
file components/package-template/src/etc/credentials.template.yaml
cat -n components/package-template/src/etc/credentials.template.yaml


</function_calls>

<function_calls>

#!/bin/bash

Search for setup/initialization in the entire repository

echo "=== Search for 'setup' or 'init' related code ==="
rg -n '(setup|initialize|init|config).*credential' --type=py --type=sh | head -20

echo ""
echo "=== Docker-related files ==="
find . -maxdepth 3 ( -name "Dockerfile*" -o -name "docker-compose*" -o -name ".dockerignore" ) -type f

echo ""
echo "=== Search in deployment or setup scripts ==="
fd -e setup.py -e setup.sh -o -path "/deploy/" -o -path "/docker/" | head -20


</function_calls>

<function_calls>

#!/bin/bash

More comprehensive search

echo "=== All Python files in repo ==="
fd -e setup.py -e init.py -e main.py -e config.py

echo ""
echo "=== Check for MariaDB or MySQL setup code ==="
rg -n '(mariadb|mysql|database|db.*init)' --type=py -i -C 2 | head -40

echo ""
echo "=== Check for environment variable setup code ==="
rg -n 'environ[|os.getenv|getenv' --type=py -C 2 | head -40


</function_calls>

<function_calls>

#!/bin/bash

Search for any environment variables starting with CLP

echo "=== All CLP_ environment variable references ==="
rg -nP 'CLP_[A-Z_]+' --type=py --type=sh --type=yaml -C 1 | head -80

echo ""
echo "=== Check README or documentation ==="
cat README.md 2>/dev/null | head -100 || echo "No README.md found"

echo ""
echo "=== List all .md files ==="
fd -e md | head -20


</function_calls>

<function_calls>

#!/bin/bash

Get a focused view of the relevant code

echo "=== All environment variables containing DB or ROOT ==="
rg -nP '(DB_.*ROOT|ROOT.*DB|CLP_DB)' --type=py --type=sh --type=yaml

echo ""
echo "=== Look for where CLP_DB variables are defined or used ==="
rg -nP '\bCLP_DB_' --type=py --type=sh --type=yaml -C 2 | head -100

echo ""
echo "=== Search for docs about database setup ==="
rg -n '(database|root.*user|root.*pass)' --type=md -i


</function_calls>

<function_calls>

#!/bin/bash

Final comprehensive check

echo "=== Git status to see what changed ==="
git status --short 2>/dev/null | head -20 || echo "Not a git repo or no changes"

echo ""
echo "=== Check the actual PR/changes to credentials.template.yaml ==="
git diff HEAD components/package-template/src/etc/credentials.template.yaml 2>/dev/null || git show HEAD:components/package-template/src/etc/credentials.template.yaml 2>/dev/null || cat components/package-template/src/etc/credentials.template.yaml

echo ""
echo "=== Count all references to root_username and root_password in repo ==="
echo "root_username count: $(rg 'root_username' -c --color=never | wc -l)"
echo "root_password count: $(rg 'root_password' -c --color=never | wc -l)"
echo "CLP_DB_ROOT count: $(rg 'CLP_DB_ROOT' -c --color=never | wc -l)"


</function_calls>

<function_calls>

#!/bin/bash

Get more context about credentials file

echo "=== Full content of credentials.template.yaml ==="
cat components/package-template/src/etc/credentials.template.yaml

echo ""
echo "=== Any other template files to compare ==="
find . -name ".template." -o -name "template.yaml" | head -20

echo ""
echo "=== Search for any code that might parse this template ==="
rg -n 'yaml|template|credential' components/package-template/ --type=py


</function_calls>

Based on my comprehensive verification of the codebase, I can now provide the rewritten review comment:

No evidence found that root credentials are integrated into credential generation or deployment code.

The template additions for root_username and root_password at lines 5-6 of components/package-template/src/etc/credentials.template.yaml do not appear to be referenced, loaded, or used anywhere in the codebase. Searches for:

  • Python code reading/parsing these fields
  • Usage of CLP_DB_ROOT_USER or CLP_DB_ROOT_PASS environment variables
  • Credential generation code that handles root credentials
  • Deployment manifests using these values
  • Documentation explaining root credential configuration

all returned no results. The template fields appear to be uncommitted additions without corresponding integration into the credential handling infrastructure.

components/clp-package-utils/clp_package_utils/controller.py (2)

16-21: LGTM!

The new imports correctly support the per-user credential model, bringing in the necessary environment variable constants and the ClpDbUserType enum for type-safe credential access.


146-152: The repository clone operation failed, which prevents me from executing the verification scripts to check database access patterns across the codebase.

Based on the context provided in the review comment:

The original review comment makes a reasonable request for verification - it asks developers to ensure that ClpDbUserType.ROOT is used only for administrative tasks and ClpDbUserType.CLP is used for normal operations. The code snippet shows proper credential separation using enum-based indexing.

However, without access to the repository, I cannot:

  • Verify if database connections throughout the codebase actually use the appropriate user types
  • Confirm whether ROOT credentials are misused in operational code
  • Validate the pattern consistency claims made in the review

Verify that database connections throughout the codebase use appropriate user types.

The credential separation implementation correctly distinguishes between root and CLP user credentials using enum-based indexing. Ensure database connections use the appropriate user type:

  • ClpDbUserType.ROOT for administrative tasks only (database initialization, table creation)
  • ClpDbUserType.CLP for normal application operations

Search the codebase for database connection instantiation and credential usage to confirm proper separation.

tools/yscope-dev-utils (1)

1-1: Based on my verification attempts, I was unable to access sufficient information to validate the concerns raised in the original review comment. The repository clone failed in the sandbox environment, and web searches did not locate the pull request or confirm the commit details.

Final Output

Unable to verify submodule commit and PR context—manual verification required.

The review comment requests verification of a Git submodule pointer update for tools/yscope-dev-utils, but the underlying PR context could not be located or verified through available search and repository access methods. To properly assess this change, please provide:

  1. A direct link to the pull request
  2. Confirmation that tools/yscope-dev-utils is configured as a Git submodule (recent web searches did not find evidence of this)
  3. Documentation of how this submodule update relates to the database credentials refactoring mentioned in the PR summary
components/job-orchestration/job_orchestration/scheduler/utils.py (1)

40-42: Generic create_connection() usage matches multi‑DB support; please confirm socket flag parity

Swapping to sql_adapter.create_connection() here is consistent with the new adapter API and keeps this path DB‑agnostic while still using the CLP DB user by default, which is appropriate for killing hanging jobs.

One thing to double‑check: if the previous code passed a non‑default disable_localhost_socket_connection (e.g., True) into create_mysql_connection, that behaviour is now different because this call relies on the default. If you did previously rely on disabling local socket connections in this path, you should propagate that flag into create_connection explicitly; otherwise this change is fine as‑is.

components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)

16-16: CLP‑user credential lookup for extraction env is correct and consistent

Using credentials[ClpDbUserType.CLP].username/password for CLP_DB_USER_ENV_VAR_NAME and CLP_DB_PASS_ENV_VAR_NAME aligns this script with the new per‑user credential model and ensures extraction runs under the CLP DB user rather than root. Given validate_and_load_config_file calls database.load_credentials_from_env(), the credentials mapping should always be populated before this point. No issues from a correctness or security perspective.

Also applies to: 249-254

components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)

14-14: Per‑user CLP credentials for S3 compression container env look good

The change to derive extra_env_vars from clp_config.database.credentials[ClpDbUserType.CLP] (after validate_and_load_db_credentials_file) correctly scopes the DB access in the compression container to the CLP user and matches the new credential structure. This keeps behaviour aligned with other scripts while avoiding unintended use of the root credentials.

Also applies to: 310-314

components/clp-package-utils/clp_package_utils/scripts/search.py (1)

13-13: Search container now correctly uses CLP DB user credentials

After loading credentials via validate_and_load_db_credentials_file, sourcing CLP_DB_USER_ENV_VAR_NAME and CLP_DB_PASS_ENV_VAR_NAME from credentials[ClpDbUserType.CLP] is exactly what we want: search runs with CLP user privileges and never touches the root account. The change is straightforward and consistent with the rest of the credential refactor.

Also applies to: 138-142

components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)

14-14: Dataset manager correctly uses CLP DB user credentials for container

Mapping CLP_DB_USER_ENV_VAR_NAME / CLP_DB_PASS_ENV_VAR_NAME from clp_config.database.credentials[ClpDbUserType.CLP] (post validate_and_load_db_credentials_file) keeps the dataset manager running with CLP‑scoped privileges while still supporting the new root user in the credentials file. This is aligned with the rest of the tooling and looks good.

Also applies to: 159-163

components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)

14-14: File and stream decompression now consistently use CLP user credentials

Both the file and stream extraction flows now build extra_env_vars from clp_config.database.credentials[ClpDbUserType.CLP] after validate_and_load_db_credentials_file has run, so the native decompression container always authenticates as the CLP user. That’s the right separation of duties relative to the new root account and keeps behaviour consistent across all wrapper scripts.

Also applies to: 136-140, 219-223

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

14-17: Consistent use of per-user CLP DB credentials

Using ClpDbUserType.CLP and the shared credentials mapping to populate CLP_DB_USER_ENV_VAR_NAME / CLP_DB_PASS_ENV_VAR_NAME keeps this script aligned with the new multi-user DB model without changing runtime behaviour. No issues from this change.

Also applies to: 230-234

components/clp-py-utils/clp_py_utils/clp_config.py (1)

199-223: Unfortunately, I'm unable to verify the review comment due to persistent repository cloning failures in the sandbox environment. This prevents me from:

  1. Searching for all call sites of get_clp_connection_params_and_type to confirm whether consumers have been updated to the new nested credentials structure
  2. Checking for any remaining legacy usages of database.username and database.password
  3. Verifying the credentials.yaml structure requirements and whether backward compatibility or default values are provided
  4. Determining if there's documented migration guidance or explicit breaking change handling

The concerns raised in the review comment are technically plausible—changing API return shapes and adding new required configuration keys are indeed potentially breaking changes that warrant verification. However, without access to the actual codebase, I cannot confirm:

  • Whether the changes have been properly propagated to all consumers
  • If a migration path or backward compatibility layer exists
  • Whether this is an intentional, documented breaking change

Verify that all Python-side consumers have been updated to the new shape and confirm any migration strategy with the development team. Review credentials.yaml examples and documentation to ensure existing deployments have clear upgrade guidance.

CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
ClpDbUserType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

CLP credential usage is correct; consider a shared helper for DB env construction

Using credentials[ClpDbUserType.CLP].username/password for CLP_DB_USER_ENV_VAR_NAME / CLP_DB_PASS_ENV_VAR_NAME is consistent with the new per‑user credential model and with how you validate/load credentials earlier in this script, so behaviour here looks correct.

Given the identical pattern now exists in multiple wrappers (compress.py, compress_from_s3.py, search.py, dataset_manager.py, decompress.py, and the native scripts), you may want to factor this into a small helper (e.g., in clp_package_utils.general or a tiny DB‑env utility) that takes a Database or ClpConfig and returns the CLP DB env dict. That would reduce duplication and keep future changes to credential handling in one place.

Also applies to: 256-260

🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/scripts/compress.py lines ~14 and
~256-260: the code repeats constructing CLP DB env vars using
credentials[ClpDbUserType.CLP].username/password across multiple scripts;
extract this duplicated logic into a small helper (e.g.,
clp_package_utils.general or clp_package_utils.db_utils) that accepts the
Database/ClpConfig (or credentials mapping) and returns the CLP DB env dict with
keys CLP_DB_USER_ENV_VAR_NAME and CLP_DB_PASS_ENV_VAR_NAME (and any other shared
DB env entries), then replace the inline construction in compress.py (and the
other listed scripts: compress_from_s3.py, search.py, dataset_manager.py,
decompress.py and native scripts) with calls to that helper to centralize
credential handling and reduce duplication.

from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
ClpDbUserType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Switch to per‑user CLP credentials via Database.model_validate is sound; minor redundancy only

Using Database.model_validate(...).credentials[ClpDbUserType.CLP] here aligns this code with the new per‑user credential model and ensures you always work with a validated config and the CLP DB user, which matches how SqlAdapter is instantiated below. The change looks correct.

The only minor nit is that this re‑validates the same clp_metadata_db_connection_config each time you call _get_db_connection_env_vars_for_clp_cmd. If this helper ends up being called frequently, you could consider threading a Database instance (or just the credentials mapping) through instead of the raw dict to avoid repeated validation, but that’s purely an optional tidy‑up.

Also applies to: 212-216

🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/compress/compression_task.py
around lines 13 and 212-216, the helper _get_db_connection_env_vars_for_clp_cmd
repeatedly calls Database.model_validate(...) to extract
credentials[ClpDbUserType.CLP], causing redundant validation on each call;
change the API so the validated Database (or at least its credentials mapping)
is created once and passed into the helper (or threaded through callers) instead
of the raw dict, update all callers to accept the validated Database/credentials
and use validated.credentials[ClpDbUserType.CLP] inside the helper.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

284-301: Env-based credential loading correctly distinguishes CLP vs ROOT; add a fallback case

Mapping ClpDbUserType.CLP to CLP_DB_USER/CLP_DB_PASS and ClpDbUserType.ROOT to CLP_DB_ROOT_USER/CLP_DB_ROOT_PASS and funnelling reads through _get_env_var gives the right separation and ensures unset env vars fail loudly.

One robustness gap: the match user_type has no default branch, so a future enum value would leave user_env_var / pass_env_var undefined instead of raising a clear error. Consider adding an explicit fallback case:

     match user_type:
         case ClpDbUserType.CLP:
             user_env_var = CLP_DB_USER_ENV_VAR_NAME
             pass_env_var = CLP_DB_PASS_ENV_VAR_NAME
         case ClpDbUserType.ROOT:
             user_env_var = CLP_DB_ROOT_USER_ENV_VAR_NAME
             pass_env_var = CLP_DB_ROOT_PASS_ENV_VAR_NAME
+        case _:
+            raise ValueError(f"Unsupported database user type: {user_type}")

This makes failures explicit if ClpDbUserType is extended later.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf882d and 9c4aad6.

📒 Files selected for processing (3)
  • components/clp-package-utils/clp_package_utils/controller.py (2 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (5 hunks)
  • tools/deployment/package/docker-compose-all.yaml (3 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-02T15:48:58.961Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1368
File: components/clp-mcp-server/clp_mcp_server/__init__.py:11-15
Timestamp: 2025-10-02T15:48:58.961Z
Learning: In the clp-mcp-server component (components/clp-mcp-server/clp_mcp_server/__init__.py), the default host binding of 0.0.0.0 is intentional because the server is designed to be deployed in Docker containers where this binding is necessary to accept external connections.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/controller.py (1)

17-23: Imports for DB credential env vars and user type look consistent

The added imports for CLP_DB_PASS_ENV_VAR_NAME, CLP_DB_ROOT_PASS_ENV_VAR_NAME, CLP_DB_ROOT_USER_ENV_VAR_NAME, CLP_DB_USER_ENV_VAR_NAME, and ClpDbUserType are all used in _set_up_env_for_database and match the new per‑user credential model elsewhere in the codebase. No issues from this file’s perspective.

components/clp-py-utils/clp_py_utils/clp_config.py (4)

71-75: Root DB env var constants align with the new credential model

The addition of CLP_DB_ROOT_USER_ENV_VAR_NAME and CLP_DB_ROOT_PASS_ENV_VAR_NAME is consistent with existing CLP_DB_* naming and matches the docker-compose usage; no issues here.


176-184: ClpDbUserType and DbUserCredentials cleanly model CLP vs ROOT users

Using ClpDbUserType plus a dedicated DbUserCredentials model is a clear way to distinguish CLP and ROOT DB users and keeps the per-user credential handling well-typed and extensible.


197-200: Per-user credentials wiring into connection params and dumps looks correct

Initialising credentials with both ClpDbUserType.CLP and .ROOT, guarding via ensure_credentials_loaded, and then pulling user/password from self.credentials[user_type] into both MySQL and CLP connection params is coherent and keeps the per-user split encapsulated. Excluding credentials in dump_to_primitive_dict avoids leaking secrets in serialised config, which is important for safety.

Also applies to: 202-207, 209-229, 234-256, 258-260


262-279: File-based loading of CLP and ROOT DB credentials is consistent

load_credentials_from_file now populates both CLP and ROOT entries (username/password and root_username/root_password) under the database section and fails fast if any key is missing. That matches the new credentials template and keeps error messages clear when the file is incomplete.

tools/deployment/package/docker-compose-all.yaml (2)

55-56: MYSQL_ROOT_PASSWORD now correctly uses the dedicated root credential

Pointing MYSQL_ROOT_PASSWORD at ${CLP_DB_ROOT_PASS} instead of ${CLP_DB_PASS} cleanly separates the database root password from the CLP application user password and matches the new config model.


473-477: mcp-server DB env now matches the rest of the stack

Requiring CLP_DB_USER and CLP_DB_PASS for mcp-server (with :?Please set a value.) brings its DB authentication in line with the other services and with the new per-user credential model; no further issues spotted here.

@Eden-D-Zhang Eden-D-Zhang changed the title refactor(clp-package): Separate database root user password from clp db user. refactor(clp-package): Refactor Database config to support multiple DB user credentials. Nov 27, 2025
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's mark this change as BREAKING (feat(clp-package)!)

because the sql root user's username and password are no longer the same as the clp-user's

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done! the changes are all clean and correct

we recently enabled more linter checks and i helped write some docstrings so the linter doesn't complain

Let me know if you have any questions

for the title, how about:

feat(clp-package)!: Add support for multiple database user credentials; Use separate root database credentials.

try:
self.username = get_config_value(config, f"{DB_COMPONENT_NAME}.username")
self.password = get_config_value(config, f"{DB_COMPONENT_NAME}.password")
self.credentials[ClpDbUserType.CLP].username = get_config_value(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues

unrelated to the PR - using the get_config_value helper to read a value from a dictionary seems overcomplicating things. why did we create this helper?

Copy link
Member

@junhaoliao junhaoliao Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

briefly discussed offline - the get_config_value value might not be that helpful. we should investigate whether to remove it then

@coderabbitai create an issue to track

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f70670 and 04f1538.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/sql_adapter.py (4 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
📚 Learning: 2025-08-13T15:36:37.998Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/SchemaTree.hpp:35-35
Timestamp: 2025-08-13T15:36:37.998Z
Learning: The user anlowee prefers logical grouping of related enum values for code organization and readability, even when it conflicts with backward compatibility constraints. They intentionally group semantically related types together rather than following append-only patterns.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-04-17T16:55:23.796Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:23.796Z
Learning: In the CLP project, SQL queries should use parameterized queries with placeholders (%s) and pass values as a tuple to `db_cursor.execute()` to prevent SQL injection, rather than directly interpolating values into the query string.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-07-03T20:10:43.789Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-py-utils/clp_py_utils/sql_adapter.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
  • ClpDbUserType (176-180)
  • Database (190-337)
  • DatabaseEngine (117-119)
  • get_mysql_connection_params (220-251)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/sql_adapter.py (4)

1-13: LGTM! Module docstring and imports are appropriate.

The module docstring clearly describes the adapter's purpose, and the imports correctly bring in the refactored credential types (ClpDbUserType, Database, DatabaseEngine) needed to support per-user credentials.


63-67: LGTM! Class and constructor documentation is clear.

The class and constructor docstrings properly document the SqlAdapter's purpose and initialization with the CLP database config model.


124-146: LGTM! MySQL connection creation correctly handles user_type.

The private _create_mysql_connection method properly passes user_type to get_mysql_connection_params and maintains the existing error handling for access denied and bad database errors.


148-163: LGTM! MariaDB connection creation correctly handles user_type.

The private _create_mariadb_connection method properly passes user_type to get_mysql_connection_params and includes appropriate error handling.

@Eden-D-Zhang Eden-D-Zhang changed the title refactor(clp-package): Refactor Database config to support multiple DB user credentials. refactor(clp-package)!: Refactor Database config to support multiple DB user credentials. Nov 28, 2025
CLP_DB_USER: "${CLP_DB_USER}"
CLP_DB_PASS: "${CLP_DB_PASS}"
CLP_DB_PASS: "${CLP_DB_PASS:?Please set a value.}"
CLP_DB_USER: "${CLP_DB_USER:?Please set a value.}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the current PR - CLP_DB_PASS should be an optional env as well (default to "clp-user")

@coderabbitai please create an issue to track

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track making CLP_DB_PASS optional

gh issue create \
  --repo y-scope/clp \
  --title "Make CLP_DB_PASS environment variable optional with default value" \
  --body "## Description

Make the \`CLP_DB_PASS\` environment variable optional with a default value of \"clp-user\" in the Docker Compose configuration.

## Context

This was identified during review of PR #1655 (comment: https://github.com/y-scope/clp/pull/1655#discussion_r2572465293).

Currently, \`CLP_DB_PASS\` is a required environment variable. Making it optional with a sensible default would improve the developer experience for local deployments.

## Requested by

@junhaoliao" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1691 to track making CLP_DB_PASS an optional environment variable with a default value of "clp-user": #1691


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/general.py:0-0
Timestamp: 2025-08-19T14:41:28.901Z
Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

@Eden-D-Zhang Eden-D-Zhang changed the title refactor(clp-package)!: Refactor Database config to support multiple DB user credentials. feat(clp-package)!: Add support for multiple database user credentials; Use separate root database credentials. Nov 28, 2025
@Eden-D-Zhang Eden-D-Zhang merged commit cb2505d into y-scope:main Nov 28, 2025
21 checks passed
Comment on lines +88 to +89
CLP_DB_ROOT_PASS: "${CLP_DB_ROOT_PASS:?Please set a value.}"
CLP_DB_ROOT_USER: "${CLP_DB_ROOT_USER:-root}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although these are not yet used, we should plan for integration with the Helm chart, which is work in progress. For more details, see #1309 for the feature request and #1603 for a prototype.

@coderabbitai create an issue to track this and assign @junhaoliao to work on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track integration of root DB credentials with Helm chart

gh issue create \
  --repo y-scope/clp \
  --title "Integrate root database credentials with Helm chart deployment" \
  --body "## Context

PR #1655 introduces support for multiple database user credentials, specifically adding root user credentials (\`CLP_DB_ROOT_USER\` and \`CLP_DB_ROOT_PASS\`) to the database configuration. These new credentials are currently integrated into the Docker Compose deployment but need to be incorporated into the Helm chart deployment as well.

## Related Work

- Feature request: #1309
- Prototype: #1603
- Originating PR: #1655

## Task

Integrate the new root database credential environment variables (\`CLP_DB_ROOT_USER\` and \`CLP_DB_ROOT_PASS\`) into the Helm chart deployment configuration, ensuring consistency with the Docker Compose implementation.

## Reference

This issue was created following discussion in PR #1655:
https://github.com/y-scope/clp/pull/1655#discussion_r2572875618

Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1697 to track the integration of the new root database credentials (CLP_DB_ROOT_USER and CLP_DB_ROOT_PASS) with the Helm chart deployment. The issue references PR #1655, #1309, and #1603, and has been assigned to you.

Issue link: #1697


🐰✨


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants