-
Notifications
You must be signed in to change notification settings - Fork 83
feat(clp-package)!: Add support for multiple database user credentials; Use separate root database credentials. #1655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReplace single DB username/password with per-user credentials (CLP and ROOT) across config, SQL adapter, package utilities, scripts, docker-compose and callers; add ClpDbUserType enum and per-user env constants; update APIs and call sites to select credentials by user type. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
71-79: Root password integration is coherent, but avoid leaking it via config dumpsThe new
CLP_DB_ROOT_PASS_ENV_VAR_NAME,root_passwordfield,has_root_password(), and the file/env loading logic form a consistent, optional root-cred path. Tolerating missingroot_passwordviaKeyError/ValueErrorhandling is a good choice.However,
Database.dump_to_primitive_dict()still only excludesusernameandpassword, soroot_passwordwill now be serialised into any dumped config (e.g., the shared container config written into logs). That weakens the existing “don’t dump DB creds” behaviour specifically for the most privileged credential.I’d strongly recommend excluding
root_passwordhere as well, mirroring the treatment of the other secrets:def dump_to_primitive_dict(self): - d = self.model_dump(exclude={"username", "password"}) + d = self.model_dump(exclude={"username", "password", "root_password"}) return dThis keeps root credentials confined to the credentials file and env vars, and avoids unnecessarily broad exposure inside containers.
Also applies to: 165-180, 233-240, 253-257, 265-269
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/clp-package-utils/clp_package_utils/controller.py(1 hunks)components/clp-package-utils/clp_package_utils/general.py(1 hunks)components/clp-py-utils/clp_py_utils/clp_config.py(4 hunks)components/package-template/src/etc/credentials.template.yaml(1 hunks)tools/deployment/package/docker-compose-all.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.pycomponents/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
🧬 Code graph analysis (2)
components/clp-package-utils/clp_package_utils/controller.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
has_root_password(233-239)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (1)
get_config_value(28-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (5)
components/package-template/src/etc/credentials.template.yaml (1)
1-10: Template addition forroot_passwordis consistentThe commented
root_passwordexample matches the new credentials field and helps discoverability. No issues here.tools/deployment/package/docker-compose-all.yaml (2)
49-54: Confirm migration story for newCLP_DB_ROOT_PASSrequirementSwitching
MYSQL_ROOT_PASSWORDto${CLP_DB_ROOT_PASS:?Please set a value.}cleanly separates root and user passwords, but it also makesCLP_DB_ROOT_PASSmandatory for this compose file. For existing deployments whosecredentials.yamllacksroot_password, the generated.envwill omitCLP_DB_ROOT_PASSand compose will now fail fast.If that’s intentional, please make sure the upgrade path and requirement to add/generate a
root_password(or re‑generate credentials) is clearly documented. If you want smoother backcompat, you might consider a fallback toCLP_DB_PASSat the config/env layer instead.
80-87: PropagatingCLP_DB_ROOT_PASStodb-table-creatorlooks correctWiring
CLP_DB_ROOT_PASSintodb-table-creatoralongsideCLP_DB_USER/CLP_DB_PASSmatches the new model. Just ensureclp_py_utils.create-db-tablesis updated to readCLP_DB_ROOT_PASS(and not implicitly rely onCLP_DB_PASS) so this env var is actually honoured.components/clp-py-utils/clp_py_utils/clp_config.py (1)
241-257: Optional: decide whether to defaultroot_passwordto user password for legacy configs
load_credentials_from_fileandload_credentials_from_envnow makeroot_passwordtruly optional (swallowing missing keys/vars), while the compose layer can requireCLP_DB_ROOT_PASS. For oldercredentials.yamlfiles withoutdatabase.root_password, this means:
Database.root_passwordwill stayNone.has_root_password()is false.- No
CLP_DB_ROOT_PASSgets written to.env, yet docker-compose currently insists on it.If you want smoother upgrades while still allowing a distinct root password, you could consider a compatibility default such as:
- On load, if
root_passwordis absent, default it topassword.- Or introduce a small helper that derives
CLP_DB_ROOT_PASSfromCLP_DB_PASSwhenhas_root_password()is false, and phase that out later.If the hard requirement for a separate root password is intentional, documenting that explicitly (and maybe providing a migration helper) would help avoid surprises.
Also applies to: 265-269
components/clp-package-utils/clp_package_utils/controller.py (1)
140-150: ConditionalCLP_DB_ROOT_PASSexport is correct; verify other launch pathsConditionally adding
CLP_DB_ROOT_PASSwhendatabase.has_root_password()is true fits the new model and avoids writing a bogus env var when no root cred is configured. Combined with the.envwriter’s skip‑Nonelogic, this is sound.If there are any non–docker‑compose flows that start the DB initialisation / table‑creation logic (e.g., via
generate_container_start_cmd+get_credential_env_vars_list), it would be worth checking that they also propagate the root password where needed, or that they intentionally rely only on the regular DB user.
| def generate_credentials_file(credentials_file_path: pathlib.Path): | ||
| credentials = { | ||
| DB_COMPONENT_NAME: {"username": "clp-user", "password": secrets.token_urlsafe(8)}, | ||
| DB_COMPONENT_NAME: { | ||
| "username": "clp-user", | ||
| "password": secrets.token_urlsafe(8), | ||
| "root_password": secrets.token_urlsafe(8), | ||
| }, | ||
| QUEUE_COMPONENT_NAME: {"username": "clp-user", "password": secrets.token_urlsafe(8)}, | ||
| REDIS_COMPONENT_NAME: {"password": secrets.token_urlsafe(16)}, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Generating a distinct root_password looks good
Emitting a separate root_password with secrets.token_urlsafe(8) aligns with the new root-credential model and keeps it independent from the user password. If you ever revisit password policy, you might choose a larger token size for the root password, but this is acceptable and consistent with the existing DB password generation.
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 452 to
461, the code already emits a distinct "root_password" using
secrets.token_urlsafe(8); leave this separate root credential generation as-is
to match the new root-credential model, no code changes required now, but if you
revisit password policy later consider increasing the token length for
root_password (e.g., token_urlsafe(12-16)).
Exclude root_password from config serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (16)
components/clp-package-utils/clp_package_utils/controller.py(2 hunks)components/clp-package-utils/clp_package_utils/general.py(1 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/native/decompress.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py(2 hunks)components/clp-py-utils/clp_py_utils/clp_config.py(5 hunks)components/clp-py-utils/clp_py_utils/sql_adapter.py(4 hunks)components/job-orchestration/job_orchestration/executor/compress/compression_task.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/utils.py(1 hunks)components/package-template/src/etc/credentials.template.yaml(1 hunks)tools/deployment/package/docker-compose-all.yaml(3 hunks)tools/yscope-dev-utils(1 hunks)
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-07-25T21:29:48.947Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.
Applied to files:
tools/yscope-dev-utils
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/controller.pycomponents/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/clp-package-utils/clp_package_utils/scripts/search.pycomponents/clp-package-utils/clp_package_utils/scripts/archive_manager.pycomponents/clp-package-utils/clp_package_utils/scripts/dataset_manager.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-py-utils/clp_py_utils/clp_config.pycomponents/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2024-11-15T16:21:52.122Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-07-03T12:58:18.407Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.pycomponents/clp-py-utils/clp_py_utils/clp_config.pycomponents/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-02T15:48:58.961Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1368
File: components/clp-mcp-server/clp_mcp_server/__init__.py:11-15
Timestamp: 2025-10-02T15:48:58.961Z
Learning: In the clp-mcp-server component (components/clp-mcp-server/clp_mcp_server/__init__.py), the default host binding of 0.0.0.0 is intentional because the server is designed to be deployed in Docker containers where this binding is necessary to accept external connections.
Applied to files:
tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/compress.py
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/compress.py
🧬 Code graph analysis (10)
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/job-orchestration/job_orchestration/scheduler/utils.py (1)
components/clp-py-utils/clp_py_utils/sql_adapter.py (2)
create_connection(105-115)create_connection(123-124)
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (2)
read_yaml_config_file(58-64)get_config_value(28-42)
components/clp-package-utils/clp_package_utils/controller.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpDbUserType(166-168)
🔇 Additional comments (17)
tools/deployment/package/docker-compose-all.yaml (3)
52-52: Separate root password from CLP user credentials in database service.The
MYSQL_ROOT_PASSWORDnow correctly sources fromCLP_DB_ROOT_PASSinstead ofCLP_DB_PASS, establishing separate credential scopes. The required-value syntax enforces presence at runtime.
463-464: Upgrade mcp-server with required-value validation for database credentials.The
CLP_DB_PASSandCLP_DB_USERenvironment variables now enforce presence via the ":?Please set a value." pattern, improving consistency with other services and catching missing configuration at startup time.
85-86: Unable to complete automated verification due to repository access issues.The repository clone operation failed multiple times, preventing me from searching the codebase for how the
clp_py_utils.create-db-tablesmodule reads theCLP_DB_ROOT_USERandCLP_DB_ROOT_PASSenvironment variables.The original review comment's concern is reasonable and valid: the environment variables are being set in the docker-compose file with required-value validation, but without accessing the Python module implementation, I cannot confirm whether:
- The module actually reads these variables from the environment
- The variables are properly passed to database connection logic
- There are any issues with how they're consumed
Manual verification of the
clp_py_utils.create-db-tablesmodule implementation is needed to resolve this concern.components/clp-package-utils/clp_package_utils/general.py (1)
452-459: Separate root credentials in the default template look consistent and correctAdding
root_username/root_passwordunderDB_COMPONENT_NAMEensures newly generated credentials files carry both user and root credentials, which aligns with the new root/CLP user split and the existingvalidate_and_load_db_credentials_fileflow. No functional or structural issues here from this file’s perspective.components/package-template/src/etc/credentials.template.yaml (2)
5-6: Template format and naming are consistent with existing patterns.The new root credentials follow the same structure as the existing username/password fields, using appropriate naming conventions (
root_username,root_password) and default example values.
5-6: I'll verify the review comment by checking if the credential generation code and deployment manifests properly handle the root credentials mentioned in the template. Let me start by exploring the codebase.
<function_calls>
#!/bin/bashFirst, let's understand the repository structure and locate credential-related files
echo "=== Repository structure ==="
git ls-files --others --exclude-standard | head -20
echo ""
echo "=== Key directories ==="
find . -maxdepth 2 -type d -name "credential" -o -type d -name "config" -o -type d -name "deploy" 2>/dev/null | head -20
echo ""
echo "=== Files matching credential/config patterns ==="
fd -e yaml -e yml -e py -e sh -e json | grep -i credential | head -20
</function_calls><function_calls>
#!/bin/bashCheck the target template file
echo "=== Target template file ==="
cat -n components/package-template/src/etc/credentials.template.yaml
echo ""Find all credential-related files
echo "=== All credential-related files ==="
fd credentials -t fecho ""
echo "=== All YAML template files ==="
fd -e template.yaml -e template.yml | head -20
</function_calls><function_calls>
#!/bin/bashSearch for credential generation code that might read the template
echo "=== Searching for Python code handling root_username/root_password ==="
rg -nP '(root_username|root_password)' --type=py -C 3echo ""
echo "=== Searching for credential generation/loading code ==="
rg -nP 'credentials.*template|template.*credentials' --type=py -C 2echo ""
echo "=== Searching for CLP_DB_ROOT environment variables ==="
rg -nP 'CLP_DB_ROOT_(USER|PASS)' -C 2
</function_calls><function_calls>
#!/bin/bashSearch in deployment files
echo "=== Docker Compose and deployment files ==="
fd -e docker-compose.yaml -e docker-compose.yml -e Dockerfile -e "*.sh" | head -20echo ""
echo "=== Search for root_username/root_password in all files ==="
rg -nP '(root_username|root_password)' -C 2echo ""
echo "=== Search for CLP_DB_ROOT in all files ==="
rg -nP 'CLP_DB_ROOT' -C 2
</function_calls><function_calls>
#!/bin/bashLook for documentation and README files
echo "=== README and documentation files ==="
fd -e md -e rst -e txt -i readmeecho ""
echo "=== Search for root credential documentation ==="
rg -nP '(root.*credential|root.*user|root.*pass)' --type=md -C 2echo ""
echo "=== Check credentials.template.yaml content ==="
wc -l components/package-template/src/etc/credentials.template.yaml
cat components/package-template/src/etc/credentials.template.yaml
</function_calls><function_calls>
#!/bin/bashSearch for code that might load or process this template
echo "=== Search for YAML loading in Python code ==="
rg -nP 'yaml.load|yaml.safe_load' --type=py -C 3echo ""
echo "=== Search for credentials file loading ==="
rg -nP 'credentials.*.yaml|.yaml.*credentials' --type=py -C 3echo ""
echo "=== Find Python files in the same directory as template ==="
fd -t f components/package-template/src/etc/ | head -20
</function_calls><function_calls>
#!/bin/bashSearch for database-related Python code
echo "=== Search for database initialization code ==="
rg -nP '(database|db|mysql|mariadb).credential|credential.(database|db|mysql|mariadb)' --type=py -C 3 | head -50echo ""
echo "=== Find Python package initialization in package-template ==="
fd -t f '.py$' components/package-template/ 2>/dev/null | head -20echo ""
echo "=== Check for docker-compose files ==="
fd -i docker-compose | xargs wc -l 2>/dev/null
</function_calls><function_calls>
#!/bin/bashComprehensive search for credential usage patterns
echo "=== All Python files in components/package-template ==="
find components/package-template -name "*.py" -type f | head -20echo ""
echo "=== Files in etc/ directory of package-template ==="
ls -la components/package-template/src/etc/ 2>/dev/null || find components/package-template -name "etc" -type decho ""
echo "=== Search for any database root user setup ==="
rg -nP '(root|ROOT).user|user.(root|ROOT)' --type=py --type=sh -C 2 | head -60
</function_calls><function_calls>
#!/bin/bashDirect content check
echo "=== Full content of credentials template ==="
cat components/package-template/src/etc/credentials.template.yamlecho ""
echo "=== All YAML files in components/package-template ==="
find components/package-template -name ".yaml" -o -name ".yml" | xargs wc -l 2>/dev/nullecho ""
echo "=== Check for docker-compose in root or containers ==="
find . -maxdepth 3 -name "docker-compose*" -type f | head -10
</function_calls><function_calls>
#!/bin/bashBroader search for credential usage
echo "=== All references to root_username anywhere ==="
rg 'root_username' -C 3echo ""
echo "=== All references to root_password anywhere ==="
rg 'root_password' -C 3echo ""
echo "=== All references to CLP_DB_ROOT anywhere ==="
rg 'CLP_DB_ROOT' -C 3
</function_calls><function_calls>
#!/bin/bashSearch for credential processing Python code
echo "=== Python files that might process credentials ==="
fd -e setup.py -e init.py -e config.py -e credential*.pyecho ""
echo "=== Search for "credentials" file pattern in any context ==="
rg -n 'credentials.template.yaml' -C 2echo ""
echo "=== Check git history for this file ==="
git log --oneline -n 20 -- components/package-template/src/etc/credentials.template.yaml 2>/dev/null || echo "No git history available"echo ""
echo "=== Current diff of this file in PR ==="
git diff HEAD -- components/package-template/src/etc/credentials.template.yaml 2>/dev/null || echo "No git diff available"
</function_calls><function_calls>
#!/bin/bashMore targeted search for where credentials are actually used
echo "=== Search for files that mention 'credentials' ==="
rg -l 'credentials' | head -30echo ""
echo "=== Search for environment variable references to DB credentials ==="
rg -nP '(CLP_DB_|DB_|DATABASE_)' --type=py --type=sh | grep -iE '(user|pass|password|root)' | head -20echo ""
echo "=== Files in package-template that might initialize DB ==="
find components/package-template -type f ( -name ".py" -o -name ".sh" -o -name "*.yaml" ) | xargs ls -lh 2>/dev/null | head -20
</function_calls><function_calls>
#!/bin/bashLet's look at specific files that likely handle credentials
echo "=== Files in components/package-template/src ==="
find components/package-template/src -type f | head -30echo ""
echo "=== Search for any Python class or function related to credentials ==="
ast-grep --pattern 'class $_ {
$$$
credential
$$$
}'echo ""
echo "=== Check if credentials.template.yaml is referenced in Python ==="
rg 'credentials.template' -C 3
</function_calls><function_calls>
#!/bin/bashDirect listing of package-template structure
echo "=== Full directory tree of components/package-template ==="
find components/package-template -type f | sortecho ""
echo "=== Count of each file type ==="
find components/package-template -type f | sed 's/.*.//' | sort | uniq -c
</function_calls><function_calls>
#!/bin/bashCheck for any reference to the template file being used
echo "=== All references to 'template' in the codebase ==="
rg -n '.template.' -C 2 | head -60echo ""
echo "=== All references to copying or using config templates ==="
rg -nP 'template.*copy|copy.*template|setup.*template' --type=py --type=sh -C 2echo ""
echo "=== Check the template file itself for any clues about usage ==="
file components/package-template/src/etc/credentials.template.yaml
cat -n components/package-template/src/etc/credentials.template.yaml
</function_calls><function_calls>
#!/bin/bashSearch for setup/initialization in the entire repository
echo "=== Search for 'setup' or 'init' related code ==="
rg -n '(setup|initialize|init|config).*credential' --type=py --type=sh | head -20echo ""
echo "=== Docker-related files ==="
find . -maxdepth 3 ( -name "Dockerfile*" -o -name "docker-compose*" -o -name ".dockerignore" ) -type fecho ""
echo "=== Search in deployment or setup scripts ==="
fd -e setup.py -e setup.sh -o -path "/deploy/" -o -path "/docker/" | head -20
</function_calls><function_calls>
#!/bin/bashMore comprehensive search
echo "=== All Python files in repo ==="
fd -e setup.py -e init.py -e main.py -e config.pyecho ""
echo "=== Check for MariaDB or MySQL setup code ==="
rg -n '(mariadb|mysql|database|db.*init)' --type=py -i -C 2 | head -40echo ""
echo "=== Check for environment variable setup code ==="
rg -n 'environ[|os.getenv|getenv' --type=py -C 2 | head -40
</function_calls><function_calls>
#!/bin/bashSearch for any environment variables starting with CLP
echo "=== All CLP_ environment variable references ==="
rg -nP 'CLP_[A-Z_]+' --type=py --type=sh --type=yaml -C 1 | head -80echo ""
echo "=== Check README or documentation ==="
cat README.md 2>/dev/null | head -100 || echo "No README.md found"echo ""
echo "=== List all .md files ==="
fd -e md | head -20
</function_calls><function_calls>
#!/bin/bashGet a focused view of the relevant code
echo "=== All environment variables containing DB or ROOT ==="
rg -nP '(DB_.*ROOT|ROOT.*DB|CLP_DB)' --type=py --type=sh --type=yamlecho ""
echo "=== Look for where CLP_DB variables are defined or used ==="
rg -nP '\bCLP_DB_' --type=py --type=sh --type=yaml -C 2 | head -100echo ""
echo "=== Search for docs about database setup ==="
rg -n '(database|root.*user|root.*pass)' --type=md -i
</function_calls><function_calls>
#!/bin/bashFinal comprehensive check
echo "=== Git status to see what changed ==="
git status --short 2>/dev/null | head -20 || echo "Not a git repo or no changes"echo ""
echo "=== Check the actual PR/changes to credentials.template.yaml ==="
git diff HEAD components/package-template/src/etc/credentials.template.yaml 2>/dev/null || git show HEAD:components/package-template/src/etc/credentials.template.yaml 2>/dev/null || cat components/package-template/src/etc/credentials.template.yamlecho ""
echo "=== Count all references to root_username and root_password in repo ==="
echo "root_username count: $(rg 'root_username' -c --color=never | wc -l)"
echo "root_password count: $(rg 'root_password' -c --color=never | wc -l)"
echo "CLP_DB_ROOT count: $(rg 'CLP_DB_ROOT' -c --color=never | wc -l)"
</function_calls><function_calls>
#!/bin/bashGet more context about credentials file
echo "=== Full content of credentials.template.yaml ==="
cat components/package-template/src/etc/credentials.template.yamlecho ""
echo "=== Any other template files to compare ==="
find . -name ".template." -o -name "template.yaml" | head -20echo ""
echo "=== Search for any code that might parse this template ==="
rg -n 'yaml|template|credential' components/package-template/ --type=py
</function_calls>Based on my comprehensive verification of the codebase, I can now provide the rewritten review comment:
No evidence found that root credentials are integrated into credential generation or deployment code.
The template additions for
root_usernameandroot_passwordat lines 5-6 ofcomponents/package-template/src/etc/credentials.template.yamldo not appear to be referenced, loaded, or used anywhere in the codebase. Searches for:
- Python code reading/parsing these fields
- Usage of
CLP_DB_ROOT_USERorCLP_DB_ROOT_PASSenvironment variables- Credential generation code that handles root credentials
- Deployment manifests using these values
- Documentation explaining root credential configuration
all returned no results. The template fields appear to be uncommitted additions without corresponding integration into the credential handling infrastructure.
components/clp-package-utils/clp_package_utils/controller.py (2)
16-21: LGTM!The new imports correctly support the per-user credential model, bringing in the necessary environment variable constants and the
ClpDbUserTypeenum for type-safe credential access.
146-152: The repository clone operation failed, which prevents me from executing the verification scripts to check database access patterns across the codebase.Based on the context provided in the review comment:
The original review comment makes a reasonable request for verification - it asks developers to ensure that
ClpDbUserType.ROOTis used only for administrative tasks andClpDbUserType.CLPis used for normal operations. The code snippet shows proper credential separation using enum-based indexing.However, without access to the repository, I cannot:
- Verify if database connections throughout the codebase actually use the appropriate user types
- Confirm whether ROOT credentials are misused in operational code
- Validate the pattern consistency claims made in the review
Verify that database connections throughout the codebase use appropriate user types.
The credential separation implementation correctly distinguishes between root and CLP user credentials using enum-based indexing. Ensure database connections use the appropriate user type:
ClpDbUserType.ROOTfor administrative tasks only (database initialization, table creation)ClpDbUserType.CLPfor normal application operationsSearch the codebase for database connection instantiation and credential usage to confirm proper separation.
tools/yscope-dev-utils (1)
1-1: Based on my verification attempts, I was unable to access sufficient information to validate the concerns raised in the original review comment. The repository clone failed in the sandbox environment, and web searches did not locate the pull request or confirm the commit details.Final Output
Unable to verify submodule commit and PR context—manual verification required.
The review comment requests verification of a Git submodule pointer update for
tools/yscope-dev-utils, but the underlying PR context could not be located or verified through available search and repository access methods. To properly assess this change, please provide:
- A direct link to the pull request
- Confirmation that
tools/yscope-dev-utilsis configured as a Git submodule (recent web searches did not find evidence of this)- Documentation of how this submodule update relates to the database credentials refactoring mentioned in the PR summary
components/job-orchestration/job_orchestration/scheduler/utils.py (1)
40-42: Genericcreate_connection()usage matches multi‑DB support; please confirm socket flag paritySwapping to
sql_adapter.create_connection()here is consistent with the new adapter API and keeps this path DB‑agnostic while still using the CLP DB user by default, which is appropriate for killing hanging jobs.One thing to double‑check: if the previous code passed a non‑default
disable_localhost_socket_connection(e.g.,True) intocreate_mysql_connection, that behaviour is now different because this call relies on the default. If you did previously rely on disabling local socket connections in this path, you should propagate that flag intocreate_connectionexplicitly; otherwise this change is fine as‑is.components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
16-16: CLP‑user credential lookup for extraction env is correct and consistentUsing
credentials[ClpDbUserType.CLP].username/passwordforCLP_DB_USER_ENV_VAR_NAMEandCLP_DB_PASS_ENV_VAR_NAMEaligns this script with the new per‑user credential model and ensures extraction runs under the CLP DB user rather than root. Givenvalidate_and_load_config_filecallsdatabase.load_credentials_from_env(), thecredentialsmapping should always be populated before this point. No issues from a correctness or security perspective.Also applies to: 249-254
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
14-14: Per‑user CLP credentials for S3 compression container env look goodThe change to derive
extra_env_varsfromclp_config.database.credentials[ClpDbUserType.CLP](aftervalidate_and_load_db_credentials_file) correctly scopes the DB access in the compression container to the CLP user and matches the new credential structure. This keeps behaviour aligned with other scripts while avoiding unintended use of the root credentials.Also applies to: 310-314
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
13-13: Search container now correctly uses CLP DB user credentialsAfter loading credentials via
validate_and_load_db_credentials_file, sourcingCLP_DB_USER_ENV_VAR_NAMEandCLP_DB_PASS_ENV_VAR_NAMEfromcredentials[ClpDbUserType.CLP]is exactly what we want: search runs with CLP user privileges and never touches the root account. The change is straightforward and consistent with the rest of the credential refactor.Also applies to: 138-142
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
14-14: Dataset manager correctly uses CLP DB user credentials for containerMapping
CLP_DB_USER_ENV_VAR_NAME/CLP_DB_PASS_ENV_VAR_NAMEfromclp_config.database.credentials[ClpDbUserType.CLP](postvalidate_and_load_db_credentials_file) keeps the dataset manager running with CLP‑scoped privileges while still supporting the new root user in the credentials file. This is aligned with the rest of the tooling and looks good.Also applies to: 159-163
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
14-14: File and stream decompression now consistently use CLP user credentialsBoth the file and stream extraction flows now build
extra_env_varsfromclp_config.database.credentials[ClpDbUserType.CLP]aftervalidate_and_load_db_credentials_filehas run, so the native decompression container always authenticates as the CLP user. That’s the right separation of duties relative to the new root account and keeps behaviour consistent across all wrapper scripts.Also applies to: 136-140, 219-223
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
14-17: Consistent use of per-user CLP DB credentialsUsing
ClpDbUserType.CLPand the sharedcredentialsmapping to populateCLP_DB_USER_ENV_VAR_NAME/CLP_DB_PASS_ENV_VAR_NAMEkeeps this script aligned with the new multi-user DB model without changing runtime behaviour. No issues from this change.Also applies to: 230-234
components/clp-py-utils/clp_py_utils/clp_config.py (1)
199-223: Unfortunately, I'm unable to verify the review comment due to persistent repository cloning failures in the sandbox environment. This prevents me from:
- Searching for all call sites of
get_clp_connection_params_and_typeto confirm whether consumers have been updated to the new nestedcredentialsstructure- Checking for any remaining legacy usages of
database.usernameanddatabase.password- Verifying the credentials.yaml structure requirements and whether backward compatibility or default values are provided
- Determining if there's documented migration guidance or explicit breaking change handling
The concerns raised in the review comment are technically plausible—changing API return shapes and adding new required configuration keys are indeed potentially breaking changes that warrant verification. However, without access to the actual codebase, I cannot confirm:
- Whether the changes have been properly propagated to all consumers
- If a migration path or backward compatibility layer exists
- Whether this is an intentional, documented breaking change
Verify that all Python-side consumers have been updated to the new shape and confirm any migration strategy with the development team. Review credentials.yaml examples and documentation to ensure existing deployments have clear upgrade guidance.
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH, | ||
| CLP_DEFAULT_DATASET_NAME, | ||
| ClpDbUserType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
CLP credential usage is correct; consider a shared helper for DB env construction
Using credentials[ClpDbUserType.CLP].username/password for CLP_DB_USER_ENV_VAR_NAME / CLP_DB_PASS_ENV_VAR_NAME is consistent with the new per‑user credential model and with how you validate/load credentials earlier in this script, so behaviour here looks correct.
Given the identical pattern now exists in multiple wrappers (compress.py, compress_from_s3.py, search.py, dataset_manager.py, decompress.py, and the native scripts), you may want to factor this into a small helper (e.g., in clp_package_utils.general or a tiny DB‑env utility) that takes a Database or ClpConfig and returns the CLP DB env dict. That would reduce duplication and keep future changes to credential handling in one place.
Also applies to: 256-260
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/scripts/compress.py lines ~14 and
~256-260: the code repeats constructing CLP DB env vars using
credentials[ClpDbUserType.CLP].username/password across multiple scripts;
extract this duplicated logic into a small helper (e.g.,
clp_package_utils.general or clp_package_utils.db_utils) that accepts the
Database/ClpConfig (or credentials mapping) and returns the CLP DB env dict with
keys CLP_DB_USER_ENV_VAR_NAME and CLP_DB_PASS_ENV_VAR_NAME (and any other shared
DB env entries), then replace the inline construction in compress.py (and the
other listed scripts: compress_from_s3.py, search.py, dataset_manager.py,
decompress.py and native scripts) with calls to that helper to centralize
credential handling and reduce duplication.
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| ClpDbUserType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Switch to per‑user CLP credentials via Database.model_validate is sound; minor redundancy only
Using Database.model_validate(...).credentials[ClpDbUserType.CLP] here aligns this code with the new per‑user credential model and ensures you always work with a validated config and the CLP DB user, which matches how SqlAdapter is instantiated below. The change looks correct.
The only minor nit is that this re‑validates the same clp_metadata_db_connection_config each time you call _get_db_connection_env_vars_for_clp_cmd. If this helper ends up being called frequently, you could consider threading a Database instance (or just the credentials mapping) through instead of the raw dict to avoid repeated validation, but that’s purely an optional tidy‑up.
Also applies to: 212-216
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/compress/compression_task.py
around lines 13 and 212-216, the helper _get_db_connection_env_vars_for_clp_cmd
repeatedly calls Database.model_validate(...) to extract
credentials[ClpDbUserType.CLP], causing redundant validation on each call;
change the API so the validated Database (or at least its credentials mapping)
is created once and passed into the helper (or threaded through callers) instead
of the raw dict, update all callers to accept the validated Database/credentials
and use validated.credentials[ClpDbUserType.CLP] inside the helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
284-301: Env-based credential loading correctly distinguishes CLP vs ROOT; add a fallback caseMapping
ClpDbUserType.CLPtoCLP_DB_USER/CLP_DB_PASSandClpDbUserType.ROOTtoCLP_DB_ROOT_USER/CLP_DB_ROOT_PASSand funnelling reads through_get_env_vargives the right separation and ensures unset env vars fail loudly.One robustness gap: the
match user_typehas no default branch, so a future enum value would leaveuser_env_var/pass_env_varundefined instead of raising a clear error. Consider adding an explicit fallback case:match user_type: case ClpDbUserType.CLP: user_env_var = CLP_DB_USER_ENV_VAR_NAME pass_env_var = CLP_DB_PASS_ENV_VAR_NAME case ClpDbUserType.ROOT: user_env_var = CLP_DB_ROOT_USER_ENV_VAR_NAME pass_env_var = CLP_DB_ROOT_PASS_ENV_VAR_NAME + case _: + raise ValueError(f"Unsupported database user type: {user_type}")This makes failures explicit if
ClpDbUserTypeis extended later.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/clp-package-utils/clp_package_utils/controller.py(2 hunks)components/clp-py-utils/clp_py_utils/clp_config.py(5 hunks)tools/deployment/package/docker-compose-all.yaml(3 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-02T15:48:58.961Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1368
File: components/clp-mcp-server/clp_mcp_server/__init__.py:11-15
Timestamp: 2025-10-02T15:48:58.961Z
Learning: In the clp-mcp-server component (components/clp-mcp-server/clp_mcp_server/__init__.py), the default host binding of 0.0.0.0 is intentional because the server is designed to be deployed in Docker containers where this binding is necessary to accept external connections.
Applied to files:
tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.pycomponents/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/clp-package-utils/clp_package_utils/controller.py (1)
17-23: Imports for DB credential env vars and user type look consistentThe added imports for
CLP_DB_PASS_ENV_VAR_NAME,CLP_DB_ROOT_PASS_ENV_VAR_NAME,CLP_DB_ROOT_USER_ENV_VAR_NAME,CLP_DB_USER_ENV_VAR_NAME, andClpDbUserTypeare all used in_set_up_env_for_databaseand match the new per‑user credential model elsewhere in the codebase. No issues from this file’s perspective.components/clp-py-utils/clp_py_utils/clp_config.py (4)
71-75: Root DB env var constants align with the new credential modelThe addition of
CLP_DB_ROOT_USER_ENV_VAR_NAMEandCLP_DB_ROOT_PASS_ENV_VAR_NAMEis consistent with existingCLP_DB_*naming and matches the docker-compose usage; no issues here.
176-184: ClpDbUserType and DbUserCredentials cleanly model CLP vs ROOT usersUsing
ClpDbUserTypeplus a dedicatedDbUserCredentialsmodel is a clear way to distinguish CLP and ROOT DB users and keeps the per-user credential handling well-typed and extensible.
197-200: Per-user credentials wiring into connection params and dumps looks correctInitialising
credentialswith bothClpDbUserType.CLPand.ROOT, guarding viaensure_credentials_loaded, and then pullinguser/passwordfromself.credentials[user_type]into both MySQL and CLP connection params is coherent and keeps the per-user split encapsulated. Excludingcredentialsindump_to_primitive_dictavoids leaking secrets in serialised config, which is important for safety.Also applies to: 202-207, 209-229, 234-256, 258-260
262-279: File-based loading of CLP and ROOT DB credentials is consistent
load_credentials_from_filenow populates both CLP and ROOT entries (username/passwordandroot_username/root_password) under thedatabasesection and fails fast if any key is missing. That matches the new credentials template and keeps error messages clear when the file is incomplete.tools/deployment/package/docker-compose-all.yaml (2)
55-56: MYSQL_ROOT_PASSWORD now correctly uses the dedicated root credentialPointing
MYSQL_ROOT_PASSWORDat${CLP_DB_ROOT_PASS}instead of${CLP_DB_PASS}cleanly separates the database root password from the CLP application user password and matches the new config model.
473-477: mcp-server DB env now matches the rest of the stackRequiring
CLP_DB_USERandCLP_DB_PASSformcp-server(with:?Please set a value.) brings its DB authentication in line with the other services and with the new per-user credential model; no further issues spotted here.
Database config to support multiple DB user credentials.
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's mark this change as BREAKING (feat(clp-package)!)
because the sql root user's username and password are no longer the same as the clp-user's
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done! the changes are all clean and correct
we recently enabled more linter checks and i helped write some docstrings so the linter doesn't complain
Let me know if you have any questions
for the title, how about:
feat(clp-package)!: Add support for multiple database user credentials; Use separate root database credentials.
| try: | ||
| self.username = get_config_value(config, f"{DB_COMPONENT_NAME}.username") | ||
| self.password = get_config_value(config, f"{DB_COMPONENT_NAME}.password") | ||
| self.credentials[ClpDbUserType.CLP].username = get_config_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to the PR - using the get_config_value helper to read a value from a dictionary seems overcomplicating things. why did we create this helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
briefly discussed offline - the get_config_value value might not be that helpful. we should investigate whether to remove it then
@coderabbitai create an issue to track
Co-authored-by: Junhao Liao <[email protected]>
Co-authored-by: Junhao Liao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/sql_adapter.py(4 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
📚 Learning: 2025-08-13T15:36:37.998Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/SchemaTree.hpp:35-35
Timestamp: 2025-08-13T15:36:37.998Z
Learning: The user anlowee prefers logical grouping of related enum values for code organization and readability, even when it conflicts with backward compatibility constraints. They intentionally group semantically related types together rather than following append-only patterns.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-04-17T16:55:23.796Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:23.796Z
Learning: In the CLP project, SQL queries should use parameterized queries with placeholders (%s) and pass values as a tuple to `db_cursor.execute()` to prevent SQL injection, rather than directly interpolating values into the query string.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-07-03T20:10:43.789Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-py-utils/clp_py_utils/sql_adapter.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
ClpDbUserType(176-180)Database(190-337)DatabaseEngine(117-119)get_mysql_connection_params(220-251)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/sql_adapter.py (4)
1-13: LGTM! Module docstring and imports are appropriate.The module docstring clearly describes the adapter's purpose, and the imports correctly bring in the refactored credential types (
ClpDbUserType,Database,DatabaseEngine) needed to support per-user credentials.
63-67: LGTM! Class and constructor documentation is clear.The class and constructor docstrings properly document the SqlAdapter's purpose and initialization with the CLP database config model.
124-146: LGTM! MySQL connection creation correctly handles user_type.The private
_create_mysql_connectionmethod properly passesuser_typetoget_mysql_connection_paramsand maintains the existing error handling for access denied and bad database errors.
148-163: LGTM! MariaDB connection creation correctly handles user_type.The private
_create_mariadb_connectionmethod properly passesuser_typetoget_mysql_connection_paramsand includes appropriate error handling.
Database config to support multiple DB user credentials.Database config to support multiple DB user credentials.
| CLP_DB_USER: "${CLP_DB_USER}" | ||
| CLP_DB_PASS: "${CLP_DB_PASS}" | ||
| CLP_DB_PASS: "${CLP_DB_PASS:?Please set a value.}" | ||
| CLP_DB_USER: "${CLP_DB_USER:?Please set a value.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to the current PR - CLP_DB_PASS should be an optional env as well (default to "clp-user")
@coderabbitai please create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track making CLP_DB_PASS optional
gh issue create \
--repo y-scope/clp \
--title "Make CLP_DB_PASS environment variable optional with default value" \
--body "## Description
Make the \`CLP_DB_PASS\` environment variable optional with a default value of \"clp-user\" in the Docker Compose configuration.
## Context
This was identified during review of PR #1655 (comment: https://github.com/y-scope/clp/pull/1655#discussion_r2572465293).
Currently, \`CLP_DB_PASS\` is a required environment variable. Making it optional with a sensible default would improve the developer experience for local deployments.
## Requested by
@junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1691 to track making CLP_DB_PASS an optional environment variable with a default value of "clp-user": #1691
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/general.py:0-0
Timestamp: 2025-08-19T14:41:28.901Z
Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Database config to support multiple DB user credentials.| CLP_DB_ROOT_PASS: "${CLP_DB_ROOT_PASS:?Please set a value.}" | ||
| CLP_DB_ROOT_USER: "${CLP_DB_ROOT_USER:-root}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although these are not yet used, we should plan for integration with the Helm chart, which is work in progress. For more details, see #1309 for the feature request and #1603 for a prototype.
@coderabbitai create an issue to track this and assign @junhaoliao to work on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track integration of root DB credentials with Helm chart
gh issue create \
--repo y-scope/clp \
--title "Integrate root database credentials with Helm chart deployment" \
--body "## Context
PR #1655 introduces support for multiple database user credentials, specifically adding root user credentials (\`CLP_DB_ROOT_USER\` and \`CLP_DB_ROOT_PASS\`) to the database configuration. These new credentials are currently integrated into the Docker Compose deployment but need to be incorporated into the Helm chart deployment as well.
## Related Work
- Feature request: #1309
- Prototype: #1603
- Originating PR: #1655
## Task
Integrate the new root database credential environment variables (\`CLP_DB_ROOT_USER\` and \`CLP_DB_ROOT_PASS\`) into the Helm chart deployment configuration, ensuring consistency with the Docker Compose implementation.
## Reference
This issue was created following discussion in PR #1655:
https://github.com/y-scope/clp/pull/1655#discussion_r2572875618
Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1697 to track the integration of the new root database credentials (CLP_DB_ROOT_USER and CLP_DB_ROOT_PASS) with the Helm chart deployment. The issue references PR #1655, #1309, and #1603, and has been assigned to you.
Issue link: #1697
🐰✨
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.506Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
Description
This PR refactors how credentials for the
Databaseconfig class are handled:DbUserCredentialsclass to store DB username and password, andClpDbUserTypeenum class to store different user types.usernameandpasswordfields inDatabasewithdictof user types and credential sets.user_typeparameter to credential loading/validation/connection methods to specify which user is to be used.Checklist
breaking change.
Validation performed
Started the package, compressed logs, performed query, used archive/dataset manager scripts.
Summary by CodeRabbit
New Features
Chores
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.