Skip to content

Add Oracle database tools for the agent framework, enabling LLM agents to query and discover Oracle database schemas#450

Open
Viphava280444 wants to merge 11 commits intoarchi-physics:devfrom
Viphava280444:Oracle_DB
Open

Add Oracle database tools for the agent framework, enabling LLM agents to query and discover Oracle database schemas#450
Viphava280444 wants to merge 11 commits intoarchi-physics:devfrom
Viphava280444:Oracle_DB

Conversation

@Viphava280444
Copy link
Copy Markdown
Collaborator

Key changes

New files:

  • oracle_connection.py — Manages Oracle DB connections. Handles DSN, credentials from .env, schema
    allowlists, timeouts, and thick mode for CERN NNE.
  • oracle_query.py — Two LangChain tools: query_oracle_db (read-only SQL with safety checks) and
    describe_oracle_schema (list databases/tables/columns). Includes error hints for common Oracle errors so
    the LLM can self-correct.
  • oracle_safety.py — SQL validation: blocks non-SELECT statements, enforces schema allowlists, injects
    row limits.

Modified files:

  • cms_comp_ops_agent.py — Wires up Oracle tools when services.chat_app.tools.oracle_databases is
    configured.
  • tools/init.py — Exports Oracle tool factories.
  • Dockerfile-chat — Installs oracledb and Oracle Instant Client when INSTALL_ORACLE=true.
  • templates_manager.py — Auto-detects Oracle config to set the build arg.
  • base-config.yaml — Oracle config passes through via {{ services.chat_app.tools | tojson }}.
  • pyproject.toml — Adds oracledb as optional dependency.

Oracle databases are configured under services.chat_app.tools.oracle_databases

services:
 chat_app:
   tools:
     oracle_databases:
       tier0_replay2:
         dsn: "int2r-s.cern.ch:10121/int2r_nolb.cern.ch"
         user: "CMS_T0AST_REPLAY2"                         # TODO: replace with real user
         password_secret: "ORACLE_T0_PASSWORD"     # env var or Docker secret
         description: "Tier-0 replay database"
         allowed_schemas:
           - CMS_T0AST_REPLAY2
         max_rows: 200
         query_timeout_seconds: 30

       tier0_replay3:
         dsn: "int2r-s.cern.ch:10121/int2r_nolb.cern.ch"
         user: "CMS_T0AST_REPLAY3"                         # TODO: replace with real user
         password_secret: "ORACLE_T0_PASSWORD"     # env var or Docker secret
         description: "Tier-0 replay database"
         allowed_schemas:
           - CMS_T0AST_REPLAY3
         max_rows: 200
         query_timeout_seconds: 30

       tier0_replay1:
         dsn: "int2r-s.cern.ch:10121/int2r_nolb.cern.ch"
         user: "CMS_T0AST_REPLAY1"                         # TODO: replace with real user
         password_secret: "ORACLE_T0_PASSWORD"     # env var or Docker secret
         description: "Tier-0 replay database"
         allowed_schemas:
           - CMS_T0AST_REPLAY1
         max_rows: 200
         query_timeout_seconds: 30

Examples

Example 1
Screenshot 2026-02-17 at 11 52 52

Example 2
Screenshot 2026-02-17 at 11 52 07

Example 3
Screenshot 2026-02-17 at 11 52 26

Example 4
Screenshot 2026-02-17 at 11 52 36

@Viphava280444 Viphava280444 changed the base branch from main to dev February 18, 2026 16:59
@haozturk haozturk self-requested a review February 19, 2026 09:42
pyproject.toml Outdated
]

[project.optional-dependencies]
oracle = ["oracledb>=2.0.0"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this outside the container?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, no need for it outside the chat container. I'll remove it.


# Auto-detect Oracle database config and set INSTALL_ORACLE build arg
base_config = (context.config_manager.get_configs() or [{}])[0]
oracle_dbs = base_config.get("services", {}).get("chat_app", {}).get("tools", {}).get("oracle_databases")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools are determined via the agents.md now, so i think this wouldn't work.

but I agree with your concern about installing oracle in the docker even if you don't want it. How large / slow is this?

another option would be to set up an external MCP server, and ping it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was to install oracledb only when it's needed to keep the container lightweight. I use thick mode by default because the CERN Oracle databases require it and it will cover everything if we use this mode(thin mode). So the size of package is about 145 MB.
I haven't looked into the MCP server approach yet, but I'll have a look.

@pmlugato pmlugato added the enhancement New feature or request label Feb 23, 2026
@haozturk
Copy link
Copy Markdown
Collaborator

@LinaresToine can you please take a look at this from the Tier0 angle? I'm curious what you think about this

@LinaresToine
Copy link
Copy Markdown

Thanks for pinging me @haozturk. I really like this feature, I can see it being of great use for Tier-0 day to day operations. Thank you @Viphava280444.

What is the cost of this tool? I mean, would it require heavy use of the higher-end models?

@LinaresToine
Copy link
Copy Markdown

@haozturk @Viphava280444 I would also like to point out that this tool is not to be used blindly. There are cases in which the operator needs to be very critical of what is going on. One example is for data invalidation whenever we must reprocess something. Although this is unlikely now with the Run3 coming to an end, I think it is valuable to keep such cases in mind.

build:
context: .
dockerfile: archi_code/cli/templates/dockerfiles/Dockerfile-data-manager
{% if host_mode %}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added these host_mode conditionals? How are they relevant to the purpose of this PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hasan, No, it's not directly relevant to this PR. I added it in the build section as an ad hoc solution because when I tried to install Archi on a Tier 0 machine, it couldn't build the image. Somehow, Docker's network on the Tier 0 machine behaves differently and needs host mode to reach the network to install packages (pip install, apt install, etc.).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove them once I finish development on this PR, since I still need to deploy it on Tier 0 for testing the Oracle tool.

@Viphava280444
Copy link
Copy Markdown
Collaborator Author

Thanks for pinging me @haozturk. I really like this feature, I can see it being of great use for Tier-0 day to day operations. Thank you @Viphava280444.

What is the cost of this tool? I mean, would it require heavy use of the higher-end models?

Hi @LinaresToine, No, it doesn't require heavy use of a higher-end model. It just needs the same model to call this tool as it does for the other tools.

@Viphava280444
Copy link
Copy Markdown
Collaborator Author

@haozturk @Viphava280444 I would also like to point out that this tool is not to be used blindly. There are cases in which the operator needs to be very critical of what is going on. One example is for data invalidation whenever we must reprocess something. Although this is unlikely now with the Run3 coming to an end, I think it is valuable to keep such cases in mind.

@LinaresToine Thank you for pointing this out Antonio. Yesterday, I modified the output of the tool to also show the query that the AI uses, so that people can validate it for correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants