🐛 Fix DB connection pool getting exhausted #862

JonahSussman · 2025-09-04T18:44:06Z

Fixes #861

Summary by CodeRabbit

New Features
- Automatic database table creation on startup.
- New local targets to run a PostgreSQL-backed server and stress tests.
Bug Fixes
- More reliable SSL bypass behavior to prevent duplicate patching.
- Clearer, more verbose error reporting when HTTP transport fails.
Refactor
- Improved database pooling and session handling for higher concurrency and stability.
- Removed per-call locking to boost throughput.
- Safer startup/shutdown lifecycle for the database engine.
Tests
- Added a high-concurrency stress test simulating many simultaneous clients.
Chores
- CI workflow to run automated stress tests.
- Expanded .gitignore to exclude local database files.

Signed-off-by: JonahSussman <[email protected]>

coderabbitai · 2025-09-04T18:44:13Z

Walkthrough

Refactors DB engine creation and lifecycle, removes runtime drop-all setting, replaces per-call locks with transactional sessions, adds explicit table management helpers, introduces a GitHub Actions stress-test workflow and Makefile stress/podman targets, and adds a high-concurrency external-server integration test plus supporting test-tooling tweaks.

Changes

Cohort / File(s)	Summary
CI: Stress test workflow `.github/workflows/stress-test-mcp-server.yml`	New GitHub Actions workflow to start Postgres, launch the MCP server, wait for readiness, run a parametrized stress test, report Postgres connection stats, and exit with test result.
Git ignore updates `.gitignore`	Adds ignore patterns: `tests/test_.db`, `kai_mcp_solution_server/tests/test_.db`, `*.db`.
Make targets for Postgres & stress `kai_mcp_solution_server/Makefile`	Adds `podman-postgres` and `test-stress` targets, removes `pytest` target, updates help/usage and examples, minor formatting tweaks.
DB engine and schema helpers `kai_mcp_solution_server/src/.../db/dao.py`	`get_async_engine(url)` signature simplified (removed drop_all); adds `drop_all_tables(engine)` and `ensure_tables_exist(engine)`; configures pooling per DB type (StaticPool for sqlite, tuned pool for others).
Server lifecycle, sessions, concurrency `kai_mcp_solution_server/src/.../server.py`	Removes `drop_all` setting; uses `get_async_engine(self.settings.db_dsn)` and ensures tables on startup; disposes engine on shutdown; replaces explicit `commit()` calls with `async_sessionmaker.begin()` scopes; removes per-call locks; adjusts hint generation flows and v2 diff source.
MCP client args (external mode) `kai_mcp_solution_server/tests/mcp_client.py`	Makes `server_path` optional (`Path
MCP loader HTTP behavior `kai_mcp_solution_server/tests/mcp_loader_script.py`	Improves HTTP transport error reporting with traceback, removes per-request `insecure` flag from client builder, updates `ClientSession` usage, and re-raises exceptions after logging.
HTTPX SSL patch idempotency `kai_mcp_solution_server/tests/ssl_utils.py`	Adds idempotent patch check and marks patched wrappers to avoid double-patching httpx.
Concurrency stress test `kai_mcp_solution_server/tests/test_multiple_integration.py`	Adds `test_multiple_users` stress test that runs many concurrent external-server clients exercising create/list/accept/hint flows; supports `NUM_CONCURRENT_CLIENTS` and `MCP_SERVER_URL`; duplicated method appears in diff.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Server
  participant DB
  participant LLM as LLM/HintGen

  rect rgb(240,248,255)
  note right of Server: Startup: create engine & ensure tables
  Server->>DB: get_async_engine(db_dsn)
  Server->>DB: ensure_tables_exist()
  end

  Client->>Server: create_incident / create_solution
  Server->>DB: session.begin() { insert }
  DB-->>Server: OK (ids)
  Server-->>Client: IDs

  Client->>Server: accept_file
  Server->>DB: session.begin() { update status }
  alt all client's files accepted/modified
    Server--)LLM: spawn async generate_hint_v3()
  end
  Server-->>Client: OK

  rect rgb(245,240,255)
  note right of LLM: Async hint task uses DB read/write
  LLM->>DB: read code + diffs
  LLM-->>Server: produced hint
  Server->>DB: session.begin() { insert hint }
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

👻 Fixed hint generation bugs and added tests #828 — Modifies overlapping server logic (create_solution, accept_file, generate_hint_v*); strong code-level overlap.
✨ Uploaded testing script #817 — Changes test/loader tooling and server startup/DB handling; relates to test harness and startup behavior.

Suggested reviewers

hhpatel14
fabianvf
savitharaghunathan

Pre-merge checks (3 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	The pull request includes updates to `.gitignore` to ignore database files and modifications to the test loader script’s error reporting, neither of which are mentioned in the objectives for fixing pool exhaustion or handling HTTP connection errors in production. These changes fall outside the scope of addressing the DB pool exhaustion and should be separated or justified.	Please remove or isolate the `.gitignore` database ignore patterns and the unrelated test loader script adjustments into a separate PR, or expand the description to explain how they support the primary goal of resolving connection-pool exhaustion.
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.63% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the primary intent of the pull request by indicating a fix for the database connection pool exhaustion issue and directly references the core problem reported in the linked issue. It is concise, specific, and omits extraneous details about individual file changes. This phrasing allows a teammate scanning the history to quickly grasp the main change without needing to parse the entire diff.
Linked Issues Check	✅ Passed	The changes introduce explicit connection-pool parameters in `get_async_engine`, ensure proper engine disposal, remove per-call locks to rely on session management, and add concurrency stress tests and a CI workflow to validate under load. These modifications directly address the reported race conditions, session leaks, and pool exhaustion under concurrent usage as described in issue #861.
Description Check	✅ Passed	Although minimal, the description “Fixes #861” directly ties the pull request to the relevant issue and indicates its purpose. It is on-topic and sufficient for this repository’s lenient description standard.

Poem

I nibble code beneath the moonlit log, 🐇
Spawn a Postgres warren, set tests to jog.
Locks hop away, sessions snug and neat,
Clients sprint; async hints arrive on beat.
If connections stumble, I’ll debug and dig—tiny paws, big grit.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)
51-55: Validate port and normalize mount_path to avoid malformed URLs ("None" in URL).

If args.port or args.mount_path is unset, server_url becomes invalid.
-    transport = ""
-    if not args.host.startswith("http"):
-        transport = "http://"
-    server_url = f"{transport}{args.host}:{args.port}{args.mount_path}"
+    transport = ""
+    if args.port is None:
+        raise ValueError("Port must be specified for HTTP transport")
+    mount_path = args.mount_path or ""
+    if not (args.host.startswith("http://") or args.host.startswith("https://")):
+        transport = "http://"
+    server_url = f"{transport}{args.host}:{args.port}{mount_path}"
73-101: Re-raise after logging to prevent “generator didn’t yield” from asynccontextmanager; also print errors to stderr.

Catching and not re-raising inside an @asynccontextmanager leads to a confusing RuntimeError: generator didn't yield. Also, printing tracebacks to stdout can pollute test output.
-        logger.error("HTTP transport error: %s", str(e), exc_info=True)
-        print(f"x Error with HTTP transport: {traceback.format_exc()}")
+        logger.error("HTTP transport error: %s", str(e), exc_info=True)
+        print(f"x Error with HTTP transport: {traceback.format_exc()}", file=sys.stderr)
@@
-        print(f"! Make sure the server is running at {server_url}")
+        print(f"! Make sure the server is running at {server_url}", file=sys.stderr)
+        raise
kai_mcp_solution_server/tests/test_multiple_integration.py (1)

1-542: Fix remaining lint errors before verification

In tests/mcp_loader_script.py: remove blind except Exception blocks (BLE001), replace generic Exception raises with custom exception classes (TRY002), and shorten long literal messages by defining them within exception types (TRY003).

In tests/test_multiple_integration.py: remove unused locals (SOLUTION_FOR_INCIDENT_A_ID, multiple_user_mcp_args, ssl_patch) (F841), drop f prefix on static strings (F541), replace bare assert with explicit checks (S101), avoid blind except Exception clauses (BLE001), shorten long runtime-error messages (TRY003), and use sys.executable instead of hardcoded "python" in subprocess calls (S607).

After these fixes, re-run ruff check kai_mcp_solution_server/tests/... and the multi-user pytest.

🧹 Nitpick comments (12)

kai_mcp_solution_server/tests/test_multiple_integration.py (12)

2-3: Drop redundant base package import.

import concurrent is unnecessary alongside import concurrent.futures.

-import concurrent
 import concurrent.futures

36-43: Use filename-safe timestamps.

datetime.now().isoformat() includes : which breaks filenames on Windows. Use a safe format.

-        timestamp = datetime.datetime.now().isoformat()
+        timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")

84-95: Prefer os.path.join and explicit UTF-8 for fixtures.

Reduces path bugs across platforms and avoids default-encoding surprises.

-            expected_tool_schemas = json.loads(
-                open(self.test_data_dir + "/expected_tool_schemas.json").read()
-            )
+            expected_path = os.path.join(self.test_data_dir, "expected_tool_schemas.json")
+            with open(expected_path, "r", encoding="utf-8") as f:
+                expected_tool_schemas = json.load(f)

305-306: Fix log message to reference the correct ruleset/violation (B variant).

Current print uses A-variant names in the B-flow.

-            print(f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A}: {metric}")
+            print(f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B}: {metric}")

340-341: Fix log message to reference the correct ruleset/violation (B variant).

-            print(
-                f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A} after accepting file: {metric}"
-            )
+            print(
+                f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B} after accepting file: {metric}"
+            )

420-422: Resolve ruff F841 in skipped test (unused variable).

Avoid unused assignment even in skipped tests.

-            SOLUTION_FOR_INCIDENT_A_ID = int(
-                create_solution_for_incident_a.model_dump()["content"][0]["text"]
-            )
+            _ = int(create_solution_for_incident_a.model_dump()["content"][0]["text"])

437-439: Fix f-string without placeholders (ruff F541).

-                "responses": [
-                    f"You should add a smiley face to the file.",
-                ],
+                "responses": [
+                    "You should add a smiley face to the file.",
+                ],

452-459: Remove unused poll_process helper or wire it in.

Currently dead code. Recommend removal to reduce noise.

-        def poll_process(process: subprocess.Popen) -> None:
-            # Check if the process has exited early
-            if process.poll() is not None:
-                output = process.stdout.read() if process.stdout else b""
-                raise RuntimeError(
-                    f"HTTP server process exited prematurely. Output: {output.decode(errors='replace')}"
-                )

511-512: Wait for server readiness rather than sleeping a fixed second.

Avoid flakiness by polling the port.

-            await asyncio.sleep(1)  # give the server a second to start
+            # Wait up to 10s for server to accept connections on 8087
+            await asyncio.wait_for(asyncio.to_thread(_wait_for_port, "127.0.0.1", 8087, 10.0), 12.0)

Add this helper in the file (near other helpers):

import socket

def _wait_for_port(host: str, port: int, timeout: float) -> None:
    deadline = asyncio.get_event_loop().time() + timeout
    while True:
        with socket.socket() as s:
            s.settimeout(0.5)
            try:
                s.connect((host, port))
                return
            except OSError:
                pass
        if asyncio.get_event_loop().time() >= deadline:
            raise TimeoutError(f"Timed out waiting for {host}:{port}")

513-521: If you want “multi-user”, increase concurrency beyond 1.

Currently NUM_TASKS = 1. Consider bumping to at least 5 to exercise concurrency.

-            NUM_TASKS = 1
+            NUM_TASKS = 5

537-542: Guard cleanup in case startup fails before thread assignment.

Prevents UnboundLocalError on stream_thread.

-            self.http_server_process.terminate()
-            self.http_server_process.wait()
-            print("Server process terminated.")
-            stream_thread.join()
+            if getattr(self, "http_server_process", None):
+                self.http_server_process.terminate()
+                self.http_server_process.wait()
+                print("Server process terminated.")
+            if "stream_thread" in locals():
+                stream_thread.join()

491-491: Subprocess security notice + avoid broad except in futures loop.

Bandit warnings are acceptable in tests, but consider narrowing exception types in the futures loop for better diagnostics.

Also applies to: 532-532

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1d90d04 and c85c725.

📒 Files selected for processing (2)

kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
kai_mcp_solution_server/tests/test_multiple_integration.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

kai_mcp_solution_server/tests/test_multiple_integration.py (4)

kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)

CallToolResult (22-24)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

GetBestHintResult (669-671)

SuccessRateMetric (728-734)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)

MCPClientArgs (31-41)

create_client (136-151)

kai/jsonrpc/util.py (1)

model_dump (96-97)

🪛 GitHub Check: Trunk Check

kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.

[failure] 425-425: ruff(F841)
[new] Local variable multiple_user_mcp_args is assigned to but never used

[failure] 475-475: ruff(F841)
[new] Local variable ssl_patch is assigned to but never used

[failure] 420-420: ruff(F841)
[new] Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used

[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path

[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

[failure] 437-437: ruff(F541)
[new] f-string without any placeholders

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

420-420: Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used

Remove assignment to unused variable SOLUTION_FOR_INCIDENT_A_ID

(F841)

425-425: Local variable multiple_user_mcp_args is assigned to but never used

Remove assignment to unused variable multiple_user_mcp_args

(F841)

437-437: f-string without any placeholders

Remove extraneous f prefix

(F541)

444-444: Use of assert detected

(S101)

447-447: Do not catch blind exception: Exception

(BLE001)

456-458: Avoid specifying long messages outside the exception class

(TRY003)

475-475: Local variable ssl_patch is assigned to but never used

Remove assignment to unused variable ssl_patch

(F841)

492-500: Starting a process with a partial executable path

(S607)

532-532: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

kai_mcp_solution_server/tests/test_multiple_integration.py

coderabbitai · 2025-09-04T18:51:10Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+            self.http_server_process = subprocess.Popen(
+                [
+                    "python",
+                    "-m",
+                    "kai_mcp_solution_server",
+                    "--transport",
+                    "streamable-http",
+                    "--port",
+                    "8087",
+                ],


🛠️ Refactor suggestion

Use sys.executable instead of bare "python" (Bandit B607/B603).

More robust across environments and virtualenvs.

- self.http_server_process = subprocess.Popen( + self.http_server_process = subprocess.Popen( [ - "python", + sys.executable, "-m", "kai_mcp_solution_server", "--transport", "streamable-http", "--port", "8087", ],

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.12.2)

492-500: Starting a process with a partial executable path

(S607)

🪛 GitHub Check: Trunk Check

[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path

[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 491 to 500, the subprocess invocation uses the literal "python" which is brittle across environments; replace the string with sys.executable and ensure the module imports sys at the top of the test file so the current Python interpreter (including virtualenvs) is used when launching the server process.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

kai_mcp_solution_server/tests/test_multiple_integration.py (3)

477-486: Avoid runtime assert; guard stdout and safe-close.

         def stream_output(process: subprocess.Popen) -> None:
             try:
-                assert process.stdout is not None
-                for line in iter(process.stdout.readline, b""):
-                    print(f"[Server] {line.decode().rstrip()}")
+                stdout = process.stdout
+                if stdout is None:
+                    raise RuntimeError("HTTP server process has no stdout pipe")
+                for line in iter(stdout.readline, b""):
+                    print(f"[Server] {line.decode().rstrip()}")
             except Exception as e:
                 print(f"Error while streaming output: {e}")
             finally:
-                process.stdout.close()
+                if process.stdout:
+                    try:
+                        process.stdout.close()
+                    except Exception:
+                        pass

508-516: Actually apply the SSL bypass or remove it.

         async def client_task(client_id: str) -> None:
             print(f"[Client {client_id}] starting")
-            apply_ssl_bypass()
-
-            client = Client(
-                transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-            )
+            with apply_ssl_bypass():
+                client = Client(
+                    transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
+                )

Note: if the server is plain HTTP, the bypass is unnecessary and can be deleted instead.

608-617: Use sys.executable for portability (Bandit B607/B603).

-                self.http_server_process = subprocess.Popen(
+                self.http_server_process = subprocess.Popen(
                     [
-                        "python",
+                        sys.executable,
                         "-m",
                         "kai_mcp_solution_server",
                         "--transport",
                         "streamable-http",
                         "--port",
                         "8087",
                     ],

Also ensure import sys is present at the top of the file.

🧹 Nitpick comments (18)

.gitignore (1)
210-213: *Avoid ignoring all .db repository-wide

Adding test DB ignores is good, but a blanket *.db may hide legitimate DB assets outside tests. Prefer scoping to test/CI artifacts only.

Apply:
   # Test database files
   tests/test_*.db
   kai_mcp_solution_server/tests/test_*.db
-  *.db
+  # Consider avoiding a global wildcard:
+  # **/tests/**/*.db
+  # Or remove this line entirely if not strictly needed
kai_mcp_solution_server/tests/mcp_client.py (1)

311-333: Small robustness nit: avoid repeated Path(None) conversions

Since you only support stdio when server_path is present, move server_path = Path(args.server_path) after the None-guard (above) to safeguard future edits.
.github/workflows/stress-test-mcp-server.yml (1)
66-67: Avoid hardcoding DB credentials in env to quiet secret scanners

Even for ephemeral CI, static analyzers will flag inline creds. Build the DSN at runtime from variables to reduce false positives.
-          KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
           KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
         run: |
-          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
-          uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
+          export PGUSER=kai_user
+          export PGPASSWORD=kai_password
+          export PGDATABASE=kai_test_db
+          export PGHOST=localhost
+          export PGPORT=5432
+          export KAI_DB_DSN="postgresql+asyncpg://${PGUSER}:${PGPASSWORD}@${PGHOST}:${PGPORT}/${PGDATABASE}"
+          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
+          uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)
113-127: Revisit SQLite pooling strategy to avoid cross-task connection reuse

Using StaticPool forces a single shared connection. Under concurrent async workloads this can serialize DB access and risk “connection already in use”/transaction conflicts. Prefer NullPool with WAL and a sensible busy timeout for better concurrency on file-backed SQLite.

Suggested change:
-from sqlalchemy.pool import StaticPool
+from sqlalchemy.pool import StaticPool, NullPool
...
-    if "sqlite" in url_str:
+    if url_str.startswith("sqlite"):
         engine = create_async_engine(
             url,
-            connect_args={"check_same_thread": False},
-            poolclass=StaticPool,  # Single shared connection for SQLite
+            connect_args={"check_same_thread": False, "timeout": 30},
+            poolclass=NullPool,  # One connection per use; avoids cross-task sharing
             echo_pool=False,  # Set to True for debugging connection pool
         )
+        # Enable WAL + reasonable sync level for better read/write concurrency
+        @event.listens_for(engine.sync_engine, "connect")
+        def _sqlite_pragma(dbapi_conn, _):
+            cur = dbapi_conn.cursor()
+            try:
+                cur.execute("PRAGMA journal_mode=WAL;")
+                cur.execute("PRAGMA synchronous=NORMAL;")
+                cur.execute("PRAGMA foreign_keys=ON;")
+            finally:
+                cur.close()
Verification: run the SQLite stress job and confirm absence of “database is locked” and connection reuse errors under NUM_CONCURRENT_CLIENTS=5..10.

117-118: Use stricter SQLite detection

Minor: avoid accidental matches on non-SQLite URLs by using startswith("sqlite").
-    if "sqlite" in url_str:
+    if url_str.startswith("sqlite"):
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
332-351: Duplicate writes and TODO: clarify after-file creation path.

Comment “FIXME: Something is fishy here...” remains unresolved; the code always inserts a new “after” file and then queries for an existing one. This can create duplicates under concurrency.

session.add(new_file) is called twice later in accept_file (see below).

Consider an upsert-like path: query latest by (client_id, uri, content) and reuse if identical; only insert when content differs. Also, add a unique (client_id, uri, created_at) or application-level de-dup.

847-855: Redundant add().

session.add(new_file) is executed twice (Lines 847 and 853). Keep one.
             session.add(new_file)
             for solution, old_file in files_to_update:
                 # Remove the old file from the solution
                 solution.after.remove(old_file)
                 solution.after.add(new_file)

-                session.add(new_file)
                 session.add(solution)
kai_mcp_solution_server/tests/test_multiple_integration.py (2)

487-494: Remove or use poll_process().

poll_process is defined but never called. Either delete it or call it inside your readiness wait to surface early-exit errors promptly.

634-646: Thread-per-client is heavy but OK for stress; consider async tasks later.

For large NUM_TASKS, a single event loop with asyncio.gather would lower overhead. Not blocking this PR.
kai_mcp_solution_server/scripts/manage_db.py (4)
21-41: Ensure engine disposal on error.

Wrap engine usage in try/finally so the pool is disposed even if drop_everything raises.
-    engine = await get_async_engine(settings.db_dsn)
-
-    async with engine.begin() as conn:
-        print("Dropping all tables...")
-        await conn.run_sync(drop_everything)
-        print("✅ All tables dropped successfully.")
-
-    await engine.dispose()
+    engine = await get_async_engine(settings.db_dsn)
+    try:
+        async with engine.begin() as conn:
+            print("Dropping all tables...")
+            await conn.run_sync(drop_everything)
+            print("✅ All tables dropped successfully.")
+    finally:
+        await engine.dispose()
55-65: Same here: dispose on all paths.
-    engine = await get_async_engine(settings.db_dsn)
-
-    async with engine.begin() as conn:
-        print("Dropping all tables...")
-        await conn.run_sync(drop_everything)
-        print("Creating tables...")
-        await conn.run_sync(Base.metadata.create_all)
-        print("✅ Database reset successfully.")
-
-    await engine.dispose()
+    engine = await get_async_engine(settings.db_dsn)
+    try:
+        async with engine.begin() as conn:
+            print("Dropping all tables...")
+            await conn.run_sync(drop_everything)
+            print("Creating tables...")
+            await conn.run_sync(Base.metadata.create_all)
+            print("✅ Database reset successfully.")
+    finally:
+        await engine.dispose()
103-109: Tighten error message: LLM params aren’t required for this script.

Mention only KAI_DB_DSN to avoid confusing users.
-        print("\nMake sure to set required environment variables:")
-        print("  - KAI_DB_DSN: Database connection string")
-        print('  - KAI_LLM_PARAMS: LLM configuration (can be \'{"model": "fake"}\')')
+        print("\nMake sure to set required environment variables:")
+        print("  - KAI_DB_DSN: Database connection string")
1-1: Shebang present but file may not be executable.

Either make the script executable in git (chmod +x) or drop the shebang.
kai_mcp_solution_server/Makefile (5)
122-129: Prefer “podman compose up” over “podman-compose” for wider compatibility

This avoids dependency on the separate python podman-compose shim.

Apply this diff:
-	IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \
-		podman-compose up
+	IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \
+		podman compose up
Please confirm CI runners support the subcommand.

131-141: Self-contained stress test: verify whether LLM params are required; add guard if needed

If the test server path still requires KAI_LLM_PARAMS, this may fail/flap.

Optionally add a guard like run-local:
 test-stress:
 	@echo "Running self-contained stress test with $${NUM_CONCURRENT_CLIENTS:-30} concurrent clients..."
 	@echo "Database: $${KAI_DB_DSN:-sqlite+aiosqlite:///test.db}"
 	@echo "The test will start its own server on port 8087"
+	@if [ -z "$${KAI_LLM_PARAMS}" ]; then echo "Error: KAI_LLM_PARAMS is required for the test server"; exit 1; fi
 	NUM_CONCURRENT_CLIENTS=$${NUM_CONCURRENT_CLIENTS:-30} \
 	KAI_DB_DSN=$${KAI_DB_DSN:-sqlite+aiosqlite:///test.db} \
 	uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
181-205: run-with-tests may fail without KAI_LLM_PARAMS; also pass it into the server container

Align with run-local’s requirement.

Apply this diff:
 run-with-tests: build
 	@echo "Creating Podman pod for MCP server and test client..."
 	-podman pod rm -f kai-mcp-pod 2>/dev/null || true
 	podman pod create --name kai-mcp-pod -p 8000:8000
@@
 	@echo "Starting MCP server in the pod..."
-	podman run -d --pod kai-mcp-pod \
+	@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
+	podman run -d --pod kai-mcp-pod \
 		--name kai-mcp-server \
 		--volume kai-mcp-data:/data:Z \
+		-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
+		-e KAI_DB_DSN='$(KAI_DB_DSN)' \
 		$(IMAGE)
212-240: Help mentions ‘logs’ target, but no implementation exists

Either add a logs target or remove the line from help to avoid confusion.

Proposed new target (outside changed lines):
# Show Kubernetes deployment logs
.PHONY: logs
logs:
	@echo "Showing logs for deployment in namespace $(NAMESPACE)..."
	kubectl logs -n $(NAMESPACE) deploy/kai-mcp-solution-server -f
243-246: Fix NAMESPACE default in help to match actual default

Help says “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.

Apply this diff:
-	@echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+	@echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c85c725 and 02641db.

📒 Files selected for processing (8)

.github/workflows/stress-test-mcp-server.yml (1 hunks)
.gitignore (1 hunks)
kai_mcp_solution_server/Makefile (5 hunks)
kai_mcp_solution_server/scripts/manage_db.py (1 hunks)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2 hunks)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (10 hunks)
kai_mcp_solution_server/tests/mcp_client.py (1 hunks)
kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)

kai_mcp_solution_server/scripts/manage_db.py (1)

drop_all_tables (21-40)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)

Base (91-97)

get_async_engine (113-142)

kai_mcp_solution_server/src/kai_mcp_solution_server/constants.py (1)

log (9-10)

kai_mcp_solution_server/tests/test_multiple_integration.py (6)

kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)

CallToolResult (22-24)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

GetBestHintResult (663-665)

SuccessRateMetric (721-727)

kai_mcp_solution_server/tests/mcp_client.py (1)

MCPClientArgs (247-256)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)

MCPClientArgs (31-41)

create_client (136-151)

kai/jsonrpc/util.py (1)

model_dump (96-97)

kai/jsonrpc/streams.py (1)

close (70-75)

kai_mcp_solution_server/scripts/manage_db.py (2)

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (4)

Base (91-97)

drop_everything (53-88)

get_async_engine (113-142)

drop_all_tables (100-104)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)

SolutionServerSettings (39-108)

🪛 Checkov (3.2.334)

.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 GitHub Check: Trunk Check

kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.

[notice] 608-608: bandit(B607)
[new] Starting a process with a partial executable path

[notice] 608-608: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

479-479: Use of assert detected

(S101)

482-482: Do not catch blind exception: Exception

(BLE001)

491-493: Avoid specifying long messages outside the exception class

(TRY003)

609-617: Starting a process with a partial executable path

(S607)

658-658: Do not catch blind exception: Exception

(BLE001)

kai_mcp_solution_server/scripts/manage_db.py

1-1: Shebang is present but file is not executable

(EXE001)

103-103: Do not catch blind exception: Exception

(BLE001)

122-122: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: Test Code on PR

kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 1-1: test_multiple_users failed: 30 concurrent clients failed (Event loop is closed; all connection attempts failed).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Stress Test with PostgreSQL
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (10)

.github/workflows/stress-test-mcp-server.yml (2)

72-79: Post-test DB utilization check is a nice touch

Querying pg_stat_activity post-run helps validate connection pooling assumptions. LGTM.

107-114: SQLite job concurrency cap is appropriate

Capping to 5 clients acknowledges SQLite write-lock limits. LGTM.

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)

100-112: Good separation of schema management from engine creation

Introducing drop_all_tables and ensure_tables_exist clarifies lifecycle and avoids accidental drops during engine init. LGTM.

128-141: Pool sizing aligns with the 100-client stress target; confirm PG max_connections

pool_size=20 with max_overflow=80 caps at 100 connections, matching the test load. Ensure the Postgres service’s max_connections accommodates this plus overhead (psql, autovacuum). The workflow’s post-check helps track this.

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

116-124: Good: engine bootstrap is now safe and idempotent.

Creating the engine via get_async_engine and ensuring tables exist inside engine.begin() is correct and removes startup races around schema readiness.

164-169: Proper engine disposal on shutdown.

Disposing the engine in the lifespan finally block prevents pool leaks across reconnects.

kai_mcp_solution_server/Makefile (4)

89-100: run-podman target looks good

Flexible DSN with optional PODMAN_ARGS and required KAI_LLM_PARAMS check is solid.

146-153: test-stress-external target looks good

Clear UX and useful defaults.

228-233: Help additions for new targets look good

New entries are clear and consistent.

262-267: Example usage LGTM

Good guidance for local and external stress runs.

coderabbitai · 2025-09-05T18:03:04Z

.github/workflows/stress-test-mcp-server.yml

+      - name: Run stress test with 100 clients
+        env:
+          KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
+          KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
+          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+        run: |
+          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
+          uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
+        timeout-minutes: 10


🛠️ Refactor suggestion

Coalesce workflow input using the inputs context, not github.event.inputs

github.event.inputs is only defined for workflow_dispatch payloads and is null for PR/push events. Use inputs.num_clients which safely evaluates to null outside workflow_dispatch and works with || coalescing.

- NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} + NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Run stress test with 100 clients

env:

KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"

KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'

NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}

run: |

echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"

uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs

timeout-minutes: 10

- name: Run stress test with 100 clients

env:

KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"

KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'

NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

run: |

echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"

uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs

timeout-minutes: 10

🧰 Tools

🪛 Checkov (3.2.334)

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🤖 Prompt for AI Agents

In .github/workflows/stress-test-mcp-server.yml around lines 64 to 72, the workflow uses github.event.inputs.num_clients which is only set for workflow_dispatch events; replace it with the inputs context so the coalescing works for non-dispatch events. Update the NUM_CONCURRENT_CLIENTS assignment to use ${{ inputs.num_clients || '100' }} (keeping the default '100') and ensure the rest of the step remains unchanged so the workflow uses the inputs context safely across event types.

coderabbitai · 2025-09-05T18:03:04Z

kai_mcp_solution_server/Makefile

+# Convenience target: Run with SQLite
+.PHONY: podman-sqlite
+podman-sqlite: build
+	@echo "Running MCP solution server with SQLite..."
+	@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
+	podman run --rm -it -p 8000:8000 \
+		-e MOUNT_PATH='$(MOUNT_PATH)' \
+		-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
+		-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \
+		$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
+		$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
+		$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
+		$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
+		$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
+		$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
+		$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
+		$(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
+		$(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+		-v kai-sqlite-data:/data:Z \
+		--name kai-mcp-sqlite $(IMAGE)
+


⚠️ Potential issue

Avoid leaking API keys via process args; pass-through host env instead of inlining values

Using -e KEY='value' can expose secrets via ps/history. Prefer -e KEY to pass through from host.

Apply this diff:

- $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \ - $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \ - $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \ - $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \ - $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \ - $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \ - $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \ - $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \ - $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \ + $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \ + $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \ + $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \ + $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \ + $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \ + $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \ + $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \ + $(if $(AWS_REGION),-e AWS_REGION,) \ + $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Convenience target: Run with SQLite

.PHONY: podman-sqlite

podman-sqlite: build

@echo "Running MCP solution server with SQLite..."

@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi

podman run --rm -it -p 8000:8000 \

-e MOUNT_PATH='$(MOUNT_PATH)' \

-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \

-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \

$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \

$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \

$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \

$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \

$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \

$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \

$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \

$(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \

$(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \

-v kai-sqlite-data:/data:Z \

--name kai-mcp-sqlite $(IMAGE)

# Convenience target: Run with SQLite

.PHONY: podman-sqlite

podman-sqlite: build

@echo "Running MCP solution server with SQLite..."

@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi

podman run --rm -it -p 8000:8000 \

-e MOUNT_PATH='$(MOUNT_PATH)' \

-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \

-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \

$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \

$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \

$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \

$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \

$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \

$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \

$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \

$(if $(AWS_REGION),-e AWS_REGION,) \

$(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \

-v kai-sqlite-data:/data:Z \

--name kai-mcp-sqlite $(IMAGE)

coderabbitai · 2025-09-05T18:03:04Z

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py

+    # Lock removed - database handles concurrent access
+    for extended_incident in extended_incidents:
+        incident_id = await create_incident(kai_ctx, client_id, extended_incident)
+        results.append(CreateIncidentResult(incident_id=incident_id, solution_id=0))



🛠️ Refactor suggestion

Race still likely: unthrottled LLM calls spawned in background.

Removing the app-level lock is fine for DB ops, but outbound kai_ctx.model.ainvoke(...) calls (e.g., in generate_hint_v1/v2/v3) can now stampede under concurrent accept_file paths (asyncio.create_task(...)). This commonly manifests as httpcore/httpx TLS/connection errors under load, matching Issue #861. Add a small per-process semaphore and explicit timeouts/retries to bound concurrency and avoid connection storms.

Apply this diff to introduce an LLM concurrency gate and timeouts:

class KaiSolutionServerContext: def __init__(self, settings: SolutionServerSettings) -> None: self.settings = settings self.lock = asyncio.Lock() async def create(self) -> None: from kai_mcp_solution_server.db.dao import Base self.engine = await get_async_engine(self.settings.db_dsn) # Ensure tables exist (safe - only creates if not already there) async with self.engine.begin() as conn: await conn.run_sync(Base.metadata.create_all) self.session_maker = async_sessionmaker( bind=self.engine, expire_on_commit=False ) self.model: BaseChatModel + # Limit outbound LLM concurrency to mitigate connection spikes + self.llm_semaphore = asyncio.Semaphore( + int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8")) + ) if self.settings.llm_params is None: raise ValueError("LLM parameters must be provided in the settings.") elif self.settings.llm_params.get("model") == "fake":

And wrap LLM calls with the gate and a timeout (repeat similarly in v2/v3):

- response = await kai_ctx.model.ainvoke(prompt) + async with kai_ctx.llm_semaphore: + response = await asyncio.wait_for( + kai_ctx.model.ainvoke(prompt), + timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")), + )

Optionally harden initialization with retries/timeouts:

- self.model = init_chat_model(**self.settings.llm_params) + params = { + "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")), + "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")), + **self.settings.llm_params, + } + self.model = init_chat_model(**params)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines 260-264, the app removed the lock but still spawns unthrottled background LLM calls (kai_ctx.model.ainvoke) which can stampede and cause httpx/httpcore TLS/connection errors; introduce a module-level asyncio.Semaphore (e.g., max_concurrent_llm = 5) and ensure all LLM invocations (generate_hint_v1/v2/v3 and any other ainvoke paths) acquire the semaphore before calling and release it after, wrapping the actual ai call in asyncio.wait_for to enforce a per-call timeout and add a simple retry loop with exponential backoff for transient failures; also initialize the semaphore once at startup and consider applying the same timeout/retry pattern to any LLM client initialization to harden startup.

kai_mcp_solution_server/tests/mcp_client.py

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

kai_mcp_solution_server/tests/mcp_loader_script.py (1)

47-55: Validate port and normalize mount_path to avoid malformed URLs.

If port or mount_path is None, server_url becomes http://host:NoneNone. Guard and normalize.

Apply this diff:

   if args.host is None:
       raise ValueError("Host must be specified for HTTP transport")

   # Setup HTTP transport using streamable-http
   transport = ""
   if not args.host.startswith("http"):
       transport = "http://"
-  server_url = f"{transport}{args.host}:{args.port}{args.mount_path}"
+  if args.port is None:
+      raise ValueError("Port must be specified for HTTP transport")
+  mount_path = args.mount_path or ""
+  if mount_path and not mount_path.startswith("/"):
+      mount_path = "/" + mount_path
+  server_url = f"{transport}{args.host}:{args.port}{mount_path}"
   print(f"Connecting to server at {server_url}...")

♻️ Duplicate comments (4)

.github/workflows/stress-test-mcp-server.yml (1)

69-69: Coalesce workflow input via inputs context, not github.event.inputs.

github.event.inputs is null outside workflow_dispatch; use inputs.num_clients so the default coalesces reliably across events.

Apply this diff:
-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

kai_mcp_solution_server/Makefile (1)

110-118: Avoid leaking secrets in podman-sqlite env flags

Inline values in -e KEY='value' are visible via ps/history. Pass-through host env with -e KEY instead.

-        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
-        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
-        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
-        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
-        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
-        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
-        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
-        $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
-        $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+        $(if $(AWS_REGION),-e AWS_REGION,) \
+        $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \

kai_mcp_solution_server/tests/test_multiple_integration.py (2)

477-486: Remove runtime assert; guard stdout close; avoid blind except

Prevents crashes and aligns with prior review.

-        def stream_output(process: subprocess.Popen) -> None:
-            try:
-                assert process.stdout is not None
-                for line in iter(process.stdout.readline, b""):
-                    print(f"[Server] {line.decode().rstrip()}")
-            except Exception as e:
-                print(f"Error while streaming output: {e}")
-            finally:
-                process.stdout.close()
+        def stream_output(process: subprocess.Popen) -> None:
+            try:
+                stdout = process.stdout
+                if stdout is None:
+                    return
+                for line in iter(stdout.readline, b""):
+                    print(f"[Server] {line.decode(errors='replace').rstrip()}")
+            except BaseException as e:
+                print(f"Error while streaming output: {e}")
+            finally:
+                try:
+                    if process.stdout:
+                        process.stdout.close()
+                except Exception:
+                    pass

693-702: Use sys.executable instead of bare "python"

More robust and fixes Bandit B607/B603 warnings.

-                self.http_server_process = subprocess.Popen(
+                self.http_server_process = subprocess.Popen(
                     [
-                        "python",
+                        sys.executable,
                         "-m",
                         "kai_mcp_solution_server",
                         "--transport",
                         "streamable-http",
                         "--port",
                         "8087",
                     ],

Add import:

import sys

🧹 Nitpick comments (14)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)
9-9: Remove unused import (ruff F401).

timedelta is imported but never used.

Apply this diff:
-from datetime import timedelta
63-69: Avoid unused local; don’t bind get_session_id if not used.

Prevents potential linter warnings and clarifies intent.

Apply this diff:
-            read, write, get_session_id = await stack.enter_async_context(
+            read, write, _ = await stack.enter_async_context(
                 streamablehttp_client(server_url)
             )
kai_mcp_solution_server/tests/ssl_utils.py (3)
36-49: Make monkey-patching reentrant-safe across concurrent contexts.

Current design patches/unpatches global httpx methods per-context without reference counting. Overlapping contexts (e.g., parallel tests/clients) can restore while others still rely on the patch. Add a module-level refcount + lock so only the last exiting context restores.

Here’s a minimal pattern to adopt (outside the selected range):
# Module-level
import threading
_PATCH_LOCK = threading.Lock()
_PATCH_COUNT = 0

def _acquire_patch() -> bool:
    global _PATCH_COUNT
    with _PATCH_LOCK:
        if _PATCH_COUNT == 0:
            return True  # caller should apply patches
        _PATCH_COUNT += 1
        return False

def _release_patch() -> bool:
    global _PATCH_COUNT
    with _PATCH_LOCK:
        _PATCH_COUNT -= 1
        return _PATCH_COUNT == 0  # caller should restore
Then in SSLMonkeyPatch:

In __enter__/apply_ssl_bypass(): call _acquire_patch(), apply patches only when it returns True.

In restore_ssl_settings(): call _release_patch(), restore only when it returns True.

Also applies to: 137-155

61-63: Over-broad warning suppression.

warnings.filterwarnings("ignore", category=Warning) silences all warnings. Restrict to the specific SSL warning message or a specific category.

Apply this diff:
-        warnings.filterwarnings("ignore", category=Warning)
+        # Keep the specific message filter; avoid blanket suppression
123-128: Cleanup unused SSL environment variables
HTTPX_SSL_VERIFY, HTTPX_NO_VERIFY, and SSL_CERT_VERIFY aren’t referenced elsewhere; retain only the standard PYTHONHTTPSVERIFY.
-        ssl_env_vars = {
-            "SSL_CERT_VERIFY": "false",
-            "HTTPX_SSL_VERIFY": "false",
-            "HTTPX_NO_VERIFY": "true",
-            "PYTHONHTTPSVERIFY": "0",
-        }
+        ssl_env_vars = {
+            "PYTHONHTTPSVERIFY": "0",
+        }
.github/workflows/stress-test-mcp-server.yml (2)
74-74: Trim trailing whitespace (yamllint).

These lines have trailing spaces; remove to appease linters and keep diffs clean.

Also applies to: 90-90, 96-96, 100-100

66-69: Optional: avoid hardcoding DB credentials in DSN to satisfy secret scanners.

Even if they’re test-only, Checkov flags inline creds. Consider building KAI_DB_DSN from job env vars.

Apply this minimal change (job-level env at the top of the job), then reference them in DSN:
   stress-test-postgres:
     name: Stress Test with PostgreSQL
     runs-on: ubuntu-latest
+    env:
+      PGUSER: kai_user
+      PGPASSWORD: kai_password
+      PGDATABASE: kai_test_db
...
         env:
-          KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
+          KAI_DB_DSN: "postgresql+asyncpg://${{ env.PGUSER }}:${{ env.PGPASSWORD }}@localhost:5432/${{ env.PGDATABASE }}"
Optionally add an allowlist annotation if your policy permits.
kai_mcp_solution_server/Makefile (3)
96-99: Protect JSON in KAI_LLM_PARAMS

Quoting JSON inline can break with embedded quotes/newlines. Prefer an env file or base64.

Option A: --env-file .env (store KAI_LLM_PARAMS there)

Option B (inline-safe): -e KAI_LLM_PARAMS_B64="$$(printf %s "$$KAI_LLM_PARAMS" | base64 -w0)" and decode in entrypoint.

243-243: Fix help text: NAMESPACE default mismatch

Help says konveyor-kai but variable default is konveyor-kai-solution-server.
-   @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+   @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"
224-224: Help lists 'logs' target that doesn't exist

Either add a logs target or remove this line from help.

Example target:
+# Show container logs
+.PHONY: logs
+logs:
+	@echo "Showing logs for deployment in namespace $(NAMESPACE)..."
+	kubectl logs -n $(NAMESPACE) -l app=kai-mcp-solution-server --all-containers=true --tail=200
kai_mcp_solution_server/tests/test_multiple_integration.py (4)
748-756: Shorten post-run sleeps to reduce flakiness

Ten-second sleep is brittle; prefer awaiting the last client futures and then proceed, or add a small bounded wait (1–2s) if needed for logs.
-            await asyncio.sleep(10)  # wait a moment for all output to be printed
+            await asyncio.sleep(2)  # brief pause for buffered output
792-800: Fail fast if no hints; add context to failure

Good check. Consider logging the server mode and NUM_TASKS for easier triage.
-                    self.fail(
-                        "No hints were generated! Hint generation may not be working correctly."
-                    )
+                    mode = "external" if not start_own_server else "self-contained"
+                    self.fail(f"No hints generated (mode={mode}, clients={NUM_TASKS}). Hint generation may not be working.")
36-43: Cross-platform DB path safety

isoformat() includes : which is invalid on Windows filenames. Use a safe format.
-        timestamp = datetime.datetime.now().isoformat()
+        timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")
723-747: Collect root causes from client failures

Wrap exceptions with context (client_id, mode, transport) to speed up debugging.
-                    except Exception as exc:
-                        error_msg = f"Task {task_id} failed: {exc}"
+                    except Exception as exc:
+                        error_msg = f"Task {task_id} failed (mode={'external' if not start_own_server else 'self-contained'}): {exc}"
                         print(f"[Main] {error_msg}")
                         errors.append(error_msg)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02641db and 17e6fd6.

📒 Files selected for processing (5)

.github/workflows/stress-test-mcp-server.yml (1 hunks)
kai_mcp_solution_server/Makefile (5 hunks)
kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
kai_mcp_solution_server/tests/ssl_utils.py (2 hunks)
kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

kai_mcp_solution_server/tests/test_multiple_integration.py (7)

kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)

CallToolResult (22-24)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

GetBestHintResult (663-665)

SuccessRateMetric (721-727)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)

MCPClientArgs (32-42)

create_client (137-152)

kai_mcp_solution_server/tests/mcp_client.py (1)

MCPClientArgs (247-256)

kai_mcp_solution_server/tests/ssl_utils.py (2)

apply_ssl_bypass (49-81)

apply_ssl_bypass (168-192)

kai/jsonrpc/util.py (1)

model_dump (96-97)

kai/jsonrpc/streams.py (1)

close (70-75)

🪛 GitHub Check: Trunk Check

kai_mcp_solution_server/tests/mcp_loader_script.py

[failure] 9-9: ruff(F401)
[new] datetime.timedelta imported but unused

kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.

[notice] 693-693: bandit(B607)
[new] Starting a process with a partial executable path

[notice] 693-693: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🪛 Checkov (3.2.334)

.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 YAMLlint (1.37.1)

.github/workflows/stress-test-mcp-server.yml

[error] 74-74: trailing spaces

(trailing-spaces)

[error] 90-90: trailing spaces

(trailing-spaces)

[error] 96-96: trailing spaces

(trailing-spaces)

[error] 100-100: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

479-479: Use of assert detected

(S101)

482-482: Do not catch blind exception: Exception

(BLE001)

491-493: Avoid specifying long messages outside the exception class

(TRY003)

547-547: Use of assert detected

(S101)

573-573: Use of assert detected

(S101)

597-597: Use of assert detected

(S101)

608-608: Use of assert detected

(S101)

611-611: Use of assert detected

(S101)

652-652: Use of assert detected

(S101)

664-664: Use of assert detected

(S101)

667-667: Use of assert detected

(S101)

674-674: Use of assert detected

(S101)

679-679: Use of assert detected

(S101)

694-702: Starting a process with a partial executable path

(S607)

743-743: Do not catch blind exception: Exception

(BLE001)

789-789: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: Test Code on PR

kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 752-752: TestMultipleIntegration.test_multiple_users failed: 30 clients failed due to event loop errors (Event loop is closed, All connection attempts failed).

[warning] 1-1: RuntimeWarning: coroutine 'TestMultipleIntegration.test_multiple_users..client_task' was never awaited.

🪛 GitHub Actions: MCP Server Stress Test

kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 1-1: Test step failed: pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users - All connection attempts failed for 5 concurrent clients against the self-contained MCP server.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (1)

kai_mcp_solution_server/tests/ssl_utils.py (1)

88-93: Good idempotence guard on httpx patching.

The _ssl_bypass_patched sentinel effectively avoids double-patching.

coderabbitai · 2025-09-05T19:24:00Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+        if external_server_url:
+            # External server mode - parse URL to get host and port
+            from urllib.parse import urlparse
+
+            parsed = urlparse(external_server_url)
+            host = parsed.hostname or "localhost"
+            port = parsed.port or 8000
+            print(f"Using external MCP server at {host}:{port}")
+
+            multiple_user_mcp_args = MCPClientArgs(
+                transport="http",
+                host=host,
+                port=port,
+                insecure=True,
+                server_path=None,  # Not needed for external server
+            )
+            # Don't set KAI_LLM_PARAMS for external server - it should already be configured
+            start_own_server = False


🛠️ Refactor suggestion

⚠️ Potential issue

Honor URL scheme in external mode

Currently the client always uses http://{host}:{port}. If MCP_SERVER_URL is https://..., this will fail with connection errors.

- parsed = urlparse(external_server_url) + parsed = urlparse(external_server_url) host = parsed.hostname or "localhost" port = parsed.port or 8000 + scheme = (parsed.scheme or "http").lower() print(f"Using external MCP server at {host}:{port}") @@ - multiple_user_mcp_args = MCPClientArgs( + multiple_user_mcp_args = MCPClientArgs( transport="http", host=host, port=port, insecure=True, server_path=None, # Not needed for external server )

And use scheme when constructing the transport (see suggestion at Lines 509-515).

Committable suggestion skipped: line range outside the PR's diff.

kai_mcp_solution_server/tests/test_multiple_integration.py

coderabbitai · 2025-09-05T19:24:01Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+            print(f"[Client {client_id}] starting")
+            apply_ssl_bypass()
+
+            client = Client(
+                transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
+            )
+


🛠️ Refactor suggestion

Scope the SSL bypass and only when needed

Global monkeypatching per thread is racy and unnecessary for HTTP. Use the context manager and gate on scheme.

- async def client_task(client_id: str) -> None: + async def client_task(client_id: str) -> None: print(f"[Client {client_id}] starting") - apply_ssl_bypass() - - client = Client( - transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", - ) + scheme = "http" + try: + scheme = scheme # external mode may set this; fallback to http + except NameError: + pass + transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}" + cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext() + with cm: + client = Client(transport=transport)

Add at top-level imports:

import contextlib

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 509-515, avoid globally monkeypatching SSL per thread by scoping the bypass: add "import contextlib" at the top, remove the standalone apply_ssl_bypass() call and instead wrap the client creation in a context manager that only applies the bypass when the transport scheme is HTTPS, e.g. compute the transport URL string, then use "with (apply_ssl_bypass() if transport_url.startswith('https') else contextlib.nullcontext()):" and create the Client inside that block so the SSL bypass is temporary and only used when needed.

coderabbitai · 2025-09-05T19:24:01Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+                await asyncio.sleep(1)  # give the server a second to start
+            else:


🛠️ Refactor suggestion

⚠️ Potential issue

Replace fixed sleep with active readiness probe

Prevents clients racing the server socket bind (root cause of intermittent connect failures).

- await asyncio.sleep(1) # give the server a second to start + await wait_for_http_server("localhost", 8087, timeout=30)

Place this helper near other helpers in the module (outside the selected range):

async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None: loop = asyncio.get_running_loop() deadline = loop.time() + timeout last_err: Exception | None = None while loop.time() < deadline: try: reader, writer = await asyncio.open_connection(host, port) writer.close() await writer.wait_closed() return except Exception as e: last_err = e await asyncio.sleep(0.2) raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 713-714, replace the fixed await asyncio.sleep(1) used to wait for the server with an active readiness probe: add the provided async helper function wait_for_http_server(host, port, timeout=30.0) near the other test helpers in the module (outside the selected range), and call await wait_for_http_server(server_host, server_port, timeout=30.0) instead of the fixed sleep so the test actively retries connecting until the server socket is ready and avoids race conditions.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

kai_mcp_solution_server/Makefile (1)

110-118: Avoid leaking secrets via inlined env values in podman-sqlite

Passing secrets as -e KEY='$(VALUE)' exposes them via ps/history. Pass-through host env instead.

Apply this diff:

-        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
-        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
-        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
-        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
-        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
-        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
-        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
-        $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
-        $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+        $(if $(AWS_REGION),-e AWS_REGION,) \
+        $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \

.github/workflows/stress-test-mcp-server.yml (1)

69-69: Use inputs context for num_clients (works across event types)

github.event.inputs is null except for workflow_dispatch. Use inputs.num_clients.

Apply this diff:
-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

🧹 Nitpick comments (7)

kai_mcp_solution_server/Makefile (2)
89-100: Consider passing provider creds in run-podman, too

External LLMs won’t work unless creds are provided. Mirror the pass-through flags here or document using PODMAN_ARGS.

239-239: Help text default for NAMESPACE doesn’t match actual default

Help says default is “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.

Apply this diff:
-    @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+    @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"
Also applies to: 5-5
kai_mcp_solution_server/tests/test_multiple_integration.py (5)
460-472: Use asyncio.run per thread to avoid loop lifecycle bugs

Simpler and avoids “event loop is closed” under stress.

Apply this diff:
 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))
700-702: Apply SSL bypass for hint check when using HTTPS

Mirror the scoped bypass used for client tasks so HTTPS external servers pass.

Apply this diff:
-        async with create_client(multiple_user_mcp_args) as hint_session:
+        cm = apply_ssl_bypass() if multiple_user_mcp_args.transport == "https" else contextlib.nullcontext()
+        with cm:
+            async with create_client(multiple_user_mcp_args) as hint_session:
+                await hint_session.initialize()
-            await hint_session.initialize()
512-647: Prefer unittest assertions over bare assert for clearer failures

Optional, but aligns with unittest style and avoids S101 lints. Example:

Apply this pattern:
-assert incident_id > 0, f"Invalid incident ID: {incident_id}"
+self.assertGreater(incident_id, 0, f"Invalid incident ID: {incident_id}")
Repeat similarly for other asserts in this block.

555-555: Remove redundant local import

import json is already at module top.

Apply this diff:
-                    import json
684-697: Replace fixed sleeps with active polling

Use a readiness/poll helper instead of arbitrary sleeps to cut flakiness/time.

I can provide a small wait_for_condition or HTTP probe helper if you want it integrated.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 17e6fd6 and 4ad87a5.

📒 Files selected for processing (4)

.github/workflows/stress-test-mcp-server.yml (1 hunks)
kai_mcp_solution_server/Makefile (5 hunks)
kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

kai_mcp_solution_server/tests/mcp_loader_script.py

🧰 Additional context used

🪛 Checkov (3.2.334)

.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

512-512: Use of assert detected

(S101)

538-538: Use of assert detected

(S101)

562-562: Use of assert detected

(S101)

573-573: Use of assert detected

(S101)

576-576: Use of assert detected

(S101)

617-617: Use of assert detected

(S101)

629-629: Use of assert detected

(S101)

632-632: Use of assert detected

(S101)

639-639: Use of assert detected

(S101)

644-644: Use of assert detected

(S101)

679-679: Do not catch blind exception: Exception

(BLE001)

725-725: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: MCP Server Stress Test

kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 688-688: Test encountered parsing error: incident_id could not be parsed due to an error response from create_incident (invalid literal for int() with base 10: 'Error calling tool 'create_incident': ...'). Underlying cause: database integrity violation causing the tool call to fail.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)

coderabbitai · 2025-09-05T21:07:30Z

.github/workflows/stress-test-mcp-server.yml

+          echo "Starting MCP server connected to PostgreSQL..."
+          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
+          SERVER_PID=$!
+
+          # Wait for server to be ready
+          echo "Waiting for server to start..."
+          for i in {1..30}; do
+            if curl -s http://localhost:8000/ > /dev/null 2>&1; then
+              echo "Server is ready!"
+              break
+            fi
+            if [ $i -eq 30 ]; then
+              echo "Server failed to start in 30 seconds"
+              kill $SERVER_PID || true
+              exit 1
+            fi
+            echo -n "."
+            sleep 1
+          done
+
+          # Run the stress test
+          echo ""
+          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
+          make test-stress
+          TEST_RESULT=$?
+
+          # Stop the server
+          echo "Stopping MCP server..."
+          kill $SERVER_PID || true
+
+          exit $TEST_RESULT


🛠️ Refactor suggestion

Harden server lifecycle: fail-fast, trap, and wait on exit

Prevents orphaned background server and ensures proper failure propagation.

Apply this diff:

- run: | - echo "Starting MCP server connected to PostgreSQL..." - uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & - SERVER_PID=$! + run: | + set -euo pipefail + echo "Starting MCP server connected to PostgreSQL..." + uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & + SERVER_PID=$! + trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT @@ - kill $SERVER_PID || true - - exit $TEST_RESULT + kill $SERVER_PID || true + wait $SERVER_PID || true + exit $TEST_RESULT

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Starting MCP server connected to PostgreSQL..."

uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &

SERVER_PID=$!

# Wait for server to be ready

echo "Waiting for server to start..."

for i in {1..30}; do

if curl -s http://localhost:8000/ > /dev/null 2>&1; then

echo "Server is ready!"

break

fi

if [ $i -eq 30 ]; then

echo "Server failed to start in 30 seconds"

kill $SERVER_PID || true

exit 1

fi

echo -n "."

sleep 1

done

# Run the stress test

echo ""

echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"

make test-stress

TEST_RESULT=$?

# Stop the server

echo "Stopping MCP server..."

kill $SERVER_PID || true

exit $TEST_RESULT

run: |

set -euo pipefail

echo "Starting MCP server connected to PostgreSQL..."

uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &

SERVER_PID=$!

trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT

# Wait for server to be ready

echo "Waiting for server to start..."

for i in {1..30}; do

if curl -s http://localhost:8000/ > /dev/null 2>&1; then

echo "Server is ready!"

break

fi

if [ $i -eq 30 ]; then

echo "Server failed to start in 30 seconds"

kill $SERVER_PID || true

exit 1

fi

echo -n "."

sleep 1

done

# Run the stress test

echo ""

echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"

make test-stress

TEST_RESULT=$?

# Stop the server

echo "Stopping MCP server..."

kill $SERVER_PID || true

wait $SERVER_PID || true

exit $TEST_RESULT

🤖 Prompt for AI Agents

In .github/workflows/stress-test-mcp-server.yml around lines 71 to 101, the workflow starts the MCP server in the background but doesn't fail-fast, trap signals, or wait for the server process, which can leave orphaned processes or hide failures; modify the script to: enable strict failure handling (e.g., set -euo pipefail), register a trap on EXIT/INT/TERM that kills the background server PID if still running, after launching the server save SERVER_PID and on startup failure kill that PID and exit non‑zero, and when running tests use wait $SERVER_PID to propagate the server's exit status (or capture and return TEST_RESULT appropriately) so the job fails fast and cleans up the server in all termination cases.

coderabbitai · 2025-09-05T21:07:31Z

.github/workflows/stress-test-mcp-server.yml

+      - name: Check PostgreSQL connection count
+        if: always()
+        run: |
+          PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \
+            "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
+


🛠️ Refactor suggestion

Install psql before querying pg_stat_activity

Ubuntu runners may lack psql; this step can fail the job.

Add this step before “Check PostgreSQL connection count”:

- name: Install PostgreSQL client run: | sudo apt-get update sudo apt-get install -y postgresql-client

🤖 Prompt for AI Agents

In .github/workflows/stress-test-mcp-server.yml around lines 104 to 109, the workflow runs psql directly which can fail on Ubuntu runners that don't have the PostgreSQL client installed; add a step immediately before the "Check PostgreSQL connection count" step that runs "sudo apt-get update" and then "sudo apt-get install -y postgresql-client" to ensure the psql binary is available for the subsequent query.

coderabbitai · 2025-09-05T21:07:31Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+        parsed = urlparse(external_server_url)
+        host = parsed.hostname or "localhost"
+        port = parsed.port or 8000
+        print(f"Using external MCP server at {host}:{port}")
+
+        multiple_user_mcp_args = MCPClientArgs(
+            transport="http",
+            host=host,
+            port=port,
+            insecure=True,
+            server_path=None,  # Not needed for external server
+        )


🛠️ Refactor suggestion

Honor MCP_SERVER_URL scheme and scope SSL bypass; prevent TLS flakiness

Use scheme from MCP_SERVER_URL and apply SSL bypass only for HTTPS.

Apply this diff:

parsed = urlparse(external_server_url) host = parsed.hostname or "localhost" port = parsed.port or 8000 -print(f"Using external MCP server at {host}:{port}") +scheme = (parsed.scheme or "http").lower() +print(f"Using external MCP server at {host}:{port} ({scheme})") @@ - multiple_user_mcp_args = MCPClientArgs( - transport="http", + multiple_user_mcp_args = MCPClientArgs( + transport=scheme, host=host, port=port, insecure=True, server_path=None, # Not needed for external server ) @@ - async def client_task(client_id: str) -> None: + async def client_task(client_id: int) -> None: print(f"[Client {client_id}] starting") - apply_ssl_bypass() - - client = Client( - transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", - ) + transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}" + cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext() + with cm: + client = Client(transport=transport)

Add import at top (near other imports):

import contextlib

Also applies to: 473-480

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 446-457, the test currently hardcodes transport="http" and insecure=True; change it to derive transport from the MCP_SERVER_URL scheme (use parsed.scheme or default "http") and set insecure=True only when the scheme is "https" (otherwise False) so SSL bypass is applied only for HTTPS; add "import contextlib" near the other imports; apply the same scheme+insecure logic to the similar block at lines 473-480.

kai_mcp_solution_server/tests/test_multiple_integration.py

coderabbitai · 2025-09-05T21:07:31Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+                                "uri": f"file://test/file_{client_id}.java",
+                                "message": f"Test issue for client {client_id}",
+                                "ruleset_name": f"test-ruleset-{client_id % 10}",  # Share some rulesets
+                                "violation_name": f"test-violation-{client_id % 5}",  # Share some violations
+                                "violation_category": "potential",
+                                "code_snip": "// test code",
+                                "line_number": 42,


⚠️ Potential issue

Make URIs unique per client to avoid DB integrity violations

Use per-client UUID in URIs to prevent unique-constraint collisions.

Apply this diff:

- "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", @@ - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", @@ - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java",

Also applies to: 522-529, 586-588

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 499-505 (and similarly at 522-529 and 586-588), the test uses identical file:// URIs per client which causes DB unique-constraint collisions; update the URI to include a per-client unique identifier (e.g., a generated UUID or client-specific UUID string) so each client gets a distinct URI (e.g., file://test/{client_uuid}/file_{client_id}.java or append ?uid={uuid}); ensure the UUID is generated/available in the test setup and used consistently in all three ranges mentioned to prevent integrity violations.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

kai_mcp_solution_server/tests/test_multiple_integration.py (3)

460-471: Use asyncio.run to avoid event-loop lifecycle bugs.
Simpler and less flaky in threads.

 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))

446-457: Honor MCP_SERVER_URL scheme; set insecure only for HTTPS.
Prevents HTTP/HTTPS mismatches and avoids unnecessary global SSL patching.

 parsed = urlparse(external_server_url)
 host = parsed.hostname or "localhost"
 port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
 
 multiple_user_mcp_args = MCPClientArgs(
-    transport="http",
+    transport=scheme,
     host=host,
     port=port,
-    insecure=True,
+    insecure=(scheme == "https"),
     server_path=None,  # Not needed for external server
 )

473-481: Scope SSL bypass and use correct scheme; fix type annotation.
Avoids global monkeypatching and ensures HTTPS works; annotation must be int since you use modulo arithmetic.

-async def client_task(client_id: str) -> None:
+async def client_task(client_id: int) -> None:
     print(f"[Client {client_id}] starting")
-    apply_ssl_bypass()
-
-    client = Client(
-        transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-    )
+    transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+    cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+    with cm:
+        client = Client(transport=transport_url)

🧹 Nitpick comments (5)

kai_mcp_solution_server/tests/test_multiple_integration.py (5)

1-11: Add contextlib import (needed for scoped SSL bypass).
Used below to conditionally apply the SSL bypass only for HTTPS.

 import asyncio
-import concurrent
 import concurrent.futures
 import datetime
 import json
 import os
 import unittest
 from uuid import uuid4
+import contextlib

2-3: Drop redundant import.
import concurrent is unnecessary since import concurrent.futures already brings concurrent into scope.

-import concurrent
 import concurrent.futures

621-623: Replace fixed sleep with readiness polling.
Reduces flakiness under load by waiting until metrics reflect acceptance (with timeout).

-# Wait a bit for the acceptance to be processed
-await asyncio.sleep(0.5)
+# Wait for acceptance to be reflected in metrics (timeout safeguards the loop)
+await wait_for_metric_change(client, client_id, initial_accepted, initial_pending, ruleset=f"test-ruleset-{client_id % 10}", violation=f"test-violation-{client_id % 5}", timeout=10.0)

Add this helper (outside the diffed range, e.g., near other helpers in this test module):

async def wait_for_metric_change(client: Client, client_id: int, initial_accepted: int, initial_pending: int, ruleset: str, violation: str, timeout: float = 10.0) -> None:
    start = asyncio.get_event_loop().time()
    last_err: Exception | None = None
    while asyncio.get_event_loop().time() - start < timeout:
        res = await client.session.call_tool(
            "get_success_rate",
            {"violation_ids": [{"ruleset_name": ruleset, "violation_name": violation}]},
        )
        if res.isError:
            last_err = RuntimeError(res.content[0].text)
            await asyncio.sleep(0.2)
            continue
        rate = json.loads(res.content[0].text)
        if isinstance(rate, list):
            rate = rate[0] if rate else {}
        if (rate.get("accepted_solutions", 0) - initial_accepted) >= 1 and (rate.get("pending_solutions", 0) - initial_pending) <= 0:
            return
        await asyncio.sleep(0.2)
    raise TimeoutError(f"[Client {client_id}] metrics did not reflect acceptance within {timeout}s: last_err={last_err}")

683-687: Probe server readiness before launching clients.
Avoids initial connection races against the external server.

 # External server should already be running
 # Allow configuring number of concurrent clients via environment variable
 NUM_TASKS = int(os.environ.get("NUM_CONCURRENT_CLIENTS", "30"))
 print(f"Testing with {NUM_TASKS} concurrent clients")
+await wait_for_http_server(host, port, timeout=30)

Add this helper (outside the diffed range):

async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None:
    loop = asyncio.get_running_loop()
    deadline = loop.time() + timeout
    last_err: Exception | None = None
    while loop.time() < deadline:
        try:
            reader, writer = await asyncio.open_connection(host, port)
            writer.close()
            await writer.wait_closed()
            return
        except Exception as e:
            last_err = e
            await asyncio.sleep(0.2)
    raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")

720-757: Wrap hint-session in scoped SSL bypass for HTTPS.
Keeps HTTPS behavior consistent with client tasks.

 print("Verifying hints were generated...")
-async with create_client(multiple_user_mcp_args) as hint_session:
-    await hint_session.initialize()
+cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+with cm:
+    async with create_client(multiple_user_mcp_args) as hint_session:
+        await hint_session.initialize()
 
-# Check hints for a few different violation combinations
-violations_to_check = [
+    # Check hints for a few different violation combinations
+    violations_to_check = [
         ("test-ruleset-0", "test-violation-0"),
         ("test-ruleset-1", "test-violation-1"),
         ("test-ruleset-2", "test-violation-2"),
-]
+    ]
 
-hints_found = 0
-for ruleset, violation in violations_to_check:
-    try:
-        hint_result = await hint_session.call_tool(
+    hints_found = 0
+    for ruleset, violation in violations_to_check:
+        try:
+            hint_result = await hint_session.call_tool(
                 "get_best_hint",
                 {
                     "ruleset_name": ruleset,
                     "violation_name": violation,
                 },
-        )
-        if hint_result and not hint_result.isError:
-            hints_found += 1
-            print(f"  ✓ Found hint for {ruleset}/{violation}")
-        else:
-            print(f"  ✗ No hint for {ruleset}/{violation}")
-    except Exception as e:
-        print(f"  ✗ Error checking hint for {ruleset}/{violation}: {e}")
+            )
+            if hint_result and not hint_result.isError:
+                hints_found += 1
+                print(f"  ✓ Found hint for {ruleset}/{violation}")
+            else:
+                print(f"  ✗ No hint for {ruleset}/{violation}")
+        except Exception as e:
+            print(f"  ✗ Error checking hint for {ruleset}/{violation}: {e}")
 
 if hints_found == 0:
     self.fail(
         "No hints were generated! Hint generation may not be working correctly."
     )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4ad87a5 and 1a0942c.

📒 Files selected for processing (1)

kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

kai_mcp_solution_server/tests/test_multiple_integration.py (4)

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

GetBestHintResult (663-665)

SuccessRateMetric (721-727)

kai_mcp_solution_server/tests/mcp_client.py (1)

MCPClientArgs (247-256)

kai_mcp_solution_server/tests/mcp_loader_script.py (2)

MCPClientArgs (31-41)

create_client (136-151)

kai_mcp_solution_server/tests/ssl_utils.py (2)

apply_ssl_bypass (49-81)

apply_ssl_bypass (168-192)

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

514-516: Abstract raise to an inner function

(TRY301)

514-516: Create your own exception

(TRY002)

514-516: Avoid specifying long messages outside the exception class

(TRY003)

521-523: Create your own exception

(TRY002)

521-523: Avoid specifying long messages outside the exception class

(TRY003)

526-526: Use of assert detected

(S101)

554-556: Abstract raise to an inner function

(TRY301)

554-556: Create your own exception

(TRY002)

554-556: Avoid specifying long messages outside the exception class

(TRY003)

561-563: Create your own exception

(TRY002)

561-563: Avoid specifying long messages outside the exception class

(TRY003)

566-566: Use of assert detected

(S101)

590-590: Use of assert detected

(S101)

601-601: Use of assert detected

(S101)

604-604: Use of assert detected

(S101)

645-645: Use of assert detected

(S101)

657-657: Use of assert detected

(S101)

660-660: Use of assert detected

(S101)

667-667: Use of assert detected

(S101)

672-672: Use of assert detected

(S101)

706-706: Do not catch blind exception: Exception

(BLE001)

747-747: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

coderabbitai · 2025-09-05T21:14:56Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+                    # Create an incident
+                    incident_result = await client.session.call_tool(
+                        "create_incident",
+                        {
+                            "client_id": f"stress-test-{client_uuid}",
+                            "extended_incident": {
+                                "uri": f"file://test/file_{client_id}.java",
+                                "message": f"Test issue for client {client_id}",
+                                "ruleset_name": f"test-ruleset-{client_id % 10}",  # Share some rulesets
+                                "violation_name": f"test-violation-{client_id % 5}",  # Share some violations
+                                "violation_category": "potential",
+                                "code_snip": "// test code",
+                                "line_number": 42,
+                                "variables": {},
+                            },
+                        },


🛠️ Refactor suggestion

Make file URIs unique per client to prevent DB unique-constraint collisions.
Shareable rulesets/violations are fine; URIs should be distinct.

"extended_incident": { - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", "message": f"Test issue for client {client_id}", "ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets "violation_name": f"test-violation-{client_id % 5}", # Share some violations "violation_category": "potential", "code_snip": "// test code", "line_number": 42, "variables": {}, }, @@ "before": [ { - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", "content": "// original code", } ], "after": [ { - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", "content": "// fixed code", } ], @@ "solution_file": { - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", "content": "// fixed code", },

Also applies to: 535-546, 612-617

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 493 to 508 (and also apply the same change at 535-546 and 612-617), the file URI used for incidents is not unique per client which causes DB unique-constraint collisions; change the "uri" value to include a client-unique identifier (for example client_uuid or client_id) so each test incident uses a distinct file:// path (e.g., append client_uuid/client_id into the filename or path) while leaving shared ruleset/violation names unchanged.

coderabbitai · 2025-09-05T21:14:56Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+                    # Get success rate (before accepting)
+                    success_rate_result = await client.session.call_tool(
+                        "get_success_rate",
+                        {
+                            "violation_ids": [
+                                {
+                                    "ruleset_name": f"test-ruleset-{client_id % 10}",
+                                    "violation_name": f"test-violation-{client_id % 5}",
+                                }
+                            ]
+                        },
+                    )
+                    # Parse and verify success rate
+                    success_rate_text = success_rate_result.content[0].text
+                    import json
+
+                    success_rate = json.loads(success_rate_text)
+                    print(f"[Client {client_id}] got success rate: {success_rate}")
+


🛠️ Refactor suggestion

Check success_rate tool-call errors and remove redundant import.
Prevents JSON parse on error payload; json is already imported at top.

success_rate_result = await client.session.call_tool( "get_success_rate", { "violation_ids": [ { "ruleset_name": f"test-ruleset-{client_id % 10}", "violation_name": f"test-violation-{client_id % 5}", } ] }, ) -# Parse and verify success rate -success_rate_text = success_rate_result.content[0].text -import json +if success_rate_result.isError: + error_msg = success_rate_result.content[0].text + raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}") +# Parse and verify success rate +success_rate_text = success_rate_result.content[0].text success_rate = json.loads(success_rate_text) print(f"[Client {client_id}] got success rate: {success_rate}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get success rate (before accepting)

success_rate_result = await client.session.call_tool(

"get_success_rate",

{

"violation_ids": [

{

"ruleset_name": f"test-ruleset-{client_id % 10}",

"violation_name": f"test-violation-{client_id % 5}",

}

]

},

)

# Parse and verify success rate

success_rate_text = success_rate_result.content[0].text

import json

success_rate = json.loads(success_rate_text)

print(f"[Client {client_id}] got success rate: {success_rate}")

# Get success rate (before accepting)

success_rate_result = await client.session.call_tool(

"get_success_rate",

{

"violation_ids": [

{

"ruleset_name": f"test-ruleset-{client_id % 10}",

"violation_name": f"test-violation-{client_id % 5}",

}

]

},

)

if success_rate_result.isError:

error_msg = success_rate_result.content[0].text

raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}")

# Parse and verify success rate

success_rate_text = success_rate_result.content[0].text

success_rate = json.loads(success_rate_text)

print(f"[Client {client_id}] got success rate: {success_rate}")

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 568 to 586, the code unconditionally parses success_rate_result.content[0].text with a local "import json" and may attempt to JSON-parse an error payload; remove the redundant "import json" and first check the tool-call result for an error/failed status (or presence of an error field) before parsing: if the call failed, log/raise/assert with the error message instead of parsing, and only call json.loads on the text when the result indicates success.

coderabbitai · 2025-09-05T21:14:56Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+                    # Get success rate again (after accepting)
+                    success_rate_result2 = await client.session.call_tool(
+                        "get_success_rate",
+                        {
+                            "violation_ids": [
+                                {
+                                    "ruleset_name": f"test-ruleset-{client_id % 10}",
+                                    "violation_name": f"test-violation-{client_id % 5}",
+                                }
+                            ]
+                        },
+                    )
+                    success_rate_text2 = success_rate_result2.content[0].text
+                    success_rate2 = json.loads(success_rate_text2)


🛠️ Refactor suggestion

Also check the second success_rate call for errors.
Matches the first call; avoids parsing error payloads.

success_rate_result2 = await client.session.call_tool( "get_success_rate", { "violation_ids": [ { "ruleset_name": f"test-ruleset-{client_id % 10}", "violation_name": f"test-violation-{client_id % 5}", } ] }, ) -success_rate_text2 = success_rate_result2.content[0].text +if success_rate_result2.isError: + error_msg = success_rate_result2.content[0].text + raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}") +success_rate_text2 = success_rate_result2.content[0].text success_rate2 = json.loads(success_rate_text2)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get success rate again (after accepting)

success_rate_result2 = await client.session.call_tool(

"get_success_rate",

{

"violation_ids": [

{

"ruleset_name": f"test-ruleset-{client_id % 10}",

"violation_name": f"test-violation-{client_id % 5}",

}

]

},

)

success_rate_text2 = success_rate_result2.content[0].text

success_rate2 = json.loads(success_rate_text2)

# Get success rate again (after accepting)

success_rate_result2 = await client.session.call_tool(

"get_success_rate",

{

"violation_ids": [

{

"ruleset_name": f"test-ruleset-{client_id % 10}",

"violation_name": f"test-violation-{client_id % 5}",

}

]

},

)

if success_rate_result2.isError:

error_msg = success_rate_result2.content[0].text

raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}")

success_rate_text2 = success_rate_result2.content[0].text

success_rate2 = json.loads(success_rate_text2)

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 624 to 637, the second get_success_rate call result isn't being checked for an error before parsing; add the same safety/assertions used on the first call: assert success_rate_result2.success is True (or raise a clear assertion with result payload), assert content exists and has at least one item, and assert the first content item contains a text field before calling json.loads on it so we avoid parsing error payloads.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

kai_mcp_solution_server/tests/mcp_loader_script.py (1)
105-114: STDIO client: guard missing server_path.

Prevent passing cwd=None into StdioServerParameters.
 async def create_stdio_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]:
-    print(f"Using server path: {args.server_path}")
+    if not args.server_path:
+        raise ValueError("server_path is required for stdio transport")
+    print(f"Using server path: {args.server_path}")

♻️ Duplicate comments (9)

kai_mcp_solution_server/tests/mcp_client.py (1)

305-333: Guard stdio path: avoid Path(None) TypeError.

When transport=stdio and server_path is None, Path(None) raises. Add an early check.

-        else:  # stdio transport
-            print(f"Using server path: {args.server_path}")
+        else:  # stdio transport
+            if args.server_path is None:
+                raise ValueError("server_path is required when using --transport stdio")
+            print(f"Using server path: {args.server_path}")
             logger.debug(
                 f"Initializing STDIO transport with server path: {args.server_path}"
             )
@@
-            server_path = Path(args.server_path)
+            server_path = Path(args.server_path)

.github/workflows/stress-test-mcp-server.yml (3)

69-69: Use inputs context for dispatch parameter.

github.event.inputs is null outside workflow_dispatch; inputs works across events.

-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

71-101: Harden server lifecycle: fail-fast, trap, and wait.

Prevents orphaned server and ensures errors propagate reliably under load.

-        run: |
-          echo "Starting MCP server connected to PostgreSQL..."
-          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
-          SERVER_PID=$!
+        run: |
+          set -euo pipefail
+          echo "Starting MCP server connected to PostgreSQL..."
+          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
+          SERVER_PID=$!
+          trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT
@@
-          kill $SERVER_PID || true
-
-          exit $TEST_RESULT
+          kill $SERVER_PID || true
+          wait $SERVER_PID || true
+          exit $TEST_RESULT

104-109: Install psql before querying pg_stat_activity.

Ubuntu runners may not have the client by default.

+      - name: Install PostgreSQL client
+        if: always()
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y postgresql-client
       - name: Check PostgreSQL connection count
         if: always()
         run: |
           PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \
             "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

kai_mcp_solution_server/tests/test_multiple_integration.py (5)

401-413: Use asyncio.run to avoid loop lifecycle bugs in threads.

Simpler, less flaky than manual loop creation/closing.

 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))

440-448: Make file URIs unique per client to avoid collisions in shared DBs.

Add client_uuid so URIs don’t collide across concurrent runs/clients.

-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",

Also applies to: 477-485, 483-485, 555-557

414-421: Activate SSL bypass and use parsed scheme in transport URL.

Calling apply_ssl_bypass() without entering the context has no effect; also the URL is forced to HTTP.

-async def client_task(client_id: str) -> None:
+async def client_task(client_id: int) -> None:
     print(f"[Client {client_id}] starting")
-    apply_ssl_bypass()
-
-    client = Client(
-        transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-    )
+    transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+    cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+    with cm:
+        client = Client(transport=transport_url)

509-527: Check tool-call result before JSON parse; remove inner import.

Avoid parsing error payloads; reuse top-level json import.

 success_rate_result = await client.session.call_tool(
@@
 )
-# Parse and verify success rate
-success_rate_text = success_rate_result.content[0].text
-import json
+if getattr(success_rate_result, "isError", False):
+    msg = success_rate_result.content[0].text if success_rate_result.content else "unknown error"
+    raise Exception(f"[Client {client_id}] get_success_rate failed: {msg}")
+success_rate_text = success_rate_result.content[0].text
 success_rate = json.loads(success_rate_text)

565-579: Also guard the second success-rate call.

 success_rate_result2 = await client.session.call_tool(
@@
 )
-success_rate_text2 = success_rate_result2.content[0].text
+if getattr(success_rate_result2, "isError", False):
+    msg = success_rate_result2.content[0].text if success_rate_result2.content else "unknown error"
+    raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {msg}")
+success_rate_text2 = success_rate_result2.content[0].text
 success_rate2 = json.loads(success_rate_text2)

🧹 Nitpick comments (13)

.gitignore (1)
209-213: Also ignore SQLite sidecar files (-wal/-shm).

To keep repos clean after local runs, consider ignoring SQLite’s write-ahead log artifacts too.
 # Test database files
 tests/test_*.db
 kai_mcp_solution_server/tests/test_*.db
 *.db
+*.db-wal
+*.db-shm
kai_mcp_solution_server/tests/mcp_client.py (1)
351-361: Increase client run timeout to reduce CI flakes under load.

15s can be tight on busy runners; bumping improves stress-test stability.
-                await asyncio.wait_for(run_with_timeout(), timeout=15.0)
+                await asyncio.wait_for(run_with_timeout(), timeout=45.0)
kai_mcp_solution_server/tests/ssl_utils.py (2)
88-93: Idempotence flag: don’t mark as patched if we didn’t patch.

If we early-return, leave httpx_patched False so restore() doesn’t attempt to revert unknown state.
-            if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"):
-                logger.debug("httpx already patched, skipping")
-                self.httpx_patched = True
-                return
+            if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"):
+                logger.debug("httpx already patched, skipping")
+                self.httpx_patched = False
+                return
106-109: Preserve function metadata on wrappers.

Use functools.wraps so debuggers and introspection don’t degrade.
+            import functools
@@
-            patched_client_init._ssl_bypass_patched = True
-            patched_async_client_init._ssl_bypass_patched = True
+            patched_client_init = functools.wraps(self.original_httpx_client_init)(patched_client_init)
+            patched_async_client_init = functools.wraps(self.original_httpx_async_client_init)(patched_async_client_init)
+            patched_client_init._ssl_bypass_patched = True
+            patched_async_client_init._ssl_bypass_patched = True
Optional: narrow warning suppression (lines 60–63) to specific SSL categories instead of all Warning to avoid masking unrelated issues.
kai_mcp_solution_server/Makefile (1)
95-100: Avoid inlining secrets in process args.

If KAI_LLM_PARAMS may contain credentials, prefer pass-through from environment to avoid exposure via ps/history.
-        -e MOUNT_PATH='$(MOUNT_PATH)' \
-        -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
-        -e KAI_DB_DSN='$(KAI_DB_DSN)' \
+        -e MOUNT_PATH \
+        -e KAI_LLM_PARAMS \
+        -e KAI_DB_DSN \
If you need non-env defaults, consider --env-file or mounting a config file.
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (3)
113-127: Detect backend via URL, and add sqlite timeout.

String contains("sqlite") can produce false positives; use make_url. Also add aiosqlite connect timeout to reduce “database is locked” under load.
-async def get_async_engine(url: URL | str) -> AsyncEngine:
-    # Convert to string if URL object
-    url_str = str(url)
-
-    # Configure connection pool based on database type
-    if "sqlite" in url_str:
+async def get_async_engine(url: URL | str) -> AsyncEngine:
+    from sqlalchemy.engine.url import make_url
+    backend = make_url(str(url)).get_backend_name()
+
+    # Configure connection pool based on database type
+    if backend == "sqlite":
         # SQLite needs special handling due to its file-based nature
         # and limited write concurrency (even with WAL mode)
         engine = create_async_engine(
             url,
-            connect_args={"check_same_thread": False},
+            connect_args={"check_same_thread": False, "timeout": 30},
             poolclass=StaticPool,  # Single shared connection for SQLite
             echo_pool=False,  # Set to True for debugging connection pool
         )
Note: If you expect concurrent writes on sqlite, consider NullPool + WAL mode, or prefer Postgres in stress paths.

100-105: Use logger instead of stderr print for DDL ops.

Keeps diagnostics consistent with server logging.
-        print("Dropping all tables", file=sys.stderr)
+        import logging
+        logging.getLogger(__name__).warning("Dropping all tables")
113-141: Upstream API race mitigation (follow-up).

Most reported errors were httpx/OpenAI connect/TLS issues under concurrency. Beyond DB pools, ensure:

A single shared Async OpenAI client per process with tuned httpx.Limits (e.g., max_connections ~ 100, max_keepalive ~ 20), and explicit timeouts.

Bounded concurrency around the LLM call (semaphore) and retry with jitter on connect/read timeouts.

Idempotency guards so retries don’t duplicate work.

I can draft a patch wiring a shared client and a retry/backoff policy if you want.
kai_mcp_solution_server/tests/test_multiple_integration.py (3)
2-3: Remove redundant import.

import concurrent is unused; import concurrent.futures is sufficient.
-import concurrent
 import concurrent.futures
567-604: Prefer unittest assertions over bare asserts (optional).

For clearer failure output, replace plain assert with self.assert* equivalents.

651-655: Drop the fixed 10s sleep.

All futures are awaited; this extra wait extends CI time without benefit.
-        await asyncio.sleep(10)  # wait a moment for all output to be printed
+        # No-op: results already collected; avoid fixed delays.
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
111-115: Remove unused lock or repurpose for LLM gating.

self.lock is no longer used; either delete it or replace with a concurrency gate for outbound LLM calls (see below).
 class KaiSolutionServerContext:
     def __init__(self, settings: SolutionServerSettings) -> None:
         self.settings = settings
-        self.lock = asyncio.Lock()
260-264: Re-removal of lock: add DB-level get-or-create safety.

With the lock gone, concurrent create_incident calls can race on first-time DBViolation creation. Ensure a unique constraint exists on (ruleset_name, violation_name) and handle IntegrityError with a re-query.

Would you like a small patch that wraps violation creation in a retry-on-conflict block?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60a0052 and f20601b.

📒 Files selected for processing (9)

.github/workflows/stress-test-mcp-server.yml (1 hunks)
.gitignore (1 hunks)
kai_mcp_solution_server/Makefile (5 hunks)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2 hunks)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (10 hunks)
kai_mcp_solution_server/tests/mcp_client.py (1 hunks)
kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
kai_mcp_solution_server/tests/ssl_utils.py (2 hunks)
kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)

🧰 Additional context used

🪛 Checkov (3.2.334)

.github/workflows/stress-test-mcp-server.yml

[medium] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/test_multiple_integration.py

455-457: Abstract raise to an inner function

(TRY301)

455-457: Create your own exception

(TRY002)

455-457: Avoid specifying long messages outside the exception class

(TRY003)

462-464: Create your own exception

(TRY002)

462-464: Avoid specifying long messages outside the exception class

(TRY003)

467-467: Use of assert detected

(S101)

495-497: Abstract raise to an inner function

(TRY301)

495-497: Create your own exception

(TRY002)

495-497: Avoid specifying long messages outside the exception class

(TRY003)

502-504: Create your own exception

(TRY002)

502-504: Avoid specifying long messages outside the exception class

(TRY003)

507-507: Use of assert detected

(S101)

531-531: Use of assert detected

(S101)

542-542: Use of assert detected

(S101)

545-545: Use of assert detected

(S101)

586-586: Use of assert detected

(S101)

598-598: Use of assert detected

(S101)

601-601: Use of assert detected

(S101)

608-608: Use of assert detected

(S101)

613-613: Use of assert detected

(S101)

647-647: Do not catch blind exception: Exception

(BLE001)

688-688: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (2)

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)

133-140: Pool sizing looks reasonable; pre_ping enabled.

Good choices for high fan-out workloads. Ensure server disposes the engine on shutdown to release connections.

If not already, verify server shutdown calls engine.dispose() after async_engine.close() to avoid lingering connections in tests.

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)

386-394: Wrapper changes look good.

Directly calling the underlying functions without the app lock aligns with DB-managed concurrency.

Also applies to: 652-656, 714-718, 806-809, 893-897, 924-928

coderabbitai · 2025-09-08T14:25:39Z

kai_mcp_solution_server/Makefile

+	@echo "  make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"
+


⚠️ Potential issue

Fix corrupted help text.

User-facing help prints a garbled example. Replace with a clean domain.

- @echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'" - + @echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"

@echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"

🤖 Prompt for AI Agents

In kai_mcp_solution_server/Makefile around lines 242 to 243, the help text contains corrupted/garbled example usage ("make test-http HOST=api.exayeah, there are a lot of little ones...") — replace that line with a clean, concise example such as: echo a correct make invocation using a normal domain, PORT and BEARER_TOKEN (for example: make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token') so the printed help shows a clear, readable usage example.

coderabbitai · 2025-09-08T14:25:39Z

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py

    async def create(self) -> None:
-        self.engine = await get_async_engine(
-            self.settings.db_dsn, self.settings.drop_all
-        )
+        from kai_mcp_solution_server.db.dao import Base
+
+        self.engine = await get_async_engine(self.settings.db_dsn)
+
+        # Ensure tables exist (safe - only creates if not already there)
+        async with self.engine.begin() as conn:
+            await conn.run_sync(Base.metadata.create_all)
+


🛠️ Refactor suggestion

Bound outbound LLM concurrency to prevent connection stampedes (root cause of #861).

Unthrottled ainvoke() calls (spawned from accept_file → generate_hint_v3) can overwhelm connection pools and trigger httpx/httpcore TLS/connection errors under load. Add a process-wide semaphore and use it in generators.

async def create(self) -> None: @@ - self.engine = await get_async_engine(self.settings.db_dsn) + self.engine = await get_async_engine(self.settings.db_dsn) @@ self.session_maker = async_sessionmaker( bind=self.engine, expire_on_commit=False ) self.model: BaseChatModel + # Limit concurrent LLM calls (configurable via env) + self.llm_semaphore = asyncio.Semaphore( + int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8")) + )

Also harden model init with retries/timeouts:

- else: - self.model = init_chat_model(**self.settings.llm_params) + else: + params = { + "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")), + "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")), + **self.settings.llm_params, + } + self.model = init_chat_model(**params)

Apply the gate and timeout at call sites (outside this range):

# In generate_hint_v1/v2/v3 where response = await kai_ctx.model.ainvoke(prompt) async with kai_ctx.llm_semaphore: response = await asyncio.wait_for( kai_ctx.model.ainvoke(prompt), timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")), )

coderabbitai · 2025-09-08T14:25:39Z

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py

+        # Clean up database connections when client disconnects
+        if "ctx" in locals() and hasattr(ctx, "engine"):
+            log("Disposing database engine...")
+            await ctx.engine.dispose()
+            log("Database engine disposed")



🛠️ Refactor suggestion

Graceful shutdown: await background hint tasks before disposing engine.

Disposing the engine while generate_hint_v* tasks are in-flight risks DB errors. Track tasks and drain/cancel with a timeout before disposal.

finally: - # Clean up database connections when client disconnects - if "ctx" in locals() and hasattr(ctx, "engine"): - log("Disposing database engine...") - await ctx.engine.dispose() - log("Database engine disposed") + if "ctx" in locals(): + # Wait briefly for background tasks to finish + if hasattr(ctx, "bg_tasks") and ctx.bg_tasks: + log(f"Draining {len(ctx.bg_tasks)} background tasks...") + done, pending = await asyncio.wait( + ctx.bg_tasks, timeout=float(os.getenv("KAI_BG_DRAIN_TIMEOUT", "15")) + ) + for t in pending: + t.cancel() + if hasattr(ctx, "engine"): + log("Disposing database engine...") + await ctx.engine.dispose() + log("Database engine disposed")

Outside this range (supporting changes):

# In KaiSolutionServerContext.__init__ self.bg_tasks: set[asyncio.Task] = set() # In accept_file after create_task(...) task = asyncio.create_task(generate_hint_v3(kai_ctx, client_id)) kai_ctx.bg_tasks.add(task) task.add_done_callback(kai_ctx.bg_tasks.discard)

🤖 Prompt for AI Agents

In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines 164 to 169, before disposing ctx.engine you must drain or cancel background hint tasks tracked on ctx.bg_tasks to avoid DB use during shutdown: check if ctx.bg_tasks is non-empty, await their completion with a configurable timeout (e.g., use asyncio.wait with a timeout), cancel any tasks still pending after the timeout, await them again to let cancellations propagate, and handle/ignore asyncio.CancelledError or other exceptions while logging failures; only after pending tasks are finished or cancelled should you call await ctx.engine.dispose().

coderabbitai · 2025-09-08T14:25:39Z

kai_mcp_solution_server/tests/mcp_loader_script.py

            read, write, get_session_id = await stack.enter_async_context(
-                streamablehttp_client(server_url, insecure=args.insecure)
+                streamablehttp_client(server_url)
            )
            logger.debug("Streamable HTTP client connection established")
-            session = await stack.enter_async_context(
-                ClientSession(read, write, get_session_id)
-            )
+            # Note: get_session_id is for session management, not passed to ClientSession
+            session = await stack.enter_async_context(ClientSession(read, write))
            logger.debug("MCP ClientSession initialized")



🛠️ Refactor suggestion

HTTP client: validate required args and default mount_path.

If port/mount_path are None, server_url becomes malformed. Add minimal validation/defaults.

async def create_http_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]: if args.host is None: raise ValueError("Host must be specified for HTTP transport") + if args.port is None: + raise ValueError("Port must be specified for HTTP transport") + if args.mount_path is None: + args.mount_path = "/" @@ - read, write, get_session_id = await stack.enter_async_context( + read, write, get_session_id = await stack.enter_async_context( streamablehttp_client(server_url) )

🤖 Prompt for AI Agents

In kai_mcp_solution_server/tests/mcp_loader_script.py around lines 62 to 69, the code builds server_url without validating port and mount_path which can produce malformed URLs when those values are None; ensure port and mount_path are validated and defaulted before constructing server_url: if mount_path is None or empty set it to "/", trim leading/trailing slashes to avoid double slashes when concatenating, and if port is None set it to 80 for "http" or 443 for "https" (or raise ValueError for unknown scheme); also validate types (port int >=1) and raise a clear exception when invalid so the subsequent ClientSession setup receives a well-formed URL.

coderabbitai · 2025-09-08T14:25:39Z

kai_mcp_solution_server/tests/test_multiple_integration.py

+        # External server mode - parse URL to get host and port
+        from urllib.parse import urlparse
+
+        parsed = urlparse(external_server_url)
+        host = parsed.hostname or "localhost"
+        port = parsed.port or 8000
+        print(f"Using external MCP server at {host}:{port}")
+
+        multiple_user_mcp_args = MCPClientArgs(
+            transport="http",
+            host=host,
+            port=port,
+            insecure=True,
+            server_path=None,  # Not needed for external server
+        )
+        # Don't set KAI_LLM_PARAMS for external server - it should already be configured


🛠️ Refactor suggestion

Honor MCP_SERVER_URL scheme and scope SSL bypass.

Hard-coding HTTP ignores HTTPS endpoints and makes the bypass ineffective. Parse the scheme, use it in the transport, and only apply the SSL bypass for HTTPS.

parsed = urlparse(external_server_url) host = parsed.hostname or "localhost" port = parsed.port or 8000 -print(f"Using external MCP server at {host}:{port}") +scheme = (parsed.scheme or "http").lower() +print(f"Using external MCP server at {host}:{port} ({scheme})") @@ multiple_user_mcp_args = MCPClientArgs( - transport="http", + transport=scheme, host=host, port=port, - insecure=True, + insecure=(scheme == "https"), server_path=None, # Not needed for external server )

Add import (outside range):

import contextlib

Signed-off-by: JonahSussman <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

kai_mcp_solution_server/tests/mcp_client.py (1)
306-307: Early guard for stdio transport: good fix; addresses prior review

This resolves the previous TypeError risk when server_path was None for stdio. Minor: Ruff hints (TRY003/TRY301) flag this raise. If you want to silence them without changing flow, add per-line ignores.

Apply inline ignores:
-            if args.server_path is None:
-                raise ValueError("server_path is required when using --transport stdio")
+            if args.server_path is None:
+                raise ValueError(
+                    "server_path is required when using --transport stdio"
+                )  # noqa: TRY003,TRY301

🧹 Nitpick comments (1)

kai_mcp_solution_server/tests/mcp_client.py (1)
198-199: Fix printf-style placeholders in print statements

print("...", result) won’t interpolate %s. Use f-strings for readable output.
-        print("get_best_hint tool call completed, result: %s", result)
+        print(f"get_best_hint tool call completed, result: {result}")
-        print("get_success_rate resource call completed, result: %s", result)
+        print(f"get_success_rate resource call completed, result: {result}")
Also applies to: 229-229

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f20601b and 76df1f1.

📒 Files selected for processing (2)

kai_mcp_solution_server/tests/mcp_client.py (2 hunks)
kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

kai_mcp_solution_server/tests/mcp_loader_script.py

🧰 Additional context used

🪛 Ruff (0.12.2)

kai_mcp_solution_server/tests/mcp_client.py

307-307: Abstract raise to an inner function

(TRY301)

307-307: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (1)

kai_mcp_solution_server/tests/mcp_client.py (1)

251-251: Optional server_path type change: LGTM

Making server_path optional on the model aligns with external-server (HTTP) mode and avoids forcing an irrelevant parameter. No issues found.

🐛 Fix konveyor#861

c85c725

Signed-off-by: JonahSussman <[email protected]>

JonahSussman changed the title ~~🐛 Fix #861~~ 🐛 Fix another potential race condition Sep 4, 2025

coderabbitai bot reviewed Sep 4, 2025

View reviewed changes

Create stress test, fix concurrency/lock/connection close issues

02641db

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

Make test more thorough

17e6fd6

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

Simplify stress test

4ad87a5

Signed-off-by: Fabian von Feilitzsch <[email protected]>

fabianvf force-pushed the fix-multi-user branch from 6709119 to 4ad87a5 Compare September 5, 2025 20:59

Catch errors during testing more quickly

1a0942c

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

fabianvf added 2 commits September 8, 2025 10:10

Remove cruft

20bd53a

Signed-off-by: Fabian von Feilitzsch <[email protected]>

Remove always skipped test

f20601b

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

JonahSussman changed the title ~~🐛 Fix another potential race condition~~ 🐛 Fix DB connection pool getting exhausted Sep 9, 2025

👻 Misc fixes

76df1f1

Signed-off-by: JonahSussman <[email protected]>

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

shawn-hurley approved these changes Sep 9, 2025

View reviewed changes

JonahSussman merged commit 830ead0 into konveyor:main Sep 9, 2025
15 checks passed

coderabbitai bot mentioned this pull request Sep 15, 2025

Fix 868 #869

Open

This was referenced Sep 17, 2025

operator konveyor-operator (0.8.0-beta.5) k8s-operatorhub/community-operators#6841

Merged

operator konveyor-operator (0.8.0-beta.5) redhat-openshift-ecosystem/community-operators-prod#7641

Merged

coderabbitai bot mentioned this pull request Sep 25, 2025

🐛 Avoid creating unneeded extra DB connections, and add a semaphore to limit concurrent operations #878

Merged

This was referenced Sep 29, 2025

operator konveyor-operator (0.8.0) k8s-operatorhub/community-operators#6932

Merged

operator konveyor-operator (0.8.0) redhat-openshift-ecosystem/community-operators-prod#7776

Merged

coderabbitai bot mentioned this pull request Sep 30, 2025

Catch and retry when race condition occurs during incident creation #882

Merged

		await asyncio.sleep(1) # give the server a second to start
		else:

		@echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"

	@echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"

	@echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"

🐛 Fix DB connection pool getting exhausted #862

🐛 Fix DB connection pool getting exhausted #862

Uh oh!

Conversation

JonahSussman commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks (3 passed, 2 warnings)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

JonahSussman commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 4, 2025 •

edited

Loading