Skip to content

Conversation

@JonahSussman
Copy link
Contributor

@JonahSussman JonahSussman commented Sep 4, 2025

Fixes #861

Summary by CodeRabbit

  • New Features

    • Automatic database table creation on startup.
    • New local targets to run a PostgreSQL-backed server and stress tests.
  • Bug Fixes

    • More reliable SSL bypass behavior to prevent duplicate patching.
    • Clearer, more verbose error reporting when HTTP transport fails.
  • Refactor

    • Improved database pooling and session handling for higher concurrency and stability.
    • Removed per-call locking to boost throughput.
    • Safer startup/shutdown lifecycle for the database engine.
  • Tests

    • Added a high-concurrency stress test simulating many simultaneous clients.
  • Chores

    • CI workflow to run automated stress tests.
    • Expanded .gitignore to exclude local database files.

Signed-off-by: JonahSussman <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Refactors DB engine creation and lifecycle, removes runtime drop-all setting, replaces per-call locks with transactional sessions, adds explicit table management helpers, introduces a GitHub Actions stress-test workflow and Makefile stress/podman targets, and adds a high-concurrency external-server integration test plus supporting test-tooling tweaks.

Changes

Cohort / File(s) Summary
CI: Stress test workflow
.github/workflows/stress-test-mcp-server.yml
New GitHub Actions workflow to start Postgres, launch the MCP server, wait for readiness, run a parametrized stress test, report Postgres connection stats, and exit with test result.
Git ignore updates
.gitignore
Adds ignore patterns: tests/test_*.db, kai_mcp_solution_server/tests/test_*.db, *.db.
Make targets for Postgres & stress
kai_mcp_solution_server/Makefile
Adds podman-postgres and test-stress targets, removes pytest target, updates help/usage and examples, minor formatting tweaks.
DB engine and schema helpers
kai_mcp_solution_server/src/.../db/dao.py
get_async_engine(url) signature simplified (removed drop_all); adds drop_all_tables(engine) and ensure_tables_exist(engine); configures pooling per DB type (StaticPool for sqlite, tuned pool for others).
Server lifecycle, sessions, concurrency
kai_mcp_solution_server/src/.../server.py
Removes drop_all setting; uses get_async_engine(self.settings.db_dsn) and ensures tables on startup; disposes engine on shutdown; replaces explicit commit() calls with async_sessionmaker.begin() scopes; removes per-call locks; adjusts hint generation flows and v2 diff source.
MCP client args (external mode)
kai_mcp_solution_server/tests/mcp_client.py
Makes server_path optional (`Path
MCP loader HTTP behavior
kai_mcp_solution_server/tests/mcp_loader_script.py
Improves HTTP transport error reporting with traceback, removes per-request insecure flag from client builder, updates ClientSession usage, and re-raises exceptions after logging.
HTTPX SSL patch idempotency
kai_mcp_solution_server/tests/ssl_utils.py
Adds idempotent patch check and marks patched wrappers to avoid double-patching httpx.
Concurrency stress test
kai_mcp_solution_server/tests/test_multiple_integration.py
Adds test_multiple_users stress test that runs many concurrent external-server clients exercising create/list/accept/hint flows; supports NUM_CONCURRENT_CLIENTS and MCP_SERVER_URL; duplicated method appears in diff.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Server
  participant DB
  participant LLM as LLM/HintGen

  rect rgb(240,248,255)
  note right of Server: Startup: create engine & ensure tables
  Server->>DB: get_async_engine(db_dsn)
  Server->>DB: ensure_tables_exist()
  end

  Client->>Server: create_incident / create_solution
  Server->>DB: session.begin() { insert }
  DB-->>Server: OK (ids)
  Server-->>Client: IDs

  Client->>Server: accept_file
  Server->>DB: session.begin() { update status }
  alt all client's files accepted/modified
    Server--)LLM: spawn async generate_hint_v3()
  end
  Server-->>Client: OK

  rect rgb(245,240,255)
  note right of LLM: Async hint task uses DB read/write
  LLM->>DB: read code + diffs
  LLM-->>Server: produced hint
  Server->>DB: session.begin() { insert hint }
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • hhpatel14
  • fabianvf
  • savitharaghunathan

Pre-merge checks (3 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The pull request includes updates to .gitignore to ignore database files and modifications to the test loader script’s error reporting, neither of which are mentioned in the objectives for fixing pool exhaustion or handling HTTP connection errors in production. These changes fall outside the scope of addressing the DB pool exhaustion and should be separated or justified. Please remove or isolate the .gitignore database ignore patterns and the unrelated test loader script adjustments into a separate PR, or expand the description to explain how they support the primary goal of resolving connection-pool exhaustion.
Docstring Coverage ⚠️ Warning Docstring coverage is 40.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the primary intent of the pull request by indicating a fix for the database connection pool exhaustion issue and directly references the core problem reported in the linked issue. It is concise, specific, and omits extraneous details about individual file changes. This phrasing allows a teammate scanning the history to quickly grasp the main change without needing to parse the entire diff.
Linked Issues Check ✅ Passed The changes introduce explicit connection-pool parameters in get_async_engine, ensure proper engine disposal, remove per-call locks to rely on session management, and add concurrency stress tests and a CI workflow to validate under load. These modifications directly address the reported race conditions, session leaks, and pool exhaustion under concurrent usage as described in issue #861.
Description Check ✅ Passed Although minimal, the description “Fixes #861” directly ties the pull request to the relevant issue and indicates its purpose. It is on-topic and sufficient for this repository’s lenient description standard.

Poem

I nibble code beneath the moonlit log, 🐇
Spawn a Postgres warren, set tests to jog.
Locks hop away, sessions snug and neat,
Clients sprint; async hints arrive on beat.
If connections stumble, I’ll debug and dig—tiny paws, big grit.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JonahSussman JonahSussman changed the title 🐛 Fix #861 🐛 Fix another potential race condition Sep 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)

51-55: Validate port and normalize mount_path to avoid malformed URLs ("None" in URL).

If args.port or args.mount_path is unset, server_url becomes invalid.

-    transport = ""
-    if not args.host.startswith("http"):
-        transport = "http://"
-    server_url = f"{transport}{args.host}:{args.port}{args.mount_path}"
+    transport = ""
+    if args.port is None:
+        raise ValueError("Port must be specified for HTTP transport")
+    mount_path = args.mount_path or ""
+    if not (args.host.startswith("http://") or args.host.startswith("https://")):
+        transport = "http://"
+    server_url = f"{transport}{args.host}:{args.port}{mount_path}"

73-101: Re-raise after logging to prevent “generator didn’t yield” from asynccontextmanager; also print errors to stderr.

Catching and not re-raising inside an @asynccontextmanager leads to a confusing RuntimeError: generator didn't yield. Also, printing tracebacks to stdout can pollute test output.

-        logger.error("HTTP transport error: %s", str(e), exc_info=True)
-        print(f"x Error with HTTP transport: {traceback.format_exc()}")
+        logger.error("HTTP transport error: %s", str(e), exc_info=True)
+        print(f"x Error with HTTP transport: {traceback.format_exc()}", file=sys.stderr)
@@
-        print(f"! Make sure the server is running at {server_url}")
+        print(f"! Make sure the server is running at {server_url}", file=sys.stderr)
+        raise
kai_mcp_solution_server/tests/test_multiple_integration.py (1)

1-542: Fix remaining lint errors before verification

  • In tests/mcp_loader_script.py: remove blind except Exception blocks (BLE001), replace generic Exception raises with custom exception classes (TRY002), and shorten long literal messages by defining them within exception types (TRY003).
  • In tests/test_multiple_integration.py: remove unused locals (SOLUTION_FOR_INCIDENT_A_ID, multiple_user_mcp_args, ssl_patch) (F841), drop f prefix on static strings (F541), replace bare assert with explicit checks (S101), avoid blind except Exception clauses (BLE001), shorten long runtime-error messages (TRY003), and use sys.executable instead of hardcoded "python" in subprocess calls (S607).
  • After these fixes, re-run ruff check kai_mcp_solution_server/tests/... and the multi-user pytest.
🧹 Nitpick comments (12)
kai_mcp_solution_server/tests/test_multiple_integration.py (12)

2-3: Drop redundant base package import.

import concurrent is unnecessary alongside import concurrent.futures.

-import concurrent
 import concurrent.futures

36-43: Use filename-safe timestamps.

datetime.now().isoformat() includes : which breaks filenames on Windows. Use a safe format.

-        timestamp = datetime.datetime.now().isoformat()
+        timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")

84-95: Prefer os.path.join and explicit UTF-8 for fixtures.

Reduces path bugs across platforms and avoids default-encoding surprises.

-            expected_tool_schemas = json.loads(
-                open(self.test_data_dir + "/expected_tool_schemas.json").read()
-            )
+            expected_path = os.path.join(self.test_data_dir, "expected_tool_schemas.json")
+            with open(expected_path, "r", encoding="utf-8") as f:
+                expected_tool_schemas = json.load(f)

305-306: Fix log message to reference the correct ruleset/violation (B variant).

Current print uses A-variant names in the B-flow.

-            print(f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A}: {metric}")
+            print(f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B}: {metric}")

340-341: Fix log message to reference the correct ruleset/violation (B variant).

-            print(
-                f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A} after accepting file: {metric}"
-            )
+            print(
+                f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B} after accepting file: {metric}"
+            )

420-422: Resolve ruff F841 in skipped test (unused variable).

Avoid unused assignment even in skipped tests.

-            SOLUTION_FOR_INCIDENT_A_ID = int(
-                create_solution_for_incident_a.model_dump()["content"][0]["text"]
-            )
+            _ = int(create_solution_for_incident_a.model_dump()["content"][0]["text"])

437-439: Fix f-string without placeholders (ruff F541).

-                "responses": [
-                    f"You should add a smiley face to the file.",
-                ],
+                "responses": [
+                    "You should add a smiley face to the file.",
+                ],

452-459: Remove unused poll_process helper or wire it in.

Currently dead code. Recommend removal to reduce noise.

-        def poll_process(process: subprocess.Popen) -> None:
-            # Check if the process has exited early
-            if process.poll() is not None:
-                output = process.stdout.read() if process.stdout else b""
-                raise RuntimeError(
-                    f"HTTP server process exited prematurely. Output: {output.decode(errors='replace')}"
-                )

511-512: Wait for server readiness rather than sleeping a fixed second.

Avoid flakiness by polling the port.

-            await asyncio.sleep(1)  # give the server a second to start
+            # Wait up to 10s for server to accept connections on 8087
+            await asyncio.wait_for(asyncio.to_thread(_wait_for_port, "127.0.0.1", 8087, 10.0), 12.0)

Add this helper in the file (near other helpers):

import socket

def _wait_for_port(host: str, port: int, timeout: float) -> None:
    deadline = asyncio.get_event_loop().time() + timeout
    while True:
        with socket.socket() as s:
            s.settimeout(0.5)
            try:
                s.connect((host, port))
                return
            except OSError:
                pass
        if asyncio.get_event_loop().time() >= deadline:
            raise TimeoutError(f"Timed out waiting for {host}:{port}")

513-521: If you want “multi-user”, increase concurrency beyond 1.

Currently NUM_TASKS = 1. Consider bumping to at least 5 to exercise concurrency.

-            NUM_TASKS = 1
+            NUM_TASKS = 5

537-542: Guard cleanup in case startup fails before thread assignment.

Prevents UnboundLocalError on stream_thread.

-            self.http_server_process.terminate()
-            self.http_server_process.wait()
-            print("Server process terminated.")
-            stream_thread.join()
+            if getattr(self, "http_server_process", None):
+                self.http_server_process.terminate()
+                self.http_server_process.wait()
+                print("Server process terminated.")
+            if "stream_thread" in locals():
+                stream_thread.join()

491-491: Subprocess security notice + avoid broad except in futures loop.

Bandit warnings are acceptable in tests, but consider narrowing exception types in the futures loop for better diagnostics.

Also applies to: 532-532

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1d90d04 and c85c725.

📒 Files selected for processing (2)
  • kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (4)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
  • CallToolResult (22-24)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
  • GetBestHintResult (669-671)
  • SuccessRateMetric (728-734)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
  • MCPClientArgs (31-41)
  • create_client (136-151)
kai/jsonrpc/util.py (1)
  • model_dump (96-97)
🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.


[failure] 425-425: ruff(F841)
[new] Local variable multiple_user_mcp_args is assigned to but never used


[failure] 475-475: ruff(F841)
[new] Local variable ssl_patch is assigned to but never used


[failure] 420-420: ruff(F841)
[new] Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used


[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path


[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.


[failure] 437-437: ruff(F541)
[new] f-string without any placeholders

🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

420-420: Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used

Remove assignment to unused variable SOLUTION_FOR_INCIDENT_A_ID

(F841)


425-425: Local variable multiple_user_mcp_args is assigned to but never used

Remove assignment to unused variable multiple_user_mcp_args

(F841)


437-437: f-string without any placeholders

Remove extraneous f prefix

(F541)


444-444: Use of assert detected

(S101)


447-447: Do not catch blind exception: Exception

(BLE001)


456-458: Avoid specifying long messages outside the exception class

(TRY003)


475-475: Local variable ssl_patch is assigned to but never used

Remove assignment to unused variable ssl_patch

(F841)


492-500: Starting a process with a partial executable path

(S607)


532-532: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

Comment on lines 491 to 500
self.http_server_process = subprocess.Popen(
[
"python",
"-m",
"kai_mcp_solution_server",
"--transport",
"streamable-http",
"--port",
"8087",
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use sys.executable instead of bare "python" (Bandit B607/B603).

More robust across environments and virtualenvs.

-            self.http_server_process = subprocess.Popen(
+            self.http_server_process = subprocess.Popen(
                 [
-                    "python",
+                    sys.executable,
                     "-m",
                     "kai_mcp_solution_server",
                     "--transport",
                     "streamable-http",
                     "--port",
                     "8087",
                 ],

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.12.2)

492-500: Starting a process with a partial executable path

(S607)

🪛 GitHub Check: Trunk Check

[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path


[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 491
to 500, the subprocess invocation uses the literal "python" which is brittle
across environments; replace the string with sys.executable and ensure the
module imports sys at the top of the test file so the current Python interpreter
(including virtualenvs) is used when launching the server process.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
kai_mcp_solution_server/tests/test_multiple_integration.py (3)

477-486: Avoid runtime assert; guard stdout and safe-close.

         def stream_output(process: subprocess.Popen) -> None:
             try:
-                assert process.stdout is not None
-                for line in iter(process.stdout.readline, b""):
-                    print(f"[Server] {line.decode().rstrip()}")
+                stdout = process.stdout
+                if stdout is None:
+                    raise RuntimeError("HTTP server process has no stdout pipe")
+                for line in iter(stdout.readline, b""):
+                    print(f"[Server] {line.decode().rstrip()}")
             except Exception as e:
                 print(f"Error while streaming output: {e}")
             finally:
-                process.stdout.close()
+                if process.stdout:
+                    try:
+                        process.stdout.close()
+                    except Exception:
+                        pass

508-516: Actually apply the SSL bypass or remove it.

         async def client_task(client_id: str) -> None:
             print(f"[Client {client_id}] starting")
-            apply_ssl_bypass()
-
-            client = Client(
-                transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-            )
+            with apply_ssl_bypass():
+                client = Client(
+                    transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
+                )

Note: if the server is plain HTTP, the bypass is unnecessary and can be deleted instead.


608-617: Use sys.executable for portability (Bandit B607/B603).

-                self.http_server_process = subprocess.Popen(
+                self.http_server_process = subprocess.Popen(
                     [
-                        "python",
+                        sys.executable,
                         "-m",
                         "kai_mcp_solution_server",
                         "--transport",
                         "streamable-http",
                         "--port",
                         "8087",
                     ],

Also ensure import sys is present at the top of the file.

🧹 Nitpick comments (18)
.gitignore (1)

210-213: *Avoid ignoring all .db repository-wide

Adding test DB ignores is good, but a blanket *.db may hide legitimate DB assets outside tests. Prefer scoping to test/CI artifacts only.

Apply:

   # Test database files
   tests/test_*.db
   kai_mcp_solution_server/tests/test_*.db
-  *.db
+  # Consider avoiding a global wildcard:
+  # **/tests/**/*.db
+  # Or remove this line entirely if not strictly needed
kai_mcp_solution_server/tests/mcp_client.py (1)

311-333: Small robustness nit: avoid repeated Path(None) conversions

Since you only support stdio when server_path is present, move server_path = Path(args.server_path) after the None-guard (above) to safeguard future edits.

.github/workflows/stress-test-mcp-server.yml (1)

66-67: Avoid hardcoding DB credentials in env to quiet secret scanners

Even for ephemeral CI, static analyzers will flag inline creds. Build the DSN at runtime from variables to reduce false positives.

-          KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
           KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
         run: |
-          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
-          uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
+          export PGUSER=kai_user
+          export PGPASSWORD=kai_password
+          export PGDATABASE=kai_test_db
+          export PGHOST=localhost
+          export PGPORT=5432
+          export KAI_DB_DSN="postgresql+asyncpg://${PGUSER}:${PGPASSWORD}@${PGHOST}:${PGPORT}/${PGDATABASE}"
+          echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
+          uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)

113-127: Revisit SQLite pooling strategy to avoid cross-task connection reuse

Using StaticPool forces a single shared connection. Under concurrent async workloads this can serialize DB access and risk “connection already in use”/transaction conflicts. Prefer NullPool with WAL and a sensible busy timeout for better concurrency on file-backed SQLite.

Suggested change:

-from sqlalchemy.pool import StaticPool
+from sqlalchemy.pool import StaticPool, NullPool
...
-    if "sqlite" in url_str:
+    if url_str.startswith("sqlite"):
         engine = create_async_engine(
             url,
-            connect_args={"check_same_thread": False},
-            poolclass=StaticPool,  # Single shared connection for SQLite
+            connect_args={"check_same_thread": False, "timeout": 30},
+            poolclass=NullPool,  # One connection per use; avoids cross-task sharing
             echo_pool=False,  # Set to True for debugging connection pool
         )
+        # Enable WAL + reasonable sync level for better read/write concurrency
+        @event.listens_for(engine.sync_engine, "connect")
+        def _sqlite_pragma(dbapi_conn, _):
+            cur = dbapi_conn.cursor()
+            try:
+                cur.execute("PRAGMA journal_mode=WAL;")
+                cur.execute("PRAGMA synchronous=NORMAL;")
+                cur.execute("PRAGMA foreign_keys=ON;")
+            finally:
+                cur.close()

Verification: run the SQLite stress job and confirm absence of “database is locked” and connection reuse errors under NUM_CONCURRENT_CLIENTS=5..10.


117-118: Use stricter SQLite detection

Minor: avoid accidental matches on non-SQLite URLs by using startswith("sqlite").

-    if "sqlite" in url_str:
+    if url_str.startswith("sqlite"):
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

332-351: Duplicate writes and TODO: clarify after-file creation path.

  • Comment “FIXME: Something is fishy here...” remains unresolved; the code always inserts a new “after” file and then queries for an existing one. This can create duplicates under concurrency.
  • session.add(new_file) is called twice later in accept_file (see below).

Consider an upsert-like path: query latest by (client_id, uri, content) and reuse if identical; only insert when content differs. Also, add a unique (client_id, uri, created_at) or application-level de-dup.


847-855: Redundant add().

session.add(new_file) is executed twice (Lines 847 and 853). Keep one.

             session.add(new_file)
             for solution, old_file in files_to_update:
                 # Remove the old file from the solution
                 solution.after.remove(old_file)
                 solution.after.add(new_file)

-                session.add(new_file)
                 session.add(solution)
kai_mcp_solution_server/tests/test_multiple_integration.py (2)

487-494: Remove or use poll_process().

poll_process is defined but never called. Either delete it or call it inside your readiness wait to surface early-exit errors promptly.


634-646: Thread-per-client is heavy but OK for stress; consider async tasks later.

For large NUM_TASKS, a single event loop with asyncio.gather would lower overhead. Not blocking this PR.

kai_mcp_solution_server/scripts/manage_db.py (4)

21-41: Ensure engine disposal on error.

Wrap engine usage in try/finally so the pool is disposed even if drop_everything raises.

-    engine = await get_async_engine(settings.db_dsn)
-
-    async with engine.begin() as conn:
-        print("Dropping all tables...")
-        await conn.run_sync(drop_everything)
-        print("✅ All tables dropped successfully.")
-
-    await engine.dispose()
+    engine = await get_async_engine(settings.db_dsn)
+    try:
+        async with engine.begin() as conn:
+            print("Dropping all tables...")
+            await conn.run_sync(drop_everything)
+            print("✅ All tables dropped successfully.")
+    finally:
+        await engine.dispose()

55-65: Same here: dispose on all paths.

-    engine = await get_async_engine(settings.db_dsn)
-
-    async with engine.begin() as conn:
-        print("Dropping all tables...")
-        await conn.run_sync(drop_everything)
-        print("Creating tables...")
-        await conn.run_sync(Base.metadata.create_all)
-        print("✅ Database reset successfully.")
-
-    await engine.dispose()
+    engine = await get_async_engine(settings.db_dsn)
+    try:
+        async with engine.begin() as conn:
+            print("Dropping all tables...")
+            await conn.run_sync(drop_everything)
+            print("Creating tables...")
+            await conn.run_sync(Base.metadata.create_all)
+            print("✅ Database reset successfully.")
+    finally:
+        await engine.dispose()

103-109: Tighten error message: LLM params aren’t required for this script.

Mention only KAI_DB_DSN to avoid confusing users.

-        print("\nMake sure to set required environment variables:")
-        print("  - KAI_DB_DSN: Database connection string")
-        print('  - KAI_LLM_PARAMS: LLM configuration (can be \'{"model": "fake"}\')')
+        print("\nMake sure to set required environment variables:")
+        print("  - KAI_DB_DSN: Database connection string")

1-1: Shebang present but file may not be executable.

Either make the script executable in git (chmod +x) or drop the shebang.

kai_mcp_solution_server/Makefile (5)

122-129: Prefer “podman compose up” over “podman-compose” for wider compatibility

This avoids dependency on the separate python podman-compose shim.

Apply this diff:

-	IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \
-		podman-compose up
+	IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \
+		podman compose up

Please confirm CI runners support the subcommand.


131-141: Self-contained stress test: verify whether LLM params are required; add guard if needed

If the test server path still requires KAI_LLM_PARAMS, this may fail/flap.

Optionally add a guard like run-local:

 test-stress:
 	@echo "Running self-contained stress test with $${NUM_CONCURRENT_CLIENTS:-30} concurrent clients..."
 	@echo "Database: $${KAI_DB_DSN:-sqlite+aiosqlite:///test.db}"
 	@echo "The test will start its own server on port 8087"
+	@if [ -z "$${KAI_LLM_PARAMS}" ]; then echo "Error: KAI_LLM_PARAMS is required for the test server"; exit 1; fi
 	NUM_CONCURRENT_CLIENTS=$${NUM_CONCURRENT_CLIENTS:-30} \
 	KAI_DB_DSN=$${KAI_DB_DSN:-sqlite+aiosqlite:///test.db} \
 	uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs

181-205: run-with-tests may fail without KAI_LLM_PARAMS; also pass it into the server container

Align with run-local’s requirement.

Apply this diff:

 run-with-tests: build
 	@echo "Creating Podman pod for MCP server and test client..."
 	-podman pod rm -f kai-mcp-pod 2>/dev/null || true
 	podman pod create --name kai-mcp-pod -p 8000:8000
@@
 	@echo "Starting MCP server in the pod..."
-	podman run -d --pod kai-mcp-pod \
+	@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
+	podman run -d --pod kai-mcp-pod \
 		--name kai-mcp-server \
 		--volume kai-mcp-data:/data:Z \
+		-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
+		-e KAI_DB_DSN='$(KAI_DB_DSN)' \
 		$(IMAGE)

212-240: Help mentions ‘logs’ target, but no implementation exists

Either add a logs target or remove the line from help to avoid confusion.

Proposed new target (outside changed lines):

# Show Kubernetes deployment logs
.PHONY: logs
logs:
	@echo "Showing logs for deployment in namespace $(NAMESPACE)..."
	kubectl logs -n $(NAMESPACE) deploy/kai-mcp-solution-server -f

243-246: Fix NAMESPACE default in help to match actual default

Help says “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.

Apply this diff:

-	@echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+	@echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c85c725 and 02641db.

📒 Files selected for processing (8)
  • .github/workflows/stress-test-mcp-server.yml (1 hunks)
  • .gitignore (1 hunks)
  • kai_mcp_solution_server/Makefile (5 hunks)
  • kai_mcp_solution_server/scripts/manage_db.py (1 hunks)
  • kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2 hunks)
  • kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (10 hunks)
  • kai_mcp_solution_server/tests/mcp_client.py (1 hunks)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)
kai_mcp_solution_server/scripts/manage_db.py (1)
  • drop_all_tables (21-40)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)
  • Base (91-97)
  • get_async_engine (113-142)
kai_mcp_solution_server/src/kai_mcp_solution_server/constants.py (1)
  • log (9-10)
kai_mcp_solution_server/tests/test_multiple_integration.py (6)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
  • CallToolResult (22-24)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
  • GetBestHintResult (663-665)
  • SuccessRateMetric (721-727)
kai_mcp_solution_server/tests/mcp_client.py (1)
  • MCPClientArgs (247-256)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
  • MCPClientArgs (31-41)
  • create_client (136-151)
kai/jsonrpc/util.py (1)
  • model_dump (96-97)
kai/jsonrpc/streams.py (1)
  • close (70-75)
kai_mcp_solution_server/scripts/manage_db.py (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (4)
  • Base (91-97)
  • drop_everything (53-88)
  • get_async_engine (113-142)
  • drop_all_tables (100-104)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)
  • SolutionServerSettings (39-108)
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.


[notice] 608-608: bandit(B607)
[new] Starting a process with a partial executable path


[notice] 608-608: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

479-479: Use of assert detected

(S101)


482-482: Do not catch blind exception: Exception

(BLE001)


491-493: Avoid specifying long messages outside the exception class

(TRY003)


609-617: Starting a process with a partial executable path

(S607)


658-658: Do not catch blind exception: Exception

(BLE001)

kai_mcp_solution_server/scripts/manage_db.py

1-1: Shebang is present but file is not executable

(EXE001)


103-103: Do not catch blind exception: Exception

(BLE001)


122-122: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: Test Code on PR
kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 1-1: test_multiple_users failed: 30 concurrent clients failed (Event loop is closed; all connection attempts failed).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Stress Test with PostgreSQL
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (10)
.github/workflows/stress-test-mcp-server.yml (2)

72-79: Post-test DB utilization check is a nice touch

Querying pg_stat_activity post-run helps validate connection pooling assumptions. LGTM.


107-114: SQLite job concurrency cap is appropriate

Capping to 5 clients acknowledges SQLite write-lock limits. LGTM.

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)

100-112: Good separation of schema management from engine creation

Introducing drop_all_tables and ensure_tables_exist clarifies lifecycle and avoids accidental drops during engine init. LGTM.


128-141: Pool sizing aligns with the 100-client stress target; confirm PG max_connections

pool_size=20 with max_overflow=80 caps at 100 connections, matching the test load. Ensure the Postgres service’s max_connections accommodates this plus overhead (psql, autovacuum). The workflow’s post-check helps track this.

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

116-124: Good: engine bootstrap is now safe and idempotent.

Creating the engine via get_async_engine and ensuring tables exist inside engine.begin() is correct and removes startup races around schema readiness.


164-169: Proper engine disposal on shutdown.

Disposing the engine in the lifespan finally block prevents pool leaks across reconnects.

kai_mcp_solution_server/Makefile (4)

89-100: run-podman target looks good

Flexible DSN with optional PODMAN_ARGS and required KAI_LLM_PARAMS check is solid.


146-153: test-stress-external target looks good

Clear UX and useful defaults.


228-233: Help additions for new targets look good

New entries are clear and consistent.


262-267: Example usage LGTM

Good guidance for local and external stress runs.

Comment on lines 64 to 72
- name: Run stress test with 100 clients
env:
KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
run: |
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
timeout-minutes: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Coalesce workflow input using the inputs context, not github.event.inputs

github.event.inputs is only defined for workflow_dispatch payloads and is null for PR/push events. Use inputs.num_clients which safely evaluates to null outside workflow_dispatch and works with || coalescing.

-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Run stress test with 100 clients
env:
KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
run: |
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
timeout-minutes: 10
- name: Run stress test with 100 clients
env:
KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}'
NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
run: |
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
timeout-minutes: 10
🧰 Tools
🪛 Checkov (3.2.334)

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 64 to 72, the
workflow uses github.event.inputs.num_clients which is only set for
workflow_dispatch events; replace it with the inputs context so the coalescing
works for non-dispatch events. Update the NUM_CONCURRENT_CLIENTS assignment to
use ${{ inputs.num_clients || '100' }} (keeping the default '100') and ensure
the rest of the step remains unchanged so the workflow uses the inputs context
safely across event types.

Comment on lines 101 to 121
# Convenience target: Run with SQLite
.PHONY: podman-sqlite
podman-sqlite: build
@echo "Running MCP solution server with SQLite..."
@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
podman run --rm -it -p 8000:8000 \
-e MOUNT_PATH='$(MOUNT_PATH)' \
-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \
$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
$(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
$(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
-v kai-sqlite-data:/data:Z \
--name kai-mcp-sqlite $(IMAGE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid leaking API keys via process args; pass-through host env instead of inlining values

Using -e KEY='value' can expose secrets via ps/history. Prefer -e KEY to pass through from host.

Apply this diff:

-		$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
-		$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
-		$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
-		$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
-		$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
-		$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
-		$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
-		$(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
-		$(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+		$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+		$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+		$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+		$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+		$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+		$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+		$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+		$(if $(AWS_REGION),-e AWS_REGION,) \
+		$(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Convenience target: Run with SQLite
.PHONY: podman-sqlite
podman-sqlite: build
@echo "Running MCP solution server with SQLite..."
@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
podman run --rm -it -p 8000:8000 \
-e MOUNT_PATH='$(MOUNT_PATH)' \
-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \
$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
$(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
$(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
-v kai-sqlite-data:/data:Z \
--name kai-mcp-sqlite $(IMAGE)
# Convenience target: Run with SQLite
.PHONY: podman-sqlite
podman-sqlite: build
@echo "Running MCP solution server with SQLite..."
@if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi
podman run --rm -it -p 8000:8000 \
-e MOUNT_PATH='$(MOUNT_PATH)' \
-e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
-e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \
$(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
$(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
$(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
$(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
$(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
$(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
$(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
$(if $(AWS_REGION),-e AWS_REGION,) \
$(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \
-v kai-sqlite-data:/data:Z \
--name kai-mcp-sqlite $(IMAGE)

Comment on lines +260 to 264
# Lock removed - database handles concurrent access
for extended_incident in extended_incidents:
incident_id = await create_incident(kai_ctx, client_id, extended_incident)
results.append(CreateIncidentResult(incident_id=incident_id, solution_id=0))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Race still likely: unthrottled LLM calls spawned in background.

Removing the app-level lock is fine for DB ops, but outbound kai_ctx.model.ainvoke(...) calls (e.g., in generate_hint_v1/v2/v3) can now stampede under concurrent accept_file paths (asyncio.create_task(...)). This commonly manifests as httpcore/httpx TLS/connection errors under load, matching Issue #861. Add a small per-process semaphore and explicit timeouts/retries to bound concurrency and avoid connection storms.

Apply this diff to introduce an LLM concurrency gate and timeouts:

 class KaiSolutionServerContext:
     def __init__(self, settings: SolutionServerSettings) -> None:
         self.settings = settings
         self.lock = asyncio.Lock()

     async def create(self) -> None:
         from kai_mcp_solution_server.db.dao import Base

         self.engine = await get_async_engine(self.settings.db_dsn)

         # Ensure tables exist (safe - only creates if not already there)
         async with self.engine.begin() as conn:
             await conn.run_sync(Base.metadata.create_all)

         self.session_maker = async_sessionmaker(
             bind=self.engine, expire_on_commit=False
         )
         self.model: BaseChatModel
+        # Limit outbound LLM concurrency to mitigate connection spikes
+        self.llm_semaphore = asyncio.Semaphore(
+            int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8"))
+        )

         if self.settings.llm_params is None:
             raise ValueError("LLM parameters must be provided in the settings.")
         elif self.settings.llm_params.get("model") == "fake":

And wrap LLM calls with the gate and a timeout (repeat similarly in v2/v3):

-            response = await kai_ctx.model.ainvoke(prompt)
+            async with kai_ctx.llm_semaphore:
+                response = await asyncio.wait_for(
+                    kai_ctx.model.ainvoke(prompt),
+                    timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+                )

Optionally harden initialization with retries/timeouts:

-            self.model = init_chat_model(**self.settings.llm_params)
+            params = {
+                "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")),
+                "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+                **self.settings.llm_params,
+            }
+            self.model = init_chat_model(**params)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines
260-264, the app removed the lock but still spawns unthrottled background LLM
calls (kai_ctx.model.ainvoke) which can stampede and cause httpx/httpcore
TLS/connection errors; introduce a module-level asyncio.Semaphore (e.g.,
max_concurrent_llm = 5) and ensure all LLM invocations (generate_hint_v1/v2/v3
and any other ainvoke paths) acquire the semaphore before calling and release it
after, wrapping the actual ai call in asyncio.wait_for to enforce a per-call
timeout and add a simple retry loop with exponential backoff for transient
failures; also initialize the semaphore once at startup and consider applying
the same timeout/retry pattern to any LLM client initialization to harden
startup.

Signed-off-by: Fabian von Feilitzsch <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
kai_mcp_solution_server/tests/mcp_loader_script.py (1)

47-55: Validate port and normalize mount_path to avoid malformed URLs.

If port or mount_path is None, server_url becomes http://host:NoneNone. Guard and normalize.

Apply this diff:

   if args.host is None:
       raise ValueError("Host must be specified for HTTP transport")

   # Setup HTTP transport using streamable-http
   transport = ""
   if not args.host.startswith("http"):
       transport = "http://"
-  server_url = f"{transport}{args.host}:{args.port}{args.mount_path}"
+  if args.port is None:
+      raise ValueError("Port must be specified for HTTP transport")
+  mount_path = args.mount_path or ""
+  if mount_path and not mount_path.startswith("/"):
+      mount_path = "/" + mount_path
+  server_url = f"{transport}{args.host}:{args.port}{mount_path}"
   print(f"Connecting to server at {server_url}...")
♻️ Duplicate comments (4)
.github/workflows/stress-test-mcp-server.yml (1)

69-69: Coalesce workflow input via inputs context, not github.event.inputs.

github.event.inputs is null outside workflow_dispatch; use inputs.num_clients so the default coalesces reliably across events.

Apply this diff:

-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
kai_mcp_solution_server/Makefile (1)

110-118: Avoid leaking secrets in podman-sqlite env flags

Inline values in -e KEY='value' are visible via ps/history. Pass-through host env with -e KEY instead.

-        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
-        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
-        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
-        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
-        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
-        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
-        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
-        $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
-        $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+        $(if $(AWS_REGION),-e AWS_REGION,) \
+        $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \
kai_mcp_solution_server/tests/test_multiple_integration.py (2)

477-486: Remove runtime assert; guard stdout close; avoid blind except

Prevents crashes and aligns with prior review.

-        def stream_output(process: subprocess.Popen) -> None:
-            try:
-                assert process.stdout is not None
-                for line in iter(process.stdout.readline, b""):
-                    print(f"[Server] {line.decode().rstrip()}")
-            except Exception as e:
-                print(f"Error while streaming output: {e}")
-            finally:
-                process.stdout.close()
+        def stream_output(process: subprocess.Popen) -> None:
+            try:
+                stdout = process.stdout
+                if stdout is None:
+                    return
+                for line in iter(stdout.readline, b""):
+                    print(f"[Server] {line.decode(errors='replace').rstrip()}")
+            except BaseException as e:
+                print(f"Error while streaming output: {e}")
+            finally:
+                try:
+                    if process.stdout:
+                        process.stdout.close()
+                except Exception:
+                    pass

693-702: Use sys.executable instead of bare "python"

More robust and fixes Bandit B607/B603 warnings.

-                self.http_server_process = subprocess.Popen(
+                self.http_server_process = subprocess.Popen(
                     [
-                        "python",
+                        sys.executable,
                         "-m",
                         "kai_mcp_solution_server",
                         "--transport",
                         "streamable-http",
                         "--port",
                         "8087",
                     ],

Add import:

import sys
🧹 Nitpick comments (14)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)

9-9: Remove unused import (ruff F401).

timedelta is imported but never used.

Apply this diff:

-from datetime import timedelta

63-69: Avoid unused local; don’t bind get_session_id if not used.

Prevents potential linter warnings and clarifies intent.

Apply this diff:

-            read, write, get_session_id = await stack.enter_async_context(
+            read, write, _ = await stack.enter_async_context(
                 streamablehttp_client(server_url)
             )
kai_mcp_solution_server/tests/ssl_utils.py (3)

36-49: Make monkey-patching reentrant-safe across concurrent contexts.

Current design patches/unpatches global httpx methods per-context without reference counting. Overlapping contexts (e.g., parallel tests/clients) can restore while others still rely on the patch. Add a module-level refcount + lock so only the last exiting context restores.

Here’s a minimal pattern to adopt (outside the selected range):

# Module-level
import threading
_PATCH_LOCK = threading.Lock()
_PATCH_COUNT = 0

def _acquire_patch() -> bool:
    global _PATCH_COUNT
    with _PATCH_LOCK:
        if _PATCH_COUNT == 0:
            return True  # caller should apply patches
        _PATCH_COUNT += 1
        return False

def _release_patch() -> bool:
    global _PATCH_COUNT
    with _PATCH_LOCK:
        _PATCH_COUNT -= 1
        return _PATCH_COUNT == 0  # caller should restore

Then in SSLMonkeyPatch:

  • In __enter__/apply_ssl_bypass(): call _acquire_patch(), apply patches only when it returns True.
  • In restore_ssl_settings(): call _release_patch(), restore only when it returns True.

Also applies to: 137-155


61-63: Over-broad warning suppression.

warnings.filterwarnings("ignore", category=Warning) silences all warnings. Restrict to the specific SSL warning message or a specific category.

Apply this diff:

-        warnings.filterwarnings("ignore", category=Warning)
+        # Keep the specific message filter; avoid blanket suppression

123-128: Cleanup unused SSL environment variables
HTTPX_SSL_VERIFY, HTTPX_NO_VERIFY, and SSL_CERT_VERIFY aren’t referenced elsewhere; retain only the standard PYTHONHTTPSVERIFY.

-        ssl_env_vars = {
-            "SSL_CERT_VERIFY": "false",
-            "HTTPX_SSL_VERIFY": "false",
-            "HTTPX_NO_VERIFY": "true",
-            "PYTHONHTTPSVERIFY": "0",
-        }
+        ssl_env_vars = {
+            "PYTHONHTTPSVERIFY": "0",
+        }
.github/workflows/stress-test-mcp-server.yml (2)

74-74: Trim trailing whitespace (yamllint).

These lines have trailing spaces; remove to appease linters and keep diffs clean.

Also applies to: 90-90, 96-96, 100-100


66-69: Optional: avoid hardcoding DB credentials in DSN to satisfy secret scanners.

Even if they’re test-only, Checkov flags inline creds. Consider building KAI_DB_DSN from job env vars.

Apply this minimal change (job-level env at the top of the job), then reference them in DSN:

   stress-test-postgres:
     name: Stress Test with PostgreSQL
     runs-on: ubuntu-latest
+    env:
+      PGUSER: kai_user
+      PGPASSWORD: kai_password
+      PGDATABASE: kai_test_db
...
         env:
-          KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db"
+          KAI_DB_DSN: "postgresql+asyncpg://${{ env.PGUSER }}:${{ env.PGPASSWORD }}@localhost:5432/${{ env.PGDATABASE }}"

Optionally add an allowlist annotation if your policy permits.

kai_mcp_solution_server/Makefile (3)

96-99: Protect JSON in KAI_LLM_PARAMS

Quoting JSON inline can break with embedded quotes/newlines. Prefer an env file or base64.

  • Option A: --env-file .env (store KAI_LLM_PARAMS there)
  • Option B (inline-safe): -e KAI_LLM_PARAMS_B64="$$(printf %s "$$KAI_LLM_PARAMS" | base64 -w0)" and decode in entrypoint.

243-243: Fix help text: NAMESPACE default mismatch

Help says konveyor-kai but variable default is konveyor-kai-solution-server.

-   @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+   @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"

224-224: Help lists 'logs' target that doesn't exist

Either add a logs target or remove this line from help.

Example target:

+# Show container logs
+.PHONY: logs
+logs:
+	@echo "Showing logs for deployment in namespace $(NAMESPACE)..."
+	kubectl logs -n $(NAMESPACE) -l app=kai-mcp-solution-server --all-containers=true --tail=200
kai_mcp_solution_server/tests/test_multiple_integration.py (4)

748-756: Shorten post-run sleeps to reduce flakiness

Ten-second sleep is brittle; prefer awaiting the last client futures and then proceed, or add a small bounded wait (1–2s) if needed for logs.

-            await asyncio.sleep(10)  # wait a moment for all output to be printed
+            await asyncio.sleep(2)  # brief pause for buffered output

792-800: Fail fast if no hints; add context to failure

Good check. Consider logging the server mode and NUM_TASKS for easier triage.

-                    self.fail(
-                        "No hints were generated! Hint generation may not be working correctly."
-                    )
+                    mode = "external" if not start_own_server else "self-contained"
+                    self.fail(f"No hints generated (mode={mode}, clients={NUM_TASKS}). Hint generation may not be working.")

36-43: Cross-platform DB path safety

isoformat() includes : which is invalid on Windows filenames. Use a safe format.

-        timestamp = datetime.datetime.now().isoformat()
+        timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")

723-747: Collect root causes from client failures

Wrap exceptions with context (client_id, mode, transport) to speed up debugging.

-                    except Exception as exc:
-                        error_msg = f"Task {task_id} failed: {exc}"
+                    except Exception as exc:
+                        error_msg = f"Task {task_id} failed (mode={'external' if not start_own_server else 'self-contained'}): {exc}"
                         print(f"[Main] {error_msg}")
                         errors.append(error_msg)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02641db and 17e6fd6.

📒 Files selected for processing (5)
  • .github/workflows/stress-test-mcp-server.yml (1 hunks)
  • kai_mcp_solution_server/Makefile (5 hunks)
  • kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
  • kai_mcp_solution_server/tests/ssl_utils.py (2 hunks)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (7)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
  • CallToolResult (22-24)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
  • GetBestHintResult (663-665)
  • SuccessRateMetric (721-727)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
  • MCPClientArgs (32-42)
  • create_client (137-152)
kai_mcp_solution_server/tests/mcp_client.py (1)
  • MCPClientArgs (247-256)
kai_mcp_solution_server/tests/ssl_utils.py (2)
  • apply_ssl_bypass (49-81)
  • apply_ssl_bypass (168-192)
kai/jsonrpc/util.py (1)
  • model_dump (96-97)
kai/jsonrpc/streams.py (1)
  • close (70-75)
🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/mcp_loader_script.py

[failure] 9-9: ruff(F401)
[new] datetime.timedelta imported but unused

kai_mcp_solution_server/tests/test_multiple_integration.py

[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.


[notice] 693-693: bandit(B607)
[new] Starting a process with a partial executable path


[notice] 693-693: bandit(B603)
[new] subprocess call - check for execution of untrusted input.

🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 YAMLlint (1.37.1)
.github/workflows/stress-test-mcp-server.yml

[error] 74-74: trailing spaces

(trailing-spaces)


[error] 90-90: trailing spaces

(trailing-spaces)


[error] 96-96: trailing spaces

(trailing-spaces)


[error] 100-100: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

479-479: Use of assert detected

(S101)


482-482: Do not catch blind exception: Exception

(BLE001)


491-493: Avoid specifying long messages outside the exception class

(TRY003)


547-547: Use of assert detected

(S101)


573-573: Use of assert detected

(S101)


597-597: Use of assert detected

(S101)


608-608: Use of assert detected

(S101)


611-611: Use of assert detected

(S101)


652-652: Use of assert detected

(S101)


664-664: Use of assert detected

(S101)


667-667: Use of assert detected

(S101)


674-674: Use of assert detected

(S101)


679-679: Use of assert detected

(S101)


694-702: Starting a process with a partial executable path

(S607)


743-743: Do not catch blind exception: Exception

(BLE001)


789-789: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: Test Code on PR
kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 752-752: TestMultipleIntegration.test_multiple_users failed: 30 clients failed due to event loop errors (Event loop is closed, All connection attempts failed).


[warning] 1-1: RuntimeWarning: coroutine 'TestMultipleIntegration.test_multiple_users..client_task' was never awaited.

🪛 GitHub Actions: MCP Server Stress Test
kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 1-1: Test step failed: pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users - All connection attempts failed for 5 concurrent clients against the self-contained MCP server.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_mcp_solution_server/tests/ssl_utils.py (1)

88-93: Good idempotence guard on httpx patching.

The _ssl_bypass_patched sentinel effectively avoids double-patching.

Comment on lines 439 to 456
if external_server_url:
# External server mode - parse URL to get host and port
from urllib.parse import urlparse

parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
print(f"Using external MCP server at {host}:{port}")

multiple_user_mcp_args = MCPClientArgs(
transport="http",
host=host,
port=port,
insecure=True,
server_path=None, # Not needed for external server
)
# Don't set KAI_LLM_PARAMS for external server - it should already be configured
start_own_server = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Honor URL scheme in external mode

Currently the client always uses http://{host}:{port}. If MCP_SERVER_URL is https://..., this will fail with connection errors.

-            parsed = urlparse(external_server_url)
+            parsed = urlparse(external_server_url)
             host = parsed.hostname or "localhost"
             port = parsed.port or 8000
+            scheme = (parsed.scheme or "http").lower()
             print(f"Using external MCP server at {host}:{port}")
@@
-            multiple_user_mcp_args = MCPClientArgs(
+            multiple_user_mcp_args = MCPClientArgs(
                 transport="http",
                 host=host,
                 port=port,
                 insecure=True,
                 server_path=None,  # Not needed for external server
             )

And use scheme when constructing the transport (see suggestion at Lines 509-515).

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +509 to +515
print(f"[Client {client_id}] starting")
apply_ssl_bypass()

client = Client(
transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Scope the SSL bypass and only when needed

Global monkeypatching per thread is racy and unnecessary for HTTP. Use the context manager and gate on scheme.

-        async def client_task(client_id: str) -> None:
+        async def client_task(client_id: str) -> None:
             print(f"[Client {client_id}] starting")
-            apply_ssl_bypass()
-
-            client = Client(
-                transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-            )
+            scheme = "http"
+            try:
+                scheme = scheme  # external mode may set this; fallback to http
+            except NameError:
+                pass
+            transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+            cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+            with cm:
+                client = Client(transport=transport)

Add at top-level imports:

import contextlib
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
509-515, avoid globally monkeypatching SSL per thread by scoping the bypass: add
"import contextlib" at the top, remove the standalone apply_ssl_bypass() call
and instead wrap the client creation in a context manager that only applies the
bypass when the transport scheme is HTTPS, e.g. compute the transport URL
string, then use "with (apply_ssl_bypass() if transport_url.startswith('https')
else contextlib.nullcontext()):" and create the Client inside that block so the
SSL bypass is temporary and only used when needed.

Comment on lines 713 to 714
await asyncio.sleep(1) # give the server a second to start
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Replace fixed sleep with active readiness probe

Prevents clients racing the server socket bind (root cause of intermittent connect failures).

-                await asyncio.sleep(1)  # give the server a second to start
+                await wait_for_http_server("localhost", 8087, timeout=30)

Place this helper near other helpers in the module (outside the selected range):

async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None:
    loop = asyncio.get_running_loop()
    deadline = loop.time() + timeout
    last_err: Exception | None = None
    while loop.time() < deadline:
        try:
            reader, writer = await asyncio.open_connection(host, port)
            writer.close()
            await writer.wait_closed()
            return
        except Exception as e:
            last_err = e
            await asyncio.sleep(0.2)
    raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
713-714, replace the fixed await asyncio.sleep(1) used to wait for the server
with an active readiness probe: add the provided async helper function
wait_for_http_server(host, port, timeout=30.0) near the other test helpers in
the module (outside the selected range), and call await
wait_for_http_server(server_host, server_port, timeout=30.0) instead of the
fixed sleep so the test actively retries connecting until the server socket is
ready and avoids race conditions.

Signed-off-by: Fabian von Feilitzsch <[email protected]>
Signed-off-by: Fabian von Feilitzsch <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
kai_mcp_solution_server/Makefile (1)

110-118: Avoid leaking secrets via inlined env values in podman-sqlite

Passing secrets as -e KEY='$(VALUE)' exposes them via ps/history. Pass-through host env instead.

Apply this diff:

-        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
-        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
-        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
-        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
-        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
-        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
-        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
-        $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
-        $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+        $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+        $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+        $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+        $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+        $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+        $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+        $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+        $(if $(AWS_REGION),-e AWS_REGION,) \
+        $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \
.github/workflows/stress-test-mcp-server.yml (1)

69-69: Use inputs context for num_clients (works across event types)

github.event.inputs is null except for workflow_dispatch. Use inputs.num_clients.

Apply this diff:

-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
🧹 Nitpick comments (7)
kai_mcp_solution_server/Makefile (2)

89-100: Consider passing provider creds in run-podman, too

External LLMs won’t work unless creds are provided. Mirror the pass-through flags here or document using PODMAN_ARGS.


239-239: Help text default for NAMESPACE doesn’t match actual default

Help says default is “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.

Apply this diff:

-    @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai)"
+    @echo "  NAMESPACE      : Kubernetes namespace (default: konveyor-kai-solution-server)"

Also applies to: 5-5

kai_mcp_solution_server/tests/test_multiple_integration.py (5)

460-472: Use asyncio.run per thread to avoid loop lifecycle bugs

Simpler and avoids “event loop is closed” under stress.

Apply this diff:

 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))

700-702: Apply SSL bypass for hint check when using HTTPS

Mirror the scoped bypass used for client tasks so HTTPS external servers pass.

Apply this diff:

-        async with create_client(multiple_user_mcp_args) as hint_session:
+        cm = apply_ssl_bypass() if multiple_user_mcp_args.transport == "https" else contextlib.nullcontext()
+        with cm:
+            async with create_client(multiple_user_mcp_args) as hint_session:
+                await hint_session.initialize()
-            await hint_session.initialize()

512-647: Prefer unittest assertions over bare assert for clearer failures

Optional, but aligns with unittest style and avoids S101 lints. Example:

Apply this pattern:

-assert incident_id > 0, f"Invalid incident ID: {incident_id}"
+self.assertGreater(incident_id, 0, f"Invalid incident ID: {incident_id}")

Repeat similarly for other asserts in this block.


555-555: Remove redundant local import

import json is already at module top.

Apply this diff:

-                    import json

684-697: Replace fixed sleeps with active polling

Use a readiness/poll helper instead of arbitrary sleeps to cut flakiness/time.

I can provide a small wait_for_condition or HTTP probe helper if you want it integrated.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 17e6fd6 and 4ad87a5.

📒 Files selected for processing (4)
  • .github/workflows/stress-test-mcp-server.yml (1 hunks)
  • kai_mcp_solution_server/Makefile (5 hunks)
  • kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • kai_mcp_solution_server/tests/mcp_loader_script.py
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml

[MEDIUM] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

512-512: Use of assert detected

(S101)


538-538: Use of assert detected

(S101)


562-562: Use of assert detected

(S101)


573-573: Use of assert detected

(S101)


576-576: Use of assert detected

(S101)


617-617: Use of assert detected

(S101)


629-629: Use of assert detected

(S101)


632-632: Use of assert detected

(S101)


639-639: Use of assert detected

(S101)


644-644: Use of assert detected

(S101)


679-679: Do not catch blind exception: Exception

(BLE001)


725-725: Do not catch blind exception: Exception

(BLE001)

🪛 GitHub Actions: MCP Server Stress Test
kai_mcp_solution_server/tests/test_multiple_integration.py

[error] 688-688: Test encountered parsing error: incident_id could not be parsed due to an error response from create_incident (invalid literal for int() with base 10: 'Error calling tool 'create_incident': ...'). Underlying cause: database integrity violation causing the tool call to fail.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)

Comment on lines +71 to +101
echo "Starting MCP server connected to PostgreSQL..."
uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
SERVER_PID=$!
# Wait for server to be ready
echo "Waiting for server to start..."
for i in {1..30}; do
if curl -s http://localhost:8000/ > /dev/null 2>&1; then
echo "Server is ready!"
break
fi
if [ $i -eq 30 ]; then
echo "Server failed to start in 30 seconds"
kill $SERVER_PID || true
exit 1
fi
echo -n "."
sleep 1
done
# Run the stress test
echo ""
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
make test-stress
TEST_RESULT=$?
# Stop the server
echo "Stopping MCP server..."
kill $SERVER_PID || true
exit $TEST_RESULT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden server lifecycle: fail-fast, trap, and wait on exit

Prevents orphaned background server and ensures proper failure propagation.

Apply this diff:

-        run: |
-          echo "Starting MCP server connected to PostgreSQL..."
-          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
-          SERVER_PID=$!
+        run: |
+          set -euo pipefail
+          echo "Starting MCP server connected to PostgreSQL..."
+          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
+          SERVER_PID=$!
+          trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT
@@
-          kill $SERVER_PID || true
-
-          exit $TEST_RESULT
+          kill $SERVER_PID || true
+          wait $SERVER_PID || true
+          exit $TEST_RESULT
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Starting MCP server connected to PostgreSQL..."
uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
SERVER_PID=$!
# Wait for server to be ready
echo "Waiting for server to start..."
for i in {1..30}; do
if curl -s http://localhost:8000/ > /dev/null 2>&1; then
echo "Server is ready!"
break
fi
if [ $i -eq 30 ]; then
echo "Server failed to start in 30 seconds"
kill $SERVER_PID || true
exit 1
fi
echo -n "."
sleep 1
done
# Run the stress test
echo ""
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
make test-stress
TEST_RESULT=$?
# Stop the server
echo "Stopping MCP server..."
kill $SERVER_PID || true
exit $TEST_RESULT
run: |
set -euo pipefail
echo "Starting MCP server connected to PostgreSQL..."
uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
SERVER_PID=$!
trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT
# Wait for server to be ready
echo "Waiting for server to start..."
for i in {1..30}; do
if curl -s http://localhost:8000/ > /dev/null 2>&1; then
echo "Server is ready!"
break
fi
if [ $i -eq 30 ]; then
echo "Server failed to start in 30 seconds"
kill $SERVER_PID || true
exit 1
fi
echo -n "."
sleep 1
done
# Run the stress test
echo ""
echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL"
make test-stress
TEST_RESULT=$?
# Stop the server
echo "Stopping MCP server..."
kill $SERVER_PID || true
wait $SERVER_PID || true
exit $TEST_RESULT
🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 71 to 101, the
workflow starts the MCP server in the background but doesn't fail-fast, trap
signals, or wait for the server process, which can leave orphaned processes or
hide failures; modify the script to: enable strict failure handling (e.g., set
-euo pipefail), register a trap on EXIT/INT/TERM that kills the background
server PID if still running, after launching the server save SERVER_PID and on
startup failure kill that PID and exit non‑zero, and when running tests use wait
$SERVER_PID to propagate the server's exit status (or capture and return
TEST_RESULT appropriately) so the job fails fast and cleans up the server in all
termination cases.

Comment on lines +104 to +109
- name: Check PostgreSQL connection count
if: always()
run: |
PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \
"SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Install psql before querying pg_stat_activity

Ubuntu runners may lack psql; this step can fail the job.

Add this step before “Check PostgreSQL connection count”:

      - name: Install PostgreSQL client
        run: |
          sudo apt-get update
          sudo apt-get install -y postgresql-client
🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 104 to 109, the
workflow runs psql directly which can fail on Ubuntu runners that don't have the
PostgreSQL client installed; add a step immediately before the "Check PostgreSQL
connection count" step that runs "sudo apt-get update" and then "sudo apt-get
install -y postgresql-client" to ensure the psql binary is available for the
subsequent query.

Comment on lines +446 to +457
parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
print(f"Using external MCP server at {host}:{port}")

multiple_user_mcp_args = MCPClientArgs(
transport="http",
host=host,
port=port,
insecure=True,
server_path=None, # Not needed for external server
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Honor MCP_SERVER_URL scheme and scope SSL bypass; prevent TLS flakiness

Use scheme from MCP_SERVER_URL and apply SSL bypass only for HTTPS.

Apply this diff:

 parsed = urlparse(external_server_url)
 host = parsed.hostname or "localhost"
 port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
@@
-        multiple_user_mcp_args = MCPClientArgs(
-            transport="http",
+        multiple_user_mcp_args = MCPClientArgs(
+            transport=scheme,
             host=host,
             port=port,
             insecure=True,
             server_path=None,  # Not needed for external server
         )
@@
-        async def client_task(client_id: str) -> None:
+        async def client_task(client_id: int) -> None:
             print(f"[Client {client_id}] starting")
-            apply_ssl_bypass()
-
-            client = Client(
-                transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-            )
+            transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+            cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+            with cm:
+                client = Client(transport=transport)

Add import at top (near other imports):

import contextlib

Also applies to: 473-480

🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
446-457, the test currently hardcodes transport="http" and insecure=True; change
it to derive transport from the MCP_SERVER_URL scheme (use parsed.scheme or
default "http") and set insecure=True only when the scheme is "https" (otherwise
False) so SSL bypass is applied only for HTTPS; add "import contextlib" near the
other imports; apply the same scheme+insecure logic to the similar block at
lines 473-480.

Comment on lines +499 to +505
"uri": f"file://test/file_{client_id}.java",
"message": f"Test issue for client {client_id}",
"ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets
"violation_name": f"test-violation-{client_id % 5}", # Share some violations
"violation_category": "potential",
"code_snip": "// test code",
"line_number": 42,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make URIs unique per client to avoid DB integrity violations

Use per-client UUID in URIs to prevent unique-constraint collisions.

Apply this diff:

-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",

Also applies to: 522-529, 586-588

🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
499-505 (and similarly at 522-529 and 586-588), the test uses identical file://
URIs per client which causes DB unique-constraint collisions; update the URI to
include a per-client unique identifier (e.g., a generated UUID or
client-specific UUID string) so each client gets a distinct URI (e.g.,
file://test/{client_uuid}/file_{client_id}.java or append ?uid={uuid}); ensure
the UUID is generated/available in the test setup and used consistently in all
three ranges mentioned to prevent integrity violations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
kai_mcp_solution_server/tests/test_multiple_integration.py (3)

460-471: Use asyncio.run to avoid event-loop lifecycle bugs.
Simpler and less flaky in threads.

 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))

446-457: Honor MCP_SERVER_URL scheme; set insecure only for HTTPS.
Prevents HTTP/HTTPS mismatches and avoids unnecessary global SSL patching.

 parsed = urlparse(external_server_url)
 host = parsed.hostname or "localhost"
 port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
 
 multiple_user_mcp_args = MCPClientArgs(
-    transport="http",
+    transport=scheme,
     host=host,
     port=port,
-    insecure=True,
+    insecure=(scheme == "https"),
     server_path=None,  # Not needed for external server
 )

473-481: Scope SSL bypass and use correct scheme; fix type annotation.
Avoids global monkeypatching and ensures HTTPS works; annotation must be int since you use modulo arithmetic.

-async def client_task(client_id: str) -> None:
+async def client_task(client_id: int) -> None:
     print(f"[Client {client_id}] starting")
-    apply_ssl_bypass()
-
-    client = Client(
-        transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-    )
+    transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+    cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+    with cm:
+        client = Client(transport=transport_url)
🧹 Nitpick comments (5)
kai_mcp_solution_server/tests/test_multiple_integration.py (5)

1-11: Add contextlib import (needed for scoped SSL bypass).
Used below to conditionally apply the SSL bypass only for HTTPS.

 import asyncio
-import concurrent
 import concurrent.futures
 import datetime
 import json
 import os
 import unittest
 from uuid import uuid4
+import contextlib

2-3: Drop redundant import.
import concurrent is unnecessary since import concurrent.futures already brings concurrent into scope.

-import concurrent
 import concurrent.futures

621-623: Replace fixed sleep with readiness polling.
Reduces flakiness under load by waiting until metrics reflect acceptance (with timeout).

-# Wait a bit for the acceptance to be processed
-await asyncio.sleep(0.5)
+# Wait for acceptance to be reflected in metrics (timeout safeguards the loop)
+await wait_for_metric_change(client, client_id, initial_accepted, initial_pending, ruleset=f"test-ruleset-{client_id % 10}", violation=f"test-violation-{client_id % 5}", timeout=10.0)

Add this helper (outside the diffed range, e.g., near other helpers in this test module):

async def wait_for_metric_change(client: Client, client_id: int, initial_accepted: int, initial_pending: int, ruleset: str, violation: str, timeout: float = 10.0) -> None:
    start = asyncio.get_event_loop().time()
    last_err: Exception | None = None
    while asyncio.get_event_loop().time() - start < timeout:
        res = await client.session.call_tool(
            "get_success_rate",
            {"violation_ids": [{"ruleset_name": ruleset, "violation_name": violation}]},
        )
        if res.isError:
            last_err = RuntimeError(res.content[0].text)
            await asyncio.sleep(0.2)
            continue
        rate = json.loads(res.content[0].text)
        if isinstance(rate, list):
            rate = rate[0] if rate else {}
        if (rate.get("accepted_solutions", 0) - initial_accepted) >= 1 and (rate.get("pending_solutions", 0) - initial_pending) <= 0:
            return
        await asyncio.sleep(0.2)
    raise TimeoutError(f"[Client {client_id}] metrics did not reflect acceptance within {timeout}s: last_err={last_err}")

683-687: Probe server readiness before launching clients.
Avoids initial connection races against the external server.

 # External server should already be running
 # Allow configuring number of concurrent clients via environment variable
 NUM_TASKS = int(os.environ.get("NUM_CONCURRENT_CLIENTS", "30"))
 print(f"Testing with {NUM_TASKS} concurrent clients")
+await wait_for_http_server(host, port, timeout=30)

Add this helper (outside the diffed range):

async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None:
    loop = asyncio.get_running_loop()
    deadline = loop.time() + timeout
    last_err: Exception | None = None
    while loop.time() < deadline:
        try:
            reader, writer = await asyncio.open_connection(host, port)
            writer.close()
            await writer.wait_closed()
            return
        except Exception as e:
            last_err = e
            await asyncio.sleep(0.2)
    raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")

720-757: Wrap hint-session in scoped SSL bypass for HTTPS.
Keeps HTTPS behavior consistent with client tasks.

 print("Verifying hints were generated...")
-async with create_client(multiple_user_mcp_args) as hint_session:
-    await hint_session.initialize()
+cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+with cm:
+    async with create_client(multiple_user_mcp_args) as hint_session:
+        await hint_session.initialize()
 
-# Check hints for a few different violation combinations
-violations_to_check = [
+    # Check hints for a few different violation combinations
+    violations_to_check = [
         ("test-ruleset-0", "test-violation-0"),
         ("test-ruleset-1", "test-violation-1"),
         ("test-ruleset-2", "test-violation-2"),
-]
+    ]
 
-hints_found = 0
-for ruleset, violation in violations_to_check:
-    try:
-        hint_result = await hint_session.call_tool(
+    hints_found = 0
+    for ruleset, violation in violations_to_check:
+        try:
+            hint_result = await hint_session.call_tool(
                 "get_best_hint",
                 {
                     "ruleset_name": ruleset,
                     "violation_name": violation,
                 },
-        )
-        if hint_result and not hint_result.isError:
-            hints_found += 1
-            print(f"  ✓ Found hint for {ruleset}/{violation}")
-        else:
-            print(f"  ✗ No hint for {ruleset}/{violation}")
-    except Exception as e:
-        print(f"  ✗ Error checking hint for {ruleset}/{violation}: {e}")
+            )
+            if hint_result and not hint_result.isError:
+                hints_found += 1
+                print(f"  ✓ Found hint for {ruleset}/{violation}")
+            else:
+                print(f"  ✗ No hint for {ruleset}/{violation}")
+        except Exception as e:
+            print(f"  ✗ Error checking hint for {ruleset}/{violation}: {e}")
 
 if hints_found == 0:
     self.fail(
         "No hints were generated! Hint generation may not be working correctly."
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4ad87a5 and 1a0942c.

📒 Files selected for processing (1)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (4)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
  • GetBestHintResult (663-665)
  • SuccessRateMetric (721-727)
kai_mcp_solution_server/tests/mcp_client.py (1)
  • MCPClientArgs (247-256)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
  • MCPClientArgs (31-41)
  • create_client (136-151)
kai_mcp_solution_server/tests/ssl_utils.py (2)
  • apply_ssl_bypass (49-81)
  • apply_ssl_bypass (168-192)
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

514-516: Abstract raise to an inner function

(TRY301)


514-516: Create your own exception

(TRY002)


514-516: Avoid specifying long messages outside the exception class

(TRY003)


521-523: Create your own exception

(TRY002)


521-523: Avoid specifying long messages outside the exception class

(TRY003)


526-526: Use of assert detected

(S101)


554-556: Abstract raise to an inner function

(TRY301)


554-556: Create your own exception

(TRY002)


554-556: Avoid specifying long messages outside the exception class

(TRY003)


561-563: Create your own exception

(TRY002)


561-563: Avoid specifying long messages outside the exception class

(TRY003)


566-566: Use of assert detected

(S101)


590-590: Use of assert detected

(S101)


601-601: Use of assert detected

(S101)


604-604: Use of assert detected

(S101)


645-645: Use of assert detected

(S101)


657-657: Use of assert detected

(S101)


660-660: Use of assert detected

(S101)


667-667: Use of assert detected

(S101)


672-672: Use of assert detected

(S101)


706-706: Do not catch blind exception: Exception

(BLE001)


747-747: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

Comment on lines +493 to +508
# Create an incident
incident_result = await client.session.call_tool(
"create_incident",
{
"client_id": f"stress-test-{client_uuid}",
"extended_incident": {
"uri": f"file://test/file_{client_id}.java",
"message": f"Test issue for client {client_id}",
"ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets
"violation_name": f"test-violation-{client_id % 5}", # Share some violations
"violation_category": "potential",
"code_snip": "// test code",
"line_number": 42,
"variables": {},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make file URIs unique per client to prevent DB unique-constraint collisions.
Shareable rulesets/violations are fine; URIs should be distinct.

                         "extended_incident": {
-                            "uri": f"file://test/file_{client_id}.java",
+                            "uri": f"file://test/file_{client_id}_{client_uuid}.java",
                             "message": f"Test issue for client {client_id}",
                             "ruleset_name": f"test-ruleset-{client_id % 10}",  # Share some rulesets
                             "violation_name": f"test-violation-{client_id % 5}",  # Share some violations
                             "violation_category": "potential",
                             "code_snip": "// test code",
                             "line_number": 42,
                             "variables": {},
                         },
@@
                             "before": [
                                 {
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
                                     "content": "// original code",
                                 }
                             ],
                             "after": [
                                 {
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
                                     "content": "// fixed code",
                                 }
                             ],
@@
                             "solution_file": {
-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",
                                 "content": "// fixed code",
                             },

Also applies to: 535-546, 612-617

🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 493
to 508 (and also apply the same change at 535-546 and 612-617), the file URI
used for incidents is not unique per client which causes DB unique-constraint
collisions; change the "uri" value to include a client-unique identifier (for
example client_uuid or client_id) so each test incident uses a distinct file://
path (e.g., append client_uuid/client_id into the filename or path) while
leaving shared ruleset/violation names unchanged.

Comment on lines +568 to +586
# Get success rate (before accepting)
success_rate_result = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
# Parse and verify success rate
success_rate_text = success_rate_result.content[0].text
import json

success_rate = json.loads(success_rate_text)
print(f"[Client {client_id}] got success rate: {success_rate}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Check success_rate tool-call errors and remove redundant import.
Prevents JSON parse on error payload; json is already imported at top.

 success_rate_result = await client.session.call_tool(
     "get_success_rate",
     {
         "violation_ids": [
             {
                 "ruleset_name": f"test-ruleset-{client_id % 10}",
                 "violation_name": f"test-violation-{client_id % 5}",
             }
         ]
     },
 )
-# Parse and verify success rate
-success_rate_text = success_rate_result.content[0].text
-import json
+if success_rate_result.isError:
+    error_msg = success_rate_result.content[0].text
+    raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}")
+# Parse and verify success rate
+success_rate_text = success_rate_result.content[0].text
 
 success_rate = json.loads(success_rate_text)
 print(f"[Client {client_id}] got success rate: {success_rate}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get success rate (before accepting)
success_rate_result = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
# Parse and verify success rate
success_rate_text = success_rate_result.content[0].text
import json
success_rate = json.loads(success_rate_text)
print(f"[Client {client_id}] got success rate: {success_rate}")
# Get success rate (before accepting)
success_rate_result = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
if success_rate_result.isError:
error_msg = success_rate_result.content[0].text
raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}")
# Parse and verify success rate
success_rate_text = success_rate_result.content[0].text
success_rate = json.loads(success_rate_text)
print(f"[Client {client_id}] got success rate: {success_rate}")
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 568
to 586, the code unconditionally parses success_rate_result.content[0].text with
a local "import json" and may attempt to JSON-parse an error payload; remove the
redundant "import json" and first check the tool-call result for an error/failed
status (or presence of an error field) before parsing: if the call failed,
log/raise/assert with the error message instead of parsing, and only call
json.loads on the text when the result indicates success.

Comment on lines +624 to +637
# Get success rate again (after accepting)
success_rate_result2 = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
success_rate_text2 = success_rate_result2.content[0].text
success_rate2 = json.loads(success_rate_text2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Also check the second success_rate call for errors.
Matches the first call; avoids parsing error payloads.

 success_rate_result2 = await client.session.call_tool(
     "get_success_rate",
     {
         "violation_ids": [
             {
                 "ruleset_name": f"test-ruleset-{client_id % 10}",
                 "violation_name": f"test-violation-{client_id % 5}",
             }
         ]
     },
 )
-success_rate_text2 = success_rate_result2.content[0].text
+if success_rate_result2.isError:
+    error_msg = success_rate_result2.content[0].text
+    raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}")
+success_rate_text2 = success_rate_result2.content[0].text
 success_rate2 = json.loads(success_rate_text2)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get success rate again (after accepting)
success_rate_result2 = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
success_rate_text2 = success_rate_result2.content[0].text
success_rate2 = json.loads(success_rate_text2)
# Get success rate again (after accepting)
success_rate_result2 = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
if success_rate_result2.isError:
error_msg = success_rate_result2.content[0].text
raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}")
success_rate_text2 = success_rate_result2.content[0].text
success_rate2 = json.loads(success_rate_text2)
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 624
to 637, the second get_success_rate call result isn't being checked for an error
before parsing; add the same safety/assertions used on the first call: assert
success_rate_result2.success is True (or raise a clear assertion with result
payload), assert content exists and has at least one item, and assert the first
content item contains a text field before calling json.loads on it so we avoid
parsing error payloads.

Signed-off-by: Fabian von Feilitzsch <[email protected]>
Signed-off-by: Fabian von Feilitzsch <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
kai_mcp_solution_server/tests/mcp_loader_script.py (1)

105-114: STDIO client: guard missing server_path.

Prevent passing cwd=None into StdioServerParameters.

 async def create_stdio_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]:
-    print(f"Using server path: {args.server_path}")
+    if not args.server_path:
+        raise ValueError("server_path is required for stdio transport")
+    print(f"Using server path: {args.server_path}")
♻️ Duplicate comments (9)
kai_mcp_solution_server/tests/mcp_client.py (1)

305-333: Guard stdio path: avoid Path(None) TypeError.

When transport=stdio and server_path is None, Path(None) raises. Add an early check.

-        else:  # stdio transport
-            print(f"Using server path: {args.server_path}")
+        else:  # stdio transport
+            if args.server_path is None:
+                raise ValueError("server_path is required when using --transport stdio")
+            print(f"Using server path: {args.server_path}")
             logger.debug(
                 f"Initializing STDIO transport with server path: {args.server_path}"
             )
@@
-            server_path = Path(args.server_path)
+            server_path = Path(args.server_path)
.github/workflows/stress-test-mcp-server.yml (3)

69-69: Use inputs context for dispatch parameter.

github.event.inputs is null outside workflow_dispatch; inputs works across events.

-          NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+          NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}

71-101: Harden server lifecycle: fail-fast, trap, and wait.

Prevents orphaned server and ensures errors propagate reliably under load.

-        run: |
-          echo "Starting MCP server connected to PostgreSQL..."
-          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
-          SERVER_PID=$!
+        run: |
+          set -euo pipefail
+          echo "Starting MCP server connected to PostgreSQL..."
+          uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
+          SERVER_PID=$!
+          trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT
@@
-          kill $SERVER_PID || true
-
-          exit $TEST_RESULT
+          kill $SERVER_PID || true
+          wait $SERVER_PID || true
+          exit $TEST_RESULT

104-109: Install psql before querying pg_stat_activity.

Ubuntu runners may not have the client by default.

+      - name: Install PostgreSQL client
+        if: always()
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y postgresql-client
       - name: Check PostgreSQL connection count
         if: always()
         run: |
           PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \
             "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
kai_mcp_solution_server/tests/test_multiple_integration.py (5)

401-413: Use asyncio.run to avoid loop lifecycle bugs in threads.

Simpler, less flaky than manual loop creation/closing.

 def run_async_in_thread(fn, *args, **kwargs):
-    try:
-        loop = asyncio.get_event_loop()
-    except RuntimeError:
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
-
-    try:
-        result = loop.run_until_complete(fn(*args, **kwargs))
-        return result
-    finally:
-        loop.close()
+    return asyncio.run(fn(*args, **kwargs))

440-448: Make file URIs unique per client to avoid collisions in shared DBs.

Add client_uuid so URIs don’t collide across concurrent runs/clients.

-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                    "uri": f"file://test/file_{client_id}.java",
+                                    "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
-                                "uri": f"file://test/file_{client_id}.java",
+                                "uri": f"file://test/file_{client_id}_{client_uuid}.java",

Also applies to: 477-485, 483-485, 555-557


414-421: Activate SSL bypass and use parsed scheme in transport URL.

Calling apply_ssl_bypass() without entering the context has no effect; also the URL is forced to HTTP.

-async def client_task(client_id: str) -> None:
+async def client_task(client_id: int) -> None:
     print(f"[Client {client_id}] starting")
-    apply_ssl_bypass()
-
-    client = Client(
-        transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
-    )
+    transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+    cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+    with cm:
+        client = Client(transport=transport_url)

509-527: Check tool-call result before JSON parse; remove inner import.

Avoid parsing error payloads; reuse top-level json import.

 success_rate_result = await client.session.call_tool(
@@
 )
-# Parse and verify success rate
-success_rate_text = success_rate_result.content[0].text
-import json
+if getattr(success_rate_result, "isError", False):
+    msg = success_rate_result.content[0].text if success_rate_result.content else "unknown error"
+    raise Exception(f"[Client {client_id}] get_success_rate failed: {msg}")
+success_rate_text = success_rate_result.content[0].text
 success_rate = json.loads(success_rate_text)

565-579: Also guard the second success-rate call.

 success_rate_result2 = await client.session.call_tool(
@@
 )
-success_rate_text2 = success_rate_result2.content[0].text
+if getattr(success_rate_result2, "isError", False):
+    msg = success_rate_result2.content[0].text if success_rate_result2.content else "unknown error"
+    raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {msg}")
+success_rate_text2 = success_rate_result2.content[0].text
 success_rate2 = json.loads(success_rate_text2)
🧹 Nitpick comments (13)
.gitignore (1)

209-213: Also ignore SQLite sidecar files (-wal/-shm).

To keep repos clean after local runs, consider ignoring SQLite’s write-ahead log artifacts too.

 # Test database files
 tests/test_*.db
 kai_mcp_solution_server/tests/test_*.db
 *.db
+*.db-wal
+*.db-shm
kai_mcp_solution_server/tests/mcp_client.py (1)

351-361: Increase client run timeout to reduce CI flakes under load.

15s can be tight on busy runners; bumping improves stress-test stability.

-                await asyncio.wait_for(run_with_timeout(), timeout=15.0)
+                await asyncio.wait_for(run_with_timeout(), timeout=45.0)
kai_mcp_solution_server/tests/ssl_utils.py (2)

88-93: Idempotence flag: don’t mark as patched if we didn’t patch.

If we early-return, leave httpx_patched False so restore() doesn’t attempt to revert unknown state.

-            if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"):
-                logger.debug("httpx already patched, skipping")
-                self.httpx_patched = True
-                return
+            if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"):
+                logger.debug("httpx already patched, skipping")
+                self.httpx_patched = False
+                return

106-109: Preserve function metadata on wrappers.

Use functools.wraps so debuggers and introspection don’t degrade.

+            import functools
@@
-            patched_client_init._ssl_bypass_patched = True
-            patched_async_client_init._ssl_bypass_patched = True
+            patched_client_init = functools.wraps(self.original_httpx_client_init)(patched_client_init)
+            patched_async_client_init = functools.wraps(self.original_httpx_async_client_init)(patched_async_client_init)
+            patched_client_init._ssl_bypass_patched = True
+            patched_async_client_init._ssl_bypass_patched = True

Optional: narrow warning suppression (lines 60–63) to specific SSL categories instead of all Warning to avoid masking unrelated issues.

kai_mcp_solution_server/Makefile (1)

95-100: Avoid inlining secrets in process args.

If KAI_LLM_PARAMS may contain credentials, prefer pass-through from environment to avoid exposure via ps/history.

-        -e MOUNT_PATH='$(MOUNT_PATH)' \
-        -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \
-        -e KAI_DB_DSN='$(KAI_DB_DSN)' \
+        -e MOUNT_PATH \
+        -e KAI_LLM_PARAMS \
+        -e KAI_DB_DSN \

If you need non-env defaults, consider --env-file or mounting a config file.

kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (3)

113-127: Detect backend via URL, and add sqlite timeout.

String contains("sqlite") can produce false positives; use make_url. Also add aiosqlite connect timeout to reduce “database is locked” under load.

-async def get_async_engine(url: URL | str) -> AsyncEngine:
-    # Convert to string if URL object
-    url_str = str(url)
-
-    # Configure connection pool based on database type
-    if "sqlite" in url_str:
+async def get_async_engine(url: URL | str) -> AsyncEngine:
+    from sqlalchemy.engine.url import make_url
+    backend = make_url(str(url)).get_backend_name()
+
+    # Configure connection pool based on database type
+    if backend == "sqlite":
         # SQLite needs special handling due to its file-based nature
         # and limited write concurrency (even with WAL mode)
         engine = create_async_engine(
             url,
-            connect_args={"check_same_thread": False},
+            connect_args={"check_same_thread": False, "timeout": 30},
             poolclass=StaticPool,  # Single shared connection for SQLite
             echo_pool=False,  # Set to True for debugging connection pool
         )

Note: If you expect concurrent writes on sqlite, consider NullPool + WAL mode, or prefer Postgres in stress paths.


100-105: Use logger instead of stderr print for DDL ops.

Keeps diagnostics consistent with server logging.

-        print("Dropping all tables", file=sys.stderr)
+        import logging
+        logging.getLogger(__name__).warning("Dropping all tables")

113-141: Upstream API race mitigation (follow-up).

Most reported errors were httpx/OpenAI connect/TLS issues under concurrency. Beyond DB pools, ensure:

  • A single shared Async OpenAI client per process with tuned httpx.Limits (e.g., max_connections ~ 100, max_keepalive ~ 20), and explicit timeouts.
  • Bounded concurrency around the LLM call (semaphore) and retry with jitter on connect/read timeouts.
  • Idempotency guards so retries don’t duplicate work.

I can draft a patch wiring a shared client and a retry/backoff policy if you want.

kai_mcp_solution_server/tests/test_multiple_integration.py (3)

2-3: Remove redundant import.

import concurrent is unused; import concurrent.futures is sufficient.

-import concurrent
 import concurrent.futures

567-604: Prefer unittest assertions over bare asserts (optional).

For clearer failure output, replace plain assert with self.assert* equivalents.


651-655: Drop the fixed 10s sleep.

All futures are awaited; this extra wait extends CI time without benefit.

-        await asyncio.sleep(10)  # wait a moment for all output to be printed
+        # No-op: results already collected; avoid fixed delays.
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)

111-115: Remove unused lock or repurpose for LLM gating.

self.lock is no longer used; either delete it or replace with a concurrency gate for outbound LLM calls (see below).

 class KaiSolutionServerContext:
     def __init__(self, settings: SolutionServerSettings) -> None:
         self.settings = settings
-        self.lock = asyncio.Lock()

260-264: Re-removal of lock: add DB-level get-or-create safety.

With the lock gone, concurrent create_incident calls can race on first-time DBViolation creation. Ensure a unique constraint exists on (ruleset_name, violation_name) and handle IntegrityError with a re-query.

Would you like a small patch that wraps violation creation in a retry-on-conflict block?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60a0052 and f20601b.

📒 Files selected for processing (9)
  • .github/workflows/stress-test-mcp-server.yml (1 hunks)
  • .gitignore (1 hunks)
  • kai_mcp_solution_server/Makefile (5 hunks)
  • kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2 hunks)
  • kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (10 hunks)
  • kai_mcp_solution_server/tests/mcp_client.py (1 hunks)
  • kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
  • kai_mcp_solution_server/tests/ssl_utils.py (2 hunks)
  • kai_mcp_solution_server/tests/test_multiple_integration.py (3 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml

[medium] 66-67: Basic Auth Credentials

(CKV_SECRET_4)

🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py

455-457: Abstract raise to an inner function

(TRY301)


455-457: Create your own exception

(TRY002)


455-457: Avoid specifying long messages outside the exception class

(TRY003)


462-464: Create your own exception

(TRY002)


462-464: Avoid specifying long messages outside the exception class

(TRY003)


467-467: Use of assert detected

(S101)


495-497: Abstract raise to an inner function

(TRY301)


495-497: Create your own exception

(TRY002)


495-497: Avoid specifying long messages outside the exception class

(TRY003)


502-504: Create your own exception

(TRY002)


502-504: Avoid specifying long messages outside the exception class

(TRY003)


507-507: Use of assert detected

(S101)


531-531: Use of assert detected

(S101)


542-542: Use of assert detected

(S101)


545-545: Use of assert detected

(S101)


586-586: Use of assert detected

(S101)


598-598: Use of assert detected

(S101)


601-601: Use of assert detected

(S101)


608-608: Use of assert detected

(S101)


613-613: Use of assert detected

(S101)


647-647: Do not catch blind exception: Exception

(BLE001)


688-688: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)

133-140: Pool sizing looks reasonable; pre_ping enabled.

Good choices for high fan-out workloads. Ensure server disposes the engine on shutdown to release connections.

If not already, verify server shutdown calls engine.dispose() after async_engine.close() to avoid lingering connections in tests.

kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)

386-394: Wrapper changes look good.

Directly calling the underlying functions without the app lock aligns with DB-managed concurrency.

Also applies to: 652-656, 714-718, 806-809, 893-897, 924-928

Comment on lines +242 to +243
@echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix corrupted help text.

User-facing help prints a garbled example. Replace with a clean domain.

-    @echo "  make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"
-    
+    @echo "  make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"
@echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"
🤖 Prompt for AI Agents
In kai_mcp_solution_server/Makefile around lines 242 to 243, the help text
contains corrupted/garbled example usage ("make test-http HOST=api.exayeah,
there are a lot of little ones...") — replace that line with a clean, concise
example such as: echo a correct make invocation using a normal domain, PORT and
BEARER_TOKEN (for example: make test-http HOST=api.example.com PORT=443
BEARER_TOKEN='token') so the printed help shows a clear, readable usage example.

Comment on lines 116 to +124
async def create(self) -> None:
self.engine = await get_async_engine(
self.settings.db_dsn, self.settings.drop_all
)
from kai_mcp_solution_server.db.dao import Base

self.engine = await get_async_engine(self.settings.db_dsn)

# Ensure tables exist (safe - only creates if not already there)
async with self.engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bound outbound LLM concurrency to prevent connection stampedes (root cause of #861).

Unthrottled ainvoke() calls (spawned from accept_file → generate_hint_v3) can overwhelm connection pools and trigger httpx/httpcore TLS/connection errors under load. Add a process-wide semaphore and use it in generators.

 async def create(self) -> None:
@@
-        self.engine = await get_async_engine(self.settings.db_dsn)
+        self.engine = await get_async_engine(self.settings.db_dsn)
@@
         self.session_maker = async_sessionmaker(
             bind=self.engine, expire_on_commit=False
         )
         self.model: BaseChatModel
+        # Limit concurrent LLM calls (configurable via env)
+        self.llm_semaphore = asyncio.Semaphore(
+            int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8"))
+        )

Also harden model init with retries/timeouts:

-        else:
-            self.model = init_chat_model(**self.settings.llm_params)
+        else:
+            params = {
+                "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")),
+                "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+                **self.settings.llm_params,
+            }
+            self.model = init_chat_model(**params)

Apply the gate and timeout at call sites (outside this range):

# In generate_hint_v1/v2/v3 where response = await kai_ctx.model.ainvoke(prompt)
async with kai_ctx.llm_semaphore:
    response = await asyncio.wait_for(
        kai_ctx.model.ainvoke(prompt),
        timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
    )

Comment on lines +164 to 169
# Clean up database connections when client disconnects
if "ctx" in locals() and hasattr(ctx, "engine"):
log("Disposing database engine...")
await ctx.engine.dispose()
log("Database engine disposed")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Graceful shutdown: await background hint tasks before disposing engine.

Disposing the engine while generate_hint_v* tasks are in-flight risks DB errors. Track tasks and drain/cancel with a timeout before disposal.

 finally:
-        # Clean up database connections when client disconnects
-        if "ctx" in locals() and hasattr(ctx, "engine"):
-            log("Disposing database engine...")
-            await ctx.engine.dispose()
-            log("Database engine disposed")
+        if "ctx" in locals():
+            # Wait briefly for background tasks to finish
+            if hasattr(ctx, "bg_tasks") and ctx.bg_tasks:
+                log(f"Draining {len(ctx.bg_tasks)} background tasks...")
+                done, pending = await asyncio.wait(
+                    ctx.bg_tasks, timeout=float(os.getenv("KAI_BG_DRAIN_TIMEOUT", "15"))
+                )
+                for t in pending:
+                    t.cancel()
+            if hasattr(ctx, "engine"):
+                log("Disposing database engine...")
+                await ctx.engine.dispose()
+                log("Database engine disposed")

Outside this range (supporting changes):

# In KaiSolutionServerContext.__init__
self.bg_tasks: set[asyncio.Task] = set()

# In accept_file after create_task(...)
task = asyncio.create_task(generate_hint_v3(kai_ctx, client_id))
kai_ctx.bg_tasks.add(task)
task.add_done_callback(kai_ctx.bg_tasks.discard)
🤖 Prompt for AI Agents
In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines
164 to 169, before disposing ctx.engine you must drain or cancel background hint
tasks tracked on ctx.bg_tasks to avoid DB use during shutdown: check if
ctx.bg_tasks is non-empty, await their completion with a configurable timeout
(e.g., use asyncio.wait with a timeout), cancel any tasks still pending after
the timeout, await them again to let cancellations propagate, and handle/ignore
asyncio.CancelledError or other exceptions while logging failures; only after
pending tasks are finished or cancelled should you call await
ctx.engine.dispose().

Comment on lines 62 to 69
read, write, get_session_id = await stack.enter_async_context(
streamablehttp_client(server_url, insecure=args.insecure)
streamablehttp_client(server_url)
)
logger.debug("Streamable HTTP client connection established")
session = await stack.enter_async_context(
ClientSession(read, write, get_session_id)
)
# Note: get_session_id is for session management, not passed to ClientSession
session = await stack.enter_async_context(ClientSession(read, write))
logger.debug("MCP ClientSession initialized")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

HTTP client: validate required args and default mount_path.

If port/mount_path are None, server_url becomes malformed. Add minimal validation/defaults.

 async def create_http_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]:
     if args.host is None:
         raise ValueError("Host must be specified for HTTP transport")
+    if args.port is None:
+        raise ValueError("Port must be specified for HTTP transport")
+    if args.mount_path is None:
+        args.mount_path = "/"
@@
-            read, write, get_session_id = await stack.enter_async_context(
+            read, write, get_session_id = await stack.enter_async_context(
                 streamablehttp_client(server_url)
             )
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/mcp_loader_script.py around lines 62 to 69, the
code builds server_url without validating port and mount_path which can produce
malformed URLs when those values are None; ensure port and mount_path are
validated and defaulted before constructing server_url: if mount_path is None or
empty set it to "/", trim leading/trailing slashes to avoid double slashes when
concatenating, and if port is None set it to 80 for "http" or 443 for "https"
(or raise ValueError for unknown scheme); also validate types (port int >=1) and
raise a clear exception when invalid so the subsequent ClientSession setup
receives a well-formed URL.

Comment on lines +384 to +399
# External server mode - parse URL to get host and port
from urllib.parse import urlparse

parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
print(f"Using external MCP server at {host}:{port}")

multiple_user_mcp_args = MCPClientArgs(
transport="http",
host=host,
port=port,
insecure=True,
server_path=None, # Not needed for external server
)
# Don't set KAI_LLM_PARAMS for external server - it should already be configured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Honor MCP_SERVER_URL scheme and scope SSL bypass.

Hard-coding HTTP ignores HTTPS endpoints and makes the bypass ineffective. Parse the scheme, use it in the transport, and only apply the SSL bypass for HTTPS.

 parsed = urlparse(external_server_url)
 host = parsed.hostname or "localhost"
 port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
@@
 multiple_user_mcp_args = MCPClientArgs(
-    transport="http",
+    transport=scheme,
     host=host,
     port=port,
-    insecure=True,
+    insecure=(scheme == "https"),
     server_path=None,  # Not needed for external server
 )

Add import (outside range):

import contextlib

@JonahSussman JonahSussman changed the title 🐛 Fix another potential race condition 🐛 Fix DB connection pool getting exhausted Sep 9, 2025
Signed-off-by: JonahSussman <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)

306-307: Early guard for stdio transport: good fix; addresses prior review

This resolves the previous TypeError risk when server_path was None for stdio. Minor: Ruff hints (TRY003/TRY301) flag this raise. If you want to silence them without changing flow, add per-line ignores.

Apply inline ignores:

-            if args.server_path is None:
-                raise ValueError("server_path is required when using --transport stdio")
+            if args.server_path is None:
+                raise ValueError(
+                    "server_path is required when using --transport stdio"
+                )  # noqa: TRY003,TRY301
🧹 Nitpick comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)

198-199: Fix printf-style placeholders in print statements

print("...", result) won’t interpolate %s. Use f-strings for readable output.

-        print("get_best_hint tool call completed, result: %s", result)
+        print(f"get_best_hint tool call completed, result: {result}")
-        print("get_success_rate resource call completed, result: %s", result)
+        print(f"get_success_rate resource call completed, result: {result}")

Also applies to: 229-229

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f20601b and 76df1f1.

📒 Files selected for processing (2)
  • kai_mcp_solution_server/tests/mcp_client.py (2 hunks)
  • kai_mcp_solution_server/tests/mcp_loader_script.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • kai_mcp_solution_server/tests/mcp_loader_script.py
🧰 Additional context used
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/mcp_client.py

307-307: Abstract raise to an inner function

(TRY301)


307-307: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)

251-251: Optional server_path type change: LGTM

Making server_path optional on the model aligns with external-server (HTTP) mode and avoids forcing an irrelevant parameter. No issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DB Connection pool is getting exhausted

3 participants