-
Notifications
You must be signed in to change notification settings - Fork 53
🐛 Fix DB connection pool getting exhausted #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: JonahSussman <[email protected]>
WalkthroughRefactors DB engine creation and lifecycle, removes runtime drop-all setting, replaces per-call locks with transactional sessions, adds explicit table management helpers, introduces a GitHub Actions stress-test workflow and Makefile stress/podman targets, and adds a high-concurrency external-server integration test plus supporting test-tooling tweaks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Server
participant DB
participant LLM as LLM/HintGen
rect rgb(240,248,255)
note right of Server: Startup: create engine & ensure tables
Server->>DB: get_async_engine(db_dsn)
Server->>DB: ensure_tables_exist()
end
Client->>Server: create_incident / create_solution
Server->>DB: session.begin() { insert }
DB-->>Server: OK (ids)
Server-->>Client: IDs
Client->>Server: accept_file
Server->>DB: session.begin() { update status }
alt all client's files accepted/modified
Server--)LLM: spawn async generate_hint_v3()
end
Server-->>Client: OK
rect rgb(245,240,255)
note right of LLM: Async hint task uses DB read/write
LLM->>DB: read code + diffs
LLM-->>Server: produced hint
Server->>DB: session.begin() { insert hint }
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (3 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
51-55: Validate port and normalize mount_path to avoid malformed URLs ("None" in URL).If
args.portorargs.mount_pathis unset,server_urlbecomes invalid.- transport = "" - if not args.host.startswith("http"): - transport = "http://" - server_url = f"{transport}{args.host}:{args.port}{args.mount_path}" + transport = "" + if args.port is None: + raise ValueError("Port must be specified for HTTP transport") + mount_path = args.mount_path or "" + if not (args.host.startswith("http://") or args.host.startswith("https://")): + transport = "http://" + server_url = f"{transport}{args.host}:{args.port}{mount_path}"
73-101: Re-raise after logging to prevent “generator didn’t yield” from asynccontextmanager; also print errors to stderr.Catching and not re-raising inside an
@asynccontextmanagerleads to a confusingRuntimeError: generator didn't yield. Also, printing tracebacks to stdout can pollute test output.- logger.error("HTTP transport error: %s", str(e), exc_info=True) - print(f"x Error with HTTP transport: {traceback.format_exc()}") + logger.error("HTTP transport error: %s", str(e), exc_info=True) + print(f"x Error with HTTP transport: {traceback.format_exc()}", file=sys.stderr) @@ - print(f"! Make sure the server is running at {server_url}") + print(f"! Make sure the server is running at {server_url}", file=sys.stderr) + raisekai_mcp_solution_server/tests/test_multiple_integration.py (1)
1-542: Fix remaining lint errors before verification
- In tests/mcp_loader_script.py: remove blind
except Exceptionblocks (BLE001), replace genericExceptionraises with custom exception classes (TRY002), and shorten long literal messages by defining them within exception types (TRY003).- In tests/test_multiple_integration.py: remove unused locals (
SOLUTION_FOR_INCIDENT_A_ID,multiple_user_mcp_args,ssl_patch) (F841), dropfprefix on static strings (F541), replace bareassertwith explicit checks (S101), avoid blindexcept Exceptionclauses (BLE001), shorten long runtime-error messages (TRY003), and usesys.executableinstead of hardcoded"python"in subprocess calls (S607).- After these fixes, re-run
ruff check kai_mcp_solution_server/tests/...and the multi-user pytest.
🧹 Nitpick comments (12)
kai_mcp_solution_server/tests/test_multiple_integration.py (12)
2-3: Drop redundant base package import.
import concurrentis unnecessary alongsideimport concurrent.futures.-import concurrent import concurrent.futures
36-43: Use filename-safe timestamps.
datetime.now().isoformat()includes:which breaks filenames on Windows. Use a safe format.- timestamp = datetime.datetime.now().isoformat() + timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")
84-95: Prefer os.path.join and explicit UTF-8 for fixtures.Reduces path bugs across platforms and avoids default-encoding surprises.
- expected_tool_schemas = json.loads( - open(self.test_data_dir + "/expected_tool_schemas.json").read() - ) + expected_path = os.path.join(self.test_data_dir, "expected_tool_schemas.json") + with open(expected_path, "r", encoding="utf-8") as f: + expected_tool_schemas = json.load(f)
305-306: Fix log message to reference the correct ruleset/violation (B variant).Current print uses A-variant names in the B-flow.
- print(f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A}: {metric}") + print(f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B}: {metric}")
340-341: Fix log message to reference the correct ruleset/violation (B variant).- print( - f"Success rate of {RULESET_NAME_A}/{VIOLATION_NAME_A} after accepting file: {metric}" - ) + print( + f"Success rate of {RULESET_NAME_B}/{VIOLATION_NAME_B} after accepting file: {metric}" + )
420-422: Resolve ruff F841 in skipped test (unused variable).Avoid unused assignment even in skipped tests.
- SOLUTION_FOR_INCIDENT_A_ID = int( - create_solution_for_incident_a.model_dump()["content"][0]["text"] - ) + _ = int(create_solution_for_incident_a.model_dump()["content"][0]["text"])
437-439: Fix f-string without placeholders (ruff F541).- "responses": [ - f"You should add a smiley face to the file.", - ], + "responses": [ + "You should add a smiley face to the file.", + ],
452-459: Remove unusedpoll_processhelper or wire it in.Currently dead code. Recommend removal to reduce noise.
- def poll_process(process: subprocess.Popen) -> None: - # Check if the process has exited early - if process.poll() is not None: - output = process.stdout.read() if process.stdout else b"" - raise RuntimeError( - f"HTTP server process exited prematurely. Output: {output.decode(errors='replace')}" - )
511-512: Wait for server readiness rather than sleeping a fixed second.Avoid flakiness by polling the port.
- await asyncio.sleep(1) # give the server a second to start + # Wait up to 10s for server to accept connections on 8087 + await asyncio.wait_for(asyncio.to_thread(_wait_for_port, "127.0.0.1", 8087, 10.0), 12.0)Add this helper in the file (near other helpers):
import socket def _wait_for_port(host: str, port: int, timeout: float) -> None: deadline = asyncio.get_event_loop().time() + timeout while True: with socket.socket() as s: s.settimeout(0.5) try: s.connect((host, port)) return except OSError: pass if asyncio.get_event_loop().time() >= deadline: raise TimeoutError(f"Timed out waiting for {host}:{port}")
513-521: If you want “multi-user”, increase concurrency beyond 1.Currently
NUM_TASKS = 1. Consider bumping to at least 5 to exercise concurrency.- NUM_TASKS = 1 + NUM_TASKS = 5
537-542: Guard cleanup in case startup fails before thread assignment.Prevents
UnboundLocalErroronstream_thread.- self.http_server_process.terminate() - self.http_server_process.wait() - print("Server process terminated.") - stream_thread.join() + if getattr(self, "http_server_process", None): + self.http_server_process.terminate() + self.http_server_process.wait() + print("Server process terminated.") + if "stream_thread" in locals(): + stream_thread.join()
491-491: Subprocess security notice + avoid broad except in futures loop.Bandit warnings are acceptable in tests, but consider narrowing exception types in the futures loop for better diagnostics.
Also applies to: 532-532
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
kai_mcp_solution_server/tests/mcp_loader_script.py(3 hunks)kai_mcp_solution_server/tests/test_multiple_integration.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (4)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
CallToolResult(22-24)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
GetBestHintResult(669-671)SuccessRateMetric(728-734)kai_mcp_solution_server/tests/mcp_loader_script.py (2)
MCPClientArgs(31-41)create_client(136-151)kai/jsonrpc/util.py (1)
model_dump(96-97)
🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/test_multiple_integration.py
[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.
[failure] 425-425: ruff(F841)
[new] Local variable multiple_user_mcp_args is assigned to but never used
[failure] 475-475: ruff(F841)
[new] Local variable ssl_patch is assigned to but never used
[failure] 420-420: ruff(F841)
[new] Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used
[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path
[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.
[failure] 437-437: ruff(F541)
[new] f-string without any placeholders
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
420-420: Local variable SOLUTION_FOR_INCIDENT_A_ID is assigned to but never used
Remove assignment to unused variable SOLUTION_FOR_INCIDENT_A_ID
(F841)
425-425: Local variable multiple_user_mcp_args is assigned to but never used
Remove assignment to unused variable multiple_user_mcp_args
(F841)
437-437: f-string without any placeholders
Remove extraneous f prefix
(F541)
444-444: Use of assert detected
(S101)
447-447: Do not catch blind exception: Exception
(BLE001)
456-458: Avoid specifying long messages outside the exception class
(TRY003)
475-475: Local variable ssl_patch is assigned to but never used
Remove assignment to unused variable ssl_patch
(F841)
492-500: Starting a process with a partial executable path
(S607)
532-532: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
| self.http_server_process = subprocess.Popen( | ||
| [ | ||
| "python", | ||
| "-m", | ||
| "kai_mcp_solution_server", | ||
| "--transport", | ||
| "streamable-http", | ||
| "--port", | ||
| "8087", | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use sys.executable instead of bare "python" (Bandit B607/B603).
More robust across environments and virtualenvs.
- self.http_server_process = subprocess.Popen(
+ self.http_server_process = subprocess.Popen(
[
- "python",
+ sys.executable,
"-m",
"kai_mcp_solution_server",
"--transport",
"streamable-http",
"--port",
"8087",
],Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
492-500: Starting a process with a partial executable path
(S607)
🪛 GitHub Check: Trunk Check
[notice] 491-491: bandit(B607)
[new] Starting a process with a partial executable path
[notice] 491-491: bandit(B603)
[new] subprocess call - check for execution of untrusted input.
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 491
to 500, the subprocess invocation uses the literal "python" which is brittle
across environments; replace the string with sys.executable and ensure the
module imports sys at the top of the test file so the current Python interpreter
(including virtualenvs) is used when launching the server process.
Signed-off-by: Fabian von Feilitzsch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (3)
kai_mcp_solution_server/tests/test_multiple_integration.py (3)
477-486: Avoid runtime assert; guard stdout and safe-close.def stream_output(process: subprocess.Popen) -> None: try: - assert process.stdout is not None - for line in iter(process.stdout.readline, b""): - print(f"[Server] {line.decode().rstrip()}") + stdout = process.stdout + if stdout is None: + raise RuntimeError("HTTP server process has no stdout pipe") + for line in iter(stdout.readline, b""): + print(f"[Server] {line.decode().rstrip()}") except Exception as e: print(f"Error while streaming output: {e}") finally: - process.stdout.close() + if process.stdout: + try: + process.stdout.close() + except Exception: + pass
508-516: Actually apply the SSL bypass or remove it.async def client_task(client_id: str) -> None: print(f"[Client {client_id}] starting") - apply_ssl_bypass() - - client = Client( - transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", - ) + with apply_ssl_bypass(): + client = Client( + transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", + )Note: if the server is plain HTTP, the bypass is unnecessary and can be deleted instead.
608-617: Use sys.executable for portability (Bandit B607/B603).- self.http_server_process = subprocess.Popen( + self.http_server_process = subprocess.Popen( [ - "python", + sys.executable, "-m", "kai_mcp_solution_server", "--transport", "streamable-http", "--port", "8087", ],Also ensure
import sysis present at the top of the file.
🧹 Nitpick comments (18)
.gitignore (1)
210-213: *Avoid ignoring all .db repository-wideAdding test DB ignores is good, but a blanket
*.dbmay hide legitimate DB assets outside tests. Prefer scoping to test/CI artifacts only.Apply:
# Test database files tests/test_*.db kai_mcp_solution_server/tests/test_*.db - *.db + # Consider avoiding a global wildcard: + # **/tests/**/*.db + # Or remove this line entirely if not strictly neededkai_mcp_solution_server/tests/mcp_client.py (1)
311-333: Small robustness nit: avoid repeated Path(None) conversionsSince you only support stdio when
server_pathis present, moveserver_path = Path(args.server_path)after the None-guard (above) to safeguard future edits..github/workflows/stress-test-mcp-server.yml (1)
66-67: Avoid hardcoding DB credentials in env to quiet secret scannersEven for ephemeral CI, static analyzers will flag inline creds. Build the DSN at runtime from variables to reduce false positives.
- KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db" KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}' - NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} + NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }} run: | - echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" - uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs + export PGUSER=kai_user + export PGPASSWORD=kai_password + export PGDATABASE=kai_test_db + export PGHOST=localhost + export PGPORT=5432 + export KAI_DB_DSN="postgresql+asyncpg://${PGUSER}:${PGPASSWORD}@${PGHOST}:${PGPORT}/${PGDATABASE}" + echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" + uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvskai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)
113-127: Revisit SQLite pooling strategy to avoid cross-task connection reuseUsing
StaticPoolforces a single shared connection. Under concurrent async workloads this can serialize DB access and risk “connection already in use”/transaction conflicts. PreferNullPoolwith WAL and a sensible busy timeout for better concurrency on file-backed SQLite.Suggested change:
-from sqlalchemy.pool import StaticPool +from sqlalchemy.pool import StaticPool, NullPool ... - if "sqlite" in url_str: + if url_str.startswith("sqlite"): engine = create_async_engine( url, - connect_args={"check_same_thread": False}, - poolclass=StaticPool, # Single shared connection for SQLite + connect_args={"check_same_thread": False, "timeout": 30}, + poolclass=NullPool, # One connection per use; avoids cross-task sharing echo_pool=False, # Set to True for debugging connection pool ) + # Enable WAL + reasonable sync level for better read/write concurrency + @event.listens_for(engine.sync_engine, "connect") + def _sqlite_pragma(dbapi_conn, _): + cur = dbapi_conn.cursor() + try: + cur.execute("PRAGMA journal_mode=WAL;") + cur.execute("PRAGMA synchronous=NORMAL;") + cur.execute("PRAGMA foreign_keys=ON;") + finally: + cur.close()Verification: run the SQLite stress job and confirm absence of “database is locked” and connection reuse errors under NUM_CONCURRENT_CLIENTS=5..10.
117-118: Use stricter SQLite detectionMinor: avoid accidental matches on non-SQLite URLs by using
startswith("sqlite").- if "sqlite" in url_str: + if url_str.startswith("sqlite"):kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
332-351: Duplicate writes and TODO: clarify after-file creation path.
- Comment “FIXME: Something is fishy here...” remains unresolved; the code always inserts a new “after” file and then queries for an existing one. This can create duplicates under concurrency.
session.add(new_file)is called twice later inaccept_file(see below).Consider an upsert-like path: query latest by
(client_id, uri, content)and reuse if identical; only insert when content differs. Also, add a unique (client_id, uri, created_at) or application-level de-dup.
847-855: Redundant add().
session.add(new_file)is executed twice (Lines 847 and 853). Keep one.session.add(new_file) for solution, old_file in files_to_update: # Remove the old file from the solution solution.after.remove(old_file) solution.after.add(new_file) - session.add(new_file) session.add(solution)kai_mcp_solution_server/tests/test_multiple_integration.py (2)
487-494: Remove or use poll_process().
poll_processis defined but never called. Either delete it or call it inside your readiness wait to surface early-exit errors promptly.
634-646: Thread-per-client is heavy but OK for stress; consider async tasks later.For large NUM_TASKS, a single event loop with
asyncio.gatherwould lower overhead. Not blocking this PR.kai_mcp_solution_server/scripts/manage_db.py (4)
21-41: Ensure engine disposal on error.Wrap engine usage in
try/finallyso the pool is disposed even ifdrop_everythingraises.- engine = await get_async_engine(settings.db_dsn) - - async with engine.begin() as conn: - print("Dropping all tables...") - await conn.run_sync(drop_everything) - print("✅ All tables dropped successfully.") - - await engine.dispose() + engine = await get_async_engine(settings.db_dsn) + try: + async with engine.begin() as conn: + print("Dropping all tables...") + await conn.run_sync(drop_everything) + print("✅ All tables dropped successfully.") + finally: + await engine.dispose()
55-65: Same here: dispose on all paths.- engine = await get_async_engine(settings.db_dsn) - - async with engine.begin() as conn: - print("Dropping all tables...") - await conn.run_sync(drop_everything) - print("Creating tables...") - await conn.run_sync(Base.metadata.create_all) - print("✅ Database reset successfully.") - - await engine.dispose() + engine = await get_async_engine(settings.db_dsn) + try: + async with engine.begin() as conn: + print("Dropping all tables...") + await conn.run_sync(drop_everything) + print("Creating tables...") + await conn.run_sync(Base.metadata.create_all) + print("✅ Database reset successfully.") + finally: + await engine.dispose()
103-109: Tighten error message: LLM params aren’t required for this script.Mention only
KAI_DB_DSNto avoid confusing users.- print("\nMake sure to set required environment variables:") - print(" - KAI_DB_DSN: Database connection string") - print(' - KAI_LLM_PARAMS: LLM configuration (can be \'{"model": "fake"}\')') + print("\nMake sure to set required environment variables:") + print(" - KAI_DB_DSN: Database connection string")
1-1: Shebang present but file may not be executable.Either make the script executable in git (
chmod +x) or drop the shebang.kai_mcp_solution_server/Makefile (5)
122-129: Prefer “podman compose up” over “podman-compose” for wider compatibilityThis avoids dependency on the separate python podman-compose shim.
Apply this diff:
- IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \ - podman-compose up + IMAGE=$(IMAGE) KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' MOUNT_PATH='$(MOUNT_PATH)' \ + podman compose upPlease confirm CI runners support the subcommand.
131-141: Self-contained stress test: verify whether LLM params are required; add guard if neededIf the test server path still requires KAI_LLM_PARAMS, this may fail/flap.
Optionally add a guard like run-local:
test-stress: @echo "Running self-contained stress test with $${NUM_CONCURRENT_CLIENTS:-30} concurrent clients..." @echo "Database: $${KAI_DB_DSN:-sqlite+aiosqlite:///test.db}" @echo "The test will start its own server on port 8087" + @if [ -z "$${KAI_LLM_PARAMS}" ]; then echo "Error: KAI_LLM_PARAMS is required for the test server"; exit 1; fi NUM_CONCURRENT_CLIENTS=$${NUM_CONCURRENT_CLIENTS:-30} \ KAI_DB_DSN=$${KAI_DB_DSN:-sqlite+aiosqlite:///test.db} \ uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs
181-205: run-with-tests may fail without KAI_LLM_PARAMS; also pass it into the server containerAlign with run-local’s requirement.
Apply this diff:
run-with-tests: build @echo "Creating Podman pod for MCP server and test client..." -podman pod rm -f kai-mcp-pod 2>/dev/null || true podman pod create --name kai-mcp-pod -p 8000:8000 @@ @echo "Starting MCP server in the pod..." - podman run -d --pod kai-mcp-pod \ + @if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi + podman run -d --pod kai-mcp-pod \ --name kai-mcp-server \ --volume kai-mcp-data:/data:Z \ + -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \ + -e KAI_DB_DSN='$(KAI_DB_DSN)' \ $(IMAGE)
212-240: Help mentions ‘logs’ target, but no implementation existsEither add a logs target or remove the line from help to avoid confusion.
Proposed new target (outside changed lines):
# Show Kubernetes deployment logs .PHONY: logs logs: @echo "Showing logs for deployment in namespace $(NAMESPACE)..." kubectl logs -n $(NAMESPACE) deploy/kai-mcp-solution-server -f
243-246: Fix NAMESPACE default in help to match actual defaultHelp says “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.
Apply this diff:
- @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai)" + @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai-solution-server)"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
.github/workflows/stress-test-mcp-server.yml(1 hunks).gitignore(1 hunks)kai_mcp_solution_server/Makefile(5 hunks)kai_mcp_solution_server/scripts/manage_db.py(1 hunks)kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py(2 hunks)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py(10 hunks)kai_mcp_solution_server/tests/mcp_client.py(1 hunks)kai_mcp_solution_server/tests/test_multiple_integration.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)
kai_mcp_solution_server/scripts/manage_db.py (1)
drop_all_tables(21-40)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)
Base(91-97)get_async_engine(113-142)kai_mcp_solution_server/src/kai_mcp_solution_server/constants.py (1)
log(9-10)
kai_mcp_solution_server/tests/test_multiple_integration.py (6)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
CallToolResult(22-24)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
GetBestHintResult(663-665)SuccessRateMetric(721-727)kai_mcp_solution_server/tests/mcp_client.py (1)
MCPClientArgs(247-256)kai_mcp_solution_server/tests/mcp_loader_script.py (2)
MCPClientArgs(31-41)create_client(136-151)kai/jsonrpc/util.py (1)
model_dump(96-97)kai/jsonrpc/streams.py (1)
close(70-75)
kai_mcp_solution_server/scripts/manage_db.py (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (4)
Base(91-97)drop_everything(53-88)get_async_engine(113-142)drop_all_tables(100-104)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)
SolutionServerSettings(39-108)
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml
[MEDIUM] 66-67: Basic Auth Credentials
(CKV_SECRET_4)
🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/test_multiple_integration.py
[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.
[notice] 608-608: bandit(B607)
[new] Starting a process with a partial executable path
[notice] 608-608: bandit(B603)
[new] subprocess call - check for execution of untrusted input.
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
479-479: Use of assert detected
(S101)
482-482: Do not catch blind exception: Exception
(BLE001)
491-493: Avoid specifying long messages outside the exception class
(TRY003)
609-617: Starting a process with a partial executable path
(S607)
658-658: Do not catch blind exception: Exception
(BLE001)
kai_mcp_solution_server/scripts/manage_db.py
1-1: Shebang is present but file is not executable
(EXE001)
103-103: Do not catch blind exception: Exception
(BLE001)
122-122: Do not catch blind exception: Exception
(BLE001)
🪛 GitHub Actions: Test Code on PR
kai_mcp_solution_server/tests/test_multiple_integration.py
[error] 1-1: test_multiple_users failed: 30 concurrent clients failed (Event loop is closed; all connection attempts failed).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Stress Test with PostgreSQL
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (10)
.github/workflows/stress-test-mcp-server.yml (2)
72-79: Post-test DB utilization check is a nice touchQuerying
pg_stat_activitypost-run helps validate connection pooling assumptions. LGTM.
107-114: SQLite job concurrency cap is appropriateCapping to 5 clients acknowledges SQLite write-lock limits. LGTM.
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (2)
100-112: Good separation of schema management from engine creationIntroducing
drop_all_tablesandensure_tables_existclarifies lifecycle and avoids accidental drops during engine init. LGTM.
128-141: Pool sizing aligns with the 100-client stress target; confirm PG max_connections
pool_size=20withmax_overflow=80caps at 100 connections, matching the test load. Ensure the Postgres service’smax_connectionsaccommodates this plus overhead (psql, autovacuum). The workflow’s post-check helps track this.kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
116-124: Good: engine bootstrap is now safe and idempotent.Creating the engine via
get_async_engineand ensuring tables exist insideengine.begin()is correct and removes startup races around schema readiness.
164-169: Proper engine disposal on shutdown.Disposing the engine in the lifespan
finallyblock prevents pool leaks across reconnects.kai_mcp_solution_server/Makefile (4)
89-100: run-podman target looks goodFlexible DSN with optional PODMAN_ARGS and required KAI_LLM_PARAMS check is solid.
146-153: test-stress-external target looks goodClear UX and useful defaults.
228-233: Help additions for new targets look goodNew entries are clear and consistent.
262-267: Example usage LGTMGood guidance for local and external stress runs.
| - name: Run stress test with 100 clients | ||
| env: | ||
| KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db" | ||
| KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}' | ||
| NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} | ||
| run: | | ||
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | ||
| uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs | ||
| timeout-minutes: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Coalesce workflow input using the inputs context, not github.event.inputs
github.event.inputs is only defined for workflow_dispatch payloads and is null for PR/push events. Use inputs.num_clients which safely evaluates to null outside workflow_dispatch and works with || coalescing.
- NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }}
+ NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Run stress test with 100 clients | |
| env: | |
| KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db" | |
| KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}' | |
| NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} | |
| run: | | |
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | |
| uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs | |
| timeout-minutes: 10 | |
| - name: Run stress test with 100 clients | |
| env: | |
| KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db" | |
| KAI_LLM_PARAMS: '{"model": "fake", "responses": ["Test response"]}' | |
| NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }} | |
| run: | | |
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | |
| uv run python -m pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users -xvs | |
| timeout-minutes: 10 |
🧰 Tools
🪛 Checkov (3.2.334)
[MEDIUM] 66-67: Basic Auth Credentials
(CKV_SECRET_4)
🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 64 to 72, the
workflow uses github.event.inputs.num_clients which is only set for
workflow_dispatch events; replace it with the inputs context so the coalescing
works for non-dispatch events. Update the NUM_CONCURRENT_CLIENTS assignment to
use ${{ inputs.num_clients || '100' }} (keeping the default '100') and ensure
the rest of the step remains unchanged so the workflow uses the inputs context
safely across event types.
kai_mcp_solution_server/Makefile
Outdated
| # Convenience target: Run with SQLite | ||
| .PHONY: podman-sqlite | ||
| podman-sqlite: build | ||
| @echo "Running MCP solution server with SQLite..." | ||
| @if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi | ||
| podman run --rm -it -p 8000:8000 \ | ||
| -e MOUNT_PATH='$(MOUNT_PATH)' \ | ||
| -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \ | ||
| -e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \ | ||
| $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \ | ||
| $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \ | ||
| $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \ | ||
| $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \ | ||
| $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \ | ||
| $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \ | ||
| $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \ | ||
| $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \ | ||
| $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \ | ||
| -v kai-sqlite-data:/data:Z \ | ||
| --name kai-mcp-sqlite $(IMAGE) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid leaking API keys via process args; pass-through host env instead of inlining values
Using -e KEY='value' can expose secrets via ps/history. Prefer -e KEY to pass through from host.
Apply this diff:
- $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \
- $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \
- $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \
- $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \
- $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \
- $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \
- $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \
- $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \
- $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \
+ $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \
+ $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \
+ $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \
+ $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \
+ $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \
+ $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \
+ $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \
+ $(if $(AWS_REGION),-e AWS_REGION,) \
+ $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Convenience target: Run with SQLite | |
| .PHONY: podman-sqlite | |
| podman-sqlite: build | |
| @echo "Running MCP solution server with SQLite..." | |
| @if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi | |
| podman run --rm -it -p 8000:8000 \ | |
| -e MOUNT_PATH='$(MOUNT_PATH)' \ | |
| -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \ | |
| -e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \ | |
| $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \ | |
| $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \ | |
| $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \ | |
| $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \ | |
| $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \ | |
| $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \ | |
| $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \ | |
| $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \ | |
| $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \ | |
| -v kai-sqlite-data:/data:Z \ | |
| --name kai-mcp-sqlite $(IMAGE) | |
| # Convenience target: Run with SQLite | |
| .PHONY: podman-sqlite | |
| podman-sqlite: build | |
| @echo "Running MCP solution server with SQLite..." | |
| @if [ -z "$(KAI_LLM_PARAMS)" ]; then echo "Error: KAI_LLM_PARAMS is required"; exit 1; fi | |
| podman run --rm -it -p 8000:8000 \ | |
| -e MOUNT_PATH='$(MOUNT_PATH)' \ | |
| -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \ | |
| -e KAI_DB_DSN='sqlite+aiosqlite:///data/kai.db' \ | |
| $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \ | |
| $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \ | |
| $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \ | |
| $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \ | |
| $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \ | |
| $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \ | |
| $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \ | |
| $(if $(AWS_REGION),-e AWS_REGION,) \ | |
| $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \ | |
| -v kai-sqlite-data:/data:Z \ | |
| --name kai-mcp-sqlite $(IMAGE) |
| # Lock removed - database handles concurrent access | ||
| for extended_incident in extended_incidents: | ||
| incident_id = await create_incident(kai_ctx, client_id, extended_incident) | ||
| results.append(CreateIncidentResult(incident_id=incident_id, solution_id=0)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Race still likely: unthrottled LLM calls spawned in background.
Removing the app-level lock is fine for DB ops, but outbound kai_ctx.model.ainvoke(...) calls (e.g., in generate_hint_v1/v2/v3) can now stampede under concurrent accept_file paths (asyncio.create_task(...)). This commonly manifests as httpcore/httpx TLS/connection errors under load, matching Issue #861. Add a small per-process semaphore and explicit timeouts/retries to bound concurrency and avoid connection storms.
Apply this diff to introduce an LLM concurrency gate and timeouts:
class KaiSolutionServerContext:
def __init__(self, settings: SolutionServerSettings) -> None:
self.settings = settings
self.lock = asyncio.Lock()
async def create(self) -> None:
from kai_mcp_solution_server.db.dao import Base
self.engine = await get_async_engine(self.settings.db_dsn)
# Ensure tables exist (safe - only creates if not already there)
async with self.engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
self.session_maker = async_sessionmaker(
bind=self.engine, expire_on_commit=False
)
self.model: BaseChatModel
+ # Limit outbound LLM concurrency to mitigate connection spikes
+ self.llm_semaphore = asyncio.Semaphore(
+ int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8"))
+ )
if self.settings.llm_params is None:
raise ValueError("LLM parameters must be provided in the settings.")
elif self.settings.llm_params.get("model") == "fake":And wrap LLM calls with the gate and a timeout (repeat similarly in v2/v3):
- response = await kai_ctx.model.ainvoke(prompt)
+ async with kai_ctx.llm_semaphore:
+ response = await asyncio.wait_for(
+ kai_ctx.model.ainvoke(prompt),
+ timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+ )Optionally harden initialization with retries/timeouts:
- self.model = init_chat_model(**self.settings.llm_params)
+ params = {
+ "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")),
+ "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+ **self.settings.llm_params,
+ }
+ self.model = init_chat_model(**params)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines
260-264, the app removed the lock but still spawns unthrottled background LLM
calls (kai_ctx.model.ainvoke) which can stampede and cause httpx/httpcore
TLS/connection errors; introduce a module-level asyncio.Semaphore (e.g.,
max_concurrent_llm = 5) and ensure all LLM invocations (generate_hint_v1/v2/v3
and any other ainvoke paths) acquire the semaphore before calling and release it
after, wrapping the actual ai call in asyncio.wait_for to enforce a per-call
timeout and add a simple retry loop with exponential backoff for transient
failures; also initialize the semaphore once at startup and consider applying
the same timeout/retry pattern to any LLM client initialization to harden
startup.
Signed-off-by: Fabian von Feilitzsch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
kai_mcp_solution_server/tests/mcp_loader_script.py (1)
47-55: Validate port and normalize mount_path to avoid malformed URLs.If
portormount_pathisNone,server_urlbecomeshttp://host:NoneNone. Guard and normalize.Apply this diff:
if args.host is None: raise ValueError("Host must be specified for HTTP transport") # Setup HTTP transport using streamable-http transport = "" if not args.host.startswith("http"): transport = "http://" - server_url = f"{transport}{args.host}:{args.port}{args.mount_path}" + if args.port is None: + raise ValueError("Port must be specified for HTTP transport") + mount_path = args.mount_path or "" + if mount_path and not mount_path.startswith("/"): + mount_path = "/" + mount_path + server_url = f"{transport}{args.host}:{args.port}{mount_path}" print(f"Connecting to server at {server_url}...")
♻️ Duplicate comments (4)
.github/workflows/stress-test-mcp-server.yml (1)
69-69: Coalesce workflow input viainputscontext, notgithub.event.inputs.
github.event.inputsis null outsideworkflow_dispatch; useinputs.num_clientsso the default coalesces reliably across events.Apply this diff:
- NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} + NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}kai_mcp_solution_server/Makefile (1)
110-118: Avoid leaking secrets inpodman-sqliteenv flagsInline values in
-e KEY='value'are visible via ps/history. Pass-through host env with-e KEYinstead.- $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \ - $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \ - $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \ - $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \ - $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \ - $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \ - $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \ - $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \ - $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \ + $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \ + $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \ + $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \ + $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \ + $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \ + $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \ + $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \ + $(if $(AWS_REGION),-e AWS_REGION,) \ + $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \kai_mcp_solution_server/tests/test_multiple_integration.py (2)
477-486: Remove runtime assert; guard stdout close; avoid blind exceptPrevents crashes and aligns with prior review.
- def stream_output(process: subprocess.Popen) -> None: - try: - assert process.stdout is not None - for line in iter(process.stdout.readline, b""): - print(f"[Server] {line.decode().rstrip()}") - except Exception as e: - print(f"Error while streaming output: {e}") - finally: - process.stdout.close() + def stream_output(process: subprocess.Popen) -> None: + try: + stdout = process.stdout + if stdout is None: + return + for line in iter(stdout.readline, b""): + print(f"[Server] {line.decode(errors='replace').rstrip()}") + except BaseException as e: + print(f"Error while streaming output: {e}") + finally: + try: + if process.stdout: + process.stdout.close() + except Exception: + pass
693-702: Usesys.executableinstead of bare"python"More robust and fixes Bandit B607/B603 warnings.
- self.http_server_process = subprocess.Popen( + self.http_server_process = subprocess.Popen( [ - "python", + sys.executable, "-m", "kai_mcp_solution_server", "--transport", "streamable-http", "--port", "8087", ],Add import:
import sys
🧹 Nitpick comments (14)
kai_mcp_solution_server/tests/mcp_loader_script.py (2)
9-9: Remove unused import (ruff F401).
timedeltais imported but never used.Apply this diff:
-from datetime import timedelta
63-69: Avoid unused local; don’t bindget_session_idif not used.Prevents potential linter warnings and clarifies intent.
Apply this diff:
- read, write, get_session_id = await stack.enter_async_context( + read, write, _ = await stack.enter_async_context( streamablehttp_client(server_url) )kai_mcp_solution_server/tests/ssl_utils.py (3)
36-49: Make monkey-patching reentrant-safe across concurrent contexts.Current design patches/unpatches global
httpxmethods per-context without reference counting. Overlapping contexts (e.g., parallel tests/clients) can restore while others still rely on the patch. Add a module-level refcount + lock so only the last exiting context restores.Here’s a minimal pattern to adopt (outside the selected range):
# Module-level import threading _PATCH_LOCK = threading.Lock() _PATCH_COUNT = 0 def _acquire_patch() -> bool: global _PATCH_COUNT with _PATCH_LOCK: if _PATCH_COUNT == 0: return True # caller should apply patches _PATCH_COUNT += 1 return False def _release_patch() -> bool: global _PATCH_COUNT with _PATCH_LOCK: _PATCH_COUNT -= 1 return _PATCH_COUNT == 0 # caller should restoreThen in SSLMonkeyPatch:
- In
__enter__/apply_ssl_bypass(): call_acquire_patch(), apply patches only when it returns True.- In
restore_ssl_settings(): call_release_patch(), restore only when it returns True.Also applies to: 137-155
61-63: Over-broad warning suppression.
warnings.filterwarnings("ignore", category=Warning)silences all warnings. Restrict to the specific SSL warning message or a specific category.Apply this diff:
- warnings.filterwarnings("ignore", category=Warning) + # Keep the specific message filter; avoid blanket suppression
123-128: Cleanup unused SSL environment variables
HTTPX_SSL_VERIFY,HTTPX_NO_VERIFY, andSSL_CERT_VERIFYaren’t referenced elsewhere; retain only the standardPYTHONHTTPSVERIFY.- ssl_env_vars = { - "SSL_CERT_VERIFY": "false", - "HTTPX_SSL_VERIFY": "false", - "HTTPX_NO_VERIFY": "true", - "PYTHONHTTPSVERIFY": "0", - } + ssl_env_vars = { + "PYTHONHTTPSVERIFY": "0", + }.github/workflows/stress-test-mcp-server.yml (2)
74-74: Trim trailing whitespace (yamllint).These lines have trailing spaces; remove to appease linters and keep diffs clean.
Also applies to: 90-90, 96-96, 100-100
66-69: Optional: avoid hardcoding DB credentials in DSN to satisfy secret scanners.Even if they’re test-only, Checkov flags inline creds. Consider building
KAI_DB_DSNfrom job env vars.Apply this minimal change (job-level env at the top of the job), then reference them in DSN:
stress-test-postgres: name: Stress Test with PostgreSQL runs-on: ubuntu-latest + env: + PGUSER: kai_user + PGPASSWORD: kai_password + PGDATABASE: kai_test_db ... env: - KAI_DB_DSN: "postgresql+asyncpg://kai_user:kai_password@localhost:5432/kai_test_db" + KAI_DB_DSN: "postgresql+asyncpg://${{ env.PGUSER }}:${{ env.PGPASSWORD }}@localhost:5432/${{ env.PGDATABASE }}"Optionally add an allowlist annotation if your policy permits.
kai_mcp_solution_server/Makefile (3)
96-99: Protect JSON inKAI_LLM_PARAMSQuoting JSON inline can break with embedded quotes/newlines. Prefer an env file or base64.
- Option A:
--env-file .env(store KAI_LLM_PARAMS there)- Option B (inline-safe):
-e KAI_LLM_PARAMS_B64="$$(printf %s "$$KAI_LLM_PARAMS" | base64 -w0)"and decode in entrypoint.
243-243: Fix help text: NAMESPACE default mismatchHelp says
konveyor-kaibut variable default iskonveyor-kai-solution-server.- @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai)" + @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai-solution-server)"
224-224: Help lists 'logs' target that doesn't existEither add a
logstarget or remove this line from help.Example target:
+# Show container logs +.PHONY: logs +logs: + @echo "Showing logs for deployment in namespace $(NAMESPACE)..." + kubectl logs -n $(NAMESPACE) -l app=kai-mcp-solution-server --all-containers=true --tail=200kai_mcp_solution_server/tests/test_multiple_integration.py (4)
748-756: Shorten post-run sleeps to reduce flakinessTen-second sleep is brittle; prefer awaiting the last client futures and then proceed, or add a small bounded wait (1–2s) if needed for logs.
- await asyncio.sleep(10) # wait a moment for all output to be printed + await asyncio.sleep(2) # brief pause for buffered output
792-800: Fail fast if no hints; add context to failureGood check. Consider logging the server mode and NUM_TASKS for easier triage.
- self.fail( - "No hints were generated! Hint generation may not be working correctly." - ) + mode = "external" if not start_own_server else "self-contained" + self.fail(f"No hints generated (mode={mode}, clients={NUM_TASKS}). Hint generation may not be working.")
36-43: Cross-platform DB path safety
isoformat()includes:which is invalid on Windows filenames. Use a safe format.- timestamp = datetime.datetime.now().isoformat() + timestamp = datetime.datetime.now().strftime("%Y%m%dT%H%M%S")
723-747: Collect root causes from client failuresWrap exceptions with context (client_id, mode, transport) to speed up debugging.
- except Exception as exc: - error_msg = f"Task {task_id} failed: {exc}" + except Exception as exc: + error_msg = f"Task {task_id} failed (mode={'external' if not start_own_server else 'self-contained'}): {exc}" print(f"[Main] {error_msg}") errors.append(error_msg)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
.github/workflows/stress-test-mcp-server.yml(1 hunks)kai_mcp_solution_server/Makefile(5 hunks)kai_mcp_solution_server/tests/mcp_loader_script.py(3 hunks)kai_mcp_solution_server/tests/ssl_utils.py(2 hunks)kai_mcp_solution_server/tests/test_multiple_integration.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (7)
kai_mcp_solution_server/ts-mcp-client/src/types/sdk.d.ts (1)
CallToolResult(22-24)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
GetBestHintResult(663-665)SuccessRateMetric(721-727)kai_mcp_solution_server/tests/mcp_loader_script.py (2)
MCPClientArgs(32-42)create_client(137-152)kai_mcp_solution_server/tests/mcp_client.py (1)
MCPClientArgs(247-256)kai_mcp_solution_server/tests/ssl_utils.py (2)
apply_ssl_bypass(49-81)apply_ssl_bypass(168-192)kai/jsonrpc/util.py (1)
model_dump(96-97)kai/jsonrpc/streams.py (1)
close(70-75)
🪛 GitHub Check: Trunk Check
kai_mcp_solution_server/tests/mcp_loader_script.py
[failure] 9-9: ruff(F401)
[new] datetime.timedelta imported but unused
kai_mcp_solution_server/tests/test_multiple_integration.py
[notice] 7-7: bandit(B404)
[new] Consider possible security implications associated with the subprocess module.
[notice] 693-693: bandit(B607)
[new] Starting a process with a partial executable path
[notice] 693-693: bandit(B603)
[new] subprocess call - check for execution of untrusted input.
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml
[MEDIUM] 66-67: Basic Auth Credentials
(CKV_SECRET_4)
🪛 YAMLlint (1.37.1)
.github/workflows/stress-test-mcp-server.yml
[error] 74-74: trailing spaces
(trailing-spaces)
[error] 90-90: trailing spaces
(trailing-spaces)
[error] 96-96: trailing spaces
(trailing-spaces)
[error] 100-100: trailing spaces
(trailing-spaces)
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
479-479: Use of assert detected
(S101)
482-482: Do not catch blind exception: Exception
(BLE001)
491-493: Avoid specifying long messages outside the exception class
(TRY003)
547-547: Use of assert detected
(S101)
573-573: Use of assert detected
(S101)
597-597: Use of assert detected
(S101)
608-608: Use of assert detected
(S101)
611-611: Use of assert detected
(S101)
652-652: Use of assert detected
(S101)
664-664: Use of assert detected
(S101)
667-667: Use of assert detected
(S101)
674-674: Use of assert detected
(S101)
679-679: Use of assert detected
(S101)
694-702: Starting a process with a partial executable path
(S607)
743-743: Do not catch blind exception: Exception
(BLE001)
789-789: Do not catch blind exception: Exception
(BLE001)
🪛 GitHub Actions: Test Code on PR
kai_mcp_solution_server/tests/test_multiple_integration.py
[error] 752-752: TestMultipleIntegration.test_multiple_users failed: 30 clients failed due to event loop errors (Event loop is closed, All connection attempts failed).
[warning] 1-1: RuntimeWarning: coroutine 'TestMultipleIntegration.test_multiple_users..client_task' was never awaited.
🪛 GitHub Actions: MCP Server Stress Test
kai_mcp_solution_server/tests/test_multiple_integration.py
[error] 1-1: Test step failed: pytest tests/test_multiple_integration.py::TestMultipleIntegration::test_multiple_users - All connection attempts failed for 5 concurrent clients against the self-contained MCP server.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_mcp_solution_server/tests/ssl_utils.py (1)
88-93: Good idempotence guard on httpx patching.The
_ssl_bypass_patchedsentinel effectively avoids double-patching.
| if external_server_url: | ||
| # External server mode - parse URL to get host and port | ||
| from urllib.parse import urlparse | ||
|
|
||
| parsed = urlparse(external_server_url) | ||
| host = parsed.hostname or "localhost" | ||
| port = parsed.port or 8000 | ||
| print(f"Using external MCP server at {host}:{port}") | ||
|
|
||
| multiple_user_mcp_args = MCPClientArgs( | ||
| transport="http", | ||
| host=host, | ||
| port=port, | ||
| insecure=True, | ||
| server_path=None, # Not needed for external server | ||
| ) | ||
| # Don't set KAI_LLM_PARAMS for external server - it should already be configured | ||
| start_own_server = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Honor URL scheme in external mode
Currently the client always uses http://{host}:{port}. If MCP_SERVER_URL is https://..., this will fail with connection errors.
- parsed = urlparse(external_server_url)
+ parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
+ scheme = (parsed.scheme or "http").lower()
print(f"Using external MCP server at {host}:{port}")
@@
- multiple_user_mcp_args = MCPClientArgs(
+ multiple_user_mcp_args = MCPClientArgs(
transport="http",
host=host,
port=port,
insecure=True,
server_path=None, # Not needed for external server
)And use scheme when constructing the transport (see suggestion at Lines 509-515).
Committable suggestion skipped: line range outside the PR's diff.
| print(f"[Client {client_id}] starting") | ||
| apply_ssl_bypass() | ||
|
|
||
| client = Client( | ||
| transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Scope the SSL bypass and only when needed
Global monkeypatching per thread is racy and unnecessary for HTTP. Use the context manager and gate on scheme.
- async def client_task(client_id: str) -> None:
+ async def client_task(client_id: str) -> None:
print(f"[Client {client_id}] starting")
- apply_ssl_bypass()
-
- client = Client(
- transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
- )
+ scheme = "http"
+ try:
+ scheme = scheme # external mode may set this; fallback to http
+ except NameError:
+ pass
+ transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+ cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+ with cm:
+ client = Client(transport=transport)Add at top-level imports:
import contextlib🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
509-515, avoid globally monkeypatching SSL per thread by scoping the bypass: add
"import contextlib" at the top, remove the standalone apply_ssl_bypass() call
and instead wrap the client creation in a context manager that only applies the
bypass when the transport scheme is HTTPS, e.g. compute the transport URL
string, then use "with (apply_ssl_bypass() if transport_url.startswith('https')
else contextlib.nullcontext()):" and create the Client inside that block so the
SSL bypass is temporary and only used when needed.
| await asyncio.sleep(1) # give the server a second to start | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace fixed sleep with active readiness probe
Prevents clients racing the server socket bind (root cause of intermittent connect failures).
- await asyncio.sleep(1) # give the server a second to start
+ await wait_for_http_server("localhost", 8087, timeout=30)Place this helper near other helpers in the module (outside the selected range):
async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None:
loop = asyncio.get_running_loop()
deadline = loop.time() + timeout
last_err: Exception | None = None
while loop.time() < deadline:
try:
reader, writer = await asyncio.open_connection(host, port)
writer.close()
await writer.wait_closed()
return
except Exception as e:
last_err = e
await asyncio.sleep(0.2)
raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
713-714, replace the fixed await asyncio.sleep(1) used to wait for the server
with an active readiness probe: add the provided async helper function
wait_for_http_server(host, port, timeout=30.0) near the other test helpers in
the module (outside the selected range), and call await
wait_for_http_server(server_host, server_port, timeout=30.0) instead of the
fixed sleep so the test actively retries connecting until the server socket is
ready and avoids race conditions.
Signed-off-by: Fabian von Feilitzsch <[email protected]>
6709119 to
4ad87a5
Compare
Signed-off-by: Fabian von Feilitzsch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
kai_mcp_solution_server/Makefile (1)
110-118: Avoid leaking secrets via inlined env values in podman-sqlitePassing secrets as
-e KEY='$(VALUE)'exposes them via ps/history. Pass-through host env instead.Apply this diff:
- $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY='$(OPENAI_API_KEY)',) \ - $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY='$(ANTHROPIC_API_KEY)',) \ - $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY='$(AZURE_OPENAI_API_KEY)',) \ - $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT='$(AZURE_OPENAI_ENDPOINT)',) \ - $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY='$(GOOGLE_API_KEY)',) \ - $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID='$(AWS_ACCESS_KEY_ID)',) \ - $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY='$(AWS_SECRET_ACCESS_KEY)',) \ - $(if $(AWS_REGION),-e AWS_REGION='$(AWS_REGION)',) \ - $(if $(OLLAMA_HOST),-e OLLAMA_HOST='$(OLLAMA_HOST)',) \ + $(if $(OPENAI_API_KEY),-e OPENAI_API_KEY,) \ + $(if $(ANTHROPIC_API_KEY),-e ANTHROPIC_API_KEY,) \ + $(if $(AZURE_OPENAI_API_KEY),-e AZURE_OPENAI_API_KEY,) \ + $(if $(AZURE_OPENAI_ENDPOINT),-e AZURE_OPENAI_ENDPOINT,) \ + $(if $(GOOGLE_API_KEY),-e GOOGLE_API_KEY,) \ + $(if $(AWS_ACCESS_KEY_ID),-e AWS_ACCESS_KEY_ID,) \ + $(if $(AWS_SECRET_ACCESS_KEY),-e AWS_SECRET_ACCESS_KEY,) \ + $(if $(AWS_REGION),-e AWS_REGION,) \ + $(if $(OLLAMA_HOST),-e OLLAMA_HOST,) \.github/workflows/stress-test-mcp-server.yml (1)
69-69: Use inputs context for num_clients (works across event types)
github.event.inputsis null except for workflow_dispatch. Useinputs.num_clients.Apply this diff:
- NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} + NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
🧹 Nitpick comments (7)
kai_mcp_solution_server/Makefile (2)
89-100: Consider passing provider creds in run-podman, tooExternal LLMs won’t work unless creds are provided. Mirror the pass-through flags here or document using PODMAN_ARGS.
239-239: Help text default for NAMESPACE doesn’t match actual defaultHelp says default is “konveyor-kai” but Makefile sets “konveyor-kai-solution-server”.
Apply this diff:
- @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai)" + @echo " NAMESPACE : Kubernetes namespace (default: konveyor-kai-solution-server)"Also applies to: 5-5
kai_mcp_solution_server/tests/test_multiple_integration.py (5)
460-472: Use asyncio.run per thread to avoid loop lifecycle bugsSimpler and avoids “event loop is closed” under stress.
Apply this diff:
def run_async_in_thread(fn, *args, **kwargs): - try: - loop = asyncio.get_event_loop() - except RuntimeError: - loop = asyncio.new_event_loop() - asyncio.set_event_loop(loop) - - try: - result = loop.run_until_complete(fn(*args, **kwargs)) - return result - finally: - loop.close() + return asyncio.run(fn(*args, **kwargs))
700-702: Apply SSL bypass for hint check when using HTTPSMirror the scoped bypass used for client tasks so HTTPS external servers pass.
Apply this diff:
- async with create_client(multiple_user_mcp_args) as hint_session: + cm = apply_ssl_bypass() if multiple_user_mcp_args.transport == "https" else contextlib.nullcontext() + with cm: + async with create_client(multiple_user_mcp_args) as hint_session: + await hint_session.initialize() - await hint_session.initialize()
512-647: Prefer unittest assertions over bare assert for clearer failuresOptional, but aligns with unittest style and avoids S101 lints. Example:
Apply this pattern:
-assert incident_id > 0, f"Invalid incident ID: {incident_id}" +self.assertGreater(incident_id, 0, f"Invalid incident ID: {incident_id}")Repeat similarly for other asserts in this block.
555-555: Remove redundant local import
import jsonis already at module top.Apply this diff:
- import json
684-697: Replace fixed sleeps with active pollingUse a readiness/poll helper instead of arbitrary sleeps to cut flakiness/time.
I can provide a small
wait_for_conditionor HTTP probe helper if you want it integrated.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
.github/workflows/stress-test-mcp-server.yml(1 hunks)kai_mcp_solution_server/Makefile(5 hunks)kai_mcp_solution_server/tests/mcp_loader_script.py(3 hunks)kai_mcp_solution_server/tests/test_multiple_integration.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- kai_mcp_solution_server/tests/mcp_loader_script.py
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml
[MEDIUM] 66-67: Basic Auth Credentials
(CKV_SECRET_4)
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
512-512: Use of assert detected
(S101)
538-538: Use of assert detected
(S101)
562-562: Use of assert detected
(S101)
573-573: Use of assert detected
(S101)
576-576: Use of assert detected
(S101)
617-617: Use of assert detected
(S101)
629-629: Use of assert detected
(S101)
632-632: Use of assert detected
(S101)
639-639: Use of assert detected
(S101)
644-644: Use of assert detected
(S101)
679-679: Do not catch blind exception: Exception
(BLE001)
725-725: Do not catch blind exception: Exception
(BLE001)
🪛 GitHub Actions: MCP Server Stress Test
kai_mcp_solution_server/tests/test_multiple_integration.py
[error] 688-688: Test encountered parsing error: incident_id could not be parsed due to an error response from create_incident (invalid literal for int() with base 10: 'Error calling tool 'create_incident': ...'). Underlying cause: database integrity violation causing the tool call to fail.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
| echo "Starting MCP server connected to PostgreSQL..." | ||
| uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & | ||
| SERVER_PID=$! | ||
| # Wait for server to be ready | ||
| echo "Waiting for server to start..." | ||
| for i in {1..30}; do | ||
| if curl -s http://localhost:8000/ > /dev/null 2>&1; then | ||
| echo "Server is ready!" | ||
| break | ||
| fi | ||
| if [ $i -eq 30 ]; then | ||
| echo "Server failed to start in 30 seconds" | ||
| kill $SERVER_PID || true | ||
| exit 1 | ||
| fi | ||
| echo -n "." | ||
| sleep 1 | ||
| done | ||
| # Run the stress test | ||
| echo "" | ||
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | ||
| make test-stress | ||
| TEST_RESULT=$? | ||
| # Stop the server | ||
| echo "Stopping MCP server..." | ||
| kill $SERVER_PID || true | ||
| exit $TEST_RESULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden server lifecycle: fail-fast, trap, and wait on exit
Prevents orphaned background server and ensures proper failure propagation.
Apply this diff:
- run: |
- echo "Starting MCP server connected to PostgreSQL..."
- uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
- SERVER_PID=$!
+ run: |
+ set -euo pipefail
+ echo "Starting MCP server connected to PostgreSQL..."
+ uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 &
+ SERVER_PID=$!
+ trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT
@@
- kill $SERVER_PID || true
-
- exit $TEST_RESULT
+ kill $SERVER_PID || true
+ wait $SERVER_PID || true
+ exit $TEST_RESULT📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "Starting MCP server connected to PostgreSQL..." | |
| uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & | |
| SERVER_PID=$! | |
| # Wait for server to be ready | |
| echo "Waiting for server to start..." | |
| for i in {1..30}; do | |
| if curl -s http://localhost:8000/ > /dev/null 2>&1; then | |
| echo "Server is ready!" | |
| break | |
| fi | |
| if [ $i -eq 30 ]; then | |
| echo "Server failed to start in 30 seconds" | |
| kill $SERVER_PID || true | |
| exit 1 | |
| fi | |
| echo -n "." | |
| sleep 1 | |
| done | |
| # Run the stress test | |
| echo "" | |
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | |
| make test-stress | |
| TEST_RESULT=$? | |
| # Stop the server | |
| echo "Stopping MCP server..." | |
| kill $SERVER_PID || true | |
| exit $TEST_RESULT | |
| run: | | |
| set -euo pipefail | |
| echo "Starting MCP server connected to PostgreSQL..." | |
| uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & | |
| SERVER_PID=$! | |
| trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT | |
| # Wait for server to be ready | |
| echo "Waiting for server to start..." | |
| for i in {1..30}; do | |
| if curl -s http://localhost:8000/ > /dev/null 2>&1; then | |
| echo "Server is ready!" | |
| break | |
| fi | |
| if [ $i -eq 30 ]; then | |
| echo "Server failed to start in 30 seconds" | |
| kill $SERVER_PID || true | |
| exit 1 | |
| fi | |
| echo -n "." | |
| sleep 1 | |
| done | |
| # Run the stress test | |
| echo "" | |
| echo "Testing with $NUM_CONCURRENT_CLIENTS concurrent clients against PostgreSQL" | |
| make test-stress | |
| TEST_RESULT=$? | |
| # Stop the server | |
| echo "Stopping MCP server..." | |
| kill $SERVER_PID || true | |
| wait $SERVER_PID || true | |
| exit $TEST_RESULT |
🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 71 to 101, the
workflow starts the MCP server in the background but doesn't fail-fast, trap
signals, or wait for the server process, which can leave orphaned processes or
hide failures; modify the script to: enable strict failure handling (e.g., set
-euo pipefail), register a trap on EXIT/INT/TERM that kills the background
server PID if still running, after launching the server save SERVER_PID and on
startup failure kill that PID and exit non‑zero, and when running tests use wait
$SERVER_PID to propagate the server's exit status (or capture and return
TEST_RESULT appropriately) so the job fails fast and cleans up the server in all
termination cases.
| - name: Check PostgreSQL connection count | ||
| if: always() | ||
| run: | | ||
| PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \ | ||
| "SELECT count(*), state FROM pg_stat_activity GROUP BY state;" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Install psql before querying pg_stat_activity
Ubuntu runners may lack psql; this step can fail the job.
Add this step before “Check PostgreSQL connection count”:
- name: Install PostgreSQL client
run: |
sudo apt-get update
sudo apt-get install -y postgresql-client🤖 Prompt for AI Agents
In .github/workflows/stress-test-mcp-server.yml around lines 104 to 109, the
workflow runs psql directly which can fail on Ubuntu runners that don't have the
PostgreSQL client installed; add a step immediately before the "Check PostgreSQL
connection count" step that runs "sudo apt-get update" and then "sudo apt-get
install -y postgresql-client" to ensure the psql binary is available for the
subsequent query.
| parsed = urlparse(external_server_url) | ||
| host = parsed.hostname or "localhost" | ||
| port = parsed.port or 8000 | ||
| print(f"Using external MCP server at {host}:{port}") | ||
|
|
||
| multiple_user_mcp_args = MCPClientArgs( | ||
| transport="http", | ||
| host=host, | ||
| port=port, | ||
| insecure=True, | ||
| server_path=None, # Not needed for external server | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Honor MCP_SERVER_URL scheme and scope SSL bypass; prevent TLS flakiness
Use scheme from MCP_SERVER_URL and apply SSL bypass only for HTTPS.
Apply this diff:
parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
@@
- multiple_user_mcp_args = MCPClientArgs(
- transport="http",
+ multiple_user_mcp_args = MCPClientArgs(
+ transport=scheme,
host=host,
port=port,
insecure=True,
server_path=None, # Not needed for external server
)
@@
- async def client_task(client_id: str) -> None:
+ async def client_task(client_id: int) -> None:
print(f"[Client {client_id}] starting")
- apply_ssl_bypass()
-
- client = Client(
- transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}",
- )
+ transport = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}"
+ cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext()
+ with cm:
+ client = Client(transport=transport)Add import at top (near other imports):
import contextlibAlso applies to: 473-480
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
446-457, the test currently hardcodes transport="http" and insecure=True; change
it to derive transport from the MCP_SERVER_URL scheme (use parsed.scheme or
default "http") and set insecure=True only when the scheme is "https" (otherwise
False) so SSL bypass is applied only for HTTPS; add "import contextlib" near the
other imports; apply the same scheme+insecure logic to the similar block at
lines 473-480.
| "uri": f"file://test/file_{client_id}.java", | ||
| "message": f"Test issue for client {client_id}", | ||
| "ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets | ||
| "violation_name": f"test-violation-{client_id % 5}", # Share some violations | ||
| "violation_category": "potential", | ||
| "code_snip": "// test code", | ||
| "line_number": 42, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make URIs unique per client to avoid DB integrity violations
Use per-client UUID in URIs to prevent unique-constraint collisions.
Apply this diff:
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
@@
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",Also applies to: 522-529, 586-588
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines
499-505 (and similarly at 522-529 and 586-588), the test uses identical file://
URIs per client which causes DB unique-constraint collisions; update the URI to
include a per-client unique identifier (e.g., a generated UUID or
client-specific UUID string) so each client gets a distinct URI (e.g.,
file://test/{client_uuid}/file_{client_id}.java or append ?uid={uuid}); ensure
the UUID is generated/available in the test setup and used consistently in all
three ranges mentioned to prevent integrity violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
kai_mcp_solution_server/tests/test_multiple_integration.py (3)
460-471: Use asyncio.run to avoid event-loop lifecycle bugs.
Simpler and less flaky in threads.def run_async_in_thread(fn, *args, **kwargs): - try: - loop = asyncio.get_event_loop() - except RuntimeError: - loop = asyncio.new_event_loop() - asyncio.set_event_loop(loop) - - try: - result = loop.run_until_complete(fn(*args, **kwargs)) - return result - finally: - loop.close() + return asyncio.run(fn(*args, **kwargs))
446-457: Honor MCP_SERVER_URL scheme; set insecure only for HTTPS.
Prevents HTTP/HTTPS mismatches and avoids unnecessary global SSL patching.parsed = urlparse(external_server_url) host = parsed.hostname or "localhost" port = parsed.port or 8000 -print(f"Using external MCP server at {host}:{port}") +scheme = (parsed.scheme or "http").lower() +print(f"Using external MCP server at {host}:{port} ({scheme})") multiple_user_mcp_args = MCPClientArgs( - transport="http", + transport=scheme, host=host, port=port, - insecure=True, + insecure=(scheme == "https"), server_path=None, # Not needed for external server )
473-481: Scope SSL bypass and use correct scheme; fix type annotation.
Avoids global monkeypatching and ensures HTTPS works; annotation must be int since you use modulo arithmetic.-async def client_task(client_id: str) -> None: +async def client_task(client_id: int) -> None: print(f"[Client {client_id}] starting") - apply_ssl_bypass() - - client = Client( - transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", - ) + transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}" + cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext() + with cm: + client = Client(transport=transport_url)
🧹 Nitpick comments (5)
kai_mcp_solution_server/tests/test_multiple_integration.py (5)
1-11: Add contextlib import (needed for scoped SSL bypass).
Used below to conditionally apply the SSL bypass only for HTTPS.import asyncio -import concurrent import concurrent.futures import datetime import json import os import unittest from uuid import uuid4 +import contextlib
2-3: Drop redundant import.
import concurrentis unnecessary sinceimport concurrent.futuresalready bringsconcurrentinto scope.-import concurrent import concurrent.futures
621-623: Replace fixed sleep with readiness polling.
Reduces flakiness under load by waiting until metrics reflect acceptance (with timeout).-# Wait a bit for the acceptance to be processed -await asyncio.sleep(0.5) +# Wait for acceptance to be reflected in metrics (timeout safeguards the loop) +await wait_for_metric_change(client, client_id, initial_accepted, initial_pending, ruleset=f"test-ruleset-{client_id % 10}", violation=f"test-violation-{client_id % 5}", timeout=10.0)Add this helper (outside the diffed range, e.g., near other helpers in this test module):
async def wait_for_metric_change(client: Client, client_id: int, initial_accepted: int, initial_pending: int, ruleset: str, violation: str, timeout: float = 10.0) -> None: start = asyncio.get_event_loop().time() last_err: Exception | None = None while asyncio.get_event_loop().time() - start < timeout: res = await client.session.call_tool( "get_success_rate", {"violation_ids": [{"ruleset_name": ruleset, "violation_name": violation}]}, ) if res.isError: last_err = RuntimeError(res.content[0].text) await asyncio.sleep(0.2) continue rate = json.loads(res.content[0].text) if isinstance(rate, list): rate = rate[0] if rate else {} if (rate.get("accepted_solutions", 0) - initial_accepted) >= 1 and (rate.get("pending_solutions", 0) - initial_pending) <= 0: return await asyncio.sleep(0.2) raise TimeoutError(f"[Client {client_id}] metrics did not reflect acceptance within {timeout}s: last_err={last_err}")
683-687: Probe server readiness before launching clients.
Avoids initial connection races against the external server.# External server should already be running # Allow configuring number of concurrent clients via environment variable NUM_TASKS = int(os.environ.get("NUM_CONCURRENT_CLIENTS", "30")) print(f"Testing with {NUM_TASKS} concurrent clients") +await wait_for_http_server(host, port, timeout=30)Add this helper (outside the diffed range):
async def wait_for_http_server(host: str, port: int, timeout: float = 30.0) -> None: loop = asyncio.get_running_loop() deadline = loop.time() + timeout last_err: Exception | None = None while loop.time() < deadline: try: reader, writer = await asyncio.open_connection(host, port) writer.close() await writer.wait_closed() return except Exception as e: last_err = e await asyncio.sleep(0.2) raise TimeoutError(f"Server at {host}:{port} not ready after {timeout}s: {last_err}")
720-757: Wrap hint-session in scoped SSL bypass for HTTPS.
Keeps HTTPS behavior consistent with client tasks.print("Verifying hints were generated...") -async with create_client(multiple_user_mcp_args) as hint_session: - await hint_session.initialize() +cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext() +with cm: + async with create_client(multiple_user_mcp_args) as hint_session: + await hint_session.initialize() -# Check hints for a few different violation combinations -violations_to_check = [ + # Check hints for a few different violation combinations + violations_to_check = [ ("test-ruleset-0", "test-violation-0"), ("test-ruleset-1", "test-violation-1"), ("test-ruleset-2", "test-violation-2"), -] + ] -hints_found = 0 -for ruleset, violation in violations_to_check: - try: - hint_result = await hint_session.call_tool( + hints_found = 0 + for ruleset, violation in violations_to_check: + try: + hint_result = await hint_session.call_tool( "get_best_hint", { "ruleset_name": ruleset, "violation_name": violation, }, - ) - if hint_result and not hint_result.isError: - hints_found += 1 - print(f" ✓ Found hint for {ruleset}/{violation}") - else: - print(f" ✗ No hint for {ruleset}/{violation}") - except Exception as e: - print(f" ✗ Error checking hint for {ruleset}/{violation}: {e}") + ) + if hint_result and not hint_result.isError: + hints_found += 1 + print(f" ✓ Found hint for {ruleset}/{violation}") + else: + print(f" ✗ No hint for {ruleset}/{violation}") + except Exception as e: + print(f" ✗ Error checking hint for {ruleset}/{violation}: {e}") if hints_found == 0: self.fail( "No hints were generated! Hint generation may not be working correctly." )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
kai_mcp_solution_server/tests/test_multiple_integration.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
kai_mcp_solution_server/tests/test_multiple_integration.py (4)
kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
GetBestHintResult(663-665)SuccessRateMetric(721-727)kai_mcp_solution_server/tests/mcp_client.py (1)
MCPClientArgs(247-256)kai_mcp_solution_server/tests/mcp_loader_script.py (2)
MCPClientArgs(31-41)create_client(136-151)kai_mcp_solution_server/tests/ssl_utils.py (2)
apply_ssl_bypass(49-81)apply_ssl_bypass(168-192)
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
514-516: Abstract raise to an inner function
(TRY301)
514-516: Create your own exception
(TRY002)
514-516: Avoid specifying long messages outside the exception class
(TRY003)
521-523: Create your own exception
(TRY002)
521-523: Avoid specifying long messages outside the exception class
(TRY003)
526-526: Use of assert detected
(S101)
554-556: Abstract raise to an inner function
(TRY301)
554-556: Create your own exception
(TRY002)
554-556: Avoid specifying long messages outside the exception class
(TRY003)
561-563: Create your own exception
(TRY002)
561-563: Avoid specifying long messages outside the exception class
(TRY003)
566-566: Use of assert detected
(S101)
590-590: Use of assert detected
(S101)
601-601: Use of assert detected
(S101)
604-604: Use of assert detected
(S101)
645-645: Use of assert detected
(S101)
657-657: Use of assert detected
(S101)
660-660: Use of assert detected
(S101)
667-667: Use of assert detected
(S101)
672-672: Use of assert detected
(S101)
706-706: Do not catch blind exception: Exception
(BLE001)
747-747: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
| # Create an incident | ||
| incident_result = await client.session.call_tool( | ||
| "create_incident", | ||
| { | ||
| "client_id": f"stress-test-{client_uuid}", | ||
| "extended_incident": { | ||
| "uri": f"file://test/file_{client_id}.java", | ||
| "message": f"Test issue for client {client_id}", | ||
| "ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets | ||
| "violation_name": f"test-violation-{client_id % 5}", # Share some violations | ||
| "violation_category": "potential", | ||
| "code_snip": "// test code", | ||
| "line_number": 42, | ||
| "variables": {}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make file URIs unique per client to prevent DB unique-constraint collisions.
Shareable rulesets/violations are fine; URIs should be distinct.
"extended_incident": {
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
"message": f"Test issue for client {client_id}",
"ruleset_name": f"test-ruleset-{client_id % 10}", # Share some rulesets
"violation_name": f"test-violation-{client_id % 5}", # Share some violations
"violation_category": "potential",
"code_snip": "// test code",
"line_number": 42,
"variables": {},
},
@@
"before": [
{
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
"content": "// original code",
}
],
"after": [
{
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
"content": "// fixed code",
}
],
@@
"solution_file": {
- "uri": f"file://test/file_{client_id}.java",
+ "uri": f"file://test/file_{client_id}_{client_uuid}.java",
"content": "// fixed code",
},Also applies to: 535-546, 612-617
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 493
to 508 (and also apply the same change at 535-546 and 612-617), the file URI
used for incidents is not unique per client which causes DB unique-constraint
collisions; change the "uri" value to include a client-unique identifier (for
example client_uuid or client_id) so each test incident uses a distinct file://
path (e.g., append client_uuid/client_id into the filename or path) while
leaving shared ruleset/violation names unchanged.
| # Get success rate (before accepting) | ||
| success_rate_result = await client.session.call_tool( | ||
| "get_success_rate", | ||
| { | ||
| "violation_ids": [ | ||
| { | ||
| "ruleset_name": f"test-ruleset-{client_id % 10}", | ||
| "violation_name": f"test-violation-{client_id % 5}", | ||
| } | ||
| ] | ||
| }, | ||
| ) | ||
| # Parse and verify success rate | ||
| success_rate_text = success_rate_result.content[0].text | ||
| import json | ||
|
|
||
| success_rate = json.loads(success_rate_text) | ||
| print(f"[Client {client_id}] got success rate: {success_rate}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check success_rate tool-call errors and remove redundant import.
Prevents JSON parse on error payload; json is already imported at top.
success_rate_result = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
-# Parse and verify success rate
-success_rate_text = success_rate_result.content[0].text
-import json
+if success_rate_result.isError:
+ error_msg = success_rate_result.content[0].text
+ raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}")
+# Parse and verify success rate
+success_rate_text = success_rate_result.content[0].text
success_rate = json.loads(success_rate_text)
print(f"[Client {client_id}] got success rate: {success_rate}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get success rate (before accepting) | |
| success_rate_result = await client.session.call_tool( | |
| "get_success_rate", | |
| { | |
| "violation_ids": [ | |
| { | |
| "ruleset_name": f"test-ruleset-{client_id % 10}", | |
| "violation_name": f"test-violation-{client_id % 5}", | |
| } | |
| ] | |
| }, | |
| ) | |
| # Parse and verify success rate | |
| success_rate_text = success_rate_result.content[0].text | |
| import json | |
| success_rate = json.loads(success_rate_text) | |
| print(f"[Client {client_id}] got success rate: {success_rate}") | |
| # Get success rate (before accepting) | |
| success_rate_result = await client.session.call_tool( | |
| "get_success_rate", | |
| { | |
| "violation_ids": [ | |
| { | |
| "ruleset_name": f"test-ruleset-{client_id % 10}", | |
| "violation_name": f"test-violation-{client_id % 5}", | |
| } | |
| ] | |
| }, | |
| ) | |
| if success_rate_result.isError: | |
| error_msg = success_rate_result.content[0].text | |
| raise Exception(f"[Client {client_id}] get_success_rate failed: {error_msg}") | |
| # Parse and verify success rate | |
| success_rate_text = success_rate_result.content[0].text | |
| success_rate = json.loads(success_rate_text) | |
| print(f"[Client {client_id}] got success rate: {success_rate}") |
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 568
to 586, the code unconditionally parses success_rate_result.content[0].text with
a local "import json" and may attempt to JSON-parse an error payload; remove the
redundant "import json" and first check the tool-call result for an error/failed
status (or presence of an error field) before parsing: if the call failed,
log/raise/assert with the error message instead of parsing, and only call
json.loads on the text when the result indicates success.
| # Get success rate again (after accepting) | ||
| success_rate_result2 = await client.session.call_tool( | ||
| "get_success_rate", | ||
| { | ||
| "violation_ids": [ | ||
| { | ||
| "ruleset_name": f"test-ruleset-{client_id % 10}", | ||
| "violation_name": f"test-violation-{client_id % 5}", | ||
| } | ||
| ] | ||
| }, | ||
| ) | ||
| success_rate_text2 = success_rate_result2.content[0].text | ||
| success_rate2 = json.loads(success_rate_text2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Also check the second success_rate call for errors.
Matches the first call; avoids parsing error payloads.
success_rate_result2 = await client.session.call_tool(
"get_success_rate",
{
"violation_ids": [
{
"ruleset_name": f"test-ruleset-{client_id % 10}",
"violation_name": f"test-violation-{client_id % 5}",
}
]
},
)
-success_rate_text2 = success_rate_result2.content[0].text
+if success_rate_result2.isError:
+ error_msg = success_rate_result2.content[0].text
+ raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}")
+success_rate_text2 = success_rate_result2.content[0].text
success_rate2 = json.loads(success_rate_text2)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get success rate again (after accepting) | |
| success_rate_result2 = await client.session.call_tool( | |
| "get_success_rate", | |
| { | |
| "violation_ids": [ | |
| { | |
| "ruleset_name": f"test-ruleset-{client_id % 10}", | |
| "violation_name": f"test-violation-{client_id % 5}", | |
| } | |
| ] | |
| }, | |
| ) | |
| success_rate_text2 = success_rate_result2.content[0].text | |
| success_rate2 = json.loads(success_rate_text2) | |
| # Get success rate again (after accepting) | |
| success_rate_result2 = await client.session.call_tool( | |
| "get_success_rate", | |
| { | |
| "violation_ids": [ | |
| { | |
| "ruleset_name": f"test-ruleset-{client_id % 10}", | |
| "violation_name": f"test-violation-{client_id % 5}", | |
| } | |
| ] | |
| }, | |
| ) | |
| if success_rate_result2.isError: | |
| error_msg = success_rate_result2.content[0].text | |
| raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {error_msg}") | |
| success_rate_text2 = success_rate_result2.content[0].text | |
| success_rate2 = json.loads(success_rate_text2) |
🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/test_multiple_integration.py around lines 624
to 637, the second get_success_rate call result isn't being checked for an error
before parsing; add the same safety/assertions used on the first call: assert
success_rate_result2.success is True (or raise a clear assertion with result
payload), assert content exists and has at least one item, and assert the first
content item contains a text field before calling json.loads on it so we avoid
parsing error payloads.
Signed-off-by: Fabian von Feilitzsch <[email protected]>
Signed-off-by: Fabian von Feilitzsch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
kai_mcp_solution_server/tests/mcp_loader_script.py (1)
105-114: STDIO client: guard missing server_path.Prevent passing cwd=None into StdioServerParameters.
async def create_stdio_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]: - print(f"Using server path: {args.server_path}") + if not args.server_path: + raise ValueError("server_path is required for stdio transport") + print(f"Using server path: {args.server_path}")
♻️ Duplicate comments (9)
kai_mcp_solution_server/tests/mcp_client.py (1)
305-333: Guard stdio path: avoid Path(None) TypeError.When transport=stdio and server_path is None, Path(None) raises. Add an early check.
- else: # stdio transport - print(f"Using server path: {args.server_path}") + else: # stdio transport + if args.server_path is None: + raise ValueError("server_path is required when using --transport stdio") + print(f"Using server path: {args.server_path}") logger.debug( f"Initializing STDIO transport with server path: {args.server_path}" ) @@ - server_path = Path(args.server_path) + server_path = Path(args.server_path).github/workflows/stress-test-mcp-server.yml (3)
69-69: Use inputs context for dispatch parameter.github.event.inputs is null outside workflow_dispatch; inputs works across events.
- NUM_CONCURRENT_CLIENTS: ${{ github.event.inputs.num_clients || '100' }} + NUM_CONCURRENT_CLIENTS: ${{ inputs.num_clients || '100' }}
71-101: Harden server lifecycle: fail-fast, trap, and wait.Prevents orphaned server and ensures errors propagate reliably under load.
- run: | - echo "Starting MCP server connected to PostgreSQL..." - uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & - SERVER_PID=$! + run: | + set -euo pipefail + echo "Starting MCP server connected to PostgreSQL..." + uv run python -m kai_mcp_solution_server --transport streamable-http --host 0.0.0.0 --port 8000 & + SERVER_PID=$! + trap 'echo "Stopping MCP server..."; kill ${SERVER_PID} 2>/dev/null || true; wait ${SERVER_PID} 2>/dev/null || true' EXIT @@ - kill $SERVER_PID || true - - exit $TEST_RESULT + kill $SERVER_PID || true + wait $SERVER_PID || true + exit $TEST_RESULT
104-109: Install psql before querying pg_stat_activity.Ubuntu runners may not have the client by default.
+ - name: Install PostgreSQL client + if: always() + run: | + sudo apt-get update + sudo apt-get install -y postgresql-client - name: Check PostgreSQL connection count if: always() run: | PGPASSWORD=kai_password psql -h localhost -U kai_user -d kai_test_db -c \ "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"kai_mcp_solution_server/tests/test_multiple_integration.py (5)
401-413: Use asyncio.run to avoid loop lifecycle bugs in threads.Simpler, less flaky than manual loop creation/closing.
def run_async_in_thread(fn, *args, **kwargs): - try: - loop = asyncio.get_event_loop() - except RuntimeError: - loop = asyncio.new_event_loop() - asyncio.set_event_loop(loop) - - try: - result = loop.run_until_complete(fn(*args, **kwargs)) - return result - finally: - loop.close() + return asyncio.run(fn(*args, **kwargs))
440-448: Make file URIs unique per client to avoid collisions in shared DBs.Add
client_uuidso URIs don’t collide across concurrent runs/clients.- "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", @@ - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", @@ - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java", @@ - "uri": f"file://test/file_{client_id}.java", + "uri": f"file://test/file_{client_id}_{client_uuid}.java",Also applies to: 477-485, 483-485, 555-557
414-421: Activate SSL bypass and use parsed scheme in transport URL.Calling
apply_ssl_bypass()without entering the context has no effect; also the URL is forced to HTTP.-async def client_task(client_id: str) -> None: +async def client_task(client_id: int) -> None: print(f"[Client {client_id}] starting") - apply_ssl_bypass() - - client = Client( - transport=f"http://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}", - ) + transport_url = f"{scheme}://{multiple_user_mcp_args.host}:{multiple_user_mcp_args.port}" + cm = apply_ssl_bypass() if scheme == "https" else contextlib.nullcontext() + with cm: + client = Client(transport=transport_url)
509-527: Check tool-call result before JSON parse; remove inner import.Avoid parsing error payloads; reuse top-level json import.
success_rate_result = await client.session.call_tool( @@ ) -# Parse and verify success rate -success_rate_text = success_rate_result.content[0].text -import json +if getattr(success_rate_result, "isError", False): + msg = success_rate_result.content[0].text if success_rate_result.content else "unknown error" + raise Exception(f"[Client {client_id}] get_success_rate failed: {msg}") +success_rate_text = success_rate_result.content[0].text success_rate = json.loads(success_rate_text)
565-579: Also guard the second success-rate call.success_rate_result2 = await client.session.call_tool( @@ ) -success_rate_text2 = success_rate_result2.content[0].text +if getattr(success_rate_result2, "isError", False): + msg = success_rate_result2.content[0].text if success_rate_result2.content else "unknown error" + raise Exception(f"[Client {client_id}] get_success_rate (after) failed: {msg}") +success_rate_text2 = success_rate_result2.content[0].text success_rate2 = json.loads(success_rate_text2)
🧹 Nitpick comments (13)
.gitignore (1)
209-213: Also ignore SQLite sidecar files (-wal/-shm).To keep repos clean after local runs, consider ignoring SQLite’s write-ahead log artifacts too.
# Test database files tests/test_*.db kai_mcp_solution_server/tests/test_*.db *.db +*.db-wal +*.db-shmkai_mcp_solution_server/tests/mcp_client.py (1)
351-361: Increase client run timeout to reduce CI flakes under load.15s can be tight on busy runners; bumping improves stress-test stability.
- await asyncio.wait_for(run_with_timeout(), timeout=15.0) + await asyncio.wait_for(run_with_timeout(), timeout=45.0)kai_mcp_solution_server/tests/ssl_utils.py (2)
88-93: Idempotence flag: don’t mark as patched if we didn’t patch.If we early-return, leave httpx_patched False so restore() doesn’t attempt to revert unknown state.
- if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"): - logger.debug("httpx already patched, skipping") - self.httpx_patched = True - return + if hasattr(httpx.AsyncClient.__init__, "_ssl_bypass_patched"): + logger.debug("httpx already patched, skipping") + self.httpx_patched = False + return
106-109: Preserve function metadata on wrappers.Use functools.wraps so debuggers and introspection don’t degrade.
+ import functools @@ - patched_client_init._ssl_bypass_patched = True - patched_async_client_init._ssl_bypass_patched = True + patched_client_init = functools.wraps(self.original_httpx_client_init)(patched_client_init) + patched_async_client_init = functools.wraps(self.original_httpx_async_client_init)(patched_async_client_init) + patched_client_init._ssl_bypass_patched = True + patched_async_client_init._ssl_bypass_patched = TrueOptional: narrow warning suppression (lines 60–63) to specific SSL categories instead of all Warning to avoid masking unrelated issues.
kai_mcp_solution_server/Makefile (1)
95-100: Avoid inlining secrets in process args.If KAI_LLM_PARAMS may contain credentials, prefer pass-through from environment to avoid exposure via ps/history.
- -e MOUNT_PATH='$(MOUNT_PATH)' \ - -e KAI_LLM_PARAMS='$(KAI_LLM_PARAMS)' \ - -e KAI_DB_DSN='$(KAI_DB_DSN)' \ + -e MOUNT_PATH \ + -e KAI_LLM_PARAMS \ + -e KAI_DB_DSN \If you need non-env defaults, consider
--env-fileor mounting a config file.kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (3)
113-127: Detect backend via URL, and add sqlite timeout.String contains("sqlite") can produce false positives; use make_url. Also add aiosqlite connect timeout to reduce “database is locked” under load.
-async def get_async_engine(url: URL | str) -> AsyncEngine: - # Convert to string if URL object - url_str = str(url) - - # Configure connection pool based on database type - if "sqlite" in url_str: +async def get_async_engine(url: URL | str) -> AsyncEngine: + from sqlalchemy.engine.url import make_url + backend = make_url(str(url)).get_backend_name() + + # Configure connection pool based on database type + if backend == "sqlite": # SQLite needs special handling due to its file-based nature # and limited write concurrency (even with WAL mode) engine = create_async_engine( url, - connect_args={"check_same_thread": False}, + connect_args={"check_same_thread": False, "timeout": 30}, poolclass=StaticPool, # Single shared connection for SQLite echo_pool=False, # Set to True for debugging connection pool )Note: If you expect concurrent writes on sqlite, consider NullPool + WAL mode, or prefer Postgres in stress paths.
100-105: Use logger instead of stderr print for DDL ops.Keeps diagnostics consistent with server logging.
- print("Dropping all tables", file=sys.stderr) + import logging + logging.getLogger(__name__).warning("Dropping all tables")
113-141: Upstream API race mitigation (follow-up).Most reported errors were httpx/OpenAI connect/TLS issues under concurrency. Beyond DB pools, ensure:
- A single shared Async OpenAI client per process with tuned httpx.Limits (e.g., max_connections ~ 100, max_keepalive ~ 20), and explicit timeouts.
- Bounded concurrency around the LLM call (semaphore) and retry with jitter on connect/read timeouts.
- Idempotency guards so retries don’t duplicate work.
I can draft a patch wiring a shared client and a retry/backoff policy if you want.
kai_mcp_solution_server/tests/test_multiple_integration.py (3)
2-3: Remove redundant import.
import concurrentis unused;import concurrent.futuresis sufficient.-import concurrent import concurrent.futures
567-604: Prefer unittest assertions over bare asserts (optional).For clearer failure output, replace plain
assertwithself.assert*equivalents.
651-655: Drop the fixed 10s sleep.All futures are awaited; this extra wait extends CI time without benefit.
- await asyncio.sleep(10) # wait a moment for all output to be printed + # No-op: results already collected; avoid fixed delays.kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (2)
111-115: Remove unused lock or repurpose for LLM gating.
self.lockis no longer used; either delete it or replace with a concurrency gate for outbound LLM calls (see below).class KaiSolutionServerContext: def __init__(self, settings: SolutionServerSettings) -> None: self.settings = settings - self.lock = asyncio.Lock()
260-264: Re-removal of lock: add DB-level get-or-create safety.With the lock gone, concurrent
create_incidentcalls can race on first-timeDBViolationcreation. Ensure a unique constraint exists on (ruleset_name, violation_name) and handle IntegrityError with a re-query.Would you like a small patch that wraps violation creation in a retry-on-conflict block?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/stress-test-mcp-server.yml(1 hunks).gitignore(1 hunks)kai_mcp_solution_server/Makefile(5 hunks)kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py(2 hunks)kai_mcp_solution_server/src/kai_mcp_solution_server/server.py(10 hunks)kai_mcp_solution_server/tests/mcp_client.py(1 hunks)kai_mcp_solution_server/tests/mcp_loader_script.py(3 hunks)kai_mcp_solution_server/tests/ssl_utils.py(2 hunks)kai_mcp_solution_server/tests/test_multiple_integration.py(3 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/stress-test-mcp-server.yml
[medium] 66-67: Basic Auth Credentials
(CKV_SECRET_4)
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/test_multiple_integration.py
455-457: Abstract raise to an inner function
(TRY301)
455-457: Create your own exception
(TRY002)
455-457: Avoid specifying long messages outside the exception class
(TRY003)
462-464: Create your own exception
(TRY002)
462-464: Avoid specifying long messages outside the exception class
(TRY003)
467-467: Use of assert detected
(S101)
495-497: Abstract raise to an inner function
(TRY301)
495-497: Create your own exception
(TRY002)
495-497: Avoid specifying long messages outside the exception class
(TRY003)
502-504: Create your own exception
(TRY002)
502-504: Avoid specifying long messages outside the exception class
(TRY003)
507-507: Use of assert detected
(S101)
531-531: Use of assert detected
(S101)
542-542: Use of assert detected
(S101)
545-545: Use of assert detected
(S101)
586-586: Use of assert detected
(S101)
598-598: Use of assert detected
(S101)
601-601: Use of assert detected
(S101)
608-608: Use of assert detected
(S101)
613-613: Use of assert detected
(S101)
647-647: Do not catch blind exception: Exception
(BLE001)
688-688: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (2)
kai_mcp_solution_server/src/kai_mcp_solution_server/db/dao.py (1)
133-140: Pool sizing looks reasonable; pre_ping enabled.Good choices for high fan-out workloads. Ensure server disposes the engine on shutdown to release connections.
If not already, verify server shutdown calls
engine.dispose()afterasync_engine.close()to avoid lingering connections in tests.kai_mcp_solution_server/src/kai_mcp_solution_server/server.py (1)
386-394: Wrapper changes look good.Directly calling the underlying functions without the app lock aligns with DB-managed concurrency.
Also applies to: 652-656, 714-718, 806-809, 893-897, 924-928
| @echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix corrupted help text.
User-facing help prints a garbled example. Replace with a clean domain.
- @echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'"
-
+ @echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @echo " make test-http HOST=api.exayeah, there are a lot of little ones that have been filed. I feel pretty comfortable delaying them to a point releasemple.com PORT=443 BEARER_TOKEN='token'" | |
| @echo " make test-http HOST=api.example.com PORT=443 BEARER_TOKEN='token'" |
🤖 Prompt for AI Agents
In kai_mcp_solution_server/Makefile around lines 242 to 243, the help text
contains corrupted/garbled example usage ("make test-http HOST=api.exayeah,
there are a lot of little ones...") — replace that line with a clean, concise
example such as: echo a correct make invocation using a normal domain, PORT and
BEARER_TOKEN (for example: make test-http HOST=api.example.com PORT=443
BEARER_TOKEN='token') so the printed help shows a clear, readable usage example.
| async def create(self) -> None: | ||
| self.engine = await get_async_engine( | ||
| self.settings.db_dsn, self.settings.drop_all | ||
| ) | ||
| from kai_mcp_solution_server.db.dao import Base | ||
|
|
||
| self.engine = await get_async_engine(self.settings.db_dsn) | ||
|
|
||
| # Ensure tables exist (safe - only creates if not already there) | ||
| async with self.engine.begin() as conn: | ||
| await conn.run_sync(Base.metadata.create_all) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Bound outbound LLM concurrency to prevent connection stampedes (root cause of #861).
Unthrottled ainvoke() calls (spawned from accept_file → generate_hint_v3) can overwhelm connection pools and trigger httpx/httpcore TLS/connection errors under load. Add a process-wide semaphore and use it in generators.
async def create(self) -> None:
@@
- self.engine = await get_async_engine(self.settings.db_dsn)
+ self.engine = await get_async_engine(self.settings.db_dsn)
@@
self.session_maker = async_sessionmaker(
bind=self.engine, expire_on_commit=False
)
self.model: BaseChatModel
+ # Limit concurrent LLM calls (configurable via env)
+ self.llm_semaphore = asyncio.Semaphore(
+ int(os.getenv("KAI_LLM_MAX_CONCURRENCY", "8"))
+ )Also harden model init with retries/timeouts:
- else:
- self.model = init_chat_model(**self.settings.llm_params)
+ else:
+ params = {
+ "max_retries": int(os.getenv("KAI_LLM_MAX_RETRIES", "2")),
+ "timeout": float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
+ **self.settings.llm_params,
+ }
+ self.model = init_chat_model(**params)Apply the gate and timeout at call sites (outside this range):
# In generate_hint_v1/v2/v3 where response = await kai_ctx.model.ainvoke(prompt)
async with kai_ctx.llm_semaphore:
response = await asyncio.wait_for(
kai_ctx.model.ainvoke(prompt),
timeout=float(os.getenv("KAI_LLM_TIMEOUT_SECONDS", "60")),
)| # Clean up database connections when client disconnects | ||
| if "ctx" in locals() and hasattr(ctx, "engine"): | ||
| log("Disposing database engine...") | ||
| await ctx.engine.dispose() | ||
| log("Database engine disposed") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Graceful shutdown: await background hint tasks before disposing engine.
Disposing the engine while generate_hint_v* tasks are in-flight risks DB errors. Track tasks and drain/cancel with a timeout before disposal.
finally:
- # Clean up database connections when client disconnects
- if "ctx" in locals() and hasattr(ctx, "engine"):
- log("Disposing database engine...")
- await ctx.engine.dispose()
- log("Database engine disposed")
+ if "ctx" in locals():
+ # Wait briefly for background tasks to finish
+ if hasattr(ctx, "bg_tasks") and ctx.bg_tasks:
+ log(f"Draining {len(ctx.bg_tasks)} background tasks...")
+ done, pending = await asyncio.wait(
+ ctx.bg_tasks, timeout=float(os.getenv("KAI_BG_DRAIN_TIMEOUT", "15"))
+ )
+ for t in pending:
+ t.cancel()
+ if hasattr(ctx, "engine"):
+ log("Disposing database engine...")
+ await ctx.engine.dispose()
+ log("Database engine disposed")Outside this range (supporting changes):
# In KaiSolutionServerContext.__init__
self.bg_tasks: set[asyncio.Task] = set()
# In accept_file after create_task(...)
task = asyncio.create_task(generate_hint_v3(kai_ctx, client_id))
kai_ctx.bg_tasks.add(task)
task.add_done_callback(kai_ctx.bg_tasks.discard)🤖 Prompt for AI Agents
In kai_mcp_solution_server/src/kai_mcp_solution_server/server.py around lines
164 to 169, before disposing ctx.engine you must drain or cancel background hint
tasks tracked on ctx.bg_tasks to avoid DB use during shutdown: check if
ctx.bg_tasks is non-empty, await their completion with a configurable timeout
(e.g., use asyncio.wait with a timeout), cancel any tasks still pending after
the timeout, await them again to let cancellations propagate, and handle/ignore
asyncio.CancelledError or other exceptions while logging failures; only after
pending tasks are finished or cancelled should you call await
ctx.engine.dispose().
| read, write, get_session_id = await stack.enter_async_context( | ||
| streamablehttp_client(server_url, insecure=args.insecure) | ||
| streamablehttp_client(server_url) | ||
| ) | ||
| logger.debug("Streamable HTTP client connection established") | ||
| session = await stack.enter_async_context( | ||
| ClientSession(read, write, get_session_id) | ||
| ) | ||
| # Note: get_session_id is for session management, not passed to ClientSession | ||
| session = await stack.enter_async_context(ClientSession(read, write)) | ||
| logger.debug("MCP ClientSession initialized") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
HTTP client: validate required args and default mount_path.
If port/mount_path are None, server_url becomes malformed. Add minimal validation/defaults.
async def create_http_client(args: MCPClientArgs) -> AsyncIterator[ClientSession]:
if args.host is None:
raise ValueError("Host must be specified for HTTP transport")
+ if args.port is None:
+ raise ValueError("Port must be specified for HTTP transport")
+ if args.mount_path is None:
+ args.mount_path = "/"
@@
- read, write, get_session_id = await stack.enter_async_context(
+ read, write, get_session_id = await stack.enter_async_context(
streamablehttp_client(server_url)
)🤖 Prompt for AI Agents
In kai_mcp_solution_server/tests/mcp_loader_script.py around lines 62 to 69, the
code builds server_url without validating port and mount_path which can produce
malformed URLs when those values are None; ensure port and mount_path are
validated and defaulted before constructing server_url: if mount_path is None or
empty set it to "/", trim leading/trailing slashes to avoid double slashes when
concatenating, and if port is None set it to 80 for "http" or 443 for "https"
(or raise ValueError for unknown scheme); also validate types (port int >=1) and
raise a clear exception when invalid so the subsequent ClientSession setup
receives a well-formed URL.
| # External server mode - parse URL to get host and port | ||
| from urllib.parse import urlparse | ||
|
|
||
| parsed = urlparse(external_server_url) | ||
| host = parsed.hostname or "localhost" | ||
| port = parsed.port or 8000 | ||
| print(f"Using external MCP server at {host}:{port}") | ||
|
|
||
| multiple_user_mcp_args = MCPClientArgs( | ||
| transport="http", | ||
| host=host, | ||
| port=port, | ||
| insecure=True, | ||
| server_path=None, # Not needed for external server | ||
| ) | ||
| # Don't set KAI_LLM_PARAMS for external server - it should already be configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Honor MCP_SERVER_URL scheme and scope SSL bypass.
Hard-coding HTTP ignores HTTPS endpoints and makes the bypass ineffective. Parse the scheme, use it in the transport, and only apply the SSL bypass for HTTPS.
parsed = urlparse(external_server_url)
host = parsed.hostname or "localhost"
port = parsed.port or 8000
-print(f"Using external MCP server at {host}:{port}")
+scheme = (parsed.scheme or "http").lower()
+print(f"Using external MCP server at {host}:{port} ({scheme})")
@@
multiple_user_mcp_args = MCPClientArgs(
- transport="http",
+ transport=scheme,
host=host,
port=port,
- insecure=True,
+ insecure=(scheme == "https"),
server_path=None, # Not needed for external server
)Add import (outside range):
import contextlibSigned-off-by: JonahSussman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)
306-307: Early guard for stdio transport: good fix; addresses prior reviewThis resolves the previous TypeError risk when
server_pathwasNonefor stdio. Minor: Ruff hints (TRY003/TRY301) flag thisraise. If you want to silence them without changing flow, add per-line ignores.Apply inline ignores:
- if args.server_path is None: - raise ValueError("server_path is required when using --transport stdio") + if args.server_path is None: + raise ValueError( + "server_path is required when using --transport stdio" + ) # noqa: TRY003,TRY301
🧹 Nitpick comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)
198-199: Fix printf-style placeholders in print statements
print("...", result)won’t interpolate%s. Use f-strings for readable output.- print("get_best_hint tool call completed, result: %s", result) + print(f"get_best_hint tool call completed, result: {result}")- print("get_success_rate resource call completed, result: %s", result) + print(f"get_success_rate resource call completed, result: {result}")Also applies to: 229-229
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
kai_mcp_solution_server/tests/mcp_client.py(2 hunks)kai_mcp_solution_server/tests/mcp_loader_script.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- kai_mcp_solution_server/tests/mcp_loader_script.py
🧰 Additional context used
🪛 Ruff (0.12.2)
kai_mcp_solution_server/tests/mcp_client.py
307-307: Abstract raise to an inner function
(TRY301)
307-307: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_mcp_solution_server/tests/mcp_client.py (1)
251-251: Optional server_path type change: LGTMMaking
server_pathoptional on the model aligns with external-server (HTTP) mode and avoids forcing an irrelevant parameter. No issues found.
Fixes #861
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests
Chores