Conversation
Summary of ChangesHello @wwwillchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refines the Claude configuration, primarily by standardizing the E2E testing build workflow and introducing a new command for collaborative issue planning. A major focus has been on enhancing the multi-agent PR review system, making it more intelligent with a dedicated code health agent, LLM-powered issue deduplication, and improved reporting. These changes aim to streamline development processes, improve code quality through more nuanced reviews, and ensure more robust and secure interactions with external comments. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly enhances Claude-based code review and agent interaction capabilities, introducing a dedicated code health agent, smart LLM-based issue deduplication, and improved summary reporting. It also refactors the stop-hook to delegate task analysis to the model and enforces build steps before E2E tests. A security audit identified a potential prompt injection vulnerability in the LLM interaction logic within the review orchestration scripts, which could allow untrusted input to manipulate LLM tools. Remediation should focus on better prompt engineering techniques such as delimiters and explicit data-handling instructions. Additionally, consider improving error visibility in the stop-hook script.
| prompt = f"""You are analyzing code review issues to identify duplicates. | ||
|
|
||
| Multiple reviewers have identified issues in a code review. Some issues may refer to the same underlying problem, even if described differently. | ||
|
|
||
| def aggregate_issues( | ||
| Group the following issues by whether they refer to the SAME underlying problem. Issues should be grouped together if: | ||
| - They point to the same file and similar line ranges (within ~10 lines) | ||
| - They describe the same fundamental issue (even if worded differently) | ||
| - They would result in the same fix | ||
|
|
||
| Do NOT group issues that: | ||
| - Are in different files | ||
| - Are in the same file but describe different problems | ||
| - Point to significantly different line ranges (>20 lines apart) | ||
|
|
||
| Issues to analyze: | ||
| {chr(10).join(issue_descriptions)} |
There was a problem hiding this comment.
The group_similar_issues function constructs a prompt for the deduplication model by directly interpolating issue descriptions. These descriptions are derived from the code being reviewed, which is untrusted. An attacker could craft malicious comments in the code that, when picked up by the review agents, inject instructions into the deduplication prompt to manipulate the grouping of issues (e.g., to hide certain findings). It is recommended to use clear delimiters to separate the issue descriptions from the instructions and explicitly instruct the model to treat the descriptions as data.
| except Exception: | ||
| return {"tasks": {}, "remaining": [], "total": 0} | ||
| return [] |
There was a problem hiding this comment.
The broad except Exception block silently ignores all errors that might occur during transcript processing, such as file I/O errors. This could lead to the hook incorrectly approving a stop if the transcript file is unreadable, as it would appear that no tasks were ever created. It's better to log these unexpected errors to stderr for visibility and debugging, while still returning an empty list to prevent the hook from crashing.
| except Exception: | |
| return {"tasks": {}, "remaining": [], "total": 0} | |
| return [] | |
| except Exception as e: | |
| print(f"Warning: Failed to extract task tool calls from {transcript_path}: {e}", file=sys.stderr) | |
| return [] | |
Greptile OverviewGreptile SummaryThis PR enhances the Claude Code configuration with several key improvements: Multi-Agent PR Review System:
Stop Hook Improvements:
E2E Testing Workflow:
New Features:
The changes are well-structured and focused on improving the Claude Code development experience. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub PR
participant Workflow as Claude Code Action
participant Orch as orchestrate_review.py
participant Agent1 as Code Health Agent
participant Agent2 as Default Agent 1
participant Agent3 as Default Agent 2
participant Sonnet as Sonnet (Dedup)
participant Post as post_comment.py
GH->>Workflow: PR opened/updated
Workflow->>Orch: /dyad:multi-pr-review {PR_NUM}
Orch->>GH: Fetch PR diff
Orch->>GH: Fetch existing comments
Orch->>Orch: Parse diff into files
Orch->>Orch: Create 3 shuffled orderings
par Parallel Review (3 agents)
Orch->>Agent1: Review (code health focus, order 1)
Agent1-->>Orch: Issues JSON
and
Orch->>Agent2: Review (default focus, order 2)
Agent2-->>Orch: Issues JSON
and
Orch->>Agent3: Review (default focus, order 3)
Agent3-->>Orch: Issues JSON
end
Orch->>Sonnet: Group similar issues
Sonnet-->>Orch: Issue groups
Orch->>Orch: Consensus voting (2+ agents)
Orch->>Orch: Filter by severity (MEDIUM+)
Orch->>Post: consensus_results.json
Post->>Post: Deduplicate against existing comments
Post->>Post: Separate HIGH/MEDIUM from LOW
Post->>GH: Post summary comment (always)
Post->>GH: Post inline comments (HIGH/MEDIUM only)
|
There was a problem hiding this comment.
2 issues found across 17 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name=".claude/skills/multi-pr-review/scripts/orchestrate_review.py">
<violation number="1" location=".claude/skills/multi-pr-review/scripts/orchestrate_review.py:325">
P2: Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.</violation>
</file>
<file name=".claude/skills/multi-pr-review/scripts/post_comment.py">
<violation number="1" location=".claude/skills/multi-pr-review/scripts/post_comment.py:159">
P2: Issues without a valid `line_start` can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| f"title=\"{issue.title}\", description=\"{issue.description}\"" | ||
| ) | ||
|
|
||
| prompt = f"""You are analyzing code review issues to identify duplicates. |
There was a problem hiding this comment.
P2: Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/multi-pr-review/scripts/orchestrate_review.py, line 325:
<comment>Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.</comment>
<file context>
@@ -250,94 +301,159 @@ async def run_sub_agent(
+ f"title=\"{issue.title}\", description=\"{issue.description}\""
+ )
+
+ prompt = f"""You are analyzing code review issues to identify duplicates.
+Multiple reviewers have identified issues in a code review. Some issues may refer to the same underlying problem, even if described differently.
</file context>
| existing_body = existing.get('body', '').lower() | ||
|
|
||
| # Same line (within tolerance) and similar title/content | ||
| if abs(existing_line - line) <= 3: |
There was a problem hiding this comment.
P2: Issues without a valid line_start can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/multi-pr-review/scripts/post_comment.py, line 159:
<comment>Issues without a valid `line_start` can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.</comment>
<file context>
@@ -133,58 +133,141 @@ def post_inline_review(repo: str, pr_number: int, commit_sha: str,
+ existing_body = existing.get('body', '').lower()
+
+ # Same line (within tolerance) and similar title/content
+ if abs(existing_line - line) <= 3:
+ # Check if title keywords appear in existing comment
+ title_words = set(title.split())
</file context>
| if abs(existing_line - line) <= 3: | |
| if line > 0 and abs(existing_line - line) <= 3: |
🔍 Multi-Agent Code Review✅ No new issues found by consensus review. 3 independent reviewers analyzed this PR. All issues that reached consensus (2+ agents agreeing) have already been commented on by previous reviewers:
Reviewer Breakdown
ℹ️ Non-consensus observations (single agent only)These observations were flagged by only one agent and don't meet the 2+ consensus threshold:
Generated by multi-agent consensus review |
| continue | ||
|
|
||
| tool_name = part.get("name", "") | ||
| tool_input = part.get("input", {}) | ||
|
|
||
| if tool_name == "TaskCreate": | ||
| # TaskCreate assigns sequential IDs starting from 1 | ||
| task_id = str(len(tasks) + 1) | ||
| subject = tool_input.get("subject", f"Task {task_id}") | ||
| tasks[task_id] = { | ||
| "subject": subject, | ||
| "status": "pending" | ||
| } | ||
|
|
||
| elif tool_name == "TaskUpdate": | ||
| task_id = tool_input.get("taskId", "") | ||
| if task_id and task_id in tasks: | ||
| if "status" in tool_input: | ||
| tasks[task_id]["status"] = tool_input["status"] | ||
| if "subject" in tool_input: | ||
| tasks[task_id]["subject"] = tool_input["subject"] | ||
| if tool_name.startswith("Task"): | ||
| task_calls.append({ | ||
| "tool_name": tool_name, | ||
| "input": part.get("input", {}) | ||
| }) | ||
|
|
||
| except json.JSONDecodeError: | ||
| continue | ||
|
|
||
| except Exception: | ||
| return {"tasks": {}, "remaining": [], "total": 0} | ||
| return [] | ||
|
|
||
| # Find remaining (non-completed) tasks | ||
| remaining = [ | ||
| (task_id, info["subject"], info["status"]) | ||
| for task_id, info in tasks.items() | ||
| if info["status"] != "completed" | ||
| ] | ||
|
|
||
| return { | ||
| "tasks": tasks, | ||
| "remaining": remaining, | ||
| "total": len(tasks) | ||
| } | ||
| return task_calls |
There was a problem hiding this comment.
🔴 Test file not updated after refactoring extract_task_state to extract_task_tool_calls
The stop-hook.py was refactored to replace extract_task_state() with extract_task_tool_calls() and change analyze_with_claude() from 2 parameters to 3 parameters. However, the test file test_stop_hook.py was NOT updated, causing test failures.
Click to expand
Breaking Changes
- Function removed:
extract_task_stateno longer exists, replaced byextract_task_tool_callswith different return type - Signature changed:
analyze_with_claude(transcript, cwd)changed toanalyze_with_claude(transcript, task_calls, cwd)
Affected Test Locations
Tests calling the removed extract_task_state function:
test_stop_hook.py:270:module.extract_task_state(str(fixture_path))test_stop_hook.py:294:module.extract_task_state(str(fixture_path))test_stop_hook.py:302:module.extract_task_state("/nonexistent/path.jsonl")test_stop_hook.py:313:module.extract_task_state(str(transcript))
Tests calling analyze_with_claude with wrong number of arguments (2 instead of 3):
test_stop_hook.py:102:module.analyze_with_claude("test transcript", str(tmp_path))test_stop_hook.py:136,154,171,187,203,216,224
Impact
Running the tests will result in AttributeError: module has no attribute 'extract_task_state' and TypeError: analyze_with_claude() missing 1 required positional argument.
(Refers to lines 25-65)
Recommendation: Update the test file test_stop_hook.py to use the new extract_task_tool_calls function and update all calls to analyze_with_claude to include the new task_calls parameter.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name=".claude/hooks/gh-permission-hook.py">
<violation number="1" location=".claude/hooks/gh-permission-hook.py:152">
P1: The new `SAFE_FALLBACK_PATTERN` allows double-quoted or unquoted `echo` arguments that can include command substitution (e.g., `$(...)` or backticks), and then strips them before injection checks. This creates a bypass where a malicious `|| echo "$(rm -rf /)"` won’t be detected. Restrict the allowed echo arguments to exclude `$`/backticks or only allow single-quoted literals.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| # Safe fallback pattern - || echo "..." is commonly used for error handling | ||
| # This pattern matches: || echo "string" or || echo 'string' or || echo WORD | ||
| # The echo command only outputs text, making this safe for fallback values | ||
| SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$') |
There was a problem hiding this comment.
P1: The new SAFE_FALLBACK_PATTERN allows double-quoted or unquoted echo arguments that can include command substitution (e.g., $(...) or backticks), and then strips them before injection checks. This creates a bypass where a malicious || echo "$(rm -rf /)" won’t be detected. Restrict the allowed echo arguments to exclude $/backticks or only allow single-quoted literals.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/hooks/gh-permission-hook.py, line 152:
<comment>The new `SAFE_FALLBACK_PATTERN` allows double-quoted or unquoted `echo` arguments that can include command substitution (e.g., `$(...)` or backticks), and then strips them before injection checks. This creates a bypass where a malicious `|| echo "$(rm -rf /)"` won’t be detected. Restrict the allowed echo arguments to exclude `$`/backticks or only allow single-quoted literals.</comment>
<file context>
@@ -143,7 +143,13 @@
+# Safe fallback pattern - || echo "..." is commonly used for error handling
+# This pattern matches: || echo "string" or || echo 'string' or || echo WORD
+# The echo command only outputs text, making this safe for fallback values
+SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$')
</file context>
| SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$') | |
| SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"$`]*"|\'[^\']*\'|[^\s$`]+)\s*$') |
#skip-bb
Summary by cubic
Upgrades the Claude PR review to include a dedicated code-health agent, smart deduplication, and an always-posted summary, while enforcing app builds before E2E tests to avoid stale binaries. Also improves stop behavior by analyzing Task* tool calls and updates the workflow to use the new multi-agent review command.
New Features
/dyad:multi-pr-review; adds “Plan to Issue” command; updates trusted commenters and E2E run permissions.Migration
npm run build.buildscript; updates docs and commands to require building before tests.Written for commit 3b5e3c7. Summary will update on new commits.