Skip to content

Improve claude configs#2361

Merged
wwwillchen merged 3 commits intodyad-sh:mainfrom
wwwillchen:claude-improvements
Jan 27, 2026
Merged

Improve claude configs#2361
wwwillchen merged 3 commits intodyad-sh:mainfrom
wwwillchen:claude-improvements

Conversation

@wwwillchen
Copy link
Copy Markdown
Collaborator

@wwwillchen wwwillchen commented Jan 27, 2026

#skip-bb


Open with Devin

Summary by cubic

Upgrades the Claude PR review to include a dedicated code-health agent, smart deduplication, and an always-posted summary, while enforcing app builds before E2E tests to avoid stale binaries. Also improves stop behavior by analyzing Task* tool calls and updates the workflow to use the new multi-agent review command.

  • New Features

    • Adds a Code Health reviewer alongside correctness-focused agents.
    • Deduplicates issues with Sonnet and skips already-commented items.
    • Posts a summary every run; inline comments only for HIGH/MEDIUM.
    • Improves stop-hook by sending Task* calls to the model and recognizing plan-mode waits.
    • Switches GitHub Action to /dyad:multi-pr-review; adds “Plan to Issue” command; updates trusted commenters and E2E run permissions.
  • Migration

    • Before any E2E run, build the app: npm run build.
    • Adds a build script; updates docs and commands to require building before tests.

Written for commit 3b5e3c7. Summary will update on new commits.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @wwwillchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the Claude configuration, primarily by standardizing the E2E testing build workflow and introducing a new command for collaborative issue planning. A major focus has been on enhancing the multi-agent PR review system, making it more intelligent with a dedicated code health agent, LLM-powered issue deduplication, and improved reporting. These changes aim to streamline development processes, improve code quality through more nuanced reviews, and ensure more robust and secure interactions with external comments.

Highlights

  • Standardized E2E Build Process: The command for building the application binary for E2E tests has been standardized from npm run pre:e2e to npm run build across various documentation and configuration files. Crucially, new 'IMPORTANT' notes have been added to clarify that E2E tests run against the built binary and require a rebuild after application code changes.
  • New 'Plan to Issue' Command: A new Claude command, dyad/plan-to-issue, has been introduced. This command facilitates collaborative planning with the user, allowing for the creation of a comprehensive plan that can then be converted into a GitHub issue, detailing summary, background, implementation steps, files to modify, and testing approach.
  • Enhanced Multi-Agent PR Review Skill: The dyad:multi-pr-review skill has undergone a significant upgrade. It now includes a dedicated 'Code Health' sub-agent focused on maintainability, uses an LLM (Claude Sonnet) for semantic deduplication of issues, and features a more structured and informative summary comment format with collapsible low-priority issues. It also fetches existing comments to avoid posting duplicates.
  • Improved Claude Stop Hook Logic: The stop-hook.py script has been refactored to use AI analysis for determining when Claude should stop. Instead of deterministic task state parsing, it now collects all Task* tool calls and sends them inline with the conversation context to the LLM. New conditions for stopping have been added, including considering incomplete tasks and allowing stops when in 'plan mode' awaiting user input.
  • Updated Trusted Authors for PR Comments: The list of trusted authors for PR comments has been expanded to include devin-ai-integration. Additionally, the security warning for untrusted authors has been clarified to explicitly state that all further processing of threads from untrusted authors should be skipped.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/claude-pr-review.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances Claude-based code review and agent interaction capabilities, introducing a dedicated code health agent, smart LLM-based issue deduplication, and improved summary reporting. It also refactors the stop-hook to delegate task analysis to the model and enforces build steps before E2E tests. A security audit identified a potential prompt injection vulnerability in the LLM interaction logic within the review orchestration scripts, which could allow untrusted input to manipulate LLM tools. Remediation should focus on better prompt engineering techniques such as delimiters and explicit data-handling instructions. Additionally, consider improving error visibility in the stop-hook script.

Comment on lines +325 to +340
prompt = f"""You are analyzing code review issues to identify duplicates.

Multiple reviewers have identified issues in a code review. Some issues may refer to the same underlying problem, even if described differently.

def aggregate_issues(
Group the following issues by whether they refer to the SAME underlying problem. Issues should be grouped together if:
- They point to the same file and similar line ranges (within ~10 lines)
- They describe the same fundamental issue (even if worded differently)
- They would result in the same fix

Do NOT group issues that:
- Are in different files
- Are in the same file but describe different problems
- Point to significantly different line ranges (>20 lines apart)

Issues to analyze:
{chr(10).join(issue_descriptions)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The group_similar_issues function constructs a prompt for the deduplication model by directly interpolating issue descriptions. These descriptions are derived from the code being reviewed, which is untrusted. An attacker could craft malicious comments in the code that, when picked up by the review agents, inject instructions into the deduplication prompt to manipulate the grouping of issues (e.g., to hide certain findings). It is recommended to use clear delimiters to separate the issue descriptions from the instructions and explicitly instruct the model to treat the descriptions as data.

Comment on lines 62 to +63
except Exception:
return {"tasks": {}, "remaining": [], "total": 0}
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The broad except Exception block silently ignores all errors that might occur during transcript processing, such as file I/O errors. This could lead to the hook incorrectly approving a stop if the transcript file is unreadable, as it would appear that no tasks were ever created. It's better to log these unexpected errors to stderr for visibility and debugging, while still returning an empty list to prevent the hook from crashing.

Suggested change
except Exception:
return {"tasks": {}, "remaining": [], "total": 0}
return []
except Exception as e:
print(f"Warning: Failed to extract task tool calls from {transcript_path}: {e}", file=sys.stderr)
return []

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

This PR enhances the Claude Code configuration with several key improvements:

Multi-Agent PR Review System:

  • Added a dedicated Code Health reviewer agent focused on maintainability alongside the existing correctness-focused agents
  • Implemented smart deduplication using Sonnet to group similar issues across agents
  • Enhanced comment posting to always show a summary, with inline comments only for HIGH/MEDIUM severity issues
  • Added filtering against existing PR comments to prevent duplicate reviews

Stop Hook Improvements:

  • Improved stop-hook behavior by sending Task* tool calls directly to the LLM for analysis instead of parsing deterministically
  • Added recognition of plan-mode waits where Claude is correctly waiting for user input

E2E Testing Workflow:

  • Added build script as an alias to pre:e2e for better discoverability
  • Updated all documentation and commands to require npm run build before E2E tests with stronger warnings
  • Added permissions for PLAYWRIGHT_HTML_OPEN=never test variants

New Features:

  • Added /dyad:plan-to-issue command for converting approved plans into GitHub issues
  • Updated GitHub Action workflow to use /dyad:multi-pr-review and increased max output tokens to 48000

The changes are well-structured and focused on improving the Claude Code development experience.

Confidence Score: 5/5

  • This PR is safe to merge - all changes are configuration and documentation updates
  • Score reflects the non-production nature of changes (Claude configs, hooks, documentation) with no risk to the Electron application code. Changes follow consistent patterns and improve the development workflow.
  • No files require special attention - all changes are configuration improvements

Important Files Changed

Filename Overview
.claude/hooks/stop-hook.py Improved stop hook by sending Task* tool calls to LLM and recognizing plan-mode waits
.claude/skills/multi-pr-review/scripts/orchestrate_review.py Added code health agent, LLM-based deduplication, and existing comment filtering
.claude/skills/multi-pr-review/scripts/post_comment.py Enhanced comment posting with deduplication and always-post summary behavior
.github/workflows/claude-pr-review.yml Updated to use /dyad:multi-pr-review and increased max output tokens to 48000
package.json Added build script aliasing pre:e2e for consistency with documentation

Sequence Diagram

sequenceDiagram
    participant GH as GitHub PR
    participant Workflow as Claude Code Action
    participant Orch as orchestrate_review.py
    participant Agent1 as Code Health Agent
    participant Agent2 as Default Agent 1
    participant Agent3 as Default Agent 2
    participant Sonnet as Sonnet (Dedup)
    participant Post as post_comment.py

    GH->>Workflow: PR opened/updated
    Workflow->>Orch: /dyad:multi-pr-review {PR_NUM}
    
    Orch->>GH: Fetch PR diff
    Orch->>GH: Fetch existing comments
    Orch->>Orch: Parse diff into files
    Orch->>Orch: Create 3 shuffled orderings
    
    par Parallel Review (3 agents)
        Orch->>Agent1: Review (code health focus, order 1)
        Agent1-->>Orch: Issues JSON
    and
        Orch->>Agent2: Review (default focus, order 2)
        Agent2-->>Orch: Issues JSON
    and
        Orch->>Agent3: Review (default focus, order 3)
        Agent3-->>Orch: Issues JSON
    end
    
    Orch->>Sonnet: Group similar issues
    Sonnet-->>Orch: Issue groups
    Orch->>Orch: Consensus voting (2+ agents)
    Orch->>Orch: Filter by severity (MEDIUM+)
    
    Orch->>Post: consensus_results.json
    Post->>Post: Deduplicate against existing comments
    Post->>Post: Separate HIGH/MEDIUM from LOW
    Post->>GH: Post summary comment (always)
    Post->>GH: Post inline comments (HIGH/MEDIUM only)
Loading

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional flags.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 17 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".claude/skills/multi-pr-review/scripts/orchestrate_review.py">

<violation number="1" location=".claude/skills/multi-pr-review/scripts/orchestrate_review.py:325">
P2: Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.</violation>
</file>

<file name=".claude/skills/multi-pr-review/scripts/post_comment.py">

<violation number="1" location=".claude/skills/multi-pr-review/scripts/post_comment.py:159">
P2: Issues without a valid `line_start` can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

f"title=\"{issue.title}\", description=\"{issue.description}\""
)

prompt = f"""You are analyzing code review issues to identify duplicates.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/multi-pr-review/scripts/orchestrate_review.py, line 325:

<comment>Potential prompt injection vulnerability: Issue descriptions derived from code being reviewed are directly interpolated into the deduplication prompt without sanitization or clear delimiters. An attacker could craft malicious code comments that inject instructions to manipulate issue grouping (e.g., hiding security findings). Consider using clear delimiters (like XML tags) to separate data from instructions and explicitly instruct the model to treat descriptions as data only.</comment>

<file context>
@@ -250,94 +301,159 @@ async def run_sub_agent(
+            f"title=\"{issue.title}\", description=\"{issue.description}\""
+        )
+
+    prompt = f"""You are analyzing code review issues to identify duplicates.
 
+Multiple reviewers have identified issues in a code review. Some issues may refer to the same underlying problem, even if described differently.
</file context>
Fix with Cubic

existing_body = existing.get('body', '').lower()

# Same line (within tolerance) and similar title/content
if abs(existing_line - line) <= 3:
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Issues without a valid line_start can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/multi-pr-review/scripts/post_comment.py, line 159:

<comment>Issues without a valid `line_start` can be incorrectly filtered as duplicates because the duplicate check compares existing lines against 0. Guard the line comparison so only line-aware issues are deduplicated.</comment>

<file context>
@@ -133,58 +133,141 @@ def post_inline_review(repo: str, pr_number: int, commit_sha: str,
+                existing_body = existing.get('body', '').lower()
+
+                # Same line (within tolerance) and similar title/content
+                if abs(existing_line - line) <= 3:
+                    # Check if title keywords appear in existing comment
+                    title_words = set(title.split())
</file context>
Suggested change
if abs(existing_line - line) <= 3:
if line > 0 and abs(existing_line - line) <= 3:
Fix with Cubic

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Multi-Agent Code Review

✅ No new issues found by consensus review.

3 independent reviewers analyzed this PR. All issues that reached consensus (2+ agents agreeing) have already been commented on by previous reviewers:

Issue File Status
Silent exception swallowing .claude/hooks/stop-hook.py:63 Already commented
Prompt injection in dedup prompt .claude/skills/multi-pr-review/scripts/orchestrate_review.py:325-340 Already commented
Line 0 false positive in duplicate detection .claude/skills/multi-pr-review/scripts/post_comment.py:159 Already commented

Reviewer Breakdown

Agent Focus Issues Found
Agent 1 Code Health 4 issues
Agent 2 Correctness 3 issues
Agent 3 Correctness 4 issues
ℹ️ Non-consensus observations (single agent only)

These observations were flagged by only one agent and don't meet the 2+ consensus threshold:

  • Removal of deterministic task check (stop-hook.py) - Agent 3 noted that removing the deterministic check for incomplete tasks means relying entirely on LLM analysis, which adds latency and cost. However, the change provides the LLM with full task context for more nuanced analysis.

  • SAFE_FALLBACK_PATTERN allows arbitrary echo content (gh-permission-hook.py) - Agent 1 noted the pattern allows || echo WORD where WORD can be any non-whitespace. Risk is low since echo only outputs text.

Generated by multi-agent consensus review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 6 additional flags in Devin Review.

Open in Devin Review

Comment on lines 50 to +65
continue

tool_name = part.get("name", "")
tool_input = part.get("input", {})

if tool_name == "TaskCreate":
# TaskCreate assigns sequential IDs starting from 1
task_id = str(len(tasks) + 1)
subject = tool_input.get("subject", f"Task {task_id}")
tasks[task_id] = {
"subject": subject,
"status": "pending"
}

elif tool_name == "TaskUpdate":
task_id = tool_input.get("taskId", "")
if task_id and task_id in tasks:
if "status" in tool_input:
tasks[task_id]["status"] = tool_input["status"]
if "subject" in tool_input:
tasks[task_id]["subject"] = tool_input["subject"]
if tool_name.startswith("Task"):
task_calls.append({
"tool_name": tool_name,
"input": part.get("input", {})
})

except json.JSONDecodeError:
continue

except Exception:
return {"tasks": {}, "remaining": [], "total": 0}
return []

# Find remaining (non-completed) tasks
remaining = [
(task_id, info["subject"], info["status"])
for task_id, info in tasks.items()
if info["status"] != "completed"
]

return {
"tasks": tasks,
"remaining": remaining,
"total": len(tasks)
}
return task_calls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Test file not updated after refactoring extract_task_state to extract_task_tool_calls

The stop-hook.py was refactored to replace extract_task_state() with extract_task_tool_calls() and change analyze_with_claude() from 2 parameters to 3 parameters. However, the test file test_stop_hook.py was NOT updated, causing test failures.

Click to expand

Breaking Changes

  1. Function removed: extract_task_state no longer exists, replaced by extract_task_tool_calls with different return type
  2. Signature changed: analyze_with_claude(transcript, cwd) changed to analyze_with_claude(transcript, task_calls, cwd)

Affected Test Locations

Tests calling the removed extract_task_state function:

  • test_stop_hook.py:270: module.extract_task_state(str(fixture_path))
  • test_stop_hook.py:294: module.extract_task_state(str(fixture_path))
  • test_stop_hook.py:302: module.extract_task_state("/nonexistent/path.jsonl")
  • test_stop_hook.py:313: module.extract_task_state(str(transcript))

Tests calling analyze_with_claude with wrong number of arguments (2 instead of 3):

  • test_stop_hook.py:102: module.analyze_with_claude("test transcript", str(tmp_path))
  • test_stop_hook.py:136, 154, 171, 187, 203, 216, 224

Impact

Running the tests will result in AttributeError: module has no attribute 'extract_task_state' and TypeError: analyze_with_claude() missing 1 required positional argument.

(Refers to lines 25-65)

Recommendation: Update the test file test_stop_hook.py to use the new extract_task_tool_calls function and update all calls to analyze_with_claude to include the new task_calls parameter.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".claude/hooks/gh-permission-hook.py">

<violation number="1" location=".claude/hooks/gh-permission-hook.py:152">
P1: The new `SAFE_FALLBACK_PATTERN` allows double-quoted or unquoted `echo` arguments that can include command substitution (e.g., `$(...)` or backticks), and then strips them before injection checks. This creates a bypass where a malicious `|| echo "$(rm -rf /)"` won’t be detected. Restrict the allowed echo arguments to exclude `$`/backticks or only allow single-quoted literals.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

# Safe fallback pattern - || echo "..." is commonly used for error handling
# This pattern matches: || echo "string" or || echo 'string' or || echo WORD
# The echo command only outputs text, making this safe for fallback values
SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$')
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The new SAFE_FALLBACK_PATTERN allows double-quoted or unquoted echo arguments that can include command substitution (e.g., $(...) or backticks), and then strips them before injection checks. This creates a bypass where a malicious || echo "$(rm -rf /)" won’t be detected. Restrict the allowed echo arguments to exclude $/backticks or only allow single-quoted literals.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/hooks/gh-permission-hook.py, line 152:

<comment>The new `SAFE_FALLBACK_PATTERN` allows double-quoted or unquoted `echo` arguments that can include command substitution (e.g., `$(...)` or backticks), and then strips them before injection checks. This creates a bypass where a malicious `|| echo "$(rm -rf /)"` won’t be detected. Restrict the allowed echo arguments to exclude `$`/backticks or only allow single-quoted literals.</comment>

<file context>
@@ -143,7 +143,13 @@
+# Safe fallback pattern - || echo "..." is commonly used for error handling
+# This pattern matches: || echo "string" or || echo 'string' or || echo WORD
+# The echo command only outputs text, making this safe for fallback values
+SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$')
 
 
</file context>
Suggested change
SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"]*"|\'[^\']*\'|\S+)\s*$')
SAFE_FALLBACK_PATTERN = re.compile(r'\|\|\s*echo\s+(?:"[^"$`]*"|\'[^\']*\'|[^\s$`]+)\s*$')
Fix with Cubic

@wwwillchen wwwillchen merged commit 1dae4f9 into dyad-sh:main Jan 27, 2026
2 checks passed
@wwwillchen wwwillchen deleted the claude-improvements branch January 27, 2026 20:57
@github-actions github-actions bot locked and limited conversation to collaborators Jan 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant