Skip to content

Conversation

@celebdor
Copy link
Collaborator

What this PR does / why we need it:

This PR adds comprehensive weekly PR report generation capability for the HyperShift project. It provides an automated way to:

  • Track weekly PR activity across openshift/hypershift and openshift-eng/ai-helpers repositories
  • Understand the connection between PRs and strategic OCPSTRAT initiatives through complete Jira hierarchy (Ticket → Epic → OCPSTRAT)
  • Analyze review patterns and merge timing metrics
  • Generate markdown reports with clickable links for weekly status updates

The tool uses async GitHub GraphQL API calls for fast PR fetching, automatically extracts Jira tickets from PR titles/bodies, queries Jira in parallel for hierarchy enrichment, and generates comprehensive markdown reports grouped by OCPSTRAT strategic initiatives.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-1977

Special notes for your reviewer:

Key Features

  • Performance: Async parallel API calls complete in ~2-3 seconds for 28 PRs + 20+ Jira tickets
  • Jira Integration: Automatic extraction and parallel hierarchy queries (Ticket → Epic → OCPSTRAT)
  • Markdown Links: All PRs and Jira tickets are clickable links
  • Claude Code Integration: New /weekly-pr-report [since-date] slash command

Files Added

  • contrib/repo_metrics/weekly_pr_report.py - Report generator (576 lines)
  • .claude/commands/weekly-pr-report.md - Slash command documentation
  • contrib/repo_metrics/README.md - Updated with usage instructions

Example Output

Report includes:

  • OCPSTRAT-grouped PRs showing strategic alignment
  • Complete Jira hierarchy for each PR
  • Timing metrics (Draft→Ready, Ready→Merge)
  • Review activity analysis
  • All with clickable markdown links

Testing Done

  • Generated report for 2025-11-13 to 2025-11-21 (28 PRs, 21 Jira tickets)
  • Verified all markdown links work correctly
  • Verified async performance (~2.7 seconds)
  • Tested both with and without Jira hierarchy cache

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 21, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 21, 2025

@celebdor: This pull request references CNTRLPLANE-1977 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:

This PR adds comprehensive weekly PR report generation capability for the HyperShift project. It provides an automated way to:

  • Track weekly PR activity across openshift/hypershift and openshift-eng/ai-helpers repositories
  • Understand the connection between PRs and strategic OCPSTRAT initiatives through complete Jira hierarchy (Ticket → Epic → OCPSTRAT)
  • Analyze review patterns and merge timing metrics
  • Generate markdown reports with clickable links for weekly status updates

The tool uses async GitHub GraphQL API calls for fast PR fetching, automatically extracts Jira tickets from PR titles/bodies, queries Jira in parallel for hierarchy enrichment, and generates comprehensive markdown reports grouped by OCPSTRAT strategic initiatives.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-1977

Special notes for your reviewer:

Key Features

  • Performance: Async parallel API calls complete in ~2-3 seconds for 28 PRs + 20+ Jira tickets
  • Jira Integration: Automatic extraction and parallel hierarchy queries (Ticket → Epic → OCPSTRAT)
  • Markdown Links: All PRs and Jira tickets are clickable links
  • Claude Code Integration: New /weekly-pr-report [since-date] slash command

Files Added

  • contrib/repo_metrics/weekly_pr_report.py - Report generator (576 lines)
  • .claude/commands/weekly-pr-report.md - Slash command documentation
  • contrib/repo_metrics/README.md - Updated with usage instructions

Example Output

Report includes:

  • OCPSTRAT-grouped PRs showing strategic alignment
  • Complete Jira hierarchy for each PR
  • Timing metrics (Draft→Ready, Ready→Merge)
  • Review activity analysis
  • All with clickable markdown links

Testing Done

  • Generated report for 2025-11-13 to 2025-11-21 (28 PRs, 21 Jira tickets)
  • Verified all markdown links work correctly
  • Verified async performance (~2.7 seconds)
  • Tested both with and without Jira hierarchy cache

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Adds a new PR Report Generator: a Python script that collects GitHub PR data (GraphQL), optionally enriches with cached Jira hierarchy, produces a Markdown weekly report and raw JSON, plus documentation and a Claude command definition.

Changes

Cohort / File(s) Summary
Documentation & Command
contrib/repo_metrics/README.md, .claude/commands/pr-report.md
Adds README section and a Claude command spec describing the PR Report Generator, usage examples, parameters (since-date), output format, and functional workflow.
PR Report Generator Implementation
contrib/repo_metrics/weekly_pr_report.py
Adds new PRReportGenerator script: fetches contributors and PRs (GraphQL) with pagination, supports asyncio/aiohttp with requests fallback, processes PR metadata (reviewers, draft → ready, time-to-merge, labels, Jira tickets), loads optional /tmp/jira_hierarchy.json, generates detailed Markdown report and saves raw JSON. Exposes PRReportGenerator and main() entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Review async vs sync HTTP fallback and error handling (aiohttp usage, requests fallback warning).
  • Verify GraphQL pagination and field selection for PRs/contributors.
  • Inspect Jira hierarchy cache loading and mapping logic.
  • Validate timing calculations (draft→ready, ready→merge), reviewer/approver extraction, and Markdown formatting for report sections.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added area/ai Indicates the PR includes changes related to AI - Claude agents, Cursor rules, etc. and removed do-not-merge/needs-area labels Nov 21, 2025
@celebdor celebdor marked this pull request as ready for review November 21, 2025 16:08
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@openshift-ci openshift-ci bot requested review from enxebre and muraee November 21, 2025 16:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
.claude/commands/update-ocpstrat-weekly-status.md (1)

43-48: Add languages to fenced code blocks to satisfy markdownlint

markdownlint (MD040) is flagging several fenced code blocks without a language (component list example, status template, and usage examples). Adding explicit languages will quiet the linter and improve rendering:

  • Component list / plain text templates: use ```text
  • CLI examples: use ```bash

Applies to the blocks starting around lines 43, 134, 205, 210, 215, 220, 225, 230, and 235.

Also applies to: 134-143, 205-237

.claude/commands/weekly-pr-report.md (1)

11-13: Specify a language for the usage code fence (MD040)

The usage block:

/weekly-pr-report [since-date]

is flagged by markdownlint (MD040) for missing a language. Consider marking it as bash (for consistency with other examples) or text:

```bash
/weekly-pr-report [since-date]

</blockquote></details>
<details>
<summary>contrib/repo_metrics/weekly_pr_report.py (5)</summary><blockquote>

`265-293`: **Jira hierarchy load relies on `/tmp/jira_hierarchy.json` without validation**

`fetch_jira_hierarchy` currently:

- Logs “Fetching Jira hierarchy…”, but the ticket loop is effectively a no‑op (`pass`).
- Then unconditionally tries to load `/tmp/jira_hierarchy.json`, falling back to an empty dict with a warning if the file is missing.

Two considerations:

1. The log message is a bit misleading: no fetch actually happens here; all enrichment depends on an external process populating `/tmp/jira_hierarchy.json`.
2. Reading a fixed, world‑writable `/tmp` path means a stale or malicious file could influence how PRs are grouped in the report (even if it’s “just” reporting logic).

If you want to harden this:

- Clarify the log message (e.g., “Loading Jira hierarchy cache from /tmp/jira_hierarchy.json …”).
- Optionally make the cache path configurable via env var or argument.
- Validate the JSON structure (e.g. ensure it’s a dict of ticket keys) before trusting it.

For now this is more about correctness/clarity than a critical security issue, but worth tightening.

---

`298-400`: **Minor report-generation cleanups for maintainability**

A few small issues in `generate_report` are flagged by Ruff and can be tidied up without behavior change:

- Lines like `f.write(f"# Weekly PR Report: openshift/hypershift\n")`, `f.write(f"- **openshift/release:** 0 PRs\n\n")`, and `f.write(f"- Ready: Created ready\n")` are f-strings without placeholders. These can be plain strings.
- In the ungrouped PRs section, `pr_type` is assigned but never used (line 355), and can be removed or actually surfaced in the output if you want to show “bug fix” vs “feature”.

Example adjustments:

```diff
-            f.write(f"# Weekly PR Report: openshift/hypershift\n")
+            f.write("# Weekly PR Report: openshift/hypershift\n")
...
-            f.write(f"- **openshift/release:** 0 PRs\n\n")
+            f.write("- **openshift/release:** 0 PRs\n\n")
...
-                pr_type = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature"
-                f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n")
+                pr_kind = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature"
+                f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]} {pr_kind}) - {self._linkify_jira_tickets(pr['title'])}\n")
...
-                    f.write(f"- Ready: Created ready\n")
+                    f.write("- Ready: Created ready\n")

None of this is urgent, but it will quiet lint warnings and make the code a bit cleaner.

Also applies to: 443-451


464-501: Tighten _generate_impact_statement and _extract_component (unused vars, style)

Two minor maintainability points:

  • In _generate_impact_statement, title and description are assigned but never used (title at line 466, description at 477). You can drop them or incorporate them into the heuristics.
  • _extract_component uses multiple if ...: return ... on single lines (lines 507–514), which Ruff flags (E701) and makes diffs less readable.

If you don’t plan to extend the logic soon, consider:

-        title = pr['title'].lower()
-        labels = [label.lower() for label in pr['labels']]
+        labels = [label.lower() for label in pr['labels']]
...
-                    description = jira_info.get('description', '')
+                    # description is available if you want to enrich impact text further
+                    # description = jira_info.get('description', '')

and in _extract_component:

-        # Common patterns
-        if 'aws' in title: return 'AWS platform support'
-        if 'azure' in title or 'aro' in title: return 'Azure platform support'
-        if 'gcp' in title: return 'GCP platform support'
-        if 'nodepool' in title: return 'NodePool management'
-        if 'hosted' in title or 'hcp' in title: return 'hosted control plane'
-        if 'cli' in title: return 'CLI tooling'
-        if 'operator' in title: return 'operator functionality'
-        if 'test' in title: return 'testing infrastructure'
+        # Common patterns
+        if 'aws' in title:
+            return 'AWS platform support'
+        if 'azure' in title or 'aro' in title:
+            return 'Azure platform support'
+        if 'gcp' in title:
+            return 'GCP platform support'
+        if 'nodepool' in title:
+            return 'NodePool management'
+        if 'hosted' in title or 'hcp' in title:
+            return 'hosted control plane'
+        if 'cli' in title:
+            return 'CLI tooling'
+        if 'operator' in title:
+            return 'operator functionality'
+        if 'test' in title:
+            return 'testing infrastructure'

These are purely cosmetic but will keep Ruff happy and make future edits easier.

Also applies to: 502-527


37-49: Consider narrowing the exception in _get_gh_token (optional)

_get_gh_token currently catches a broad Exception and exits:

        except Exception as e:
            print(f"Error getting GitHub token: {e}")
            sys.exit(1)

For a CLI this is acceptable, but if you want more precise diagnostics, you could:

  • Catch subprocess.CalledProcessError and FileNotFoundError separately.
  • Include a hint when gh is not installed vs. when the user isn’t authenticated.

Not mandatory, but would improve debuggability when someone runs the tool on a fresh machine.


92-162: I'll fix the script syntax—the gh search prs command doesn't use --search, it builds the query from individual flags like --merged-at for date filtering.

Now let me examine the actual code file to understand the context better:

Based on my verification, the concern in the original review comment is valid and well-founded. The GitHub GraphQL API requires supplying a first or last argument on any connection with values between 1 and 100, and both the REST API and GraphQL API cap the number of items returned from a search query at 1,000 results.

Critically, a real-world case shows that pagination through search results only yields 1,000 Pull Requests total despite the issueCount reporting 2,129 results—this occurred after 10 pages (10 × 100 results) due to GitHub's API limit on the search endpoint.

The code does not implement pagination and will silently drop results beyond the first 100 (or 1000 max for search). This is a legitimate concern for active repositories.


GraphQL search is limited to first 100 PRs; implement pagination or document the constraint

fetch_prs_graphql uses search(query: $searchQuery, type: ISSUE, first: 100) without pagination via pageInfo.hasNextPage and endCursor. GitHub's API limits search results to 1,000 total, so for repositories with frequent merges or when widening the date range, data will silently drop from the report beyond the first page.

Recommended fixes:

  • Add pagination loop over search.pageInfo.hasNextPage / endCursor, or
  • Explicitly document the weekly time window assumption and add a log warning when len(prs) == 100 to surface the limitation.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3c81d1b and 93a2ead.

📒 Files selected for processing (4)
  • .claude/commands/update-ocpstrat-weekly-status.md (1 hunks)
  • .claude/commands/weekly-pr-report.md (1 hunks)
  • contrib/repo_metrics/README.md (1 hunks)
  • contrib/repo_metrics/weekly_pr_report.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • contrib/repo_metrics/README.md
  • contrib/repo_metrics/weekly_pr_report.py
🪛 LanguageTool
.claude/commands/update-ocpstrat-weekly-status.md

[style] ~167-~167: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ... activity - Proposed status update Ask if they want to: - Proceed with your proposed update - ...

(IF_YOU_WANT)

🪛 markdownlint-cli2 (0.18.1)
.claude/commands/weekly-pr-report.md

11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

.claude/commands/update-ocpstrat-weekly-status.md

43-43: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


134-134: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


205-205: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


210-210: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


215-215: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


220-220: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


225-225: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


230-230: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


235-235: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.5)
contrib/repo_metrics/weekly_pr_report.py

41-41: Starting a process with a partial executable path

(S607)


47-47: Do not catch blind exception: Exception

(BLE001)


73-73: Probable use of requests call without timeout

(S113)


149-149: Probable use of requests call without timeout

(S113)


283-283: Do not catch blind exception: Exception

(BLE001)


288-288: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"

(S108)


300-300: f-string without any placeholders

Remove extraneous f prefix

(F541)


315-315: f-string without any placeholders

Remove extraneous f prefix

(F541)


355-355: Local variable pr_type is assigned to but never used

Remove assignment to unused variable pr_type

(F841)


449-449: f-string without any placeholders

Remove extraneous f prefix

(F541)


466-466: Local variable title is assigned to but never used

Remove assignment to unused variable title

(F841)


477-477: Local variable description is assigned to but never used

Remove assignment to unused variable description

(F841)


507-507: Multiple statements on one line (colon)

(E701)


508-508: Multiple statements on one line (colon)

(E701)


509-509: Multiple statements on one line (colon)

(E701)


510-510: Multiple statements on one line (colon)

(E701)


511-511: Multiple statements on one line (colon)

(E701)


512-512: Multiple statements on one line (colon)

(E701)


513-513: Multiple statements on one line (colon)

(E701)


514-514: Multiple statements on one line (colon)

(E701)


563-563: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"

(S108)


564-564: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"

(S108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request

Comment on lines 51 to 98
async def fetch_repository_contributors(self, repo_owner: str, repo_name: str) -> Set[str]:
"""Fetch repository contributors using GitHub REST API"""
headers = {
"Authorization": f"Bearer {self.github_token}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28"
}

contributors = set()
page = 1
per_page = 100

while True:
url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"

if HAS_AIOHTTP:
async with aiohttp.ClientSession() as session:
async with session.get(url, headers=headers) as response:
if response.status != 200:
break
data = await response.json()
else:
response = requests.get(url, headers=headers)
if response.status_code != 200:
break
data = response.json()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeouts to synchronous requests calls to avoid indefinite hangs

In the synchronous fallback paths:

  • fetch_repository_contributors uses requests.get(...) without a timeout (line 73).
  • fetch_prs_graphql uses requests.post(...) without a timeout (line 149).

A slow or unresponsive GitHub endpoint will block the entire script indefinitely. Even though this is a CLI tool, adding modest timeouts improves robustness with minimal complexity.

Example tightening:

-            response = requests.get(url, headers=headers)
+            response = requests.get(url, headers=headers, timeout=10)
...
-            response = requests.post(
+            response = requests.post(
                 'https://api.github.com/graphql',
                 json={"query": query, "variables": variables},
-                headers=headers
+                headers=headers,
+                timeout=10,
             )

You can tune the timeout value as needed, but having one is better than none.

Also applies to: 135-155

🧰 Tools
🪛 Ruff (0.14.5)

73-73: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In contrib/repo_metrics/weekly_pr_report.py around lines 51-77 and 135-155, the
synchronous fallback calls to requests.get and requests.post lack timeouts and
can hang indefinitely; update these calls to include a modest timeout (e.g.,
timeout=10) and propagate/handle requests.exceptions.Timeout and
requests.exceptions.RequestException as appropriate (catch and break/return or
log and continue) so the CLI does not block on slow/unresponsive GitHub
endpoints.


Parse the command arguments to check for `--component` parameter:

1. **If `--component` is provided:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if you tried putting in some other flag not mentioned here and seeing if it pulls it into JQL query as well. I've found Claude will accept more than just the defined flags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not

**IMPORTANT**: Only fetch the fields you need to save context:
- Use `fields=summary,issuelinks,comment,customfield_12320841` when getting issue details
- Use `expand=changelog` to get field update history
- Set `comment_limit=20` to limit comment history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it pull the latest 20 comments? I haven't seen many Jira tickets with that many comments but they do exist. Just wondering if this will just pull the first 20 even if they are not the 20 most recent comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll change that

- **Efficiency**: Don't fetch all fields (`*all`) - only get what you need. Always use `expand=changelog` to get update history
- **User Input**: Accept user corrections and exact text for status updates
- **Skip Issues**: Allow user to skip issues when requested, especially those recently updated
- **Context Labels**: Note if issues have labels like `aro`, `rosa-hcp`, or `no_core_payload` that indicate non-core-payload work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the aro and rosa-hcp labels necessarily mean there is no CPO work involved?

arguments: <since-date>
---

# Weekly PR Report Generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be called a weekly PR report generator when you can actually pull further back than a week?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Renamed the command from `/weekly-pr-report` to `/pr-report` since it supports custom date ranges. Also improved the argument hints to show the expected format.


AI-assisted response via Claude Code

while True:
url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"

if HAS_AIOHTTP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CC review -

1. Session Creation Anti-Pattern (Lines 66-70, 141-147)

  The code creates a new session for each paginated request:

  # INCORRECT - creates new session per page
  if HAS_AIOHTTP:
      async with aiohttp.ClientSession() as session:
          async with session.get(url, headers=headers) as response:
              ...

  Problem: This defeats connection pooling and wastes resources. Each request creates and tears down TCP connections.

  Fix: Create session once at the method level:

  async def fetch_repository_contributors(self, repo_owner: str, repo_name: str) -> Set[str]:
      """Fetch repository contributors using GitHub REST API"""
      headers = {...}
      contributors = set()
      page = 1
      per_page = 100

      if HAS_AIOHTTP:
          async with aiohttp.ClientSession() as session:
              while True:
                  url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"
                  async with session.get(url, headers=headers) as response:
                      if response.status != 200:
                          break
                      data = await response.json()
                      # ... process data
      else:
          while True:
              url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"
              response = requests.get(url, headers=headers)
              # ... sync logic

  Same issue in fetch_prs_graphql() (lines 141-147).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Refactored to create the aiohttp.ClientSession once at the method level and reuse it for all paginated requests. This enables proper connection pooling.


AI-assisted response via Claude Code

self.prs = hypershift_prs + filtered_ai_helpers
print(f"Found {len(self.prs)} PRs ({len(hypershift_prs)} hypershift, {len(filtered_ai_helpers)} ai-helpers)")

async def fetch_jira_hierarchy(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CC review -

  2. Unused/Placeholder Jira Logic (Lines 265-292)

  async def fetch_jira_hierarchy(self):
      """Fetch Jira hierarchy for all tickets in parallel using MCP tool"""
      print("Fetching Jira hierarchy...")

      all_tickets = set()
      for pr in self.prs:
          all_tickets.update(pr['jiraTickets'])

      print(f"Found {len(all_tickets)} unique Jira tickets")

      # For now, use the MCP tool via subprocess
      # In a future optimization, we could call the Jira REST API directly
      for ticket in sorted(all_tickets):
          try:
              # This is a placeholder - in practice we'd use the MCP tool
              # For now, load from the existing cache if available
              pass
          except Exception as e:
              print(f"Error fetching {ticket}: {e}")

  Problem: This doesn't actually fetch anything. It just loads from cache if available. Misleading function name and docstring.

  Fix: Either implement the actual Jira fetching or rename to load_jira_hierarchy_cache() and update docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed function to load_jira_hierarchy_cache() and updated docstring to accurately reflect that it loads from cache rather than fetching from Jira directly.


AI-assisted response via Claude Code

ready_to_merge_hours = (merged_at - ready_at).total_seconds() / 3600

# Extract Jira tickets
jira_tickets = re.findall(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CC review -

  4. Regex Jira Ticket Extraction Too Broad (Lines 201-204)

  jira_tickets = re.findall(
      r'(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+',
      pr['title'] + ' ' + (pr['body'] or '')
  )

  Problem:
  - Concatenating title + body without any delimiters could create false matches at boundary
  - No deduplication of tickets (fixed later with list(set(...)) but inefficient)

  Fix:
  text = f"{pr['title']}\n{pr['body'] or ''}"
  jira_tickets = list(set(re.findall(
      r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b',
      text
  )))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added word boundaries (\b) to the regex pattern and changed text concatenation to use a newline delimiter.


AI-assisted response via Claude Code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CC review -

  8. Magic Numbers

  Lines 61, 98, 111, 117, 365-367: Hardcoded limits (100, 50, 20) without constants.

  Fix:
  GITHUB_CONTRIBUTORS_PER_PAGE = 100
  GITHUB_PR_SEARCH_LIMIT = 100
  GITHUB_REVIEWS_LIMIT = 50
  GITHUB_LABELS_LIMIT = 20

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Extracted magic numbers into named constants at module level including DEFAULT_DAYS_AGO = 7 for the default report period.


AI-assisted response via Claude Code

import os
import re
import sys
from datetime import datetime, timezone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timezone isn't used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the unused timezone import (now using timedelta for dynamic date calculation).


AI-assisted response via Claude Code

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: celebdor
Once this PR has been reviewed and has the lgtm label, please assign jparrill for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
contrib/repo_metrics/weekly_pr_report.py (3)

355-356: Unused pr_type variable suggests incomplete implementation.

The pr_type is computed but never used. If the intent was to include it in the output, this should be completed; otherwise, remove the dead code.

-                    pr_type = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature"
-                    f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n")
+                    f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n")

464-500: Unused variables title and description in impact generation.

Lines 466 and 477 assign title and description but never use them. Either utilize these for richer impact statements or remove the dead assignments.

 def _generate_impact_statement(self, pr: Dict) -> str:
     """Generate an OCPSTRAT impact statement based on PR data and Jira info"""
-    title = pr['title'].lower()
     labels = [label.lower() for label in pr['labels']]
     
     # Try to use Jira ticket description for better context
     if pr['jiraTickets']:
         for ticket in pr['jiraTickets']:
             if ticket in self.jira_hierarchy:
                 jira_info = self.jira_hierarchy[ticket]
                 
                 # Use Jira summary and description for richer impact statement
                 summary = jira_info.get('summary', '')
-                description = jira_info.get('description', '')

570-576: Consider simplifying asyncio entry point.

The asyncio.get_event_loop() pattern at lines 575-576 is deprecated in Python 3.10+. Since modern Python versions are likely in use, you could simplify to always use asyncio.run(), or add a version check if 3.6 compatibility is truly required.

 if __name__ == '__main__':
-    if HAS_AIOHTTP:
-        asyncio.run(main())
-    else:
-        # Python 3.6 compatibility
-        loop = asyncio.get_event_loop()
-        loop.run_until_complete(main())
+    asyncio.run(main())
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 93a2ead and ac4fb79.

📒 Files selected for processing (3)
  • .claude/commands/weekly-pr-report.md (1 hunks)
  • contrib/repo_metrics/README.md (1 hunks)
  • contrib/repo_metrics/weekly_pr_report.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • contrib/repo_metrics/README.md
  • contrib/repo_metrics/weekly_pr_report.py
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/weekly-pr-report.md

11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.6)
contrib/repo_metrics/weekly_pr_report.py

41-41: Starting a process with a partial executable path

(S607)


47-47: Do not catch blind exception: Exception

(BLE001)


73-73: Probable use of requests call without timeout

(S113)


149-149: Probable use of requests call without timeout

(S113)


283-283: Do not catch blind exception: Exception

(BLE001)


288-288: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"

(S108)


300-300: f-string without any placeholders

Remove extraneous f prefix

(F541)


315-315: f-string without any placeholders

Remove extraneous f prefix

(F541)


355-355: Local variable pr_type is assigned to but never used

Remove assignment to unused variable pr_type

(F841)


449-449: f-string without any placeholders

Remove extraneous f prefix

(F541)


466-466: Local variable title is assigned to but never used

Remove assignment to unused variable title

(F841)


477-477: Local variable description is assigned to but never used

Remove assignment to unused variable description

(F841)


507-507: Multiple statements on one line (colon)

(E701)


508-508: Multiple statements on one line (colon)

(E701)


509-509: Multiple statements on one line (colon)

(E701)


510-510: Multiple statements on one line (colon)

(E701)


511-511: Multiple statements on one line (colon)

(E701)


512-512: Multiple statements on one line (colon)

(E701)


513-513: Multiple statements on one line (colon)

(E701)


514-514: Multiple statements on one line (colon)

(E701)


563-563: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"

(S108)


564-564: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"

(S108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (2)
contrib/repo_metrics/README.md (1)

19-78: Documentation looks good.

The README clearly documents the new Weekly PR Report Generator with appropriate usage examples, output format, and performance notes.

.claude/commands/weekly-pr-report.md (1)

1-214: Comprehensive command documentation.

The slash command documentation is thorough, covering usage, data sources, expected output format, and implementation steps. The workflow clearly explains the two-phase approach (Python script + Jira enrichment).

Add comprehensive weekly PR report generation for HyperShift project
with automated Jira hierarchy enrichment and markdown link generation.

Features:
- Fast async PR fetching using GitHub GraphQL API
- Automatic Jira ticket extraction from PR titles and bodies
- Complete Jira hierarchy (Ticket → Epic → OCPSTRAT) with parallel queries
- PR grouping by OCPSTRAT strategic initiatives
- Timing metrics (draft→ready, ready→merge) and review activity analysis
- Clickable markdown links for all PRs and Jira tickets
- Claude Code slash command: /weekly-pr-report [since-date]

Files added:
- contrib/repo_metrics/weekly_pr_report.py: Report generator with async API calls
- .claude/commands/weekly-pr-report.md: Slash command for Claude Code
- contrib/repo_metrics/README.md: Documentation updated

Signed-off-by: Antoni Segura Puimedon <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
contrib/repo_metrics/weekly_pr_report.py (1)

92-109: Add timeouts and basic error handling to synchronous GitHub HTTP calls

The requests.get/requests.post calls in the sync fallback still lack timeouts and can hang the whole CLI on a slow/unresponsive GitHub endpoint; earlier review already called this out. Consider adding modest timeouts and handling RequestException to fail fast and log the issue.

@@
-        else:
-            while True:
-                url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={GITHUB_CONTRIBUTORS_PER_PAGE}&page={page}"
-                response = requests.get(url, headers=headers)
-                if response.status_code != 200:
-                    break
-                data = response.json()
+        else:
+            while True:
+                url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={GITHUB_CONTRIBUTORS_PER_PAGE}&page={page}"
+                try:
+                    response = requests.get(url, headers=headers, timeout=10)
+                except requests.exceptions.RequestException as e:
+                    print(f"Error fetching contributors page {page}: {e}")
+                    break
+                if response.status_code != 200:
+                    break
+                data = response.json()
@@
-        else:
-            while True:
-                variables = {"searchQuery": search_query, "cursor": cursor}
-                response = requests.post(
-                    'https://api.github.com/graphql',
-                    json={"query": query, "variables": variables},
-                    headers=headers
-                )
-                data = response.json()
+        else:
+            while True:
+                variables = {"searchQuery": search_query, "cursor": cursor}
+                try:
+                    response = requests.post(
+                        'https://api.github.com/graphql',
+                        json={"query": query, "variables": variables},
+                        headers=headers,
+                        timeout=10,
+                    )
+                    data = response.json()
+                except requests.exceptions.RequestException as e:
+                    print(f"Error fetching PRs page (cursor={cursor}): {e}")
+                    break

If you prefer a different timeout value, keeping any finite value is better than none.

Also applies to: 186-193

🧹 Nitpick comments (4)
contrib/repo_metrics/weekly_pr_report.py (2)

245-250: Unify Jira ticket regex and reuse a shared pattern constant

_process_pr_data extracts tickets using a word‑bounded pattern, while _linkify_jira_tickets uses a slightly different regex without word boundaries. This can cause tickets to be linkified that weren’t captured (or vice versa) and makes maintenance harder.

@@
-# API pagination and limit constants
+# API pagination and limit constants
 GITHUB_CONTRIBUTORS_PER_PAGE = 100
 GITHUB_PR_SEARCH_LIMIT = 100
 GITHUB_REVIEWS_LIMIT = 50
 GITHUB_LABELS_LIMIT = 20
 GITHUB_TIMELINE_ITEMS_LIMIT = 100
+
+# Jira ticket ID pattern
+JIRA_TICKET_PATTERN = r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b'
@@
-        jira_tickets = list(set(re.findall(
-            r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b',
-            text
-        )))
+        jira_tickets = list(set(re.findall(JIRA_TICKET_PATTERN, text)))
@@
-        return re.sub(
-            r'(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+',
-            replace_ticket,
-            text
-        )
+        return re.sub(JIRA_TICKET_PATTERN, replace_ticket, text)

This keeps extraction and linkification in sync and avoids future regex drift.

Also applies to: 567-577


503-566: Tighten impact/component helpers (remove unused locals, expand one-line if chain)

_generate_impact_statement assigns title and description but never uses them, and _extract_component uses multiple one‑line if statements, both of which Ruff flags and slightly hurt readability.

Concrete changes:

  • In _generate_impact_statement, drop the unused locals:
-        title = pr['title'].lower()
-        labels = [label.lower() for label in pr['labels']]
+        labels = [label.lower() for label in pr['labels']]
@@
-                    summary = jira_info.get('summary', '')
-                    description = jira_info.get('description', '')
+                    summary = jira_info.get('summary', '')
  • In _extract_component, expand the one‑line if/return chain into a conventional multi‑line chain for clarity and to satisfy E701, for example:
-        # Common patterns
-        if 'aws' in title: return 'AWS platform support'
-        if 'azure' in title or 'aro' in title: return 'Azure platform support'
-        if 'gcp' in title: return 'GCP platform support'
-        if 'nodepool' in title: return 'NodePool management'
-        if 'hosted' in title or 'hcp' in title: return 'hosted control plane'
-        if 'cli' in title: return 'CLI tooling'
-        if 'operator' in title: return 'operator functionality'
-        if 'test' in title: return 'testing infrastructure'
-
-        return 'core functionality'
+        # Common patterns
+        if 'aws' in title:
+            return 'AWS platform support'
+        if 'azure' in title or 'aro' in title:
+            return 'Azure platform support'
+        if 'gcp' in title:
+            return 'GCP platform support'
+        if 'nodepool' in title:
+            return 'NodePool management'
+        if 'hosted' in title or 'hcp' in title:
+            return 'hosted control plane'
+        if 'cli' in title:
+            return 'CLI tooling'
+        if 'operator' in title:
+            return 'operator functionality'
+        if 'test' in title:
+            return 'testing infrastructure'
+
+        return 'core functionality'

These are small cleanups but keep the helpers easier to extend.

.claude/commands/pr-report.md (2)

11-13: Specify a language for the /pr-report fenced code block

markdownlint is flagging the bare triple‑backtick block; adding a language (e.g., bash or text) will satisfy MD040 and improve rendering.

-```
-/pr-report [since-date]
-```
+```bash
+/pr-report [since-date]
+```

55-115: Align “Expected Output / Key Requirements” with what the Python script currently emits

The doc promises sections like auto‑classified “Topics”, “Top Contributors”, and “Cross‑Repository Contributions” in the metrics, but weekly_pr_report.py currently focuses on timing metrics, OCPSTRAT groupings, review activity, and busiest merge days without those extra sections.

To avoid confusing users of /pr-report, consider either:

  • Extending weekly_pr_report.py to add the missing metrics/sections (e.g., author‑based contributor stats, cross‑repo contributors, explicit topic classification), or
  • Relaxing the examples/requirements here to match the current output structure and noting any future enhancements separately.

This keeps the command docs as an accurate contract for what the tool produces today.

Also applies to: 117-127

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a92aa7f and c22dc60.

📒 Files selected for processing (3)
  • .claude/commands/pr-report.md (1 hunks)
  • contrib/repo_metrics/README.md (1 hunks)
  • contrib/repo_metrics/weekly_pr_report.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • contrib/repo_metrics/README.md
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • contrib/repo_metrics/weekly_pr_report.py
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/pr-report.md

11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.6)
contrib/repo_metrics/weekly_pr_report.py

51-51: Starting a process with a partial executable path

(S607)


57-57: Do not catch blind exception: Exception

(BLE001)


94-94: Probable use of requests call without timeout

(S113)


188-188: Probable use of requests call without timeout

(S113)


326-326: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"

(S108)


340-340: f-string without any placeholders

Remove extraneous f prefix

(F541)


355-355: f-string without any placeholders

Remove extraneous f prefix

(F541)


488-488: f-string without any placeholders

Remove extraneous f prefix

(F541)


505-505: Local variable title is assigned to but never used

Remove assignment to unused variable title

(F841)


516-516: Local variable description is assigned to but never used

Remove assignment to unused variable description

(F841)


546-546: Multiple statements on one line (colon)

(E701)


547-547: Multiple statements on one line (colon)

(E701)


548-548: Multiple statements on one line (colon)

(E701)


549-549: Multiple statements on one line (colon)

(E701)


550-550: Multiple statements on one line (colon)

(E701)


551-551: Multiple statements on one line (colon)

(E701)


552-552: Multiple statements on one line (colon)

(E701)


553-553: Multiple statements on one line (colon)

(E701)


605-605: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"

(S108)


606-606: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"

(S108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
🔇 Additional comments (1)
contrib/repo_metrics/weekly_pr_report.py (1)

112-208: Async GraphQL pagination and PR aggregation look solid

The use of cursor‑based pagination in fetch_prs_graphql, async session reuse, and the parallel fetching/aggregation in fetch_all_prs and main() is well‑structured and should scale cleanly for the intended weekly windows.

Also applies to: 274-310, 586-609

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

@celebdor: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ai Indicates the PR includes changes related to AI - Claude agents, Cursor rules, etc. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants