-
Notifications
You must be signed in to change notification settings - Fork 423
CNTRLPLANE-1977: Add weekly PR report generation capability #7272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@celebdor: This pull request references CNTRLPLANE-1977 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
WalkthroughAdds a new PR Report Generator: a Python script that collects GitHub PR data (GraphQL), optionally enriches with cached Jira hierarchy, produces a Markdown weekly report and raw JSON, plus documentation and a Claude command definition. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (7)
.claude/commands/update-ocpstrat-weekly-status.md (1)
43-48: Add languages to fenced code blocks to satisfy markdownlintmarkdownlint (MD040) is flagging several fenced code blocks without a language (component list example, status template, and usage examples). Adding explicit languages will quiet the linter and improve rendering:
- Component list / plain text templates: use
```text- CLI examples: use
```bashApplies to the blocks starting around lines 43, 134, 205, 210, 215, 220, 225, 230, and 235.
Also applies to: 134-143, 205-237
.claude/commands/weekly-pr-report.md (1)
11-13: Specify a language for the usage code fence (MD040)The usage block:
/weekly-pr-report [since-date]is flagged by markdownlint (MD040) for missing a language. Consider marking it as bash (for consistency with other examples) or text:
```bash /weekly-pr-report [since-date]</blockquote></details> <details> <summary>contrib/repo_metrics/weekly_pr_report.py (5)</summary><blockquote> `265-293`: **Jira hierarchy load relies on `/tmp/jira_hierarchy.json` without validation** `fetch_jira_hierarchy` currently: - Logs “Fetching Jira hierarchy…”, but the ticket loop is effectively a no‑op (`pass`). - Then unconditionally tries to load `/tmp/jira_hierarchy.json`, falling back to an empty dict with a warning if the file is missing. Two considerations: 1. The log message is a bit misleading: no fetch actually happens here; all enrichment depends on an external process populating `/tmp/jira_hierarchy.json`. 2. Reading a fixed, world‑writable `/tmp` path means a stale or malicious file could influence how PRs are grouped in the report (even if it’s “just” reporting logic). If you want to harden this: - Clarify the log message (e.g., “Loading Jira hierarchy cache from /tmp/jira_hierarchy.json …”). - Optionally make the cache path configurable via env var or argument. - Validate the JSON structure (e.g. ensure it’s a dict of ticket keys) before trusting it. For now this is more about correctness/clarity than a critical security issue, but worth tightening. --- `298-400`: **Minor report-generation cleanups for maintainability** A few small issues in `generate_report` are flagged by Ruff and can be tidied up without behavior change: - Lines like `f.write(f"# Weekly PR Report: openshift/hypershift\n")`, `f.write(f"- **openshift/release:** 0 PRs\n\n")`, and `f.write(f"- Ready: Created ready\n")` are f-strings without placeholders. These can be plain strings. - In the ungrouped PRs section, `pr_type` is assigned but never used (line 355), and can be removed or actually surfaced in the output if you want to show “bug fix” vs “feature”. Example adjustments: ```diff - f.write(f"# Weekly PR Report: openshift/hypershift\n") + f.write("# Weekly PR Report: openshift/hypershift\n") ... - f.write(f"- **openshift/release:** 0 PRs\n\n") + f.write("- **openshift/release:** 0 PRs\n\n") ... - pr_type = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature" - f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n") + pr_kind = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature" + f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]} {pr_kind}) - {self._linkify_jira_tickets(pr['title'])}\n") ... - f.write(f"- Ready: Created ready\n") + f.write("- Ready: Created ready\n")None of this is urgent, but it will quiet lint warnings and make the code a bit cleaner.
Also applies to: 443-451
464-501: Tighten_generate_impact_statementand_extract_component(unused vars, style)Two minor maintainability points:
- In
_generate_impact_statement,titleanddescriptionare assigned but never used (titleat line 466,descriptionat 477). You can drop them or incorporate them into the heuristics._extract_componentuses multipleif ...: return ...on single lines (lines 507–514), which Ruff flags (E701) and makes diffs less readable.If you don’t plan to extend the logic soon, consider:
- title = pr['title'].lower() - labels = [label.lower() for label in pr['labels']] + labels = [label.lower() for label in pr['labels']] ... - description = jira_info.get('description', '') + # description is available if you want to enrich impact text further + # description = jira_info.get('description', '')and in
_extract_component:- # Common patterns - if 'aws' in title: return 'AWS platform support' - if 'azure' in title or 'aro' in title: return 'Azure platform support' - if 'gcp' in title: return 'GCP platform support' - if 'nodepool' in title: return 'NodePool management' - if 'hosted' in title or 'hcp' in title: return 'hosted control plane' - if 'cli' in title: return 'CLI tooling' - if 'operator' in title: return 'operator functionality' - if 'test' in title: return 'testing infrastructure' + # Common patterns + if 'aws' in title: + return 'AWS platform support' + if 'azure' in title or 'aro' in title: + return 'Azure platform support' + if 'gcp' in title: + return 'GCP platform support' + if 'nodepool' in title: + return 'NodePool management' + if 'hosted' in title or 'hcp' in title: + return 'hosted control plane' + if 'cli' in title: + return 'CLI tooling' + if 'operator' in title: + return 'operator functionality' + if 'test' in title: + return 'testing infrastructure'These are purely cosmetic but will keep Ruff happy and make future edits easier.
Also applies to: 502-527
37-49: Consider narrowing the exception in_get_gh_token(optional)
_get_gh_tokencurrently catches a broadExceptionand exits:except Exception as e: print(f"Error getting GitHub token: {e}") sys.exit(1)For a CLI this is acceptable, but if you want more precise diagnostics, you could:
- Catch
subprocess.CalledProcessErrorandFileNotFoundErrorseparately.- Include a hint when
ghis not installed vs. when the user isn’t authenticated.Not mandatory, but would improve debuggability when someone runs the tool on a fresh machine.
92-162: I'll fix the script syntax—thegh search prscommand doesn't use--search, it builds the query from individual flags like--merged-atfor date filtering.Now let me examine the actual code file to understand the context better:
Based on my verification, the concern in the original review comment is valid and well-founded. The GitHub GraphQL API requires supplying a first or last argument on any connection with values between 1 and 100, and both the REST API and GraphQL API cap the number of items returned from a search query at 1,000 results.
Critically, a real-world case shows that pagination through search results only yields 1,000 Pull Requests total despite the issueCount reporting 2,129 results—this occurred after 10 pages (10 × 100 results) due to GitHub's API limit on the search endpoint.
The code does not implement pagination and will silently drop results beyond the first 100 (or 1000 max for search). This is a legitimate concern for active repositories.
GraphQL search is limited to first 100 PRs; implement pagination or document the constraint
fetch_prs_graphqlusessearch(query: $searchQuery, type: ISSUE, first: 100)without pagination viapageInfo.hasNextPageandendCursor. GitHub's API limits search results to 1,000 total, so for repositories with frequent merges or when widening the date range, data will silently drop from the report beyond the first page.Recommended fixes:
- Add pagination loop over
search.pageInfo.hasNextPage/endCursor, or- Explicitly document the weekly time window assumption and add a log warning when
len(prs) == 100to surface the limitation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
.claude/commands/update-ocpstrat-weekly-status.md(1 hunks).claude/commands/weekly-pr-report.md(1 hunks)contrib/repo_metrics/README.md(1 hunks)contrib/repo_metrics/weekly_pr_report.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
contrib/repo_metrics/README.mdcontrib/repo_metrics/weekly_pr_report.py
🪛 LanguageTool
.claude/commands/update-ocpstrat-weekly-status.md
[style] ~167-~167: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ... activity - Proposed status update Ask if they want to: - Proceed with your proposed update - ...
(IF_YOU_WANT)
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/weekly-pr-report.md
11-11: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
.claude/commands/update-ocpstrat-weekly-status.md
43-43: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
134-134: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
205-205: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
210-210: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
215-215: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
220-220: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
225-225: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
230-230: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
235-235: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.5)
contrib/repo_metrics/weekly_pr_report.py
41-41: Starting a process with a partial executable path
(S607)
47-47: Do not catch blind exception: Exception
(BLE001)
73-73: Probable use of requests call without timeout
(S113)
149-149: Probable use of requests call without timeout
(S113)
283-283: Do not catch blind exception: Exception
(BLE001)
288-288: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"
(S108)
300-300: f-string without any placeholders
Remove extraneous f prefix
(F541)
315-315: f-string without any placeholders
Remove extraneous f prefix
(F541)
355-355: Local variable pr_type is assigned to but never used
Remove assignment to unused variable pr_type
(F841)
449-449: f-string without any placeholders
Remove extraneous f prefix
(F541)
466-466: Local variable title is assigned to but never used
Remove assignment to unused variable title
(F841)
477-477: Local variable description is assigned to but never used
Remove assignment to unused variable description
(F841)
507-507: Multiple statements on one line (colon)
(E701)
508-508: Multiple statements on one line (colon)
(E701)
509-509: Multiple statements on one line (colon)
(E701)
510-510: Multiple statements on one line (colon)
(E701)
511-511: Multiple statements on one line (colon)
(E701)
512-512: Multiple statements on one line (colon)
(E701)
513-513: Multiple statements on one line (colon)
(E701)
514-514: Multiple statements on one line (colon)
(E701)
563-563: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"
(S108)
564-564: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"
(S108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
| async def fetch_repository_contributors(self, repo_owner: str, repo_name: str) -> Set[str]: | ||
| """Fetch repository contributors using GitHub REST API""" | ||
| headers = { | ||
| "Authorization": f"Bearer {self.github_token}", | ||
| "Accept": "application/vnd.github+json", | ||
| "X-GitHub-Api-Version": "2022-11-28" | ||
| } | ||
|
|
||
| contributors = set() | ||
| page = 1 | ||
| per_page = 100 | ||
|
|
||
| while True: | ||
| url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}" | ||
|
|
||
| if HAS_AIOHTTP: | ||
| async with aiohttp.ClientSession() as session: | ||
| async with session.get(url, headers=headers) as response: | ||
| if response.status != 200: | ||
| break | ||
| data = await response.json() | ||
| else: | ||
| response = requests.get(url, headers=headers) | ||
| if response.status_code != 200: | ||
| break | ||
| data = response.json() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add timeouts to synchronous requests calls to avoid indefinite hangs
In the synchronous fallback paths:
fetch_repository_contributorsusesrequests.get(...)without a timeout (line 73).fetch_prs_graphqlusesrequests.post(...)without a timeout (line 149).
A slow or unresponsive GitHub endpoint will block the entire script indefinitely. Even though this is a CLI tool, adding modest timeouts improves robustness with minimal complexity.
Example tightening:
- response = requests.get(url, headers=headers)
+ response = requests.get(url, headers=headers, timeout=10)
...
- response = requests.post(
+ response = requests.post(
'https://api.github.com/graphql',
json={"query": query, "variables": variables},
- headers=headers
+ headers=headers,
+ timeout=10,
)You can tune the timeout value as needed, but having one is better than none.
Also applies to: 135-155
🧰 Tools
🪛 Ruff (0.14.5)
73-73: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In contrib/repo_metrics/weekly_pr_report.py around lines 51-77 and 135-155, the
synchronous fallback calls to requests.get and requests.post lack timeouts and
can hang indefinitely; update these calls to include a modest timeout (e.g.,
timeout=10) and propagate/handle requests.exceptions.Timeout and
requests.exceptions.RequestException as appropriate (catch and break/return or
log and continue) so the CLI does not block on slow/unresponsive GitHub
endpoints.
|
|
||
| Parse the command arguments to check for `--component` parameter: | ||
|
|
||
| 1. **If `--component` is provided:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if you tried putting in some other flag not mentioned here and seeing if it pulls it into JQL query as well. I've found Claude will accept more than just the defined flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not
| **IMPORTANT**: Only fetch the fields you need to save context: | ||
| - Use `fields=summary,issuelinks,comment,customfield_12320841` when getting issue details | ||
| - Use `expand=changelog` to get field update history | ||
| - Set `comment_limit=20` to limit comment history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it pull the latest 20 comments? I haven't seen many Jira tickets with that many comments but they do exist. Just wondering if this will just pull the first 20 even if they are not the 20 most recent comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll change that
| - **Efficiency**: Don't fetch all fields (`*all`) - only get what you need. Always use `expand=changelog` to get update history | ||
| - **User Input**: Accept user corrections and exact text for status updates | ||
| - **Skip Issues**: Allow user to skip issues when requested, especially those recently updated | ||
| - **Context Labels**: Note if issues have labels like `aro`, `rosa-hcp`, or `no_core_payload` that indicate non-core-payload work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the aro and rosa-hcp labels necessarily mean there is no CPO work involved?
.claude/commands/weekly-pr-report.md
Outdated
| arguments: <since-date> | ||
| --- | ||
|
|
||
| # Weekly PR Report Generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be called a weekly PR report generator when you can actually pull further back than a week?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Renamed the command from `/weekly-pr-report` to `/pr-report` since it supports custom date ranges. Also improved the argument hints to show the expected format.
AI-assisted response via Claude Code
| while True: | ||
| url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}" | ||
|
|
||
| if HAS_AIOHTTP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CC review -
1. Session Creation Anti-Pattern (Lines 66-70, 141-147)
The code creates a new session for each paginated request:
# INCORRECT - creates new session per page
if HAS_AIOHTTP:
async with aiohttp.ClientSession() as session:
async with session.get(url, headers=headers) as response:
...
Problem: This defeats connection pooling and wastes resources. Each request creates and tears down TCP connections.
Fix: Create session once at the method level:
async def fetch_repository_contributors(self, repo_owner: str, repo_name: str) -> Set[str]:
"""Fetch repository contributors using GitHub REST API"""
headers = {...}
contributors = set()
page = 1
per_page = 100
if HAS_AIOHTTP:
async with aiohttp.ClientSession() as session:
while True:
url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"
async with session.get(url, headers=headers) as response:
if response.status != 200:
break
data = await response.json()
# ... process data
else:
while True:
url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={per_page}&page={page}"
response = requests.get(url, headers=headers)
# ... sync logic
Same issue in fetch_prs_graphql() (lines 141-147).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Refactored to create the aiohttp.ClientSession once at the method level and reuse it for all paginated requests. This enables proper connection pooling.
AI-assisted response via Claude Code
| self.prs = hypershift_prs + filtered_ai_helpers | ||
| print(f"Found {len(self.prs)} PRs ({len(hypershift_prs)} hypershift, {len(filtered_ai_helpers)} ai-helpers)") | ||
|
|
||
| async def fetch_jira_hierarchy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CC review -
2. Unused/Placeholder Jira Logic (Lines 265-292)
async def fetch_jira_hierarchy(self):
"""Fetch Jira hierarchy for all tickets in parallel using MCP tool"""
print("Fetching Jira hierarchy...")
all_tickets = set()
for pr in self.prs:
all_tickets.update(pr['jiraTickets'])
print(f"Found {len(all_tickets)} unique Jira tickets")
# For now, use the MCP tool via subprocess
# In a future optimization, we could call the Jira REST API directly
for ticket in sorted(all_tickets):
try:
# This is a placeholder - in practice we'd use the MCP tool
# For now, load from the existing cache if available
pass
except Exception as e:
print(f"Error fetching {ticket}: {e}")
Problem: This doesn't actually fetch anything. It just loads from cache if available. Misleading function name and docstring.
Fix: Either implement the actual Jira fetching or rename to load_jira_hierarchy_cache() and update docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Renamed function to load_jira_hierarchy_cache() and updated docstring to accurately reflect that it loads from cache rather than fetching from Jira directly.
AI-assisted response via Claude Code
| ready_to_merge_hours = (merged_at - ready_at).total_seconds() / 3600 | ||
|
|
||
| # Extract Jira tickets | ||
| jira_tickets = re.findall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CC review -
4. Regex Jira Ticket Extraction Too Broad (Lines 201-204)
jira_tickets = re.findall(
r'(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+',
pr['title'] + ' ' + (pr['body'] or '')
)
Problem:
- Concatenating title + body without any delimiters could create false matches at boundary
- No deduplication of tickets (fixed later with list(set(...)) but inefficient)
Fix:
text = f"{pr['title']}\n{pr['body'] or ''}"
jira_tickets = list(set(re.findall(
r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b',
text
)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added word boundaries (\b) to the regex pattern and changed text concatenation to use a newline delimiter.
AI-assisted response via Claude Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CC review -
8. Magic Numbers
Lines 61, 98, 111, 117, 365-367: Hardcoded limits (100, 50, 20) without constants.
Fix:
GITHUB_CONTRIBUTORS_PER_PAGE = 100
GITHUB_PR_SEARCH_LIMIT = 100
GITHUB_REVIEWS_LIMIT = 50
GITHUB_LABELS_LIMIT = 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Extracted magic numbers into named constants at module level including DEFAULT_DAYS_AGO = 7 for the default report period.
AI-assisted response via Claude Code
| import os | ||
| import re | ||
| import sys | ||
| from datetime import datetime, timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timezone isn't used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed the unused timezone import (now using timedelta for dynamic date calculation).
AI-assisted response via Claude Code
93a2ead to
f23d75f
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: celebdor The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f23d75f to
ac4fb79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
contrib/repo_metrics/weekly_pr_report.py (3)
355-356: Unusedpr_typevariable suggests incomplete implementation.The
pr_typeis computed but never used. If the intent was to include it in the output, this should be completed; otherwise, remove the dead code.- pr_type = "bug fix" if any(label in ['bugfix', 'bug'] for label in pr['labels']) else "feature" - f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n") + f.write(f"- PR [#{pr['number']}]({pr['url']}) ({pr['repo'].split('/')[1]}) - {self._linkify_jira_tickets(pr['title'])}\n")
464-500: Unused variablestitleanddescriptionin impact generation.Lines 466 and 477 assign
titleanddescriptionbut never use them. Either utilize these for richer impact statements or remove the dead assignments.def _generate_impact_statement(self, pr: Dict) -> str: """Generate an OCPSTRAT impact statement based on PR data and Jira info""" - title = pr['title'].lower() labels = [label.lower() for label in pr['labels']] # Try to use Jira ticket description for better context if pr['jiraTickets']: for ticket in pr['jiraTickets']: if ticket in self.jira_hierarchy: jira_info = self.jira_hierarchy[ticket] # Use Jira summary and description for richer impact statement summary = jira_info.get('summary', '') - description = jira_info.get('description', '')
570-576: Consider simplifying asyncio entry point.The
asyncio.get_event_loop()pattern at lines 575-576 is deprecated in Python 3.10+. Since modern Python versions are likely in use, you could simplify to always useasyncio.run(), or add a version check if 3.6 compatibility is truly required.if __name__ == '__main__': - if HAS_AIOHTTP: - asyncio.run(main()) - else: - # Python 3.6 compatibility - loop = asyncio.get_event_loop() - loop.run_until_complete(main()) + asyncio.run(main())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
.claude/commands/weekly-pr-report.md(1 hunks)contrib/repo_metrics/README.md(1 hunks)contrib/repo_metrics/weekly_pr_report.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
contrib/repo_metrics/README.mdcontrib/repo_metrics/weekly_pr_report.py
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/weekly-pr-report.md
11-11: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.6)
contrib/repo_metrics/weekly_pr_report.py
41-41: Starting a process with a partial executable path
(S607)
47-47: Do not catch blind exception: Exception
(BLE001)
73-73: Probable use of requests call without timeout
(S113)
149-149: Probable use of requests call without timeout
(S113)
283-283: Do not catch blind exception: Exception
(BLE001)
288-288: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"
(S108)
300-300: f-string without any placeholders
Remove extraneous f prefix
(F541)
315-315: f-string without any placeholders
Remove extraneous f prefix
(F541)
355-355: Local variable pr_type is assigned to but never used
Remove assignment to unused variable pr_type
(F841)
449-449: f-string without any placeholders
Remove extraneous f prefix
(F541)
466-466: Local variable title is assigned to but never used
Remove assignment to unused variable title
(F841)
477-477: Local variable description is assigned to but never used
Remove assignment to unused variable description
(F841)
507-507: Multiple statements on one line (colon)
(E701)
508-508: Multiple statements on one line (colon)
(E701)
509-509: Multiple statements on one line (colon)
(E701)
510-510: Multiple statements on one line (colon)
(E701)
511-511: Multiple statements on one line (colon)
(E701)
512-512: Multiple statements on one line (colon)
(E701)
513-513: Multiple statements on one line (colon)
(E701)
514-514: Multiple statements on one line (colon)
(E701)
563-563: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"
(S108)
564-564: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"
(S108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (2)
contrib/repo_metrics/README.md (1)
19-78: Documentation looks good.The README clearly documents the new Weekly PR Report Generator with appropriate usage examples, output format, and performance notes.
.claude/commands/weekly-pr-report.md (1)
1-214: Comprehensive command documentation.The slash command documentation is thorough, covering usage, data sources, expected output format, and implementation steps. The workflow clearly explains the two-phase approach (Python script + Jira enrichment).
ac4fb79 to
a92aa7f
Compare
Add comprehensive weekly PR report generation for HyperShift project with automated Jira hierarchy enrichment and markdown link generation. Features: - Fast async PR fetching using GitHub GraphQL API - Automatic Jira ticket extraction from PR titles and bodies - Complete Jira hierarchy (Ticket → Epic → OCPSTRAT) with parallel queries - PR grouping by OCPSTRAT strategic initiatives - Timing metrics (draft→ready, ready→merge) and review activity analysis - Clickable markdown links for all PRs and Jira tickets - Claude Code slash command: /weekly-pr-report [since-date] Files added: - contrib/repo_metrics/weekly_pr_report.py: Report generator with async API calls - .claude/commands/weekly-pr-report.md: Slash command for Claude Code - contrib/repo_metrics/README.md: Documentation updated Signed-off-by: Antoni Segura Puimedon <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)
a92aa7f to
c22dc60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
contrib/repo_metrics/weekly_pr_report.py (1)
92-109: Add timeouts and basic error handling to synchronous GitHub HTTP callsThe
requests.get/requests.postcalls in the sync fallback still lack timeouts and can hang the whole CLI on a slow/unresponsive GitHub endpoint; earlier review already called this out. Consider adding modest timeouts and handlingRequestExceptionto fail fast and log the issue.@@ - else: - while True: - url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={GITHUB_CONTRIBUTORS_PER_PAGE}&page={page}" - response = requests.get(url, headers=headers) - if response.status_code != 200: - break - data = response.json() + else: + while True: + url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contributors?per_page={GITHUB_CONTRIBUTORS_PER_PAGE}&page={page}" + try: + response = requests.get(url, headers=headers, timeout=10) + except requests.exceptions.RequestException as e: + print(f"Error fetching contributors page {page}: {e}") + break + if response.status_code != 200: + break + data = response.json() @@ - else: - while True: - variables = {"searchQuery": search_query, "cursor": cursor} - response = requests.post( - 'https://api.github.com/graphql', - json={"query": query, "variables": variables}, - headers=headers - ) - data = response.json() + else: + while True: + variables = {"searchQuery": search_query, "cursor": cursor} + try: + response = requests.post( + 'https://api.github.com/graphql', + json={"query": query, "variables": variables}, + headers=headers, + timeout=10, + ) + data = response.json() + except requests.exceptions.RequestException as e: + print(f"Error fetching PRs page (cursor={cursor}): {e}") + breakIf you prefer a different timeout value, keeping any finite value is better than none.
Also applies to: 186-193
🧹 Nitpick comments (4)
contrib/repo_metrics/weekly_pr_report.py (2)
245-250: Unify Jira ticket regex and reuse a shared pattern constant
_process_pr_dataextracts tickets using a word‑bounded pattern, while_linkify_jira_ticketsuses a slightly different regex without word boundaries. This can cause tickets to be linkified that weren’t captured (or vice versa) and makes maintenance harder.@@ -# API pagination and limit constants +# API pagination and limit constants GITHUB_CONTRIBUTORS_PER_PAGE = 100 GITHUB_PR_SEARCH_LIMIT = 100 GITHUB_REVIEWS_LIMIT = 50 GITHUB_LABELS_LIMIT = 20 GITHUB_TIMELINE_ITEMS_LIMIT = 100 + +# Jira ticket ID pattern +JIRA_TICKET_PATTERN = r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b' @@ - jira_tickets = list(set(re.findall( - r'\b(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+\b', - text - ))) + jira_tickets = list(set(re.findall(JIRA_TICKET_PATTERN, text))) @@ - return re.sub( - r'(?:OCPBUGS|CNTRLPLANE|OCPSTRAT|RFE|HOSTEDCP)-\d+', - replace_ticket, - text - ) + return re.sub(JIRA_TICKET_PATTERN, replace_ticket, text)This keeps extraction and linkification in sync and avoids future regex drift.
Also applies to: 567-577
503-566: Tighten impact/component helpers (remove unused locals, expand one-lineifchain)
_generate_impact_statementassignstitleanddescriptionbut never uses them, and_extract_componentuses multiple one‑lineifstatements, both of which Ruff flags and slightly hurt readability.Concrete changes:
- In
_generate_impact_statement, drop the unused locals:- title = pr['title'].lower() - labels = [label.lower() for label in pr['labels']] + labels = [label.lower() for label in pr['labels']] @@ - summary = jira_info.get('summary', '') - description = jira_info.get('description', '') + summary = jira_info.get('summary', '')
- In
_extract_component, expand the one‑lineif/returnchain into a conventional multi‑line chain for clarity and to satisfy E701, for example:- # Common patterns - if 'aws' in title: return 'AWS platform support' - if 'azure' in title or 'aro' in title: return 'Azure platform support' - if 'gcp' in title: return 'GCP platform support' - if 'nodepool' in title: return 'NodePool management' - if 'hosted' in title or 'hcp' in title: return 'hosted control plane' - if 'cli' in title: return 'CLI tooling' - if 'operator' in title: return 'operator functionality' - if 'test' in title: return 'testing infrastructure' - - return 'core functionality' + # Common patterns + if 'aws' in title: + return 'AWS platform support' + if 'azure' in title or 'aro' in title: + return 'Azure platform support' + if 'gcp' in title: + return 'GCP platform support' + if 'nodepool' in title: + return 'NodePool management' + if 'hosted' in title or 'hcp' in title: + return 'hosted control plane' + if 'cli' in title: + return 'CLI tooling' + if 'operator' in title: + return 'operator functionality' + if 'test' in title: + return 'testing infrastructure' + + return 'core functionality'These are small cleanups but keep the helpers easier to extend.
.claude/commands/pr-report.md (2)
11-13: Specify a language for the/pr-reportfenced code block
markdownlintis flagging the bare triple‑backtick block; adding a language (e.g.,bashortext) will satisfy MD040 and improve rendering.-``` -/pr-report [since-date] -``` +```bash +/pr-report [since-date] +```
55-115: Align “Expected Output / Key Requirements” with what the Python script currently emitsThe doc promises sections like auto‑classified “Topics”, “Top Contributors”, and “Cross‑Repository Contributions” in the metrics, but
weekly_pr_report.pycurrently focuses on timing metrics, OCPSTRAT groupings, review activity, and busiest merge days without those extra sections.To avoid confusing users of
/pr-report, consider either:
- Extending
weekly_pr_report.pyto add the missing metrics/sections (e.g., author‑based contributor stats, cross‑repo contributors, explicit topic classification), or- Relaxing the examples/requirements here to match the current output structure and noting any future enhancements separately.
This keeps the command docs as an accurate contract for what the tool produces today.
Also applies to: 117-127
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
.claude/commands/pr-report.md(1 hunks)contrib/repo_metrics/README.md(1 hunks)contrib/repo_metrics/weekly_pr_report.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- contrib/repo_metrics/README.md
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
contrib/repo_metrics/weekly_pr_report.py
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/pr-report.md
11-11: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.6)
contrib/repo_metrics/weekly_pr_report.py
51-51: Starting a process with a partial executable path
(S607)
57-57: Do not catch blind exception: Exception
(BLE001)
94-94: Probable use of requests call without timeout
(S113)
188-188: Probable use of requests call without timeout
(S113)
326-326: Probable insecure usage of temporary file or directory: "/tmp/jira_hierarchy.json"
(S108)
340-340: f-string without any placeholders
Remove extraneous f prefix
(F541)
355-355: f-string without any placeholders
Remove extraneous f prefix
(F541)
488-488: f-string without any placeholders
Remove extraneous f prefix
(F541)
505-505: Local variable title is assigned to but never used
Remove assignment to unused variable title
(F841)
516-516: Local variable description is assigned to but never used
Remove assignment to unused variable description
(F841)
546-546: Multiple statements on one line (colon)
(E701)
547-547: Multiple statements on one line (colon)
(E701)
548-548: Multiple statements on one line (colon)
(E701)
549-549: Multiple statements on one line (colon)
(E701)
550-550: Multiple statements on one line (colon)
(E701)
551-551: Multiple statements on one line (colon)
(E701)
552-552: Multiple statements on one line (colon)
(E701)
553-553: Multiple statements on one line (colon)
(E701)
605-605: Probable insecure usage of temporary file or directory: "/tmp/weekly_pr_report_fast.md"
(S108)
606-606: Probable insecure usage of temporary file or directory: "/tmp/hypershift_pr_details_fast.json"
(S108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
🔇 Additional comments (1)
contrib/repo_metrics/weekly_pr_report.py (1)
112-208: Async GraphQL pagination and PR aggregation look solidThe use of cursor‑based pagination in
fetch_prs_graphql, async session reuse, and the parallel fetching/aggregation infetch_all_prsandmain()is well‑structured and should scale cleanly for the intended weekly windows.Also applies to: 274-310, 586-609
|
@celebdor: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
This PR adds comprehensive weekly PR report generation capability for the HyperShift project. It provides an automated way to:
The tool uses async GitHub GraphQL API calls for fast PR fetching, automatically extracts Jira tickets from PR titles/bodies, queries Jira in parallel for hierarchy enrichment, and generates comprehensive markdown reports grouped by OCPSTRAT strategic initiatives.
Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-1977
Special notes for your reviewer:
Key Features
/weekly-pr-report [since-date]slash commandFiles Added
contrib/repo_metrics/weekly_pr_report.py- Report generator (576 lines).claude/commands/weekly-pr-report.md- Slash command documentationcontrib/repo_metrics/README.md- Updated with usage instructionsExample Output
Report includes:
Testing Done
Checklist:
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]