📊 Agentic Workflow Lock File Statistics - December 9, 2025 #5921

2025-12-09T03:38:38Z

github-actions[bot]
bot Dec 9, 2025

This comprehensive analysis examines all 111 agentic workflow lock files (.lock.yml) in the githubnext/gh-aw repository, uncovering usage patterns, structural characteristics, and best practices. The lock files collectively represent 36.6 MB of workflow configurations powering automated agents across this repository.

Key Highlights

Most workflows combine schedule + workflow_dispatch triggers (56 workflows, 50%)
create-discussion is the dominant safe output (41 workflows, 37%)
Read-only permissions dominate with 93% of workflows using only read permissions
Average workflow timeout is 18 minutes, with most workflows configured between 5-30 minutes
GitHub and Bash tools are universal, appearing in 72 and 69 workflows respectively

📈 Executive Summary

Overview

Total Lock Files: 111 (excluding 2 MCP server lock files)
Total Size: 36.6 MB (38,343,913 bytes)
Average File Size: 340.4 KB
Size Range: 105.4 KB (smallest) to 628.3 KB (largest)
Analysis Date: December 9, 2025

File Size Distribution

The vast majority of lock files fall within a consistent size range, suggesting standardized workflow patterns:

Size Range	Count	Percentage
100-200 KB	6	5.4%
200-300 KB	10	9.0%
300-400 KB	82	73.9%
400-500 KB	11	9.9%
> 500 KB	2	1.8%

Key Insight: 74% of all lock files are between 300-400 KB, indicating highly consistent workflow structure across the repository.

Outliers:

Smallest: test-skip-if-match-object.lock.yml (105.4 KB) - minimal test workflow
Largest: poem-bot.lock.yml (628.3 KB) - comprehensive multi-output workflow with 15 safe output types

🎯 Trigger Analysis

Workflow Triggers

Most Popular Triggers

Workflows use a variety of trigger types to respond to repository events:

Trigger Type	Count	Percentage	Purpose
workflow_dispatch	90	81%	Manual execution via GitHub UI/API
schedule	69	62%	Cron-based automated runs
command	15	14%	Comment-based commands (e.g., `/do-something`)
reaction	13	12%	Emoji reaction-based triggers
pull_request	9	8%	Pull request events
skip-if-match	5	5%	Conditional skipping logic
issues	4	4%	Issue creation/update events
stop-after	2	2%	Time-limited execution
workflow_run	2	2%	Triggered by other workflows
push	2	2%	Code push events
workflow_call	1	1%	Reusable workflow
issue_comment	1	1%	Issue comment events
pull_request_review_comment	1	1%	PR review comment events

Common Trigger Combinations

Most workflows combine multiple triggers for flexibility:

Combination	Count	Use Case
schedule + workflow_dispatch	56	Automated daily/weekly reports with manual override
workflow_dispatch only	13	On-demand tools and utilities
schedule + workflow_dispatch + pull_request + reaction	6	Smoke tests with multiple entry points
command only	6	Interactive comment-based agents
schedule + workflow_dispatch + skip-if-match	4	Smart workflows that skip redundant runs
command + workflow_dispatch	3	Flexible command tools
command + reaction	3	Multi-modal interactive agents

Insight: The dominant pattern (50% of workflows) is schedule + workflow_dispatch, enabling both automated execution and manual intervention - ideal for reporting and analysis workflows.

Schedule Patterns

69 workflows use cron schedules for automation. Top patterns:

Schedule (Cron)	Count	Description	UTC Time
`0 9 * * *`	6	Daily at 9 AM UTC	9:00 AM
`0 14 * * 1-5`	5	Weekdays at 2 PM UTC	2:00 PM (Mon-Fri)
`0 0,6,12,18 * * *`	4	Every 6 hours	4× daily
`0 8 * * *`	4	Daily at 8 AM UTC	8:00 AM
`0 11 * * 1-5`	4	Weekdays at 11 AM UTC	11:00 AM (Mon-Fri)
`0 13 * * 1-5`	3	Weekdays at 1 PM UTC	1:00 PM (Mon-Fri)

Total Unique Schedules: 36 different cron patterns across 66 scheduled runs

Insight: Most workflows run during business hours (8 AM - 2 PM UTC), with weekday-only schedules common for team-focused reports.

🔒 Safe Outputs Analysis

Safe Output Types

Safe outputs define how workflows communicate results back to GitHub. The distribution shows a strong preference for discussions and issues:

Safe Output Distribution

Safe Output Type	Count	Percentage	Primary Use
create-discussion	41	37%	Analysis reports, summaries, insights
messages	21	19%	Custom workflow messaging
create-issue	20	18%	Bug reports, action items, alerts
add-comment	19	17%	Inline feedback on issues/PRs
upload-assets	18	16%	Charts, logs, generated artifacts
create-pull-request	17	15%	Automated code changes
push-to-pull-request-branch	5	5%	Update existing PRs
add-labels	4	4%	Automated triage/classification
close-discussion	4	4%	Cleanup/archival
close-pull-request	4	4%	Automated PR management
create-pull-request-review-comment	3	3%	Line-specific code review
link-sub-issue	3	3%	Issue hierarchy management

Additional outputs (each used 1-2×): staged, threat-detection, assign-to-agent, missing-tool, close-issue, update-pull-request, update-project, create-code-scanning-alert, minimize-comment, update-release, assign-to-user, update-issue, create-agent-task, noop, app

Common Safe Output Combinations

Combination	Count	Pattern
create-discussion (solo)	23	Simple reporting workflows
create-discussion + upload-assets	11	Rich reports with charts/data
create-pull-request (solo)	9	Code modification workflows
create-issue (solo)	8	Alert/task creation workflows
add-comment + messages	6	Interactive response workflows
close-pull-request (solo)	3	Test cleanup workflows

Insight: Discussion creation is the most popular output (37%), reflecting the repository's focus on analysis and reporting workflows. The combination of create-discussion + upload-assets (11 workflows) represents rich reporting with data visualizations.

Discussion Categories

For workflows creating discussions, the primary category is "audits" - used for analytical reports, security reviews, and system assessments.

🛡️ Permission Patterns

GitHub Permissions

The repository follows least-privilege principles, with most workflows using minimal read-only permissions.

Most Common Permissions

Permission	Count	Type
contents:read	102	Read repository files
pull-requests:read	89	Read PR data
issues:read	86	Read issue data
actions:read	45	Read workflow run data
discussions:read	11	Read discussion data
security-events:read	6	Read security alerts
discussions:write	4	Create/update discussions
repository-projects:read	3	Read project boards
issues:write	3	Create/update issues
contents:write	1	Modify repository files
pull-requests:write	1	Modify pull requests

Permission Distribution

Read-only workflows: 103 (93%)
Workflows with write permissions: 7 (6%)
Workflows with no specified permissions: 1 (1%)

Security Insight: 93% of workflows use exclusively read permissions, demonstrating strong security practices. Write permissions are reserved for specific use cases like PR creation and issue management.

Write Permission Use Cases:

PR automation (create-pull-request outputs)
Issue triage (add-labels)
Discussion management (create/close)
Release updates

🧰 Tool & Configuration Analysis

Tool Usage

Workflows leverage various tool categories for agent capabilities:

Tool Category	Count	Percentage	Purpose
github	72	65%	GitHub API interactions
bash	69	62%	Shell command execution
edit	54	49%	File editing capabilities
cache-memory	47	42%	Persistent storage across runs
serena	16	14%	Serena-specific tooling
playwright	9	8%	Browser automation
web-fetch	7	6%	External web content
repo-memory	4	4%	Repository-wide memory
timeout	3	3%	Timeout management
agentic-workflows	3	3%	Workflow-specific tools
web-search	2	2%	Web search capabilities

Insight: The core toolset (github, bash, edit, cache-memory) appears in 42-72% of workflows, forming the foundational agent capabilities. Specialized tools like playwright and web-search are used selectively for specific use cases.

Engine Distribution

Only 11 of 111 workflows explicitly specify an AI engine (10%):

Engine	Count	Workflows
claude	5	daily-multi-device-docs-tester, smoke-claude, unbloat-docs, commit-changes-analyzer, cloclo
copilot	4	technical-doc-writer, smoke-copilot-playwright, glossary-maintainer, poem-bot
codex	2	changeset, daily-fact

Note: Most workflows (90%) don't specify an engine, using the default configuration.

Timeout Configuration

106 of 111 workflows (95%) specify explicit timeout values:

Timeout Statistics

Minimum: 5 minutes (43 workflows)
Maximum: 60 minutes (4 workflows)
Average: 18 minutes
Median: 15 minutes

Timeout Distribution

Timeout (min)	Count	Use Case
5	43	Quick analysis, tests, simple tasks
10	35	Standard agent tasks
15	19	Complex analysis
20	14	In-depth reports
30	12	Comprehensive audits
45	6	Large-scale analysis
60	4	Maximum-duration workflows
12	1	Custom configuration

Insight: Most workflows (69%) use 5-15 minute timeouts, suggesting relatively quick agent operations. Long-running workflows (45-60 min) are reserved for comprehensive analysis tasks like deep reports and performance summaries.

🎨 Interesting Findings

Notable Patterns & Anomalies

1. Command-Based Agent Pattern

15 workflows use the command trigger, enabling natural language interaction through GitHub comments. Examples:

/q - General Q&A agent
/craft - Code generation agent
/grumpy - Critical code reviewer
/scout - Codebase explorer

This represents a unique UX pattern where agents respond to slash commands in PR/issue comments.

2. Smoke Test Consistency

Multiple smoke test workflows (claude, codex, copilot) share identical trigger patterns:

on:
  schedule: [...]
  workflow_dispatch: {}
  pull_request: {}
  reaction: {}

This ensures comprehensive testing across different engines with consistent trigger logic.

3. The "Poem Bot" Anomaly

poem-bot.lock.yml is exceptional with:

Largest file size: 628.3 KB (nearly 2× the average)
Most safe outputs: 15 different output types
All-in-one capability: Supports nearly every safe output type

This appears to be a demonstration or test workflow showcasing all available capabilities.

4. Minimal Test Workflows

Some test workflows are remarkably small:

test-skip-if-match-object.lock.yml: 105.4 KB
test-firewall-default.lock.yml: 126.2 KB

These minimal configurations help validate specific features without the overhead of full workflow logic.

5. Weekday-Only Schedules

Many reporting workflows use weekday-only schedules (Monday-Friday), suggesting:

Focus on team collaboration during work hours
Reduced noise during weekends
Cost optimization by avoiding unnecessary weekend runs

6. Cache Memory Adoption

47 workflows (42%) use cache-memory tools, indicating:

Strong pattern of stateful workflows
Memory persistence across runs
Evolution toward more intelligent, context-aware agents

7. Discussion-Centric Architecture

With 41 workflows creating discussions (37%), the repository heavily favors threaded conversations over ephemeral notifications. This creates a searchable, organized history of agent insights.

📋 Methodology

Analysis Approach

Data Collection

File Discovery: Used find to locate all 111 .lock.yml files in .github/workflows/
Frontmatter Extraction: Python script to parse YAML frontmatter from comment blocks
Statistical Analysis: Comprehensive parsing of triggers, safe outputs, permissions, tools, and configurations

Tools Used

Python 3.12: Primary analysis scripting
Bash: File discovery and text processing
JSON: Structured data storage for intermediate results
Regular Expressions: Pattern matching for YAML parsing

Cache Memory

Analysis scripts and data stored in /tmp/gh-aw/cache-memory/ for:

Script Persistence: Reusable analysis tools for future runs
Historical Tracking: Baseline for trend analysis
Pattern Library: Documented successful parsing patterns

Files Generated

/tmp/gh-aw/cache-memory/
├── scripts/
│   ├── extract_frontmatter.py
│   ├── advanced_analysis.py
│   └── generate_report.py
├── data/
│   ├── analysis_clean.json
│   ├── report.json
│   └── lockfile_stats_report.md
└── patterns/
    └── (for future use)

💡 Recommendations

Best Practices & Optimization Opportunities

1. Standardize Timeout Values

Current: Wide range of timeout configurations
Recommendation: Define standard timeout tiers (5/10/15/30/60 min) and align workflows
Benefit: Easier maintenance, predictable resource usage

2. Permission Templates

Pattern: Most workflows use contents:read + issues:read + pull-requests:read
Recommendation: Create named permission templates (e.g., read-only, pr-writer)
Benefit: Consistent security posture, easier auditing

3. Trigger Combination Documentation

The schedule + workflow_dispatch pattern is dominant but implicit
Recommendation: Document this pattern as the "standard reporting workflow trigger"
Benefit: Clearer onboarding, consistent design

4. Safe Output Strategy

23 workflows use create-discussion alone
Recommendation: Establish when to use discussions vs. issues vs. comments
Benefit: Consistent user experience, easier discoverability

5. Engine Selection Guidelines

Only 10% of workflows specify an engine explicitly
Recommendation: Document criteria for engine selection (claude vs. copilot vs. codex)
Benefit: Optimal performance, cost management

6. Cache Memory Adoption

42% of workflows use cache-memory, 58% don't
Recommendation: Audit remaining workflows for potential cache-memory benefits
Benefit: Smarter agents, reduced API calls, faster execution

7. Schedule Optimization

66 scheduled runs with 36 unique cron patterns
Recommendation: Consolidate overlapping schedules, stagger high-resource workflows
Benefit: Reduced contention, cost optimization

8. Size Consistency Verification

74% of files are 300-400 KB (very consistent)
Outliers like poem-bot (628 KB) warrant review
Recommendation: Investigate if outliers contain unnecessary duplication
Benefit: Faster git operations, easier code review

Comparison with Historical Data

Note: This is the initial baseline analysis. Future runs will compare against this data to identify:

Growth in workflow count
Changes in trigger patterns
Evolution of safe output usage
Permission scope changes
Tool adoption trends

Historical tracking enabled - data stored in /tmp/gh-aw/cache-memory/history/2025-12-09.json

📊 Generated by Lockfile Statistics Analysis Agent on December 9, 2025
Analysis scripts available in /tmp/gh-aw/cache-memory/scripts/ for future runs

AI generated by Lockfile Statistics Analysis Agent

📊 Agentic Workflow Lock File Statistics - December 9, 2025 #5921

Uh oh!

github-actions[bot] bot Dec 9, 2025

Key Highlights

Overview

File Size Distribution

Workflow Triggers

Most Popular Triggers

Common Trigger Combinations

Schedule Patterns

Safe Output Types

Safe Output Distribution

Common Safe Output Combinations

Discussion Categories

GitHub Permissions

Most Common Permissions

Permission Distribution

Tool Usage

Engine Distribution

Timeout Configuration

Timeout Statistics

Timeout Distribution

Notable Patterns & Anomalies

1. Command-Based Agent Pattern

2. Smoke Test Consistency

3. The "Poem Bot" Anomaly

4. Minimal Test Workflows

5. Weekday-Only Schedules

6. Cache Memory Adoption

7. Discussion-Centric Architecture

Analysis Approach

Data Collection

Tools Used

Cache Memory

Files Generated

Best Practices & Optimization Opportunities

1. Standardize Timeout Values

2. Permission Templates

3. Trigger Combination Documentation

4. Safe Output Strategy

5. Engine Selection Guidelines

6. Cache Memory Adoption

7. Schedule Optimization

8. Size Consistency Verification

Comparison with Historical Data

Replies: 0 comments

github-actions[bot]
bot Dec 9, 2025