docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)#916
docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)#916harinarayansrivatsan wants to merge 6 commits intomicrosoft:mainfrom
Conversation
Introduce human-in-the-loop escalation as a third decision tier between allow and deny. The tutorial covers creating escalation requests, human approve/deny flows, and timeout with safe defaults. New files: - 05-approval-workflows.md: tutorial narrative with bank teller analogy, five steps, ASCII flowchart, and exercises - examples/05_approval_policy.yaml: three-tier policy (allow, deny, escalate) - examples/05_approval_workflows.py: runnable demo with five parts Updated files: - README.md: link chapter 5, update coming-soon notice - 04-conditional-policies.md: update Next navigation link
Normalize license headers across all tutorial chapters. Chapters 5-7 already had them; this adds them to chapters 1-4 markdown and YAML example files for consistency.
Covers structural validation with Pydantic, declarative YAML test scenarios, cross-policy test matrices, and regression detection. Includes runnable Python example and test fixtures.
Covers side-by-side version comparison, structural diffing, behavioral regression detection, and deploy gates. Updates README to link chapters 6-7 and fixes chapter 5 nav link.
|
Welcome to the Agent Governance Toolkit! Thanks for your first pull request. |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Feedback on Pull Request: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)
🔴 CRITICAL: Security Issues
-
Escalation Timeout Default Action
- The tutorial suggests setting
DefaultTimeoutAction.ALLOWfor less critical actions in certain scenarios. This is a dangerous recommendation, as it could lead to silent approvals of sensitive actions if the timeout expires. Even if the action is deemed "less critical," allowing it by default without human review introduces a potential security bypass. - Action: Remove or strongly discourage the use of
DefaultTimeoutAction.ALLOWin the documentation. Defaulting toDENYis the safer and more appropriate choice for all escalation scenarios.
- The tutorial suggests setting
-
Policy Rule Priority Conflicts
- The tutorial does not address potential priority conflicts between rules. For example, if two rules apply to the same tool and have overlapping conditions, the higher-priority rule will take precedence. However, if the priority order is misconfigured, this could lead to unintended decisions (e.g., escalation bypassed by a lower-priority rule).
- Action: Add a section to the tutorial explaining how to handle rule priority conflicts and the importance of testing for such scenarios.
-
Escalation Request Spoofing
- The tutorial does not mention any mechanisms to prevent spoofing of escalation requests. For example, an attacker could potentially craft a fake escalation request with a forged
agent_idoractionfield to bypass security checks. - Action: Document the need for cryptographic signing or authentication mechanisms for escalation requests to ensure their integrity and authenticity.
- The tutorial does not mention any mechanisms to prevent spoofing of escalation requests. For example, an attacker could potentially craft a fake escalation request with a forged
🟡 WARNING: Potential Breaking Changes
- Backward Compatibility of Policy Schema
- The addition of new rules and features (e.g., escalation handling) may break existing policies that do not conform to the updated schema. For example, older policies without the
messagefield or escalation rules might fail validation. - Action: Ensure backward compatibility by providing migration tools or fallback mechanisms for older policy versions. Update the tutorial to include instructions for migrating legacy policies.
- The addition of new rules and features (e.g., escalation handling) may break existing policies that do not conform to the updated schema. For example, older policies without the
💡 Suggestions for Improvement
-
Automated Policy Testing
- The tutorial introduces policy testing but does not mention integration with CI/CD pipelines. Automated policy tests should be run as part of the CI/CD process to catch regressions early.
- Action: Add a section on integrating policy tests with CI/CD pipelines, including examples of how to run tests using
pytestor similar frameworks.
-
Thread Safety in Escalation Handling
- The tutorial uses
InMemoryApprovalQueue, which may not be thread-safe in concurrent environments. While this is acceptable for demonstration purposes, production systems should use thread-safe or distributed backends. - Action: Add a note in the tutorial recommending thread-safe or distributed backends (e.g., Redis, RabbitMQ) for production use.
- The tutorial uses
-
Policy Diffing and Versioning
- Chapter 7 introduces policy versioning but does not provide examples of how to handle structural diffs in YAML files. This could be useful for detecting unintended changes in policies.
- Action: Expand Chapter 7 to include examples of YAML diffing tools or libraries that can be used to compare policy versions.
-
Error Handling in Policy Validation
- The tutorial demonstrates how to catch validation errors but does not provide guidance on how to handle them effectively (e.g., logging, notifying developers).
- Action: Add best practices for handling validation errors, including logging and alerting mechanisms.
-
OWASP Agentic Top 10 Compliance
- The tutorial does not explicitly address compliance with OWASP Agentic Top 10 guidelines, such as ensuring audit trails for all policy decisions and escalation requests.
- Action: Add a section mapping the tutorial's features to OWASP Agentic Top 10 compliance, emphasizing auditability and traceability.
-
Improved Documentation Navigation
- The tutorial chapters are linked sequentially, but navigating between sections could be improved with a table of contents or sidebar navigation.
- Action: Add a table of contents or sidebar navigation to improve usability.
Summary
This pull request introduces valuable documentation for policy testing and versioning. However, it raises critical security concerns, particularly around escalation timeout defaults and spoofing risks. Addressing these issues is essential to ensure the library remains secure and compliant with best practices. Additionally, improving backward compatibility, thread safety, and documentation usability will enhance the overall quality of the tutorial.
Priority Actions:
- Remove or discourage
DefaultTimeoutAction.ALLOW. - Document cryptographic signing for escalation requests.
- Address rule priority conflicts in the tutorial.
Suggested Enhancements:
- Integrate policy testing into CI/CD pipelines.
- Recommend thread-safe backends for production.
- Expand policy versioning examples to include YAML diffing.
- Map features to OWASP Agentic Top 10 compliance.
🤖 AI Agent: security-scanner — 🔵 **LOW**: Potential for Misleading Error Messages in Policy ValidationAfter reviewing the provided pull request, I have identified the following security-related findings and their implications: 🔵 LOW: Potential for Misleading Error Messages in Policy ValidationIssue: The Attack Vector: An attacker could exploit this by crafting a malicious policy file with sensitive information embedded in the fields. If the validation error is logged or exposed, the attacker could retrieve this sensitive information. Recommendation: Ensure that error messages generated during policy validation are sanitized to avoid exposing sensitive information. For example, avoid including the full content of the invalid field in the error message. Instead, provide a generic error message and log the details securely for debugging purposes. 🟡 MEDIUM: Escalation Timeout Default Action ConfigurationIssue: The tutorial suggests using Attack Vector: If a policy is misconfigured to use Recommendation: Enforce stricter validation for the use of 🔵 LOW: Lack of Explicit Mention of Secure Storage for Approval QueueIssue: The tutorial mentions the use of Attack Vector: If the approval queue is not securely implemented, an attacker could tamper with escalation requests (e.g., approving their own requests or denying legitimate ones). Recommendation: Update the tutorial to explicitly mention the need for secure storage mechanisms for the approval queue in production environments. This could include encryption at rest, access controls, and audit logging. 🟡 MEDIUM: Lack of Mention of Input Sanitization for Escalation RequestsIssue: The tutorial demonstrates creating escalation requests with fields like Attack Vector: An attacker could craft a malicious Recommendation: Update the tutorial to include a note about the importance of sanitizing input fields in escalation requests. Additionally, ensure that the 🟡 MEDIUM: Lack of Explicit Mention of Secure Communication for WebhookApprovalBackendIssue: The tutorial briefly mentions the Attack Vector: If the webhook communication is not secured, an attacker could intercept or tamper with approval requests and responses, potentially leading to unauthorized actions being approved or legitimate actions being denied. Recommendation: Update the tutorial to explicitly state that HTTPS must be used for the 🔵 LOW: Lack of Mention of Audit Logging for Escalation DecisionsIssue: The tutorial demonstrates how escalation decisions are made and resolved but does not explicitly mention the importance of audit logging for these decisions. Attack Vector: Without proper audit logging, it may be difficult to investigate and attribute responsibility for escalation decisions, especially in the case of malicious or unauthorized actions. Recommendation: Update the tutorial to include a section on the importance of audit logging for escalation decisions. Ensure that the 🔵 LOW: Potential for Policy Drift in YAML FilesIssue: The tutorial highlights the risk of policy drift (e.g., accidental changes to YAML files) and suggests using automated tests to catch regressions. However, it does not explicitly mention the importance of version control and change management for policy files. Attack Vector: Without proper version control and change management, an attacker with access to the repository could introduce malicious changes to the policy files, potentially bypassing security controls. Recommendation: Update the tutorial to emphasize the importance of using version control (e.g., Git) and implementing change management processes (e.g., code reviews, automated CI/CD pipelines) for policy files. Consider integrating policy validation and testing into the CI/CD pipeline to catch issues before deployment. Summary of Findings
General Recommendations
Let me know if you need further clarification or additional recommendations! |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Pull Request Review: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)
🔴 CRITICAL Issues
1. Escalation Timeout Default Action
- The tutorial suggests using
DefaultTimeoutAction.ALLOWas an alternative toDENY. While this is presented as a rare case, it introduces a potential security bypass. If an escalation request times out and defaults toALLOW, malicious or unintended actions could proceed without human review. - Recommendation: Clearly document that
DefaultTimeoutAction.ALLOWshould only be used for non-critical actions. Add safeguards in the library to warn or prevent its use for high-risk actions.
2. Policy Rule Priority Conflicts
- The example policies rely heavily on rule priorities to determine outcomes. If two rules have overlapping conditions and the same priority, the behavior is undefined. This could lead to security vulnerabilities if a lower-priority rule inadvertently overrides a higher-priority rule.
- Recommendation: Add validation logic to detect and prevent overlapping rules with identical priorities. This should be highlighted in the tutorial as a potential pitfall.
3. EscalationHandler Timeout Behavior
- The
EscalationHandlertimeout mechanism relies on a default action, but the tutorial does not address what happens if the timeout value is set toNoneor an extremely high value (e.g.,timeout_seconds=999999). This could lead to agents waiting indefinitely, potentially causing denial-of-service scenarios. - Recommendation: Enforce a reasonable maximum timeout value in the library and document this limitation in the tutorial.
🟡 WARNING: Potential Breaking Changes
1. License Header Normalization
- Adding MIT license headers to existing files (chapters 1-4) changes the file content. If any external systems or scripts rely on exact file hashes or content, this could cause issues.
- Recommendation: Ensure that all downstream dependencies are updated to account for these changes. Communicate this update clearly in the release notes.
2. Policy Versioning Diff Behavior
- The tutorial introduces structural diffing for policy versioning. If users rely on the current behavior of policy evaluation without versioning, this could lead to unexpected results when deploying new policies.
- Recommendation: Provide clear migration guidance for users transitioning to policy versioning. Include examples of how to handle backward compatibility.
💡 Suggestions for Improvement
1. Thread Safety in EscalationHandler
- The
InMemoryApprovalQueueis used in the examples, but it is unclear whether it is thread-safe. If multiple agents are running concurrently, race conditions could occur when accessing or modifying the queue. - Recommendation: Ensure that
InMemoryApprovalQueueis thread-safe or explicitly document that it is not suitable for concurrent use. Consider providing examples with a thread-safe backend, such as a database or message broker.
2. Policy Testing Coverage
- The tutorial introduces policy testing but does not emphasize edge cases, such as testing for unintended rule overlaps or missing escalation tags. These are critical for security-focused applications.
- Recommendation: Expand the tutorial to include examples of edge case testing, such as overlapping rules, missing escalation tags, or malformed policies.
3. SPIFFE/SVID Integration
- The tutorial does not mention how policies interact with SPIFFE/SVID for agent identity verification. This is a missed opportunity to highlight the library's security features.
- Recommendation: Add a section to the tutorial explaining how SPIFFE/SVID can be used to authenticate agents and enforce identity-based policies.
4. Behavioral Regression Detection
- The tutorial mentions behavioral regression detection but does not provide concrete examples of how to implement it.
- Recommendation: Include a detailed example of a regression test that compares policy behavior across versions, highlighting potential issues like changes in escalation behavior or default actions.
5. Markdown Link Verification
- The test plan includes verifying markdown links between chapters but does not automate this step.
- Recommendation: Add a script to the CI pipeline that checks for broken links in markdown files.
6. Type Safety and Pydantic Validation
- The tutorial demonstrates Pydantic validation for policies but does not emphasize type safety for test scenarios.
- Recommendation: Extend the tutorial to include type-safe definitions for test scenarios, ensuring that all inputs are validated before execution.
Summary
This pull request introduces valuable documentation for policy testing and versioning, but it raises critical security concerns and potential backward compatibility issues. Addressing these concerns will improve the library's robustness and ensure compliance with security best practices.
Actions Required:
- Address 🔴 CRITICAL issues related to escalation timeout behavior and rule priority conflicts.
- Mitigate 🟡 WARNING risks by documenting changes and providing migration guidance.
- Implement 💡 SUGGESTIONS to enhance thread safety, testing coverage, and SPIFFE/SVID integration.
Once these issues are resolved, the documentation will be a strong addition to the repository.
There was a problem hiding this comment.
Pull request overview
Adds new “Policy-as-Code” tutorial content covering automated policy testing (chapter 6) and policy versioning/regression detection (chapter 7), plus updates navigation and normalizes MIT license headers across earlier chapters/examples.
Changes:
- Add chapters 6–7 markdown + runnable Python/YAML examples for scenario testing, test matrices, and v1/v2 comparisons.
- Update tutorial README + chapter navigation links to include the new chapters (and chapter 5 due to stack).
- Normalize MIT license headers across chapters 1–4 markdown and YAML example files.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/tutorials/policy-as-code/README.md | Links chapters 5–7 (removes “coming soon”). |
| docs/tutorials/policy-as-code/examples/07_policy_versioning.py | New runnable example to diff/test v1 vs v2 and flag regressions. |
| docs/tutorials/policy-as-code/examples/07_policy_v2.yaml | New v2 policy used for structural/behavioral comparison demo. |
| docs/tutorials/policy-as-code/examples/07_policy_v1.yaml | New v1 baseline policy used for comparison demo. |
| docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml | New declarative scenario set for CLI-based policy testing. |
| docs/tutorials/policy-as-code/examples/06_test_policy.yaml | New combined test policy used by scenario runner/matrix. |
| docs/tutorials/policy-as-code/examples/06_policy_testing.py | New runnable example covering validation, scenarios, matrix, regression check. |
| docs/tutorials/policy-as-code/examples/05_approval_workflows.py | (Stacked) Runnable example for human-in-the-loop escalation. |
| docs/tutorials/policy-as-code/examples/05_approval_policy.yaml | (Stacked) YAML policy for approval workflow chapter. |
| docs/tutorials/policy-as-code/examples/04_support_team_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/04_global_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/04_env_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/03_rate_limit_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/02_reader_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/02_admin_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/examples/01_first_policy.yaml | Add MIT header to existing example YAML. |
| docs/tutorials/policy-as-code/07-policy-versioning.md | New chapter 7 tutorial doc (diffing + regression gates). |
| docs/tutorials/policy-as-code/06-policy-testing.md | New chapter 6 tutorial doc (validation, scenarios, matrices). |
| docs/tutorials/policy-as-code/05-approval-workflows.md | (Stacked) New chapter 5 tutorial doc (escalation workflows). |
| docs/tutorials/policy-as-code/04-conditional-policies.md | Add MIT header + update “Next” link to chapter 5. |
| docs/tutorials/policy-as-code/03-rate-limiting.md | Add MIT header. |
| docs/tutorials/policy-as-code/02-capability-scoping.md | Add MIT header. |
| docs/tutorials/policy-as-code/01-your-first-policy.md | Add MIT header. |
| scenarios_path = EXAMPLES_DIR / "06_test_scenarios.yaml" | ||
| with open(scenarios_path) as f: | ||
| scenarios_data = yaml.safe_load(f) | ||
|
|
There was a problem hiding this comment.
When reading the scenarios YAML, open the file with an explicit UTF-8 encoding (e.g., encoding="utf-8") to avoid Windows default-encoding issues and keep consistency with the repo’s other file reads (e.g., Path.read_text(..., encoding="utf-8") in the policies CLI).
| # ── Escalation-tagged (deny with "requires human approval") ───────── | ||
| - name: transfer-funds-denied | ||
| context: { tool_name: transfer_funds } | ||
| expected_action: deny | ||
|
|
||
| - name: send-email-denied | ||
| context: { tool_name: send_email } | ||
| expected_action: deny |
There was a problem hiding this comment.
The comment labels these as “Escalation-tagged” scenarios, but the scenario schema/CLI test runner only asserts expected_action/expected_allowed and cannot verify the escalation keyword in the decision reason. Either adjust the wording to avoid implying escalation is being tested here, or extend the example runner/schema to assert on reason/tiers.
| with open(examples_dir / "06_test_scenarios.yaml") as f: | ||
| scenarios = yaml.safe_load(f)["scenarios"] | ||
|
|
||
| for scenario in scenarios: | ||
| decision = evaluator.evaluate(scenario["context"]) | ||
| expected = scenario.get("expected_action") | ||
| actual = decision.action | ||
| ok = (expected is None) or (actual == expected) | ||
| status = "✅ pass" if ok else "❌ FAIL" |
There was a problem hiding this comment.
This code sample suggests you can validate scenarios via YAML, but the example runner shown here only checks expected_action and ignores expected_allowed (even though the scenarios format includes both). Consider updating the sample to handle both fields (like the runnable 06_policy_testing.py does) so the docs match the declared schema.
| ```bash | ||
| python -m agent_os.policies.cli diff \ | ||
| examples/07_policy_v1.yaml \ | ||
| examples/07_policy_v2.yaml | ||
| ``` |
There was a problem hiding this comment.
The CLI command uses relative paths under examples/…, but this only works if the user’s working directory is docs/tutorials/policy-as-code/. Since earlier chapters emphasize running from the repo root, consider using full paths (docs/tutorials/policy-as-code/examples/...) or explicitly state the required working directory.
| # Changed: send_email is now a hard deny, no longer escalated | ||
| - name: escalate-send-email | ||
| condition: | ||
| field: tool_name | ||
| operator: eq | ||
| value: send_email | ||
| action: deny | ||
| priority: 85 | ||
| message: "Communication: send_email is blocked by policy" |
There was a problem hiding this comment.
In v2 this rule is now a hard deny (message no longer indicates escalation), but the rule name still starts with escalate-.... For tutorial clarity, either rename it to something like block-send-email in v2 or add an explicit note explaining the name was kept stable to make the diff easier to read.
| for name in v2_rules: | ||
| if name not in v1_rules: | ||
| diffs.append(f"rule added: {name}") | ||
|
|
||
| for name in v1_rules: | ||
| if name not in v2_rules: | ||
| diffs.append(f"rule removed: {name}") | ||
|
|
||
| for name in v1_rules: |
There was a problem hiding this comment.
diff_rules() iterates over dict keys in insertion order, which ties diff output ordering to YAML rule ordering and can produce noisy output if rule order changes. Sorting rule names before iterating would make the script’s diff output deterministic and easier to compare across edits.
| evaluator = PolicyEvaluator(policies=[policy]) | ||
|
|
||
| with open(examples_dir / "06_test_scenarios.yaml") as f: | ||
| scenarios = yaml.safe_load(f)["scenarios"] | ||
|
|
There was a problem hiding this comment.
The docs’ Python snippet reads the scenarios file without an explicit encoding. Using encoding="utf-8" avoids platform-dependent default encodings and matches other parts of the repo that read YAML/JSON as UTF-8.
Summary
New files
docs/tutorials/policy-as-code/06-policy-testing.mddocs/tutorials/policy-as-code/07-policy-versioning.mddocs/tutorials/policy-as-code/examples/06_policy_testing.pydocs/tutorials/policy-as-code/examples/06_test_policy.yamldocs/tutorials/policy-as-code/examples/06_test_scenarios.yamldocs/tutorials/policy-as-code/examples/07_policy_versioning.pydocs/tutorials/policy-as-code/examples/07_policy_v1.yamldocs/tutorials/policy-as-code/examples/07_policy_v2.yamlTest plan
python docs/tutorials/policy-as-code/examples/06_policy_testing.py— all 4 parts pass, 8/8 scenarios pass, matrix finds expected surprisepython docs/tutorials/policy-as-code/examples/07_policy_versioning.py— diff accurate, regression correctly identifiedRef #706