-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What is missing or needs to be updated?
The current AI Agent Security Cheat Sheet provides guidance on:
- Tool security and least privilege
- Prompt injection defense
- Memory protection
- Multi-agent security
- Output validation
- Monitoring and privacy
But, it lacks guidance in the following areas:
- High-Impact Action Controls
- There is no dedicated section addressing safeguards for irreversible, destructive, or security-sensitive actions.
- The sheet references high-risk actions (e.g., financial operations) but does not provide architectural controls such as separation of decision and execution, replay protection, or risk-based gating.
- Structured Security Testing for Agents
- The cheat sheet does not include guidance on testing or abuse-case simulation.
- There is no framework for validating resistance to:
a. prompt injection override attempts
b. privilege escalation
c. memory poisoning
d. recursive tool abuse
e. denial of Wallet / resource exhaustion
- Risk Scoring & Decision Observability
- Monitoring guidance does not explicitly recommend logging structured decision metadata for high-risk actions (e.g., risk score, model confidence, action classification).
- There is no guidance on monitoring deviations in approval behavior or risk scoring distribution.
How should this be resolved?
Expand Key Risks:
Add two new risk items:
- High-Impact Action Abuse – Agents executing irreversible or security-sensitive operations without independent validation.
- Model Manipulation & Risk Abuse – Attackers influencing decision confidence or approval thresholds to bypass safeguards.
Add a New Section: “High-Impact Action Controls” - section 9.
This will look something like:
- High-Impact Action Controls
AI agents operating in production environments may perform actions with significant operational, financial, or security impact. These actions require additional safeguards beyond standard tool permission controls.
Examples of high-impact actions include:
- modifying or deleting persistent data
- executing system commands
- changing access control policies
- triggering automation workflows
- sending external communications
- initiating irreversible operations
9.1 Risk-Based Action Gating
Agents should evaluate action risk dynamically and enforce escalation mechanisms for high-risk operations.
Risk considerations may include:
- Irreversibility
- Data sensitivity
- Required privilege level
- External system interaction
- Resource consumption
- Regulatory or compliance impact
High-risk actions should require at least one or more of the following: - Explicit user confirmation
- Step-up authentication
- Manual review
- Independent service validation
9.2 Explicit Intent Verification
Before executing high-risk actions consider adding explicit intent verification:
- display a clear, human-readable preview of the action.
- require explicit confirmation from the user.
- prevent silent execution triggered by prompt injection.
- if possible log confirmation artifacts for audit/verification purposes.
9.3 Separation of Decision and Execution
Avoid architectures where a single agent executes both:
- decides to perform a high-risk action
- executes that action directly
Instead consider next: - one agent generates a signed, short-lived authorization artifact.
- a separate execution component independently validates:
a. authorization scope
b. expiration
c. privilege level
This reduces the impact of prompt injection or goal hijacking.
Add a very critical section that is missing in this sheet as section 10 with the next content.
10. Secure Agent Testing & Adversarial Validation
AI agents should undergo structured security testing prior to deployment. Traditional application security testing techniques should be adapted for agentic systems.
Security validation should be integrated into CI/CD pipelines.
10.1 Prompt Injection & Override Testing
Testing should simulate:
- Instruction override attempts
- Tool misuse
- Privilege escalation
- Data exfiltration requests
- Recursive tool chaining
- Goal hijacking attempts
Agents should demonstrate resistance to these abuse patterns before production deployment.
10.2 Memory Poisoning Testing
Validate next:
- Malicious content is sanitized before being stored in memory.
- Memory is isolated per user or session.
- Expired memory entries are purged.
- Integrity checks detect tampering.
10.3 Denial of Wallet & Resource Exhaustion Testing
If possible simulate:
- Recursive reasoning loops
- Excessive tool invocation
- High token usage prompts
- Rapid repeated execution
Also try to validate enforcement of:
- Rate limits
- Budget controls (if applicable)
- Circuit breakers
- Automatic session termination
10.4 Abuse Case Simulation
Incorporate structured abuse-case testing for:
- Unauthorized data access attempts
- Cross-user data leakage
- Multi-agent privilege chaining
- Approval flow bypass attempts
Security testing should be repeatable and automated.
I'm also thinking we can enhance Section 6 “Monitoring & Observability” with this bullet - but this is optional:
- If possible and applicable log structured decision metadata for high-risk actions, including risk score (if applicable), model confidence, action classification, authorization outcome, execution result.
Add this to section 6:
Anomaly Detection for High-Impact Actions
Monitor for:
- Sudden increases in high-risk approvals
- Elevated privilege usage
- Repeated approval bypass attempts
- Abnormal tool invocation frequency
- Drift in risk scoring distributions
- Deviations from established baselines should trigger investigation or escalation.
And last if we agree on all these changes we can update Do's and Don'ts:
Add to Do section:
- Separate decision-making from execution for irreversible operations.
- Perform structured adversarial testing prior to production deployment.
- Enforce resource consumption limits.
- Log structured decision metadata for high-risk actions.
- Apply replay protection.
Add to Don't section:
- Allow agents to directly execute destructive or irreversible operations without independent validation.
- Rely solely on model output for authorization decisions.
- Skip adversarial testing before production deployment.
- Permit unlimited recursion or tool chaining.
I can split these changes into separate PRs (1 per section add/update) or have all in one PR - whatever works best for reviewers.