-
Notifications
You must be signed in to change notification settings - Fork 598
FEAT Content harm scenario #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6a3da9e
04d5742
2c03890
ea488f5
ddc6fe0
8069f17
3dd0261
19dbc86
f342e50
6366e7e
0bff470
e290f73
5ee2103
1fe542b
4c4cdca
17c5435
0e860e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,154 @@ | ||
| # Scenarios | ||
|
|
||
| A `Scenario` is a higher-level construct that groups multiple Attack Configurations together. This allows you to execute a comprehensive testing campaign with multiple attack methods sequentially. Scenarios are meant to be configured and written to test for specific workflows. As such, it is okay to hard code some values. | ||
|
|
||
| ## What is a Scenario? | ||
|
|
||
| A `Scenario` represents a comprehensive testing campaign composed of multiple atomic attack tests. It orchestrates the execution of multiple `AtomicAttack` instances sequentially and aggregates the results into a single `ScenarioResult`. | ||
|
|
||
| ### Key Components | ||
|
|
||
| - **Scenario**: The top-level orchestrator that groups and executes multiple atomic attacks | ||
| - **AtomicAttack**: An atomic test unit combining an attack strategy, objectives, and execution parameters | ||
| - **ScenarioResult**: Contains the aggregated results from all atomic attacks and scenario metadata | ||
|
|
||
| ## Use Cases | ||
|
|
||
| Some examples of scenarios you might create: | ||
|
|
||
| - **VibeCheckScenario**: Randomly selects a few prompts from HarmBench to quickly assess model behavior | ||
| - **QuickViolence**: Checks how resilient a model is to violent objectives using multiple attack techniques | ||
| - **ComprehensiveFoundry**: Tests a target with all available attack converters and strategies | ||
| - **CustomCompliance**: Tests against specific compliance requirements with curated datasets and attacks | ||
|
|
||
| These Scenarios can be updated and added to as you refine what you are testing for. | ||
|
|
||
| ## How It Works | ||
|
|
||
| Each `Scenario` contains a collection of `AtomicAttack` objects. When executed: | ||
|
|
||
| 1. Each `AtomicAttack` is executed sequentially | ||
| 2. Every `AtomicAttack` tests its configured attack against all specified objectives and datasets | ||
| 3. Results are aggregated into a single `ScenarioResult` with all attack outcomes | ||
| 4. Optional memory labels help track and categorize the scenario execution | ||
|
|
||
| ## Creating Custom Scenarios | ||
|
|
||
| To create a custom scenario, extend the `Scenario` base class and implement the required abstract methods. | ||
|
|
||
| ### Required Components | ||
|
|
||
| 1. **Strategy Enum**: Create a `ScenarioStrategy` enum that defines the available strategies for your scenario. | ||
| - Each enum member is defined as `(value, tags)` where value is a string and tags is a set of strings | ||
| - Include an `ALL` aggregate strategy that expands to all available strategies | ||
| - Optionally implement `supports_composition()` and `validate_composition()` for strategy composition rules | ||
|
|
||
| 2. **Scenario Class**: Extend `Scenario` and implement these abstract methods: | ||
| - `get_strategy_class()`: Return your strategy enum class | ||
| - `get_default_strategy()`: Return the default strategy (typically `YourStrategy.ALL`) | ||
| - `_get_atomic_attacks_async()`: Build and return a list of `AtomicAttack` instances | ||
|
|
||
| 3. **Constructor**: Use `@apply_defaults` decorator and call `super().__init__()` with scenario metadata: | ||
| - `name`: Descriptive name for your scenario | ||
| - `version`: Integer version number | ||
| - `objective_target`: The target system being tested | ||
| - `objective_scorer_identifier`: Identifier for the scoring mechanism | ||
| - `memory_labels`: Optional labels for tracking | ||
| - `max_concurrency`: Number of concurrent operations (default: 10) | ||
| - `max_retries`: Number of retry attempts on failure (default: 0) | ||
|
|
||
| ### Example Structure | ||
|
|
||
| ```python | ||
| class MyStrategy(ScenarioStrategy): | ||
| ALL = ("all", {"all"}) | ||
| StrategyA = ("strategy_a", {"tag1", "tag2"}) | ||
| StrategyB = ("strategy_b", {"tag1"}) | ||
|
|
||
| class MyScenario(Scenario): | ||
| version: int = 1 | ||
|
|
||
| @classmethod | ||
| def get_strategy_class(cls) -> Type[ScenarioStrategy]: | ||
| return MyStrategy | ||
|
|
||
| @classmethod | ||
| def get_default_strategy(cls) -> ScenarioStrategy: | ||
| return MyStrategy.ALL | ||
|
|
||
| @apply_defaults | ||
| def __init__( | ||
| self, | ||
| *, | ||
| objective_target: PromptTarget, | ||
| scenario_strategies: Sequence[MyStrategy | ScenarioCompositeStrategy] | None = None, | ||
| objective_scorer: Optional[TrueFalseScorer] = None, | ||
| memory_labels: Optional[Dict[str, str]] = None, | ||
| max_concurrency: int = 10, | ||
| max_retries: int = 0, | ||
| ): | ||
| # Prepare strategy compositions | ||
| self._strategy_compositions = MyStrategy.prepare_scenario_strategies( | ||
| scenario_strategies, default_aggregate=MyStrategy.ALL | ||
| ) | ||
|
|
||
| # Initialize scoring and targets | ||
| self._objective_target = objective_target | ||
| self._objective_scorer = objective_scorer or self._get_default_scorer() | ||
| self._scorer_config = AttackScoringConfig(objective_scorer=self._objective_scorer) | ||
|
|
||
| # Call parent constructor | ||
| super().__init__( | ||
| name="My Custom Scenario", | ||
| version=self.version, | ||
| objective_target=objective_target, | ||
| objective_scorer_identifier=self._objective_scorer.get_identifier(), | ||
| memory_labels=memory_labels, | ||
| max_concurrency=max_concurrency, | ||
| max_retries=max_retries, | ||
| ) | ||
|
|
||
| async def _get_atomic_attacks_async(self) -> List[AtomicAttack]: | ||
| atomic_attacks = [] | ||
| for strategy in self._strategy_compositions: | ||
| # Create attack instances based on strategy | ||
| attack = PromptSendingAttack( | ||
| objective_target=self._objective_target, | ||
| attack_scoring_config=self._scorer_config, | ||
| ) | ||
| atomic_attacks.append(AtomicAttack( | ||
| atomic_attack_name=strategy.name, | ||
| attack=attack, | ||
| objectives=["objective1", "objective2"], | ||
| memory_labels=self._memory_labels, | ||
| )) | ||
| return atomic_attacks | ||
| ``` | ||
|
|
||
| ### Existing Scenarios | ||
|
|
||
| - **EncodingScenario**: Tests encoding attacks (Base64, ROT13, etc.) with seed prompts and decoding templates | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can actually print this using I think worth doing scenario listing programatically so we don't have to keep the list up to date |
||
| - **FoundryScenario**: Comprehensive converter and multi-turn attack testing with difficulty-based strategies | ||
| - **ContentHarmScenario**: Tests harm categories (hate, violence, etc.) by loading datasets from CentralMemory | ||
|
|
||
| See [`FoundryScenario`](../../../pyrit/scenarios/scenarios/foundry_scenario.py), | ||
| [`EncodingScenario`](../../../pyrit/scenarios/scenarios/encoding_scenario.py), and | ||
| [`ContentHarmScenario`](../../../pyrit/scenarios/scenarios/e2e/content_harm_scenario.py) for complete examples. | ||
|
|
||
| ## Resiliency | ||
|
|
||
| Scenarios can run for a long time, and because of that, things can go wrong. Network issues, rate limits, or other transient failures can interrupt execution. PyRIT provides built-in resiliency features to handle these situations gracefully. | ||
rlundeen2 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Automatic Resume | ||
|
|
||
| If you re-run a `scenario`, it will automatically start where it left off. The framework tracks completed attacks and objectives in memory, so you won't lose progress if something interrupts your scenario execution. This means you can safely stop and restart scenarios without duplicating work. | ||
|
|
||
| ### Retry Mechanism | ||
|
|
||
| You can utilize the `max_retries` parameter to handle transient failures. If any unknown exception occurs during execution, PyRIT will automatically retry the failed operation (starting where it left off) up to the specified number of times. This helps ensure your scenario completes successfully even in the face of temporary issues. | ||
|
|
||
| ### Dynamic Configuration | ||
|
|
||
| During a long-running scenario, you may want to adjust parameters like `max_concurrency` to manage resource usage, or switch your scorer to use a different target. PyRIT's resiliency features make it safe to stop, reconfigure, and continue scenarios as needed. | ||
|
|
||
| For more information, see [resiliency](../setup/2_resiliency.ipynb) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend making this an ipynb. Even if it doesn't do anything, we can run this as a cell and make sure the syntax is correct - eg if we rename something it will not run.