FEAT Rapid response harm scenario #1174

hannahwestra25 · 2025-11-06T17:47:19Z

Description

Add rapid response harm scenario which tests several different strategies for each harm category. The idea is to have a quick, comprehensive scenario to run before drilling down into more specific strategies.

Tests and Documentation

Added rapid response notebook plus instructions for dataset naming.
Added unit tests

…xample_scenario

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

rlundeen2 · 2025-11-08T03:48:34Z

doc/code/scenarios/rapid_response_harm_scenario.py

+
+# Hate speech datasets
+
+hate_stories = await create_seed_dataset(


I think we should manage a few of these, even if the list is incomplete. So instead of having strings in the notebooks, I'd put these in datasets/seed_prompts/ai_rt and maybe one file per category.

Eventually it might be nice to have a single function call that can load all our yaml seedprompts into the database and folks can use those as examples.

I would go even further and say we should provide a truly end-to-end solution here that gives some results even if the customer doesn't bring their own datasets. Of course, the conundrum is that we may not be able to share the exact datasets we're using, but maybe it's something we should actually strive for.

Btw I'll keep fighting against the ai_rt naming for external assets 😆

I agree, but I think it's safe to require an upload, which could even be done as part of initialization. I think the dataset question can be tackled independently of this PR. Although for this one it'd be nice to include some sample datasets that we can later add to the db easily

E.g. workflow for external user

memory.add_dataset(redteam) # not part of this PR

Run the scenario

doc/code/scenarios/rapid_response_harm_scenario.py

rlundeen2 · 2025-11-08T03:53:04Z

pyrit/scenarios/__init__.py

    "ScenarioStrategy",
    "ScenarioIdentifier",
    "ScenarioResult",
+    "RapidResponseHarmScenario",


I'm thinking we might want a whole import line here? For example

from pyrit.scenarios.ai_rt import RapidResponseHarmScenario

But we may need some init shenanigans. IDK what do you think?

you mean importing from the ai_rt folder rather than the file itself ?

rlundeen2 · 2025-11-08T03:58:22Z

pyrit/scenarios/scenarios/ai_rt/content_harm_scenario.py

+    Each harm categories has a few different strategies to test different aspects of the harm type.
+    """
+
+    ALL = ("all", {"all"})


One idea is to only have the meta-categories. I think this may make the most sense just to have hate, fairness, violence.... leakage vs each individual scenario_strategy

I think the composition makes the code quite a bit more complicated, and I would guess most users will either just want to use "all" or a subset of the categories

In other words, I think it should look like the following (and that's it)

class RapidResponseHarmStrategy(ScenarioStrategy): """ RapidResponseHarmStrategy defines a set of strategies for testing model behavior in several different harm categories. Each harm categories has a few different strategies to test different aspects of the harm type. """ ALL = ("all", {"all"}) HATE = ("hate", set[str]()) FAIRNESS = ("fairness", set[str]()) VIOLENCE = ("violence", {set[str]()) SEXUAL = ("sexual", set[str]()) HARASSMENT = ("harassment", set[str]()) MISINFORMATION = ("misinformation", set[str]()) LEAKAGE = ("leakage", set[str]())

Alternatively, if you do want a long and short running version (which I also think is legit!) I might split it up like this, where the complex attacks contain long running methods. But my gut is that, it might just be simpler to have a completely separate scenario class for those

ALL = ("all", {"all"}) HATE_QUICK = ("hate_quick", {"quick", "hate"}) HATE_EXTENDED = ("hate_extended", {"complex", "hate"}) FAIRNESS_QUICK = ("fairness_quick", {"quick", "fairness"}) ...

Either way, I'd keep specific techniques out, and specific tests/datasets

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

pyrit/scenarios/scenarios/ai_rt/content_harm_scenario.py

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

rlundeen2 · 2025-11-08T04:26:51Z

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

+        # Extract RapidResponseHarmStrategy enums from the composite
+        strategy_list = [s for s in composite_strategy.strategies if isinstance(s, RapidResponseHarmStrategy)]
+
+        # Determine the attack type based on the strategy tags


I think we should make this decision in advance if we can (and if it's what operators want).

Say we get the strategy "Hate". Maybe we could do something like pick a set of strategies for hate that we want. Something like PromptSending for baseline, MultiTurn, and RolePlaying. But I could also see specific attacks/converters being created for different categories, so it might make sense to split it up this way too.

if strategy.value == "hate": seed_groups = memory.get_seed_groups(dataset_name="ai_rt_rapid_response_1", harm_category="hate") elif strategy.value == "violence": .... #now we have the seedGroups, and do we do the same attacks with every category or are they different? # my guess might be they're the same? # and can we decide? # My guess would be they're the same strategies but different objectives. attack1 = PromptSendingAttack( objective_target=self._objective_target, attack_converter_config=attack_converter_config, attack_scoring_config=self._scorer_config, ) attack2 = .... # and then append all of these atomic attacks in the same spot. E.g. you can have more than one "hate" attack and they will be grouped together atomic_attacks.append( AtomicAttack( atomic_attack_name="hate", attack=attack1, objectives=hate_objectives, seed_groups=hate_seed_groups ) ) atomic_attacks.append( AtomicAttack( atomic_attack_name="hate", attack=attack2, objectives=hate_objectives, seed_groups=hate_seed_groups ) )

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

rlundeen2 · 2025-11-08T04:29:29Z

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

+        attack_type: type[AttackStrategy] = PromptSendingAttack
+        if attack_tag:
+            if attack_tag[0] == RapidResponseHarmStrategy.Crescendo:
+                attack_type = CrescendoAttack


One arc you might be thinking about is Crescendo. But because that takes so much longer to run we might consider a different rapid response scenario for that. And/or for this one, we could pre-compute successes so it runs really fast (e.g. similar to our second cookbook).

Yes, in talking with Frederic and considering the purpose of this scenario, I think it makes sense to exclude crescendo from this scenario. It's not an attack that is (except in one instance of ~20 attacks) used in the notebook that Frederic created that was the inspo for this scenario. IMO crescendo could be considered a more in depth analysis of a harm and this scenario is supposed to be a higher level, initial analysis.

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

rlundeen2 · 2025-11-08T04:33:27Z

Overall this is good! It'll be really nice to have solid examples here :)

My biggest feedback is that I think we should define exactly what we want out of this scenario. Here is what I think it is. "Can I get a vibe of this objective_target in a couple hours based on how it does on these harm categories".

And if we keep that strategy, we want to do the best we can to answer that question, and the strategies themselves should be baked in as much as possible. Along these lines, I'd recommend:

Simplify the strategies. I suspect most users just want to run "all" to get a vibe check, or to run specific harm categories. And if there is a strategy they want but it takes a long time (like crescendo) maybe we should split that off into a seperate longer-running scenario class.
Choose the attacks to do with those strategies explicitly (which converters and attacks to use). E.g. we can get the objectives from memory, and then this scenario can decide how we send those. I wouldn't make this configurable, because it adds another dimension to things.

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

doc/code/scenarios/rapid_response_harm_scenario.py

fdubut

Adding a few comments, mainly on structure and naming. I'll try to run my notebook shortly "as a scenario" to get a better sense of how this all works, and will share more feedback then.

doc/code/scenarios/rapid_response_harm_dataset_loading.md

fdubut · 2025-11-10T21:45:22Z

doc/code/scenarios/rapid_response_harm_scenario.py

+
+# Hate speech datasets
+
+hate_stories = await create_seed_dataset(


I would go even further and say we should provide a truly end-to-end solution here that gives some results even if the customer doesn't bring their own datasets. Of course, the conundrum is that we may not be able to share the exact datasets we're using, but maybe it's something we should actually strive for.

Btw I'll keep fighting against the ai_rt naming for external assets 😆

fdubut · 2025-11-10T21:47:26Z

doc/code/scenarios/rapid_response_harm_scenario.py

+    model_name=""
+)
+
+# Define the helper adversarial target


Given the nature of the scenario, returning aggregate results on a variety of test cases, I'm wondering if we should give the option to customers to skip all test cases that require an adversarial target if they don't have one available. I think a lot of attacks that would succeed with a true adversarial target will fail with a regular model, skewing the final results.

fdubut · 2025-11-10T23:03:44Z

doc/code/scenarios/rapid_response_harm_scenario.py

+# %%
+# Load the datasets into memory
+
+violence_civic_data = await create_seed_dataset(


In the original notebook, the prompts are sequential (passed using multi-prompt attack). I haven't looked yet at the actual scenario definition but wanted to point that out.

mentioned offline but there's an issue with Multi Prompt which basically makes it error out so for now am using red teaming attack. For this PR, I'm going to keep as RedTeaming and then when we work through that issue, I can update this scenario (I like the idea of keeping this simple and the multiprompt is a simpler multi turn attack so am preferential to using it).

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

pyrit/scenarios/scenarios/ai_rt/content_harm_scenario.py

fdubut · 2025-11-10T23:11:07Z

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

+        *,
+        objective_target: PromptTarget,
+        scenario_strategies: Sequence[RapidResponseHarmStrategy | ScenarioCompositeStrategy] | None = None,
+        adversarial_chat: Optional[PromptChatTarget] = None,


Similar to what I mentioned in my comment on the notebook, I'm wondering if we should exclude from the scenario the attacks that require an adversarial chat when none is passed.

I think we can set a default; this is what the foundry scenario does

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

jsong468 · 2025-11-11T00:35:45Z

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py

+            scenario_strategies (Sequence[RapidResponseHarmStrategy | ScenarioCompositeStrategy] | None):
+                The harm strategies or composite strategies to include in this scenario. If None,
+                defaults to RapidResponseHarmStrategy.ALL.


will a user be able to compose a multi-turn scenario strategy like Crescendo? or just sticking with single turn/multiprompt sending attacks?

Currently, the default behavior is to run PromptSending (the baseline), RolePlaying, RedTeaming, and ManyShot by default; I'm on the fence about having a basic & extended version (basic would maybe just run promptsending and red teaming vs extended which would run them all; my reservation is that idk how much value the scenario has when the basic version would be run because as the name suggests, it's pretty basic) wdyt ?

…xample_scenario

hannahwestra25 added 4 commits November 6, 2025 09:59

first draft of harm scenario

6a3da9e

Merge branch 'main' of https://github.com/Azure/PyRIT into hawestra/e…

04d5742

…xample_scenario

Merge branch 'main' of https://github.com/Azure/PyRIT into hawestra/e…

2c03890

…xample_scenario

add tests and documentation

ea488f5

hannahwestra25 commented Nov 7, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

hannahwestra25 changed the title ~~[DRAFT] rapid response harm scenario~~ Rapid response harm scenario Nov 7, 2025

jsong468 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

doc/code/scenarios/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

doc/code/scenarios/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

doc/code/scenarios/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/content_harm_scenario.py Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Nov 8, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jbolor21 reviewed Nov 8, 2025

View reviewed changes

doc/code/scenarios/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

romanlutz changed the title ~~Rapid response harm scenario~~ FEAT Rapid response harm scenario Nov 8, 2025

jsong468 reviewed Nov 10, 2025

View reviewed changes

doc/code/scenarios/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

PR comments: simplify strategies and default to several attack types

ddc6fe0

fdubut reviewed Nov 10, 2025

View reviewed changes

jsong468 reviewed Nov 11, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 11, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 11, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 11, 2025

View reviewed changes

pyrit/scenarios/scenarios/ai_rt/rapid_response_harm_scenario.py Outdated Show resolved Hide resolved

jsong468 reviewed Nov 11, 2025

View reviewed changes

hannahwestra25 added 4 commits November 11, 2025 17:10

add prompts and tests

8069f17

Merge branch 'main' of https://github.com/Azure/PyRIT into hawestra/e…

3dd0261

…xample_scenario

add scenario instructions

19dbc86

rename and add attacks

f342e50


		# Hate speech datasets

		hate_stories = await create_seed_dataset(

FEAT Rapid response harm scenario #1174

Are you sure you want to change the base?

FEAT Rapid response harm scenario #1174

Conversation

hannahwestra25 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rlundeen2 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rlundeen2 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rlundeen2 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fdubut left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hannahwestra25 commented Nov 6, 2025 •

edited

Loading

rlundeen2 Nov 8, 2025 •

edited

Loading

rlundeen2 Nov 8, 2025 •

edited

Loading

rlundeen2 Nov 8, 2025 •

edited

Loading

rlundeen2 Nov 8, 2025 •

edited

Loading

rlundeen2 commented Nov 8, 2025 •

edited

Loading