FEAT: Cyber scenario #1180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

ValbuenaVC wants to merge 31 commits into Azure:main from ValbuenaVC:cyber_scenario

Contributor

ValbuenaVC commented Nov 10, 2025 •

edited

Loading

Description

Adds a cybersecurity harms scenario to pyrit called the CyberScenario, which tests a model's willingness to generate malware via single-turn or multi-turn (red teaming) attack methods. Changes listed below:

Added CyberScenario and CyberStrategy classes
Added generic malware-oriented prompts to induce cyber harms as seed prompts
Added true/false scoring YAML for malware-oriented prompts
Fixed minor typo in grounded.yaml
Added unit tests for CyberScenario

This PR is meant to be a starting point for additional cybersecurity harm scaffolding as there are still many places CyberScenario can be expanded on.

Tests and Documentation

Unit tests focus on initialization, attack generation, execution, and scenario properties, similarly to other scenarios.

Victor Valbuena and others added 22 commits

November 4, 2025 14:59


          Adding cyber scenario.

f008d70


          Cyber scenario skeleton

f1c1f0b


          Adding contents to cyber scenario

faf1fb6


          Finishing cyber scenario

6069ab6


          Testing skeleton

2dbc4b8


          Adding scoring mechanism, t/f criteria, fixed typo in grounded.yaml

ba1d3c7


          Adding more testing skeleton

9dc837f


          Merge branch 'Azure:main' into cyber_scenario

c13c0d6


          Touching up

3c16fc7


          Merge branch 'Azure:main' into cyber_scenario

b45e291


          Attack factory logic

2b2771d


          Merge branch 'cyber_scenario' of https://github.com/ValbuenaVC/PyRIT …

b3289e7

…into cyber_scenario

Resolving merge conflict


          Wrapped up CyberScenario pre-testing, moving on to testing suite

571fb92


          Adding documentation notebook, continuing unit tests

ad8b310


          Finished initialization unit tests

d85dbd3


          Added basic demo notebook

54f142f


          Precommit hooks

20b057a


          Attack generation and properties unit tests


          Removing notebooks temporarily

3c8114a


          Merge branch 'main' into cyber_scenario

1f2fedf


          Fixing broken toctree

04c61ff


          finishing unit tests

6863f57

ValbuenaVC marked this pull request as ready for review

November 12, 2025 00:23

ValbuenaVC changed the title ~~[DRAFT] FEAT: Cyber scenario~~ FEAT: Cyber scenario

ValbuenaVC commented

View reviewed changes

tests/unit/scenarios/test_cyber_scenario.py Outdated Show resolved Hide resolved


          Update tests/unit/scenarios/test_cyber_scenario.py

44b8176

Contributor

hannahwestra25 commented Nov 12, 2025 •

edited

Loading

This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts

Contributor Author

ValbuenaVC commented Nov 12, 2025

This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts

There definitely are! CyberStrategy is used very sparsely here, which I don't like, but I haven't found a way to reconcile the nature of cybersecurity harms (which are often sequential, iterative, and don't rely on conversions as much) with the tag-based system. But it's definitely something I want to drive in a second PR

hannahwestra25 reviewed

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py Show resolved Hide resolved

hannahwestra25 reviewed

View reviewed changes

pyrit/scenarios/scenarios/cyber_scenario.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

= and others added 3 commits

November 13, 2025 00:57


          Removing reference to uv in gitignore

81267a7


          Removed match statement

b6c21ce


          Merge branch 'main' into cyber_scenario

75e77db

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/__init__.py Outdated Show resolved Hide resolved

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/scenarios/cyber_scenario.py Outdated Show resolved Hide resolved

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py Show resolved Hide resolved

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py

    
                      # Adversarial chat is used by the RedTeamingAttack (multiturn) variation of the scenario. Otherwise, it's not

                      # invoked.

                      self._adversarial_chat = adversarial_chat if adversarial_chat else self._get_default_adversarial_target()

Contributor

rlundeen2 Nov 14, 2025

Personally I don't think the level of flexibility in passing an adversarial chat is necessary. I think we can pick our best environment variables and use those. Probably I'd use AZURE_OPENAI_GPT4O_UNSAFE_ENDPOINT

But would be good to sync with @hannahwestra25 since I do think it'd be nice to be consistent across airt scenarios

Contributor Author

ValbuenaVC Nov 14, 2025

FoundryScenario allows flexible adversarial chats, which is why I added it here, but both scenarios default to AZURE_OPENAI_GPT4O_UNSAFE_ENDPOINT anyway. I think we should leave it as-is for consistency but I'm open to removing it

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py Show resolved Hide resolved

rlundeen2 reviewed

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py

    
                      1. Tries to detect malware using the instructions in the scoring config below.

                      2. Returns a true/false score for malware presence rather than something like a danger score in [0.0, 1.0].

                      """

                      return SelfAskTrueFalseScorer(

Contributor

rlundeen2 Nov 14, 2025

This is good! I'd recommend also making into a composite scorer and adding a refusal scorer to check that it's not a refusal

Contributor Author

ValbuenaVC Nov 14, 2025

Do you think this would be a good place to actually fix CyberScenario's scorer as a CyberScenarioScorer, which itself is a composite scorer that checks for multiple criteria? Using the malware generation example that could mean

SelfAskTrueFalseScorer for malware presence
Likert-based scorer for severity of exploit
Refusal scorer as a backstop (if refusal_scorer flags true, disregard other two scorers)

Contributor Author

ValbuenaVC Nov 14, 2025

I'm also not sure if it's in scope here but I could go either way, what do you think? I'll work on it locally in the meantime

= and others added 4 commits

November 14, 2025 21:28


          Removed hardcoded unit test and added fast/slow dichotomy

051891d


          Merge branch 'main' into cyber_scenario

cf44c12


          interface refactoring

9cc1042


          Moving scenario under airt

2ac3372

ValbuenaVC commented

View reviewed changes

pyrit/scenarios/scenarios/airt/cyber_scenario.py

    
                      # objective_target is guaranteed to be non-None by parent class validation

                      assert self._objective_target is not None

                      attack_strategy: Optional[AttackStrategy] = None

Contributor Author

ValbuenaVC Nov 14, 2025

This is a quirk of the existing Scenario contract that I don't know how we should resolve. I don't like my implementation here, so I would appreciate suggestions on how to reconcile it.

Scenario.initialize_async populates the Scenario._scenario_composites attribute using ScenarioStrategy.prepare_scenario_strategies, expecting that Scenario._get_atomic_attacks_async will use scenario_composites to create the List[AtomicAttack]. However, this breaks scenarios that do not want or cannot use composite strategies, since scenario_strategies is ephemeral in initialize_async and is only saved via scenario_composites.

I tried fixing this by just creating Scenario._original_strategies in initialize_async, but this is hacky, because the argument passed to initialize_async is of type Sequence[ScenarioStrategy | ScenarioCompositeStrategy]. So, you have to deserialize the composite strategies (if there are any) which leads to duplication and is hard coded anyway.

I felt the easiest fix was to compartmentalize this in CyberScenario by just accessing the first element of the composite with strategy.strategies[0], but this is a consequence of CyberStrategy being non-composable which I want to change soon. In my opinion that's a large enough change to the logic of CyberScenario that it warrants a second PR but I'm curious if there's a quicker fix I haven't found yet.

pyrit/scenarios/scenarios/airt/cyber_scenario.py

    
                      # Adversarial chat is used by the RedTeamingAttack (multiturn) variation of the scenario. Otherwise, it's not

                      # invoked.

                      self._adversarial_chat = adversarial_chat if adversarial_chat else self._get_default_adversarial_target()

Contributor Author

ValbuenaVC Nov 14, 2025

FoundryScenario allows flexible adversarial chats, which is why I added it here, but both scenarios default to AZURE_OPENAI_GPT4O_UNSAFE_ENDPOINT anyway. I think we should leave it as-is for consistency but I'm open to removing it

pyrit/scenarios/scenarios/airt/cyber_scenario.py Show resolved Hide resolved

pyrit/scenarios/__init__.py Outdated Show resolved Hide resolved

pyrit/scenarios/scenarios/airt/cyber_scenario.py

    
                      1. Tries to detect malware using the instructions in the scoring config below.

                      2. Returns a true/false score for malware presence rather than something like a danger score in [0.0, 1.0].

                      """

                      return SelfAskTrueFalseScorer(

Contributor Author

ValbuenaVC Nov 14, 2025

I'm also not sure if it's in scope here but I could go either way, what do you think? I'll work on it locally in the meantime


          Precommit fixes

0c3120a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet