update: unify on `attempt.notes["triggers"]` by leondz · Pull Request #1147 · NVIDIA/garak

leondz · 2025-04-02T08:32:00Z

Removes use of attempt.notes["trigger"], unifying on attempt.notes["triggers"] as a list of str

Include tests for some affected plugins & update their code to comply with tests

Plugins known to be affected:

continuation
promptinject

API changes:

hitlog param "trigger" (str) is now "triggers" (list)

Verification

Run general tests
Run the new tests
Run a scan with continuation and check results are reasonable
Run a scan with promptinject and check results are reasonable

…gger]:str

jmartin-tech

Looks reasonable, one possible enhancement for a hardcoded expectation noted.

jmartin-tech · 2025-04-02T13:26:57Z

garak/detectors/promptinject.py

+        triggers = attempt.notes.get("triggers", None)
+        if triggers == None:
+            return results
+        trigger = triggers[0]


It may be useful to iterate all "triggers" even if only one is ever expected to exist. The hardcoded [0] will likely be a source for a future issue report.

agree, will address

jmartin-tech · 2025-04-02T13:28:12Z

garak/probes/continuation.py

                candidate_prompt = prefix + term_variant
                if candidate_prompt not in self.prompts:
-                    self.prompts.append(candidate_prompt.strip())
+                    self.prompts.append(candidate_prompt)


Do we know why the strip() call was here originally?

Yes. A few popular contemporary continuation models (since displaced by chat models) would exhibit odd behaviour when prompts ended with space, when this was coded. This isn't the case any more. I think now we want to see these odd behaviours if they can be summoned by something as simple as ending a prompt with a space.

jmartin-tech · 2025-04-02T13:53:22Z

garak/probes/continuation.py

        for term, prefix in self._slur_prefixes:
-            lower_term = term.lower()
-            term_variants = set(["", lower_term[0]])  # blank, first letter
+            term_variants = set(["", term[0]])  # blank, first letter


Does the change of case here possibly impact response? The added test suggests no as this should be case-insensitive.

Looking closer at the codebase existing runtime calls to detect() never pass case_sensitive so all detection is expected to be case_sensitive=False.

It might impact response. I think with this change the responsibility for getting a case that elicits useful responses is shifted to the slur prefixes data, which seems a fine place for it to reside.

Interesting re: no use of case_sensitive! Maybe this isn't the right place to expose that param.

leondz added 3 commits April 2, 2025 09:06

unify on attempt.notes[triggers] as List[str], drop attempt.notes[tri…

e240202

…gger]:str

specify continuation detector behaviour via tests

e13221b

test continuation probe, clean up prompt formulation

6cc190f

leondz added the architecture Architectural upgrades label Apr 2, 2025

leondz requested a review from jmartin-tech April 2, 2025 08:32

jmartin-tech approved these changes Apr 2, 2025

View reviewed changes

handle empty triggers list with no results

7459bb0

leondz merged commit 744d7b0 into NVIDIA:main Apr 2, 2025
9 checks passed

leondz deleted the update/trigger_triggers branch April 2, 2025 19:09

github-actions bot locked and limited conversation to collaborators Apr 2, 2025

leondz restored the update/trigger_triggers branch April 2, 2025 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: unify on `attempt.notes["triggers"]`#1147

update: unify on `attempt.notes["triggers"]`#1147
leondz merged 4 commits intoNVIDIA:mainfrom
leondz:update/trigger_triggers

leondz commented Apr 2, 2025

Uh oh!

jmartin-tech left a comment

Uh oh!

jmartin-tech Apr 2, 2025

Uh oh!

leondz Apr 2, 2025

Uh oh!

jmartin-tech Apr 2, 2025

Uh oh!

leondz Apr 2, 2025

Uh oh!

jmartin-tech Apr 2, 2025

Uh oh!

leondz Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leondz commented Apr 2, 2025

Verification

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

leondz Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

leondz Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

leondz Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants