refactor `LatentInjection` by leondz · Pull Request #1152 · NVIDIA/garak

leondz · 2025-04-07T09:32:17Z

The LatentInjection module had an overly complex inheritance graph.

This is updated to use mixins, as follows:

LatentInjectionMixin - retained; used in all probes; adds tags, detector, methods for assembling prompts & triggers
NonFullMixin - used to map Full probes to standard (i.e. lightweight) versions
TranslationMixin - templates and assembly for translation-based injections
FactSnippetMixin - docs, instructions, injections, and assembly for fact snippet-based instructions

LatentWhoisSnippet took a heavy refactoring; previously inherited on LatentInjectionFactSnippetEiffelFull, now inherits LatentWhois and FactSnippetMixin

Verification

python -m pytest tests/probes/test_probes_latentinjection.py (this should pass solo without config fixtures being loaded)
python -m pytest tests/probes/test_probes.py
Check that prompts from each probe correspond to the intended attack
Check that probes have the correct active True/False depending on their size

…nality to mixins

mrowebot

LGTM 👍

garak/probes/latentinjection.py

jmartin-tech

Testing shows this changed the prompt count for LatentWhoisSnippet, looking closer into why.

From 0.10.3.1

latentinjection.LatentWhois                                                 base.TriggerListDetector: PASS  ok on   28/  28
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on   32/  32

On main @ 55da36b

latentinjection.LatentWhois                                                 base.TriggerListDetector: PASS  ok on   28/  28
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on   32/  32

This PR branch:

latentinjection.LatentWhois                                                 base.TriggerListDetector: PASS  ok on   28/  28
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on  256/ 256

I suspect this is not an expected change.

leondz · 2025-04-17T18:29:39Z

Thanks, good catch. I believe the randomisation logic changed for this probe to be more in line with common practice in garak, rather than predicated on generations, so I think some change is likely if that happens to be the cause.

garak/probes/latentinjection.py

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

…ing on non-string input)

jmartin-tech

Review of LatentWhoisSnippet shows the permutations selected are consistent but not identical to previous prompts. The application of soft_prompt_cap as a limiter in the Full version is part of the reason in testing no identical prompts were found.

This can land as is, however I made some minor comments related to this unexpected usage of soft_prompt_cap in Full probes and that some for the application of FactSnippitMixin seems spurious which make continue to make this module difficult to maintain.

garak/probes/latentinjection.py

jmartin-tech · 2025-04-25T22:00:56Z

garak/probes/latentinjection.py


-
-class LatentWhoisSnippet(LatentInjectionFactSnippetEiffelFull):
+class LatentWhoisSnippetFull(FactSnippetMixin, LatentWhois):


Optional, this does not look be needed, nothing is being inherited from FactSnippetMixin.

FactSnippetMixin.injection_instructions is still accessible as written in line 602 if this is removed.

Suggested change

class LatentWhoisSnippetFull(FactSnippetMixin, LatentWhois):

class LatentWhoisSnippetFull(LatentWhois):

Snippet building needs refactoring anyway after the current fix in #1181 - will update that PR and then it should be processed directly after this one.

on second thoughts:

will leave this inheritance to express intent

bugfix: reduce latent optimisation permutation explosion #1181 creeps if it addresses this

will pick up in later refactor-only PR that needn't block recalibration

leondz · 2025-04-26T04:12:50Z

Review of LatentWhoisSnippet shows the permutations selected are consistent but not identical to previous prompts

Sounds like expected behaviour - both are intentionally sampling rather than using whole population. Good it's consistent. Tests based on validation learnings welcome

leondz added 2 commits April 7, 2025 11:17

refactor LatentInjection; move Translation, Snippet, Non-full functio…

071ad4c

…nality to mixins

move docstrings into non-mixin classes

8e3e5af

leondz added architecture Architectural upgrades probes Content & activity of LLM probes labels Apr 7, 2025

leondz assigned jmartin-tech and unassigned jmartin-tech Apr 7, 2025

leondz requested a review from jmartin-tech April 7, 2025 09:43

merge main

6f0589c

mrowebot reviewed Apr 14, 2025

View reviewed changes

garak/probes/latentinjection.py Show resolved Hide resolved

garak/probes/latentinjection.py Show resolved Hide resolved

This was referenced Apr 15, 2025

refactor probe tier as enum with value in plugin cache #1159

Merged

bug: latentinjection.FactSnippetMixin can give dupe environments #1161

Closed

jmartin-tech mentioned this pull request Apr 15, 2025

Removed detector prefix from eval records #1157

Merged

2 tasks

jmartin-tech requested changes Apr 17, 2025

View reviewed changes

erickgalinkin reviewed Apr 18, 2025

View reviewed changes

garak/probes/latentinjection.py Outdated Show resolved Hide resolved

leondz and others added 2 commits April 18, 2025 16:57

Update garak/probes/latentinjection.py

07a2a6f

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

merge main

5b8a3c3

leondz mentioned this pull request Apr 23, 2025

script: qualitative review output #1144

Merged

1 task

leondz added 3 commits April 23, 2025 16:06

prune leftover bcp47 tags

91acf85

give clear err msg if probe lang not set (instead of langcodes explod…

c02540b

…ing on non-string input)

prune spurious declarations

2e119fe

leondz force-pushed the update/latentinjection_refactor branch from b80749a to 2e119fe Compare April 24, 2025 14:00

Merge branch 'main' into update/latentinjection_refactor

99a4242

leondz mentioned this pull request Apr 25, 2025

bugfix: reduce latent optimisation permutation explosion #1181

Merged

2 tasks

jmartin-tech approved these changes Apr 25, 2025

View reviewed changes

leondz added 2 commits April 26, 2025 07:57

probe load must fail if target_lang_name_en is not set

2fd7857

also log if target_lang_name_en not set

028e3c9

leondz merged commit 37f684d into NVIDIA:main Apr 26, 2025
9 checks passed

github-actions bot locked and limited conversation to collaborators Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor `LatentInjection`#1152

refactor `LatentInjection`#1152
leondz merged 11 commits intoNVIDIA:mainfrom
leondz:update/latentinjection_refactor

leondz commented Apr 7, 2025 •

edited

Loading

Uh oh!

mrowebot left a comment

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment

Uh oh!

leondz commented Apr 17, 2025

Uh oh!

Uh oh!

jmartin-tech left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jmartin-tech Apr 25, 2025

Uh oh!

leondz Apr 26, 2025

Uh oh!

leondz Apr 26, 2025

Uh oh!

leondz commented Apr 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		class LatentWhoisSnippet(LatentInjectionFactSnippetEiffelFull):
		class LatentWhoisSnippetFull(FactSnippetMixin, LatentWhois):

	class LatentWhoisSnippetFull(FactSnippetMixin, LatentWhois):
	class LatentWhoisSnippetFull(LatentWhois):

Conversation

leondz commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

mrowebot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

leondz commented Apr 17, 2025

Uh oh!

Uh oh!

jmartin-tech left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmartin-tech Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

leondz Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

leondz Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

leondz commented Apr 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leondz commented Apr 7, 2025 •

edited

Loading

jmartin-tech left a comment •

edited

Loading