reporting: tier biased security aggregate#1329
Merged
leondz merged 31 commits intoNVIDIA:mainfrom Jan 28, 2026
Merged
Conversation
Collaborator
Author
|
NB:
|
aishwaryap
approved these changes
Jan 16, 2026
Collaborator
aishwaryap
left a comment
There was a problem hiding this comment.
Didn't try running it but logic wise this looks good to me!
Collaborator
Author
Collaborator
Author
|
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
erickgalinkin
approved these changes
Jan 27, 2026
Collaborator
erickgalinkin
left a comment
There was a problem hiding this comment.
Some minor phrasing nits. Otherwise looks good to me.
| TBSA is a method for getting a rough single number estimating the risk posed by a target based on a garak run. | ||
|
|
||
| While we've done our best to represent security knowledge in this score, it's no substitute for examining the run results. | ||
| Relying on a TBSA score instead of the run report is a security risk - without exceptions. **Do not do this, do not let other people do this**. |
|
|
||
| You can also join our `Discord <https://discord.gg/uVch4puUCs>`_ | ||
| and follow us on `Twitter <https://twitter.com/garak_llm>`_! | ||
| and follow us on `LinkedIn <https://www.linkedin.com/company/garakllm/>`_ & `X <https://www.twitter.com/garak_llm>`_! |
Collaborator
There was a problem hiding this comment.
We're still on X, the everything app? Should we be on Bluesky? :P
Collaborator
Author
There was a problem hiding this comment.
We're open to receiving followers. Open up a bluesky if you want to manage it!
Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

adds single-scalar aggregating from garak results, biased by tier
Problem
Since garak results are spread across each probe, it is hard to evaluate if a given model is good or bad. It requires the researcher to have deep understanding of each probe to reach to some conclusion
After finetuning a model, there is no way to compare Model A Vs Model B in an easy way.
Ask - The ask is NOT to make Garak benchmark a score but rather provide an easy way to non-security experts to evaluate results
Desiderata
Proposal
Let’s use “tier-based score aggregate"
How it works
garak.analyzeNB: No garak score is stable over time. This is intended behaviour. For comparability, appropriate parts of config need to match; see #1193
Notes
The score can only be compared among identical versions of garak and identical configs. We need to give a key identifying the basis for comparison (e.g. config & version). Tracking this feature in garak issue 1193.
Because “proportion of probe detections passed/failed”, we can use a sensible scale
Percentage is still a bad idea - 100% sounds like a target is fully secure, but garak can never show this. No intent to support this design, for safety reasons
Scores will be affected by the mixture of detectors and probes within Tier 1; later iterations, after technique/intent lands, will let us group this up by strategies & impacts, allowing more meaningful & stable results
Items not in the bag may not be included, meaning an integration overhead for probe updates. One proposal: just use the absolute score, we’re doing a min() for aggregation anyway
Beside config & version, any given TBSA score is predicated on:
How this fills the goals
What’s the ideal solution like
We should really be balancing score components grouped by their characteristics, rather than just how many probe/detector pairs there are. Ideally we’d like to be able to group techniques and group impacts. We have a couple of typologies for these but aren’t well-informed enough. Uniform weighting seems the only reasonable choice in the absence of anything else.
Implementation
Build this as a new analysis tool.
consumes:
relies on:
outputs:
usage:
python -m garak.analyze.tbsa -r <report.jsonl filepath>garak.analyze.tbsa.digest_to_tbsa()open questions/extensions: