reporting: tier biased security aggregate by leondz · Pull Request #1329 · NVIDIA/garak

leondz · 2025-08-11T10:18:37Z

adds single-scalar aggregating from garak results, biased by tier

Problem

Since garak results are spread across each probe, it is hard to evaluate if a given model is good or bad. It requires the researcher to have deep understanding of each probe to reach to some conclusion
After finetuning a model, there is no way to compare Model A Vs Model B in an easy way.

Ask - The ask is NOT to make Garak benchmark a score but rather provide an easy way to non-security experts to evaluate results

Desiderata

A single result for a garak run, for model teams
Quantitative, scalar score (though not necessarily in a metric space)
Some stability
Simple to understand for model and system developers
Simple to understand for security folk
Weights failures in important probes higher
Gives lower score to targets with higher variation in performance over probes but same averages
Prioritises increases in rate of severity or failure
Doesn't overstate precision

Proposal

Let’s use “tier-based score aggregate"

How it works

We could use pass/fail for every probe:detector pair, but this might not afford enough granularity
Each probe:detector result (both pass rate and Z-score) is graded internally in garak on a 1-5 scale, 5 is great, 1 is awful - this uses the DEFCON scale
- Grading boundaries are determined through experience using garak for review
- Value boundaries stored currently in garak.analyze
- DEFCON 1 & 2 are fails, according to Tier descriptions
Moving from pass/fail to a 1-5 scale gives us more granularity
First, we aggregate each probe:detector’s scores into one. This means combining the pass rate and Z-score. To do this, we extract the DEFCON for pass rate and for Z-score, and take the minimum. Any other aggregation measure would conceal important failures.
Next, we group probe:detector aggregate defcons by tier
We calculate the aggregate (e.g. harmonic mean, interpolated lower quartile) for Tier 1 and for Tier 2 probe:detector pairs
We take the weighted mean of Tier 1 and Tier 2 probes; propose a 2:1 weighting here
Round to 1 d.p.
Now you have a score in the range 1.0-5.0 where higher is better. \o/

NB: No garak score is stable over time. This is intended behaviour. For comparability, appropriate parts of config need to match; see #1193

Notes

The score can only be compared among identical versions of garak and identical configs. We need to give a key identifying the basis for comparison (e.g. config & version). Tracking this feature in garak issue 1193.

Because “proportion of probe detections passed/failed”, we can use a sensible scale

Percentage is still a bad idea - 100% sounds like a target is fully secure, but garak can never show this. No intent to support this design, for safety reasons

Scores will be affected by the mixture of detectors and probes within Tier 1; later iterations, after technique/intent lands, will let us group this up by strategies & impacts, allowing more meaningful & stable results

Items not in the bag may not be included, meaning an integration overhead for probe updates. One proposal: just use the absolute score, we’re doing a min() for aggregation anyway

Beside config & version, any given TBSA score is predicated on:

DEFCON boundaries for pass rate & Z-score (stable, pretty confident about these)
Composition of Tier 1 and Tier 2’s probes (flux only between releases)
Detectors chosen for each probe (low flux, only between releases)
Models & regex used for detector (low flux, only between releases)
External APIs used and whatever is going on there (who knows)

How this fills the goals

Garak TBSA is a single, scalar item
It’s stable for the same version and config
Dev teams can understand ratings 1-5, 1=bad 5=good
Many security folks understand DEFCON scores; many either lived through the cold war, saw Wargames, or have some familiarity with big military
Tier-1 probes have high impact, Tier-3 + U probes have no impact, so we have a weighting
Aggregation function downweights high-variance results
Coarse granularity of one decimal place expresses precision appropriately

What’s the ideal solution like

We should really be balancing score components grouped by their characteristics, rather than just how many probe/detector pairs there are. Ideally we’d like to be able to group techniques and group impacts. We have a couple of typologies for these but aren’t well-informed enough. Uniform weighting seems the only reasonable choice in the absence of anything else.

Implementation

Build this as a new analysis tool.

consumes:

report digest object

relies on:

absolute scores in digest
relative scores in digest
defcons in digest
tier defs in digest

outputs:

2s.f. / 1d.p. score [1.0,5.0]

usage:

python -m garak.analyze.tbsa -r <report.jsonl filepath>
garak.analyze.tbsa.digest_to_tbsa()

open questions/extensions:

leondz · 2025-11-11T06:19:48Z

NB:

has it been updated for the latest digest keys?
should also be updated for argparse to be consistent with other tools in the analyze package.

aishwaryap

Didn't try running it but logic wise this looks good to me!

…tch garak design lang

leondz · 2026-01-16T13:34:58Z

added JSON output. it's looking p good rn

leondz · 2026-01-20T15:57:36Z

✅ set utf-8 sys output
✅ docs: what is it, when to use, how to use. link waiting on docs: add brief stubs for garak.analyze content #1569

docs/source/analyze.tbsa.rst

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

erickgalinkin

Some minor phrasing nits. Otherwise looks good to me.

docs/source/analyze.tbsa.rst

erickgalinkin · 2026-01-27T19:20:38Z

docs/source/analyze.tbsa.rst

+TBSA is a method for getting a rough single number estimating the risk posed by a target based on a garak run.
+
+While we've done our best to represent security knowledge in this score, it's no substitute for examining the run results.
+Relying on a TBSA score instead of the run report is a security risk - without exceptions. **Do not do this, do not let other people do this**.


docs/source/analyze.tbsa.rst

erickgalinkin · 2026-01-27T19:24:15Z

docs/source/index.rst


 You can also join our `Discord <https://discord.gg/uVch4puUCs>`_
-and follow us on `Twitter <https://twitter.com/garak_llm>`_!
+and follow us on `LinkedIn <https://www.linkedin.com/company/garakllm/>`_ & `X <https://www.twitter.com/garak_llm>`_!


We're still on X, the everything app? Should we be on Bluesky? :P

We're open to receiving followers. Open up a bluesky if you want to manage it!

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

leondz added 3 commits August 11, 2025 08:13

draft tbsa/tb1q impl

b9bddaf

make tbsa a func, rely on report calibration + defcons

4fb75ae

trunc tbsa, make debug optional

c0f5e77

leondz self-assigned this Aug 11, 2025

leondz added the reporting Reporting, analysis, and other per-run result functions label Aug 11, 2025

leondz added 4 commits December 5, 2025 13:04

Merge branch 'main' into reporting/tbsa

663c804

add tbsa function and cli tool

5f3f7da

tbsa tests

d4b80c6

add light simple canary for if runs are definitely incompatible

590ddd6

leondz marked this pull request as ready for review December 9, 2025 16:42

leondz requested review from aishwaryap and erickgalinkin January 13, 2026 09:15

aishwaryap approved these changes Jan 16, 2026

View reviewed changes

leondz added 4 commits January 16, 2026 12:31

Merge branch 'main' into reporting/tbsa

d5b2392

Merge branch 'main' into reporting/tbsa

d22ccbb

give helpful messaging with unsupported content, get CLI output to ma…

5d38e9e

…tch garak design lang

add option for json output to ease result consumption

190aede

leondz requested a review from jmartin-tech January 16, 2026 13:35

update tbsa tests to have valid version, matched signature w/ pdcount

c888c5c

leondz added 3 commits January 21, 2026 11:50

Merge branch 'main' into reporting/tbsa

8a374ba

config terminal for utf-8

565dd8e

add tbsa dox

d8f9d82

jmartin-tech reviewed Jan 21, 2026

View reviewed changes

leondz and others added 4 commits January 21, 2026 20:03

Update docs/source/analyze.tbsa.rst

e0ae09f

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

Update docs/source/analyze.tbsa.rst

8b621fb

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

Update docs/source/analyze.tbsa.rst

94063aa

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

Update docs/source/analyze.tbsa.rst

9169696

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

leondz and others added 6 commits January 21, 2026 20:04

Update docs/source/analyze.tbsa.rst

ac2ff9e

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

Update docs/source/analyze.tbsa.rst

7a6836c

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

reword in intro, renumber list, link Tier, update titling

8fd92a7

Merge branch 'main' into reporting/tbsa

50bcfb2

link tbsa from analyze page

c3b7158

update socials

d5f4f33

erickgalinkin approved these changes Jan 27, 2026

View reviewed changes

leondz and others added 6 commits January 28, 2026 10:52

document instead of propose

78a2350

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

explain defcon

413f97a

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

para breaks

6afaefa

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

Merge branch 'main' into reporting/tbsa

96d66ec

always print tbsa at 1dp

7846a0c

print json output path

d70eca4

leondz merged commit 0d9b5de into NVIDIA:main Jan 28, 2026
15 checks passed

github-actions bot locked and limited conversation to collaborators Jan 28, 2026

Conversation

leondz commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Desiderata

Proposal

How it works

Notes

How this fills the goals

What’s the ideal solution like

Implementation

Uh oh!

leondz commented Nov 11, 2025

Uh oh!

aishwaryap left a comment

Choose a reason for hiding this comment

Uh oh!

leondz commented Jan 16, 2026

Uh oh!

leondz commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erickgalinkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erickgalinkin Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erickgalinkin Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

leondz Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leondz commented Aug 11, 2025 •

edited

Loading

leondz commented Jan 20, 2026 •

edited

Loading