You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`bcp47` is a type; it describes format not function
renamed to `lang` or `lang_spec` for more descriptive, precise,
intuitive code
learnings & positions taken here:
* Probes are monolingual. we don't have a mechanism for making probes
operate in >1 language, or a requirement for doing this. Thus for
probes, `bcp47` -> `lang`.
* Detectors are optionally multilingual. we already have detectors
implementing this, and it's intuitive that content returned by an llm
can be in more than one language, and that detectors support >1 language
- this is zero extra lift. thus for detectors, `bcp47` -> `lang_spec`
* `Attempt` language semantics are unclear. `Attempt`s should be in one
language, especially after unanimous decisions made during
implementation of multilingual, but attempt bcp47 is occasionally
populated from detector or probe bcp47. This PR takes the position that
attempts are monolingual. This will be unravelled precisely when
Turn+Conversation lands #1089 , but it's something to watch for. there
are a few assignments left, e.g. in `detectors.base.Detector.detect`,
that violate this.
* "xx,\*" is not a valid lang spec, it's equivalent to "*"
* we will follow IANA BCP47 strictly
* `langcodes` provides some normalisation functions that _might_ be
useful in the language code format mapping
Resolves#1139
Copy file name to clipboardExpand all lines: docs/source/garak.detectors.base.rst
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,22 @@
1
1
garak.detectors.base
2
2
====================
3
3
4
+
This class defines the basic structure of garak's detectors. All detectors inherit from ``garak.detetors.base.Detector``.
5
+
6
+
Attributes:
7
+
8
+
9
+
1. **doc_uri** URI for documentation of the detector (perhaps a paper)
10
+
1. **lang_spec** Language this is for. format: a comma-separated list of BCP47 tags, or "*" for any or not applicable. Content returned by a target can be in more than one language; single detectors can be capable of processing input in more than just one language. This field tracks which ones are supported. NB this is different from probe, which is monolingual and uses ``lang``.
11
+
1. **active** Should this detector be used by default?
12
+
1. **tags** MISP-format taxonomy categories
13
+
1. **precision** Anticipated precision of detector
14
+
1. **recall** Anticipated recall of detector
15
+
1. **accuracy** Anticipated accuracy of detector
16
+
1. **modality** Which modalities does this detector work on? ``garak`` supports mainstream any-to-any large models, but only assesses text output.
Copy file name to clipboardExpand all lines: docs/source/garak.probes.base.rst
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,22 @@
1
1
garak.probes.base
2
2
=================
3
3
4
-
This class defines the basic structure of garak's probes. All probes inherit from garak.probes.base.Probe.
4
+
This class defines the basic structure of garak's probes. All probes inherit from ``garak.probes.base.Probe``.
5
5
6
6
Attributes:
7
7
8
-
* generations - How many responses should be requested from the generator per prompt.
8
+
1. **doc_uri** URI for documentation of the probe (perhaps a paper)
9
+
1. **lang** Language this is for, in BCP47 format; ``*`` for all langs. Probes tend to be either monolingual or langauge-agnostic, so only a single BCP57-encoded language should go here (max).
10
+
1. **active** Should this probe be run by default?
11
+
1. **tags** MISP-format taxonomy categories
12
+
1. **goal** What the probe is trying to do, phrased as an imperative
13
+
1. **primary_detector** Default detector to run, if the primary/extended way of doing it is to be used
Copy file name to clipboardExpand all lines: garak/attempt.py
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -38,8 +38,8 @@ class Attempt:
38
38
:type seq: int
39
39
:param messages: conversation turn histories; list of list of dicts have the format {"role": role, "content": text}, with actor being something like "system", "user", "assistant"
40
40
:type messages: List(dict)
41
-
:param bcp47: Language code for prompt as sent to the target
42
-
:type bcp47: str
41
+
:param lang: Language code for prompt as sent to the target
42
+
:type lang: str, valid BCP47
43
43
:param reverse_translator_outputs: The reverse translation of output based on the original language of the probe
44
44
:param reverse_translator_outputs: List(str)
45
45
@@ -76,7 +76,7 @@ def __init__(
76
76
detector_results=None,
77
77
goal=None,
78
78
seq=-1,
79
-
bcp47=None, # language code for prompt as sent to the target
79
+
lang=None, # language code for prompt as sent to the target
0 commit comments