NVIDIA · leondz · Sep 15, 2025 · Sep 4, 2025 · Sep 5, 2025 · Sep 5, 2025
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,24 @@
+# Documentation
+
+## Building the Documentation
+
+1. Install dependencies:
+
+   ```console
+   python3 -m pip install -r requirements.txt
+   python3 -m pip install -r docs/requirements-docs.txt
+   ```
+
+1. Build the documentation:
+
+   ```console
+   make -C docs/source doc
+   ```
+
+   The HTML is created in the `docs/source/html` directory.
+
+## Publishing the Documentation
+
+Tag the commit to publish with `docs-v<semver>`.
+
+To avoid publishing the documentation as the latest, ensure the commit has `/not-latest` on a single line, tag that commit, and push to GitHub.
diff --git a/docs/source/configurable.rst b/docs/source/configurable.rst
@@ -1,16 +1,17 @@
 ..  headings: = - ^ "
-Configuring ``garak``
-=====================
+
+Configuring garak
+=================
 
 Beyond the standard CLI options, garak is highly configurable.
 You can use YAML files to configure a garak run, down to the level
 of exactly how each plugin behaves.
 
 
-Specifying custom configuration
+Specifying Custom Configuration
 -------------------------------
 
-``garak`` can be configured in multiple ways:
+garak can be configured in multiple ways:
 
 * Via command-line parameters
 * Using YAML configs
@@ -19,8 +20,8 @@ Specifying custom configuration
 The easiest way is often to use a YAML config, and how to do that is
 described below.
 
-Garak's config hierarchy
-^^^^^^^^^^^^^^^^^^^^^^^^
+Garak Config Hierarchy
+^^^^^^^^^^^^^^^^^^^^^^
 
 Configuration values can come from multiple places. At garak load, the
 ``_config`` module manages parsing configuration. This includes determining
@@ -90,8 +91,8 @@ Here we can see many entries that correspond to command line options, such as
 such as ``show_100_pass_modules``.
 
 
-``system`` config items
-"""""""""""""""""""""""
+System Config Items
+"""""""""""""""""""
 
 * ``parallel_requests`` - For generators not supporting multiple responses per prompt: how many requests to send in parallel with the same prompt? (raising ``parallel_attempts`` generally yields higher performance, depending on how high ``generations`` is set)
 * ``parallel_attempts`` - For parallelisable generators, how many attempts should be run in parallel? Raising this is a great way of speeding up garak runs for API-based models
@@ -102,8 +103,8 @@ such as ``show_100_pass_modules``.
 * ``enable_experimental`` - Enable experimental function CLI flags. Disabled by default. Experimental functions may disrupt your installation and provide unusual/unstable results. Can only be set by editing core config, so a git checkout of garak is recommended for this.
 * ``max_workers`` - Cap on how many parallel workers can be requested. When raising this in order to use higher parallelisation, keep an eye on system resources (e.g. `ulimit -n 4026` on Linux)
 
-``run`` config items
-""""""""""""""""""""
+Run Config Items
+""""""""""""""""
 
 * ``system_prompt`` -- If given and not overriden by the probe itself, probes will pass the specified system prompt when possible for generators that support chat modality.
 * ``probe_tags`` - If given, the probe selection is filtered according to these tags; probes that don't match the tags are not selected
@@ -116,8 +117,9 @@ such as ``show_100_pass_modules``.
 * ``target_lang`` - A single language (as BCP47 that the target application for LLM accepts as prompt and output
 * ``langproviders`` - A list of configurations representing providers for converting from probe language to lang_spec target languages (BCP47)
 
-``plugins`` config items
-""""""""""""""""""""""""
+Plugins Config Items
+""""""""""""""""""""
+
 * ``model_type`` - The generator model type, e.g. "nim" or "huggingface"
 * ``model_name`` - The name of the model to be used (optional - if blank, type-specific default is used)
 * ``probe_spec`` - A comma-separated list of probe modules or probe classnames (in ``module.classname``) format to be used. If a module is given, only ``active`` plugin in that module are chosen, this is equivalent to passing `-p` to the CLI
@@ -135,8 +137,9 @@ such as ``show_100_pass_modules``.
 For an example of how to use the ``detectors``, ``generators``, ``buffs``,
 ``harnesses``, and ``probes`` root entries, see :ref:`Configuring plugins with YAML <config_with_yaml>` below.
 
-``reporting`` config items
-""""""""""""""""""""""""""
+Reporting Config Items
+""""""""""""""""""""""
+
 * ``report_dir`` - Directory for reporting; defaults to ``$XDG_DATA/garak/garak_runs``
 * ``report_prefix`` - Prefix for report files. Defaults to ``garak.$RUN_UUID``
 * ``taxonomy`` - Which taxonomy to use to group probes when creating HTML report
@@ -163,7 +166,7 @@ These are great places to look at to get an idea of how garak YAML configs can l
 Quick configs are stored under ``garak/configs/`` in the source code/install.
 
 
-Using a custom config
+Using a Custom Config
 ^^^^^^^^^^^^^^^^^^^^^
 
 To override values in this we can create a new YAML file and point to it from the

diff --git a/docs/source/translation.rst b/docs/source/translation.rst
@@ -1,4 +1,4 @@
-Translation support
+Translation Support
 ===================
 
 Garak enables translation support for probe and detector keywords and triggers.
@@ -161,21 +161,21 @@ Google Cloud Translation
 For Google Cloud Translation, run the following command:
 You use the following yaml config.
 
-.. code-block:: yaml 
+.. code-block:: yaml
 
     run:
-      target_lang: {target language code}
+      target_lang: <target-language-code>
       langproviders:
-        - language: {source language code},{target language code}
+        - language: <source-language-code>,<target-language-code>
           model_type: remote.GoogleTranslator
-        - language: {target language code},{source language code}
+        - language: <target-language-code>,<source-language-code>
           model_type: remote.GoogleTranslator
 
 
 .. code-block:: bash
 
     export GOOGLE_APPLICATION_CREDENTIALS=<path to credential configuration json file>
-    python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file} 
+    python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}
 
 
 Local

diff --git a/garak/attempt.py b/garak/attempt.py
@@ -129,7 +129,7 @@ def last_message(self, role=None) -> Message:
         """The last message exchanged in the conversation
 
         :param role: Optional, role to search for
-        type: str
+        :type role: str
         """
         if len(self.turns) < 1:
             raise ValueError("No messages available")

diff --git a/garak/generators/nim.py b/garak/generators/nim.py
@@ -1,7 +1,7 @@
 # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
-"""NVIDIA Inference Microservice LLM interface"""
+"""NVIDIA NIM Microservice LLM Interface"""
 
 import logging
 import random
@@ -16,23 +16,23 @@
 
 
 class NVOpenAIChat(OpenAICompatible):
-    """Wrapper for NVIDIA-hosted NIMs. Expects NIM_API_KEY environment variable.
-
-    Uses the [OpenAI-compatible API](https://docs.nvidia.com/ai-enterprise/nim-llm/latest/openai-api.html)
-    via direct HTTP request.
+    """Wrapper for NVIDIA NIM microservices hosted on build.nvidia.com and self-hosted.
+    Connects to the v1/chat/completions endpoint.
+    You must set the NIM_API_KEY environment variable even if you connect to a self-hosted NIM.
 
     To get started with this generator:
-    #. Visit [https://build.nvidia.com/explore/reasoning](build.nvidia.com/explore/reasoning)
-    and find the LLM you'd like to use.
-    #. On the page for the LLM you want to use (e.g. [mixtral-8x7b-instruct](https://build.nvidia.com/mistralai/mixtral-8x7b-instruct)),
-    click "Get API key" key above the code snippet. You may need to create an
-    account. Copy this key.
-    #. In your console, Set the ``NIM_API_KEY`` variable to this API key. On
-    Linux, this might look like ``export NIM_API_KEY="nvapi-xXxXxXx"``.
-    #. Run garak, setting ``--model_name`` to ``nim`` and ``--model_type`` to
-    the name of the model on [build.nvidia.com](https://build.nvidia.com/)
-    - e.g. ``mistralai/mixtral-8x7b-instruct-v0.1``.
 
+    #. Visit https://build.nvidia.com/explore/reasoning and find the LLM you'd like to use.
+    #. On the page for the LLM you want to use (such as `mixtral-8x7b-instruct <https://build.nvidia.com/mistralai/mixtral-8x7b-instruct>`__),
+       click **Get API key** above the code snippet.
+
+       You might need to create an account if you don't have one yet.
+       Copy this key.
+    #. In your console, set the ``NIM_API_KEY`` variable to this API key.
+
+       On Linux, this might look like ``export NIM_API_KEY="nvapi-xXxXxXx"``.
+    #. Run garak, setting ``--model_type='nim.NVIDIAOpenAIChat'`` and ``--model_name`` to
+       the name of the model on build.nvidia.com, such as ``--model_name='mistralai/mixtral-8x7b-instruct-v0.1'``.
     """
 
     # per https://docs.nvidia.com/ai-enterprise/nim-llm/latest/openai-api.html
@@ -111,25 +111,23 @@ def __init__(self, name="", config_root=_config):
 
 
 class NVOpenAICompletion(NVOpenAIChat):
-    """Wrapper for NVIDIA-hosted NIMs. Expects NIM_API_KEY environment variable.
+    """Wrapper for NVIDIA NIM microservices hosted on build.nvidia.com and self-hosted.
+    Connects to the v1/completions endpoint.
+    You must set the NIM_API_KEY environment variable even if you connect to a self-hosted NIM.
 
-    Uses the [OpenAI-compatible API](https://docs.nvidia.com/ai-enterprise/nim-llm/latest/openai-api.html)
-    via direct HTTP request.
+    To get started with this generator:
 
-    This generator supports only ``completion`` and NOT ``chat``-format models.
+    #. Visit https://build.nvidia.com/explore/reasoning and find the LLM you'd like to use.
+    #. On the page for the LLM you want to use (such as `mixtral-8x7b-instruct <https://build.nvidia.com/mistralai/mixtral-8x7b-instruct>`__),
+       click **Get API key** above the code snippet.
 
-    To get started with this generator:
-    #. Visit [build.nvidia.com/explore/reasoning](build.nvidia.com/explore/reasoning)
-    and find the LLM you'd like to use.
-    #. On the page for the LLM you want to use (e.g. [mixtral-8x7b-instruct](https://build.nvidia.com/mistralai/mixtral-8x7b-instruct)),
-    click "Get API key" key above the code snippet. You may need to create an
-    account. Copy this key.
-    #. In your console, Set the ``NIM_API_KEY`` variable to this API key. On
-    Linux, this might look like ``export NIM_API_KEY="nvapi-xXxXxXx"``.
-    #. Run garak, setting ``--model_name`` to ``nim`` and ``--model_type`` to
-    the name of the model on [build.nvidia.com](https://build.nvidia.com/)
-    - e.g. ``mistralai/mixtral-8x7b-instruct-v0.1``.
+       You might need to create an account if you don't have one yet.
+       Copy this key.
+    #. In your console, set the ``NIM_API_KEY`` variable to this API key.
 
+       On Linux, this might look like ``export NIM_API_KEY="nvapi-xXxXxXx"``.
+    #. Run garak, setting ``--model_type='nim.NVIDIAOpenAIChat'`` and ``--model_name`` to
+       the name of the model on build.nvidia.com, such as ``--model_name='mistralai/mixtral-8x7b-instruct-v0.1'``.
     """
 
     def _load_client(self):
@@ -138,11 +136,27 @@ def _load_client(self):
 
 
 class NVMultimodal(NVOpenAIChat):
-    """Wrapper for text + image / audio to text NIMs. Expects NIM_API_KEY environment variable.
+    """Wrapper for text and image / audio to text NVIDIA NIM microservices hosted on build.nvidia.com and self-hosted.
+    You must set the NIM_API_KEY environment variable even if you connect to a self-hosted NIM.
+
+    Expects keys to be a dict with keys ``text`` (required), and ``image`` or ``audio`` (optional).
+    Message is sent with ``role`` and ``content`` where ``content`` is structured as text
+    followed by ``<img>`` and/or ``<audio>`` tags.
+    Refer to https://build.nvidia.com/microsoft/phi-4-multimodal-instruct for an example.
 
-    Expects keys to be a dict with keys 'text' (required), and 'image' or 'audio' (optional).
-    Message is sent with 'role' and 'content' where content is structured as text
-    followed by <img> and/or <audio> tags ala https://build.nvidia.com/microsoft/phi-4-multimodal-instruct
+    To get started with this generator:
+
+    #. Visit https://build.nvidia.com/explore/reasoning and find the LLM you'd like to use.
+    #. On the page for the LLM you want to use (such as `phi-4-multimodal-instruct <https://build.nvidia.com/microsoft/phi-4-multimodal-instruct>`__),
+       click **Get API key** above the code snippet.
+
+       You might need to create an account if you don't have one yet.
+       Copy this key.
+    #. In your console, set the ``NIM_API_KEY`` variable to this API key.
+
+       On Linux, this might look like ``export NIM_API_KEY="nvapi-xXxXxXx"``.
+    #. Run garak, setting ``--model_type='nim.NVMultimodal'`` and ``--model_name`` to
+       the name of the model on build.nvidia.com, such as ``--model_name='microsoft/phi-4-multimodal-instruct-v0.1'``.
     """
 
     DEFAULT_PARAMS = NVOpenAIChat.DEFAULT_PARAMS | {
@@ -197,10 +211,13 @@ def _prepare_prompt(self, conv: Conversation) -> Conversation:
 
 
 class Vision(NVMultimodal):
-    """Wrapper for text+image to text NIMs. Expects NIM_API_KEY environment variable.
+    """Wrapper for text and image to text NVIDIA NIM microservices hosted on build.nvidia.com and self-hosted.
+    You must set the NIM_API_KEY environment variable even if you connect to a self-hosted NIM.
 
     Following generators.huggingface.LLaVa, expects prompts to be a dict with keys
-    "text" and "image"; text holds the text prompt, image holds a path to the image."""
+    ``text`` and ``image``.
+    The ``text`` key specifies the text prompt, and the ``image`` key specifies the path to the image.
+    """
 
     modality = {"in": {"text", "image"}, "out": {"text"}}