Skip to content

Add support for datasets>2.17. #1136

@JanetVictorious

Description

@JanetVictorious

Summary

Current version of Garak is constrained on datasets>=2.14.6,<2.17 (ref). However, these versions are problematic when working with cached datasets in offline mode but has since been fixed in later versions of datasets (ref).

Basic example

With the following configuration:

# config.yaml
plugins: 
    model_name: namespace/repo_name
    model_type: huggingface.Model
    probe_spec: packagehallucination

When running this command:

HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 uv run -m garak --config ./config.yaml

I get the following output:

WARNING - root - Connection error loading packagehallucination detector for javascript: Couldn't reach the Hugging Face Hub for dataset 'garak-llm/npm-20240828': Offline mode is enabled.

Even though garak-llm/npm-20240828 is downloaded and cached under ~/.cache/huggingface/datasets

Motivation

When working in offline mode some of the probes will not be able to run (e.g. packagehallucination) since they depend on datasets from Hugging Face Hub. Even if the dataset has been cached under ~/.cache/huggingface/datasets they still can' be loaded. This is resolved in later versions of datasets and therefore support for later versions of datasets is needed.

The expected outcome of resolving this is proper handling of cached datasets from Hugging Face which helps if running in offline mode or in air-gapped environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions