-
Notifications
You must be signed in to change notification settings - Fork 711
Description
Summary
Current version of Garak is constrained on datasets>=2.14.6,<2.17 (ref). However, these versions are problematic when working with cached datasets in offline mode but has since been fixed in later versions of datasets (ref).
Basic example
With the following configuration:
# config.yaml
plugins:
model_name: namespace/repo_name
model_type: huggingface.Model
probe_spec: packagehallucinationWhen running this command:
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 uv run -m garak --config ./config.yamlI get the following output:
WARNING - root - Connection error loading packagehallucination detector for javascript: Couldn't reach the Hugging Face Hub for dataset 'garak-llm/npm-20240828': Offline mode is enabled.Even though garak-llm/npm-20240828 is downloaded and cached under ~/.cache/huggingface/datasets
Motivation
When working in offline mode some of the probes will not be able to run (e.g. packagehallucination) since they depend on datasets from Hugging Face Hub. Even if the dataset has been cached under ~/.cache/huggingface/datasets they still can' be loaded. This is resolved in later versions of datasets and therefore support for later versions of datasets is needed.
The expected outcome of resolving this is proper handling of cached datasets from Hugging Face which helps if running in offline mode or in air-gapped environments.