Skip to content

Conversation

@kevinmessiaen
Copy link
Member

Description

Fixed metamorphic tests for LLM

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

@linear
Copy link

linear bot commented Jun 20, 2023

GSK-1311 Error when running auto-generated test suite for LLM model

https://demo.giskard.ai/main/projects/1451/testing/1511/overview

2023-06-15 14:01:53,457 pid:801164 ml_worker_thread_0 giskard.ml_worker.server.ml_worker_service ERROR    An error occurred during the test suite execution: Invalid prediction task: SupportedModelTypes.TEXT_GENERATION
Traceback (most recent call last):
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/ml_worker/server/ml_worker_service.py", line 269, in runTestSuite
    is_pass, results = suite.run(**global_arguments)
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/core/suite.py", line 228, in run
    result = test_partial.giskard_test.get_builder()(**test_params).execute()
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/ml_worker/testing/registry/giskard_test.py", line 132, in execute
    return self.test_fn(**self.params)
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/testing/tests/metamorphic.py", line 219, in test_metamorphic_invariance
    return _test_metamorphic(
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/testing/tests/metamorphic.py", line 155, in _test_metamorphic
    passed_idx, _ = _compare_prediction(results_df, model.meta.model_type, direction, output_sensitivity)
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/ml_worker/utils/logging.py", line 76, in wrap
    result = fn(*args, **kw)
  File "/home/ubuntu/demo-venv/lib/python3.10/site-packages/giskard/testing/tests/metamorphic.py", line 71, in _compare_prediction
    raise ValueError(f"Invalid prediction task: {prediction_task}")
ValueError: Invalid prediction task: SupportedModelTypes.TEXT_GENERATION

The model is created using llm_comment_generation.ipynb

Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me!

from giskard.ml_worker.utils.logging import timer
from giskard.models.base import BaseModel
from giskard.models.utils import fix_seed
from giskard.scanner.llm.utils import LLMImportError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move LLMImportError outside of scanner now that we are integrating it more with the rest of the codebase. But not super important.

@mattbit
Copy link
Member

mattbit commented Jun 20, 2023

@kevinmessiaen would be nice to add a simple test if you have time, using FakeListLLM from langchain.

except ImportError as err:
raise LLMImportError() from err

scorer = evaluate.load("bertscore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add evaluate.load("bertscore") to the try-except block since it relies on another extra dependency that we add. In case a person already has evaluate (previously installed for example) but not bert-score he'd also get the same LLMImportError with explanations

@kevinmessiaen kevinmessiaen requested a review from andreybavt June 22, 2023 01:59
@kevinmessiaen kevinmessiaen merged commit cc83df1 into main Jul 7, 2023
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 7, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@Hartorn Hartorn deleted the feature/gsk-1311-error-when-running-auto-generated-test-suite-for-llm-model branch September 13, 2023 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants