[GSK-1473] Enabling rewrite_prompt for giskard evaluator #1266

rabah-khalek · 2023-07-27T16:37:01Z

Refactor Giskard evaluator into utils and evaluation artifacts
Re-introduce unwrapping of underlying object from pyfunc models
test LLM scan detectors that didn't use to work

linear · 2023-07-27T16:37:03Z

GSK-1473 Enabling rewrite_prompt for giskard evaluator

see discussion in #1189

rabah-khalek · 2023-07-27T17:12:11Z

@andreybavt to make the review easier (since the change is blurred by the refactoring), here're the main modifications:

https://github.com/Giskard-AI/giskard/blob/31006aa2a84281fb7839dba03a2c958bcbe82def/python-client/giskard/integrations/mlflow/giskard_evaluator_utils.py#L30-L39

https://github.com/Giskard-AI/giskard/blob/31006aa2a84281fb7839dba03a2c958bcbe82def/python-client/giskard/integrations/mlflow/giskard_evaluator_utils.py#L79-L82

so now I unwrap the langchain model from the pyfunc model to be able to use rewrite_prompt and I tested it with:

models = ["text-ada-001"] 

import pandas as pd
import mlflow
import openai
import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import OpenAI

df = pd.read_csv('https://raw.githubusercontent.com/sunnysai12345/News_Summary/master/news_summary_more.csv')

df_sample = pd.DataFrame(df["text"].sample(10, random_state=11))

os.environ["OPENAI_API_KEY"] = "XXX"
os.getenv("OPENAI_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = PromptTemplate(template="Create a reader comment according to the following article summary: '{text}'",
                        input_variables=["text"])

for model in models:
    llm = OpenAI(openai_api_key=openai.api_key,
                 request_timeout=20,
                 max_retries=100,
                 temperature=0,
                 model_name=model)

    chain = LLMChain(prompt=prompt, llm=llm)

    evaluator_config = {"scan_config": {"params": {"text_perturbation": {"num_samples": 10},
                                                   "ethical_bias": {"num_samples": 10}}}}

    with mlflow.start_run(run_name = model):
        model_uri = mlflow.langchain.log_model(chain, "langchain").model_uri
        mlflow.evaluate(model=model_uri, model_type="text", data=df_sample, evaluators="giskard", evaluator_config=evaluator_config)

which outputs the following issues (that didn't use to work before):

Running detector LLMToxicityDetector…2023-07-27 19:05:51,229 pid:83686 MainThread giskard.scanner.logger ERROR    Detector LLMToxicityDetector failed with error: keywords must be strings
 0 issues detected. (Took 0:00:02.373041)
Running detector HarmfulnessDetector… 1 issues detected. (Took 0:00:13.804845)
Running detector GenderStereotypeDetector… 0 issues detected. (Took 0:00:22.148888)
Running detector MinorityStereotypeDetector… 1 issues detected. (Took 0:00:46.699300)

mattbit · 2023-07-28T08:54:35Z

@rabah-khalek why is the LLMToxicityDetector failing?

rabah-khalek · 2023-07-28T08:56:01Z

@rabah-khalek why is the LLMToxicityDetector failing?

You know what, I was completely blind to this. Focused more on the new detectors. Let me investigate this. Thanks!

mattbit · 2023-07-28T08:56:29Z

Also, minor observation: could you use ruff + black to format the code? I think we standardized on that (@andreybavt has configured a git pre-commit hook that you can enable).

andreybavt · 2023-07-28T08:57:06Z

https://docs.giskard.ai/en/latest/contribute/dev-environment.html

rabah-khalek · 2023-07-28T12:37:02Z

@mattbit, the issue is fixed now. The problem was that I was trying to input the target into the dataset by default, even though it might not exist, which was causing this problem in the scan.

Running detector LLMToxicityDetector… 0 issues detected. (Took 0:00:10.042270)
Running detector HarmfulnessDetector… 1 issues detected. (Took 0:00:13.708245)
Running detector GenderStereotypeDetector… 0 issues detected. (Took 0:00:22.539881)
Running detector MinorityStereotypeDetector… 1 issues detected. (Took 0:00:44.672983)
Scan completed: 2 issues found. (Took 0:01:30.972032)

mattbit

Could you review the comments?
Also, could you run black + ruff on new files to fix the style issues?

python-client/giskard/integrations/mlflow/giskard_evaluator_utils.py

sonarqubecloud · 2023-08-01T11:59:58Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

65.7% Coverage
0.0% Duplication

rabah-khalek added 2 commits July 27, 2023 12:20

refactoring of giskard_evaluator

70d6136

Added unwrap_python_model_from_pyfunc

f4b271f

rabah-khalek marked this pull request as draft July 27, 2023 16:37

rabah-khalek added 3 commits July 27, 2023 19:03

introducing unwrap_python_model_from_pyfunc for langchain models

221b042

Merge branch 'main' into GSK-1473

7886928

added pyfunc model

31006aa

rabah-khalek requested a review from andreybavt July 27, 2023 17:07

rabah-khalek marked this pull request as ready for review July 27, 2023 17:07

rabah-khalek added 2 commits July 28, 2023 10:45

fixed test_infer_giskard_cls.py

e16f079

Merge branch 'main' into GSK-1473

476ece1

rabah-khalek requested a review from mattbit July 28, 2023 08:48

rabah-khalek added 2 commits July 28, 2023 11:03

Merge branch 'main' into GSK-1473

0a9ee16

fixed format

89d8948

rabah-khalek self-assigned this Jul 28, 2023

rabah-khalek added the Python Pull requests that update Python code label Jul 28, 2023

mattbit and others added 3 commits July 28, 2023 15:32

Merge branch 'main' into GSK-1473

6bbb0ee

Merge branch 'main' into GSK-1473

133bd7a

Merge branch 'main' into GSK-1473

df3a17e

mattbit requested changes Aug 1, 2023

View reviewed changes

implemented MD's feedback

451006f

rabah-khalek requested a review from mattbit August 1, 2023 10:48

mattbit approved these changes Aug 1, 2023

View reviewed changes

updated

a9bb585

rabah-khalek merged commit 586fdb1 into main Aug 1, 2023

Hartorn deleted the GSK-1473 branch September 22, 2023 10:46

Uh oh!

[GSK-1473] Enabling rewrite_prompt for giskard evaluator #1266

[GSK-1473] Enabling rewrite_prompt for giskard evaluator #1266

Uh oh!

Conversation

rabah-khalek commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Jul 27, 2023

Uh oh!

rabah-khalek commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented Jul 28, 2023

Uh oh!

rabah-khalek commented Jul 28, 2023

Uh oh!

mattbit commented Jul 28, 2023

Uh oh!

andreybavt commented Jul 28, 2023

Uh oh!

rabah-khalek commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Aug 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

rabah-khalek commented Jul 27, 2023 •

edited

Loading

rabah-khalek commented Jul 27, 2023 •

edited

Loading

rabah-khalek commented Jul 28, 2023 •

edited

Loading