Skip to content

Conversation

@rabah-khalek
Copy link
Contributor

@rabah-khalek rabah-khalek commented Jul 27, 2023

  • Refactor Giskard evaluator into utils and evaluation artifacts
  • Re-introduce unwrapping of underlying object from pyfunc models
  • test LLM scan detectors that didn't use to work

@linear
Copy link

linear bot commented Jul 27, 2023

@rabah-khalek rabah-khalek marked this pull request as draft July 27, 2023 16:37
@rabah-khalek rabah-khalek requested a review from andreybavt July 27, 2023 17:07
@rabah-khalek rabah-khalek marked this pull request as ready for review July 27, 2023 17:07
@rabah-khalek
Copy link
Contributor Author

rabah-khalek commented Jul 27, 2023

@andreybavt to make the review easier (since the change is blurred by the refactoring), here're the main modifications:

https://github.com/Giskard-AI/giskard/blob/31006aa2a84281fb7839dba03a2c958bcbe82def/python-client/giskard/integrations/mlflow/giskard_evaluator_utils.py#L30-L39

https://github.com/Giskard-AI/giskard/blob/31006aa2a84281fb7839dba03a2c958bcbe82def/python-client/giskard/integrations/mlflow/giskard_evaluator_utils.py#L79-L82

so now I unwrap the langchain model from the pyfunc model to be able to use rewrite_prompt and I tested it with:

models = ["text-ada-001"] 

import pandas as pd
import mlflow
import openai
import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import OpenAI

df = pd.read_csv('https://raw.githubusercontent.com/sunnysai12345/News_Summary/master/news_summary_more.csv')

df_sample = pd.DataFrame(df["text"].sample(10, random_state=11))

os.environ["OPENAI_API_KEY"] = "XXX"
os.getenv("OPENAI_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = PromptTemplate(template="Create a reader comment according to the following article summary: '{text}'",
                        input_variables=["text"])

for model in models:
    llm = OpenAI(openai_api_key=openai.api_key,
                 request_timeout=20,
                 max_retries=100,
                 temperature=0,
                 model_name=model)

    chain = LLMChain(prompt=prompt, llm=llm)

    evaluator_config = {"scan_config": {"params": {"text_perturbation": {"num_samples": 10},
                                                   "ethical_bias": {"num_samples": 10}}}}

    with mlflow.start_run(run_name = model):
        model_uri = mlflow.langchain.log_model(chain, "langchain").model_uri
        mlflow.evaluate(model=model_uri, model_type="text", data=df_sample, evaluators="giskard", evaluator_config=evaluator_config)

which outputs the following issues (that didn't use to work before):

Running detector LLMToxicityDetector…2023-07-27 19:05:51,229 pid:83686 MainThread giskard.scanner.logger ERROR    Detector LLMToxicityDetector failed with error: keywords must be strings
 0 issues detected. (Took 0:00:02.373041)
Running detector HarmfulnessDetector… 1 issues detected. (Took 0:00:13.804845)
Running detector GenderStereotypeDetector… 0 issues detected. (Took 0:00:22.148888)
Running detector MinorityStereotypeDetector… 1 issues detected. (Took 0:00:46.699300)

@rabah-khalek rabah-khalek requested a review from mattbit July 28, 2023 08:48
@mattbit
Copy link
Member

mattbit commented Jul 28, 2023

@rabah-khalek why is the LLMToxicityDetector failing?

@rabah-khalek
Copy link
Contributor Author

@rabah-khalek why is the LLMToxicityDetector failing?

You know what, I was completely blind to this. Focused more on the new detectors. Let me investigate this. Thanks!

@mattbit
Copy link
Member

mattbit commented Jul 28, 2023

Also, minor observation: could you use ruff + black to format the code? I think we standardized on that (@andreybavt has configured a git pre-commit hook that you can enable).

@andreybavt
Copy link
Contributor

https://docs.giskard.ai/en/latest/contribute/dev-environment.html

@rabah-khalek
Copy link
Contributor Author

rabah-khalek commented Jul 28, 2023

@mattbit, the issue is fixed now. The problem was that I was trying to input the target into the dataset by default, even though it might not exist, which was causing this problem in the scan.

Running detector LLMToxicityDetector… 0 issues detected. (Took 0:00:10.042270)
Running detector HarmfulnessDetector… 1 issues detected. (Took 0:00:13.708245)
Running detector GenderStereotypeDetector… 0 issues detected. (Took 0:00:22.539881)
Running detector MinorityStereotypeDetector… 1 issues detected. (Took 0:00:44.672983)
Scan completed: 2 issues found. (Took 0:01:30.972032)

@rabah-khalek rabah-khalek self-assigned this Jul 28, 2023
@rabah-khalek rabah-khalek added the Python Pull requests that update Python code label Jul 28, 2023
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review the comments?
Also, could you run black + ruff on new files to fix the style issues?

@rabah-khalek rabah-khalek requested a review from mattbit August 1, 2023 10:48
@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

65.7% 65.7% Coverage
0.0% 0.0% Duplication

@rabah-khalek rabah-khalek merged commit 586fdb1 into main Aug 1, 2023
@Hartorn Hartorn deleted the GSK-1473 branch September 22, 2023 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python Pull requests that update Python code

Development

Successfully merging this pull request may close these issues.

4 participants