-
-
Notifications
You must be signed in to change notification settings - Fork 379
Feature/gsk 2334 talk to my model mvp #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 204 commits
Commits
Show all changes
208 commits
Select commit
Hold shift + click to select a range
bfc87e5
Initial commit for the MVP of "Talk to my model" functionality.
AbSsEnT e8a3b6e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 3316bd4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 4452fd2
Defined the basic pipeline of the 'talk' function.
AbSsEnT 847e8b8
Defined the Tool interface and the boilerplate for the first tool, wh…
AbSsEnT 7ca2416
small addition
AbSsEnT b44386b
Added method to initialise tools objects, each time the method 'talk'…
AbSsEnT b727b77
Initial implementation of the "__call__" method.
AbSsEnT 84318d1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 71a08b5
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 549c916
Bug fixes. Adapted flow to currently use legacy 'functions' instead o…
AbSsEnT 66673a3
Debugged "predict from dataset" tool workflow. Debugged the tool work…
AbSsEnT 4deb3fc
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 0e98572
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT e2781aa
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 66f7429
Merge pull request #1687 from Giskard-AI/feature/gsk-2335-query-predi…
AbSsEnT b46fd57
Initial implementation of the 'SHAPExplanationTool'.
AbSsEnT a836a1e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 4dcd490
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 777011a
Added handling an errors, while calling tools.
AbSsEnT 19488bd
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT cd2bdf3
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 0a9315a
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 4db4482
Merge pull request #1696 from Giskard-AI/feature/gsk-2336-query-shap-…
AbSsEnT 6b86d48
Moved more attributes and properties to the BaseTool, since they are …
AbSsEnT bbcf347
Changed PredictFromDataset tool's specification.
AbSsEnT e248aab
Adapting model.py to the use of tools API.
AbSsEnT 394b2bb
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT dc155ee
Fully changed 'talk' method workflow to use tools API.
AbSsEnT 9a70228
Added multiple toll calling for the SHAP explanation tool
AbSsEnT 30f9184
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 10929e4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 3c6363a
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 6a1dbf8
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 110c804
Merge pull request #1702 from Giskard-AI/feature/gsk-2419-adapt-workf…
AbSsEnT cafedd6
Code refactoring.
AbSsEnT b4983d0
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 7e3fe0d
Fixed conflicts and merged latest changes from the main branch.
AbSsEnT 126d19d
Initial implementation of the IssuesScannerTool, which gives user an …
AbSsEnT f2a8a0f
Merged and fixed conflicts of branch gsk-2367-migrate-openai-api-call…
AbSsEnT 5c6ce77
Refactoring.
AbSsEnT 6c378f7
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 69dadaa
Merged and fixed conflicts from the feature/gsk-2367-migrate-openai-a…
AbSsEnT b25a5d2
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT cf5c7d1
Merge pull request #1717 from Giskard-AI/feature/gsk-2338-query-a-mod…
AbSsEnT 4c1956a
Removed __futures__ import.
AbSsEnT a954aa4
Started implementing prediction from user input tool.
AbSsEnT 4767a12
Implemented the final PredictUserInputTool.
AbSsEnT 47eb3a0
Merge pull request #1722 from Giskard-AI/feature/gsk-2337-query-predi…
AbSsEnT 41ccaa7
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 175fde1
Put the shap explanations calculation logic into separate module.
AbSsEnT 976d347
Explicitly set target to the 'None', when creating Dataset, to omit w…
AbSsEnT ebbcd3c
Distributed the tools across separate dedicated modules for easier ma…
AbSsEnT e44ff13
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 3d8fc91
Implemented history (context) persistence to enable dialogue regime b…
AbSsEnT 583f323
Small refactoring.
AbSsEnT 0de2adc
Merge pull request #1729 from Giskard-AI/feature/gsk-2421-history-per…
AbSsEnT b299208
Executed pre-commit hooks on all files.
AbSsEnT a892e69
Merged with gsk-2367
AbSsEnT a9ef45e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT c0ac0b8
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 120a272
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 69c6645
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 76418a7
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 1e16e07
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT 71ef243
Update regarding new LLMClient API.
AbSsEnT 27925e8
Updated `pdm.lock`
AbSsEnT 4d0d41b
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT d9d18a6
Finalised adaptation to the new LLMClient API for the 'talk' function…
AbSsEnT 52ddba5
Removed "_form_tool_calls" method.
AbSsEnT e80e1a9
Small fixes.
AbSsEnT 8ef5a1a
Pulled changes from main.
AbSsEnT bee5037
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT ea92292
Updated pdm.
AbSsEnT 9fd4ab7
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 59873e7
Updated pdm.
AbSsEnT ea0b7d6
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 061b982
United the PredictDatasetInput and PredictUserInput tools into single…
AbSsEnT e08bda7
If we already see, that filtered dataset is of length 0, stop further…
AbSsEnT 8bb46c3
Merge pull request #1799 from Giskard-AI/feature/gsk-2811-shap-calcul…
AbSsEnT a0888a2
Merge pull request #1800 from Giskard-AI/feature/gsk-2813-fuzzy-strin…
AbSsEnT bff9c98
Merged with the main branch.
AbSsEnT 367d40e
Merge branch 'GSK-2754' of github.com:Giskard-AI/giskard into feature…
AbSsEnT 40dd4fa
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 27b3f75
Created the new tool to calculate model's performance metrics.
AbSsEnT e9d1c8a
Talk architecture polishing.
AbSsEnT dd4de98
Merge pull request #1807 from Giskard-AI/feature/gsk-2808-metrics-cal…
AbSsEnT e964762
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 296658e
Improved the system prompt to:
AbSsEnT 13ac1f6
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 7f5efeb
System prompt improvement.
AbSsEnT 77276eb
Merge pull request #1811 from Giskard-AI/feature/gsk-2423-prompts-imp…
AbSsEnT a366563
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT f4d3587
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 64191e2
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 7302741
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT cf7862b
1) Updated to the latest gpt-4-turbo version;
AbSsEnT 5cac021
Merge pull request #1825 from Giskard-AI/feature/gsk-2915-bug-fix-met…
AbSsEnT ad8f0ac
Bug fix.
AbSsEnT d0b7a30
Added better spacing to the instruct prompt.
AbSsEnT a32b7ec
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT abc8ea8
Improved instruction to not provide generic answers.
AbSsEnT ffe9788
Added docstrings.
AbSsEnT ee84da6
Added docstrings.
AbSsEnT 69631c5
Added docstrings.
AbSsEnT 04ecfd9
Added docstrings.
AbSsEnT 08c5888
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT d186fe7
Added docstrings.
AbSsEnT 497ce21
Added docstrings.
AbSsEnT f8d44a0
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
Hartorn dbf13fd
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT c488815
Updated typing with the respect to not using the __futures__.
AbSsEnT 961b016
Replaced thefuzz.ratio() by the native difflib.SequenceMatcher().ratio()
AbSsEnT 12d9ada
Removed optional list casting.
AbSsEnT f1dff00
Refactored the dataset filtering logic. Added comments.
AbSsEnT d8e987e
Removed useless casting to list.
AbSsEnT 35011d7
Simplified assignment expression.
AbSsEnT 44f29d1
Small fix.
AbSsEnT 5e36b7a
Replaced by the object's method call.
AbSsEnT 53b014c
Replaced the __str__ by the __repr__
AbSsEnT 5141733
Moved fuzzy similarity threshold to the config.
AbSsEnT b7a639b
Small fix.
AbSsEnT ac8f157
Removed import BaseOpenAIClient from model.py
AbSsEnT bbe997d
1) The 'dataset' argument of the 'talk' is mandatory now.
AbSsEnT eaa6b74
Added clarifying comments, on why to use non-top-level imports, as we…
AbSsEnT 0d29bd1
Added the possibility of configuring Talk LLM model through the env v…
AbSsEnT 6f10a3d
Returned the from __future__ import annotations, since we accept such…
AbSsEnT 2172428
Documented the reason, why to import functions not from the top-level.
AbSsEnT 111bc35
Improved typing and docstrings.
AbSsEnT 8cf7c92
[RESTORING] dataset is not mandatory parameter.
AbSsEnT d1a792e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 89e1ea4
Created the new group 'talk' for the 'talk-to-my-ml' feature dependen…
AbSsEnT ce1a6b1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT 5d57757
Regenerating pdm.lock
1e19325
- Fixed ambiguity in calling for 'model performance'. Now, the metric…
AbSsEnT 747a6a8
Merge remote-tracking branch 'origin/feature/gsk-2334-talk-to-my-mode…
AbSsEnT dc543cc
Regenerating pdm.lock
4685007
Created unit-tests for the 'talk' feature.
AbSsEnT ee5768f
Small fix.
AbSsEnT 5883293
Regenerating pdm.lock
c1db839
Committing missing pytest file with unit-tests for the 'talk' feature.
AbSsEnT 53b08d4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT bccbf5e
Update giskard/llm/talk/config.py
AbSsEnT e61f49b
Update giskard/llm/talk/config.py
AbSsEnT de23f72
Update giskard/llm/talk/config.py
AbSsEnT 20cb2a6
Update giskard/llm/talk/config.py
AbSsEnT 4473468
Update giskard/llm/talk/config.py
AbSsEnT 2637058
Update giskard/llm/talk/config.py
AbSsEnT 9ce2ec4
Fixed typos with GPT.
AbSsEnT 48a2d96
Better exception raising logic.
AbSsEnT 3d54d90
1) Specified, that model and dataset are mandatory parameters of tools.
AbSsEnT 1599c3e
Update giskard/llm/talk/tools/metric.py
AbSsEnT b452853
Removed comments.
AbSsEnT 701abea
Made features_json_type as a property.
AbSsEnT f40269f
Added `features_dict` validation logic.
AbSsEnT 0fcb048
Replaced metrics calculation functions from sklearn to giskard
AbSsEnT 16b7dd1
Fixed unit-tests by escaping regex-sensitive characters.
AbSsEnT 9c264dd
Re-made unit-tests. Mocked LLM responses to avoid dependence on OpenA…
AbSsEnT 757ee44
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT e18f5fc
Fixed CI/CD errors:
AbSsEnT 75e66c1
Regenerating pdm.lock
46cbe7f
Fixed CI/CD errors:
AbSsEnT ad57b31
Delete pdm.lock
rabah-khalek e4318dc
Regenerating pdm.lock
6ab7155
Created the docs page for the AI Quality Copilot.
AbSsEnT b4bcebb
Merged with main.
AbSsEnT 307c352
Merge remote-tracking branch 'origin/feature/gsk-2334-talk-to-my-mode…
AbSsEnT 5bc5e01
Regenerating pdm.lock
c54d0a7
Regenerating pdm.lock
663680f
Small docs fix.
AbSsEnT 0171321
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT ab8cfad
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT 436395c
Removed instruction because of redundancy.
AbSsEnT 8666f32
Rewrote the initialization of all tools. Now only mandatory tool para…
AbSsEnT e1d816b
Introduced PredictionMixin class to abstract away common prediction n…
AbSsEnT cdf8229
Small docstring fix.
AbSsEnT f7e58ec
Added doc page for the AI Quality Copilot.
AbSsEnT b2c64f1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT f0d2818
Returned old page.
AbSsEnT 66764e4
Returned old page.
AbSsEnT ad2be7f
Once again, I added the doc page for the AI Quality Copilot.
AbSsEnT 5cf2beb
Delete pdm.lock
rabah-khalek f003572
Regenerating pdm.lock
8d2af8b
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
rabah-khalek c746770
Delete pdm.lock
rabah-khalek 05d2c82
Regenerating pdm.lock
b0d4e7a
Update talk_result.py
rabah-khalek 479d91d
Returned the logic of tool calling to the LLMClient.
AbSsEnT d195484
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' into openai-clie…
AbSsEnT 2982c02
Modified the LLMClient to support tool calling functionality as it wa…
AbSsEnT fe2f650
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT ac1156b
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
AbSsEnT a754010
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
rabah-khalek 922ddaf
create a client for copilot
71b8358
add missing imports
1d68c53
propagagated ToolChatMessage
bf9acb0
fixing imports in model.py
69b1307
more work on imports
569ce2c
switching from llmimporterr to giskardinstallationerr
68ded26
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
rabah-khalek 81d9e68
simplifying the client calling for copilot
6de2f80
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of https://githu…
becf3df
restored client init and updated imports
0431f94
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
andreybavt 74f5351
removed unused function
945addd
implemented AA's feedback
15747f5
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
rabah-khalek 0b5190c
fixing tests
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| # 🗣🤖💡AI Quality Copilot | ||
| > ⚠️ **AI Quality Copilot is currently in early version and is subject to change**. Feel free to reach out on our [Discord server](https://discord.gg/fkv7CAr3FE) if you have any trouble or to provide feedback. | ||
|
|
||
| Obtaining information about the ML model requires some coding effort. It may be time-consuming and create friction when | ||
| one seeks prediction results, explanations, or the model's performance. The reality is that the code simply translates | ||
| the user's intent, which can be described in natural language. For instance, the phrase 'What are the predictions for | ||
| women across the dataset?' could be converted to the `model.predict(df[df.sex == 'female'])` snippet. However, in many | ||
| cases, such transformation is not as straightforward and may require some effort compared to forming the query in | ||
| natural language. | ||
|
|
||
| To address this issue, we introduce the **AI Quality Copilot** - an LLM agent that facilitates accessing essential | ||
| information about the model's predictions, explanations, performance metrics, and issues through a natural language | ||
| interface. Users can prompt the Copilot, and it will provide the necessary information about the specified ML model. In | ||
| essence, instead of writing code, one can simply "talk" to the model. | ||
|
|
||
| ## How it works? | ||
| The AI Quality Copilot is an LLM agent that, based on a user's input query, determines which function to call along | ||
| with its arguments. The output of these functions will then be used to provide an answer. To utilize it, all you need to | ||
| do is call the talk method of the Giskard model and provide a question about the model. | ||
|
|
||
| We implemented this feature using the OpenAI [function calling API](https://platform.openai.com/docs/guides/function-calling). | ||
| This approach expands the standard capabilities of LLM agents by enabling them to utilize a predefined set of python | ||
| functions or tools designed for various tasks. The concept is to delegate to the agent the decision on which tool to use | ||
| in order to provide a more effective response to the user's question. | ||
|
|
||
| AI Quality Copilot is capable of providing the following information about the ML model: | ||
| 1. `Prediction` on records from the given dataset or the user's input. | ||
| 2. `Prediction explanation` using the SHAP framework. | ||
| 3. `Performance metrics` for classification and regression tasks. | ||
| 4. `Performance issues` detected using the Giskard Scan. | ||
|
|
||
| ## Before starting | ||
| First, ensure that you have installed the `talk` flavor of Giskard: | ||
| ```bash | ||
| pip install "giskard[talk]" | ||
| ``` | ||
|
|
||
| To utilize the AI Quality Copilot, you'll need an OpenAI API key. You can set it in your notebook like this: | ||
| ```python | ||
| import os | ||
|
|
||
| os.environ["OPENAI_API_KEY"] = "sk-…" | ||
| ``` | ||
|
|
||
| ## Prepare the necessary artifacts | ||
| First, set up the Giskard dataset: | ||
| ```python | ||
| giskard_dataset = giskard.Dataset(df, target=TARGET_COLUMN, name="Titanic dataset", cat_columns=CATEGORICAL_COLUMNS) | ||
| ``` | ||
|
|
||
| Next, it is mandatory to set up the Giskard model, which we will interact with. It's important to provide a detailed | ||
| description as well as the name of the model to enhance the responses of the AI Quality Copilot. | ||
| ```python | ||
| giskard_model = giskard.Model( | ||
| model=prediction_function, | ||
| model_type="classification", # Currently, the Quality Copilot supports either classification or regression. | ||
| classification_labels=CLASSIFICATION_LABELS, | ||
| feature_names=FEATURE_NAMES, | ||
| # Important for the Quality Copilot. | ||
| name="Titanic binary classification model", | ||
| description="The binary classification model, which predicts, whether the passenger survived or not in the Titanic incident. \n" | ||
| "The model outputs yes, if the person survived, and no - if he died." | ||
| ) | ||
| ``` | ||
|
|
||
| Lastly, generate the Giskard scan report. This dependency is optional, and if you don't need information about the | ||
| model's performance issues, you can omit this step. | ||
| ```python | ||
| scan_result = giskard.scan(giskard_model, giskard_dataset) | ||
| ``` | ||
|
|
||
| ## AI Quality Copilot | ||
| Let's finally try the AI Quality Copilot. The primary and only method to interact with it is through the `talk` method | ||
| of the Giskard model. Below is the API for the method: | ||
| ```python | ||
| def talk(self, question: str, dataset: Dataset, scan_report: ScanReport = None, context: str = "") -> TalkResult: | ||
| """Perform the 'talk' to the model. | ||
|
|
||
| Given `question`, allows to ask the model about prediction result, explanation, model performance, issues, etc. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| question : str | ||
| User input query. | ||
| dataset : Dataset | ||
| Giskard Dataset to be analysed by the 'talk'. | ||
| context : str | ||
| Context of the previous 'talk' results. Necessary to keep context between sequential 'talk' calls. | ||
| scan_report : ScanReport | ||
| Giskard Scan Report to be analysed by the 'talk'. | ||
| """ | ||
| ``` | ||
|
|
||
| We'll start with a simple example. We'll ask the AI Quality Copilot what it can do: | ||
| ```python | ||
| giskard_model.talk(question="What can you do?", dataset=giskard_dataset) | ||
| ``` | ||
|
|
||
| The agent's response is as follows: | ||
| ```markdown | ||
| I can assist you with various tasks related to a Titanic binary classification model, which predicts whether a passenger survived or not in the Titanic incident. Here's what I can do: | ||
|
|
||
| 1. **Predict Survival**: I can predict whether a passenger survived or not based on their details such as class, sex, age, number of siblings/spouses aboard, number of parents/children aboard, fare, and port of embarkation. | ||
|
|
||
| 2. **Model Performance Metrics**: I can estimate model performance metrics such as accuracy, F1 score, precision, recall, R2 score, explained variance, mean squared error (MSE), and mean absolute error (MAE) for the Titanic survival prediction model. | ||
|
|
||
| 3. **SHAP Explanations**: I can provide SHAP explanations for predictions, which help understand the impact of each feature on the model's prediction. | ||
|
|
||
| 4. **Model Vulnerabilities Scan**: I can give you a summary of the model's vulnerabilities, such as unrobustness, underconfidence, unethical behavior, data leakage, performance bias, and more. | ||
|
|
||
| Please let me know how I can assist you! | ||
| ``` | ||
| In this example, the agent informs about the tasks it can perform without actually executing them, as there is no need | ||
| to do so. | ||
|
|
||
| ### Prediction | ||
| Now, let's ask the agent to provide information about the model's prediction on a specific record from the dataset: | ||
| ```python | ||
| giskard_model.talk(question="Have 'Minahan miss Daisy E' survived in the Titanic incident?", dataset=giskard_dataset) | ||
| ``` | ||
|
|
||
| Response: | ||
| ```markdown | ||
| Yes, Minahan Miss Daisy E survived in the Titanic incident. | ||
| ``` | ||
| This example demonstrates how the AI Quality Copilot allows us to avoid writing code to obtain a prediction on a dataset | ||
| record. Instead, we can simply use natural language. | ||
|
|
||
| ### Prediction explanation | ||
| The next example demonstrates how we can obtain the prediction explanation in a similar manner: | ||
| ```python | ||
| giskard_model.talk(question="What was important for the survival result of Minahan, Miss. Daisy E?", dataset=giskard_dataset) | ||
| ``` | ||
| ```markdown | ||
| For Minahan, Miss. Daisy E, the most important factors for the survival prediction (yes) were: | ||
| - Sex ('female') with a SHAP value of 0.493 | ||
| - Pclass (1) with a SHAP value of 0.316 | ||
| - Embarked ('Q') with a SHAP value of 0.068 | ||
|
|
||
| These values indicate that being female, traveling in the first class, and embarking from 'Q' were significant contributors to the model's prediction of survival. | ||
| ``` | ||
| Once again, no manual coding is needed from us to obtain this result! Everything is managed and executed by the AI | ||
| Quality Copilot. | ||
|
|
||
| ### Performance metrics | ||
| Now let's calculate the model's accuracy: | ||
| ```python | ||
| giskard_model.talk(question="Calculate accuracy of the model.", dataset=giskard_dataset) | ||
| ``` | ||
| ```markdown | ||
| The accuracy of the model is 0.79. | ||
| ``` | ||
|
|
||
| ### Performance issues | ||
| Finally, we can ask an LLM agent about the model's performance issues. It's mandatory to provide the generated scan | ||
| report in this case. | ||
| ```python | ||
| giskard_model.talk(question="Does the model has any performance issues or biases?", dataset=giskard_dataset, scan_report=result) | ||
| ``` | ||
|
|
||
| ```markdown | ||
| The model has been identified to have several performance issues and biases, including: | ||
|
|
||
| 1. **Robustness issues**: The model's predictions change when the "Name" feature is transformed to title case in 6.67% of the cases, which indicates a medium level of robustness issue. | ||
| 2. **Overconfidence issues**: The model shows a significantly higher number of overconfident wrong predictions in several data slices, including when the "Name" contains "mr", when the text length of "Name" is less than 28.5, when "Fare" is less than 14.85, when "Sex" is "male", when "Parch" is 0, and when "SibSp" is 0. These issues are mostly major, indicating a high level of overconfidence in wrong predictions. | ||
| 3. **Spurious Correlation issues**: There are minor issues related to spurious correlations, particularly with the "Sex" feature being highly associated with the survival prediction. For example, "female" is highly associated with "Survived" = "yes", and "male" is highly associated with "Survived" = "no". | ||
| 4. **Performance issues**: The model has major performance issues in several data slices, including lower recall for records where "Name" contains "mr" and "Sex" is "male", lower precision for records where "Pclass" is 3, and various accuracy and precision issues in other specific conditions. | ||
|
|
||
| These findings suggest that the model may not perform equally well across different groups of passengers, indicating potential biases and vulnerabilities in its predictions. | ||
| ``` | ||
| As you can see, the LLM agent was able to represent all the performance issues the model has. | ||
|
|
||
| ### Multiple questions in one call | ||
| Thanks to the ability of the function calling API to call multiple tools within a single OpenAI API request, we can | ||
| benefit from it by prompting multiple questions to the model within a single `talk` call. For example: | ||
| ```python | ||
| giskard_model.talk(question="Calculate accuracy, f1, precision ans recall scores of the model. Summarise the result in a table", dataset=giskard_dataset) | ||
| ``` | ||
| ```markdown | ||
| Here are the model performance metrics summarized in a table: | ||
|
|
||
| | Metric | Score | | ||
| |-----------|-------| | ||
| | Accuracy | 0.79 | | ||
| | F1 | 0.7 | | ||
| | Precision | 0.75 | | ||
| | Recall | 0.66 | | ||
| ``` | ||
| In this example, to calculate each metric, an LLM agent used a dedicated tool four times with different parameters | ||
| (metric type). And in doing so, we called the `talk` method only once instead of making four distinct calls. This further | ||
| reduces the need for writing repetitive code. | ||
|
|
||
| ### Dialogue mode | ||
| By default, the `talk` calls are standalone, meaning they do not preserve the history. However, we can enable a | ||
| so-called 'dialogue' mode by passing the summary of the current `talk` call to the subsequent call as context. For | ||
| example, let's make two subsequent `talk` calls, where the latter question cannot be answered without having a summary | ||
| of the first call: | ||
| ```python | ||
| talk_result = giskard_model.talk(question="Have 'Webber, miss. Susan' survived in the Titanic incident?", dataset=giskard_dataset) | ||
| giskard_model.talk(question="Can you explain me, why did she survive?", context=talk_result.summary, dataset=giskard_dataset) | ||
| ``` | ||
|
|
||
| ```markdown | ||
| The model predicted that 'Webber, Miss. Susan' survived the Titanic incident primarily due to her sex being female, which had the highest SHAP value, indicating it was the most influential factor in the prediction. Other contributing factors include her traveling in 2nd class (Pclass) and her name, which might have been considered due to encoding specific information relevant to survival. Age and fare paid for the ticket also played minor roles in the prediction. However, the number of siblings/spouses aboard (SibSp), the number of parents/children aboard (Parch), and the port of embarkation (Embarked) did not significantly influence the prediction. | ||
| ``` | ||
|
|
||
| Without passing the `talk_result.summary` to the context of the second call, the response would be useless: | ||
| ```markdown | ||
| To provide an explanation for why a specific individual survived, I would need more details about the person in question, such as their name, ticket class, age, or any other information that could help identify them in the dataset. Could you please provide more details? | ||
| ``` | ||
|
|
||
| ## Frequently Asked Questions | ||
|
|
||
| > #### ℹ️ What data are being sent to OpenAI | ||
| > | ||
| > In order to perform the question generation, we will be sending the following information to OpenAI: | ||
| > | ||
| > - Data provided in your dataset | ||
| > - Model name and description | ||
|
|
||
| ## Troubleshooting | ||
| If you encounter any issues, join our [Discord community](https://discord.gg/fkv7CAr3FE) and ask questions in | ||
| our #support channel. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,10 @@ | ||
| from .client import get_default_client, set_default_client, set_llm_api, set_llm_model | ||
| from .errors import LLMImportError | ||
|
|
||
| __all__ = ["LLMImportError", "get_default_client", "set_default_client", "set_llm_api", "set_llm_model"] | ||
| __all__ = [ | ||
| "LLMImportError", | ||
| "get_default_client", | ||
| "set_default_client", | ||
| "set_llm_api", | ||
| "set_llm_model", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| from typing import Optional, Sequence | ||
|
|
||
| from dataclasses import dataclass | ||
| from logging import warning | ||
|
|
||
| from ..config import LLMConfigurationError | ||
| from ..errors import LLMImportError | ||
| from .base import ChatMessage | ||
| from .openai import AUTH_ERROR_MESSAGE, OpenAIClient | ||
|
|
||
| try: | ||
| import openai | ||
| except ImportError as err: | ||
| raise LLMImportError(flavor="talk") from err | ||
|
|
||
|
|
||
| @dataclass | ||
| class ToolChatMessage(ChatMessage): | ||
| name: Optional[str] = None | ||
| tool_call_id: Optional[str] = None | ||
| tool_calls: Optional[list] = None | ||
|
|
||
|
|
||
| def _format_message(msg: ChatMessage) -> dict: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually for this one, it's |
||
| """Format chat message. | ||
| Based on a message's role, include related attributes and exclude non-related. | ||
| Parameters | ||
| ---------- | ||
| msg : ChatMessage | ||
| Message to the LLMClient. | ||
| Returns | ||
| ------- | ||
| dict | ||
| A dictionary with attributes related to the role. | ||
| """ | ||
| fmt_msg = {"role": msg.role, "content": msg.content} | ||
| if msg.role == "tool": | ||
| fmt_msg.update({"name": msg.name, "tool_call_id": msg.tool_call_id}) | ||
| if msg.role == "assistant" and msg.tool_calls: | ||
| fmt_msg.update({"tool_calls": msg.tool_calls}) | ||
| return fmt_msg | ||
|
|
||
|
|
||
| class GiskardCopilotClient(OpenAIClient): | ||
| def complete( | ||
| self, | ||
| messages: Sequence[ChatMessage], | ||
| temperature: float = 1.0, | ||
| max_tokens: Optional[int] = None, | ||
| caller_id: Optional[str] = None, | ||
| tools=None, | ||
| tool_choice=None, | ||
| seed: Optional[int] = None, | ||
| format=None, | ||
| ) -> ChatMessage: | ||
pierlj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| extra_params = dict() | ||
|
|
||
| if tools is not None: | ||
| extra_params["tools"] = tools | ||
| if tool_choice is not None: | ||
| extra_params["tool_choice"] = tool_choice | ||
|
|
||
| if seed is not None: | ||
| extra_params["seed"] = seed | ||
|
|
||
| if self.json_mode: | ||
| if format not in (None, "json", "json_object"): | ||
| warning(f"Unsupported format '{format}', ignoring.") | ||
| format = None | ||
|
|
||
| if format == "json" or format == "json_object": | ||
| extra_params["response_format"] = {"type": "json_object"} | ||
|
|
||
| try: | ||
| completion = self._client.chat.completions.create( | ||
| model=self.model, | ||
| messages=[_format_message(m) for m in messages], | ||
| temperature=temperature, | ||
| max_tokens=max_tokens, | ||
| **extra_params, | ||
| ) | ||
| except openai.AuthenticationError as err: | ||
| raise LLMConfigurationError(AUTH_ERROR_MESSAGE) from err | ||
|
|
||
| self.logger.log_call( | ||
| prompt_tokens=completion.usage.prompt_tokens, | ||
| sampled_tokens=completion.usage.completion_tokens, | ||
| model=self.model, | ||
| client_class=self.__class__.__name__, | ||
| caller_id=caller_id, | ||
| ) | ||
|
|
||
| msg = completion.choices[0].message | ||
|
|
||
| return ToolChatMessage(role=msg.role, content=msg.content, tool_calls=msg.tool_calls) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, could be more precise:
Optional[list[ChatCompletionMessageToolCall]]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it, I forget if adding a typing based on a conditional imports create troubles (even if put under quotes), I'll try it.