Giskard-AI · Hartorn · Sep 19, 2025 · Sep 18, 2025 · Sep 18, 2025 · Sep 18, 2025
diff --git a/.github/workflows/create-release.yml b/.github/workflows/create-release.yml
@@ -52,7 +52,7 @@ jobs:
 
       - name: Adding file
         run: |
-          git add pyproject.toml
+          git add pyproject.toml uv.lock
           git fetch --quiet --tags
           git commit -m "v${{ inputs.version }}" --allow-empty
           git tag v${{ inputs.version }}

diff --git a/README.md b/README.md
@@ -69,7 +69,7 @@ Either use `hub.projects.list()` to get a list of all projects, or use
 
 ### Import a dataset
 
-Let's now create a dataset and add a conversation example.
+Let's now create a dataset and add a chat test case example.
 
 ```python
 # Let's create a dataset
@@ -80,12 +80,12 @@ dataset = hub.datasets.create(
 )
 ```
 
-We can now add a conversation example to the dataset. This will be used
+We can now add a chat test case example to the dataset. This will be used
 for the model evaluation.
 
 ```python
-# Add a conversation example
-hub.conversations.create(
+# Add a chat test case example
+hub.chat_test_cases.create(
     dataset_id=dataset.id,
     messages=[
         dict(role="user", content="What is the capital of France?"),
@@ -107,21 +107,21 @@ hub.conversations.create(
 )
 ```
 
-These are the attributes you can set for a conversation (the only
+These are the attributes you can set for a chat test case (the only
 required attribute is `messages`):
 
-- `messages`: A list of messages in the conversation. Each message is a dictionary with the following keys:
+- `messages`: A list of messages in the chat. Each message is a dictionary with the following keys:
 
   - `role`: The role of the message, either "user" or "assistant".
   - `content`: The content of the message.
 
 - `demo_output`: A demonstration of a (possibly wrong) output from the
   model with an optional metadata. This is just for demonstration purposes.
 
-  - `checks`: A list of checks that the conversation should pass. This is used for evaluation. Each check is a dictionary with the following keys:
+  - `checks`: A list of checks that the chat test case should pass. This is used for evaluation. Each check is a dictionary with the following keys:
   - `identifier`: The identifier of the check. If it's a built-in check, you will also need to provide the `params` dictionary. The built-in checks are:
     - `correctness`: The output of the model should match the reference.
-    - `conformity`: The conversation should follow a set of rules.
+    - `conformity`: The chat should follow a set of rules.
     - `groundedness`: The output of the model should be grounded in the conversation.
     - `string_match`: The output of the model should contain a specific string (keyword or sentence).
     - `metadata`: The metadata output of the model should match a list of JSON path rules.
@@ -137,15 +137,13 @@ required attribute is `messages`):
       - `expected_value_type`: The expected type of the value at the JSON path, one of `string`, `number`, `boolean`.
     - For the `semantic_similarity` check, the parameters are `reference` (type: `str`) and `threshold` (type: `float`), where `reference` is the expected output and `threshold` is the similarity score below which the check will fail.
 
-You can add as many conversations as you want to the dataset.
+You can add as many chat test cases as you want to the dataset.
 
 Again, you'll find your newly created dataset in the Hub UI.
 
 ### Configure a model/agent
 
-Before running our first evaluation, we'll need to set up a model.
-You'll need an API endpoint ready to serve the model. Then, you can
-configure the model API in the Hub:
+Before running our first evaluation, we'll need to set up a model. You'll need an API endpoint ready to serve the model. Then, you can configure the model API in the Hub:
 
 ```python
 model = hub.models.create(
@@ -159,8 +157,7 @@ model = hub.models.create(
 )
 ```
 
-We can test that everything is working well by running a chat with the
-model:
+We can test that everything is working well by running a chat with the model:
 
 ```python
 response = model.chat(
@@ -198,8 +195,7 @@ eval_run = client.evaluate(
 )
 ```
 
-The evaluation will run asynchronously on the Hub. To retrieve the
-results once the run is complete, you can use the following:
+The evaluation will run asynchronously on the Hub. To retrieve the results once the run is complete, you can use the following:
 
 ```python
 
@@ -213,5 +209,4 @@ eval_run.print_metrics()
 **Tip**
 
 You can directly pass IDs to the evaluate function, e.g.
-`model=model_id` and `dataset=dataset_id`, without having to retrieve
-the objects first.
+`model=model_id` and `dataset=dataset_id`, without having to retrieve the objects first.
diff --git a/examples/example.sh b/examples/example.sh
diff --git a/examples/example_python.py b/examples/example_python.py
diff --git a/script-docs/hub/sdk/checks.rst b/script-docs/hub/sdk/checks.rst
@@ -11,7 +11,7 @@ The Giskard Hub provides a set of built-in checks that cover common use cases, s
 
 * **Correctness**: Verifies if the agent's response matches the expected output (reference answer).
 * **Conformity**: Ensures the agent's response adheres to the rules, such as "The agent must be polite."
-* **Groundedness**: Ensures the agent's response is grounded in the conversation.
+* **Groundedness**: Ensures the agent's response is grounded to a specific context.
 * **String matching**: Checks if the agent's response contains a specific string, keyword, or sentence.
 * **Metadata**: Verifies the presence of specific (tool calls, user information, etc.) metadata in the agent's response.
 * **Semantic Similarity**: Verifies that the agent's response is semantically similar to the expected output.
@@ -46,7 +46,7 @@ Custom checks are reusable evaluation criteria that you can define for your proj
 
 Custom checks can be used in the following ways:
 
-- Applied to conversations in your datasets
+- Applied to chat test cases (conversations) in your datasets
 - Used during agent evaluations
 - Shared across your team **within the same project**
 - Modified or updated as your requirements evolve
@@ -243,7 +243,7 @@ You can delete a check using the ``hub.checks.delete()`` method. Here's a basic
 
 .. warning::
 
-    Deleting a check is permanent and cannot be undone. Make sure you're not using the check in any active conversations or evaluations before deleting it.
+    Deleting a check is permanent and cannot be undone. Make sure you're not using the check in any active chat test cases or evaluations before deleting it.
 
 List checks
 ___________
@@ -263,15 +263,15 @@ You can list all checks for a project using the ``hub.checks.list()`` method. He
 
 .. _add-checks-to-conversations:
 
-Add checks to conversations
+Add checks to chat test cases
 ---------------------------
 
-Once you've created a check, you can use it in your conversations by referencing its identifier:
+Once you've created a check, you can use it in your chat test cases by referencing its identifier:
 
 .. code-block:: python
 
-    # Add a conversation that uses your check
-    hub.conversations.create(
+    # Add a chat test case that uses your check
+    hub.chat_test_cases.create(
         dataset_id=dataset.id,
         messages=[
             {"role": "user", "content": "What's the formula for compound interest?"},

diff --git a/script-docs/hub/sdk/datasets/business.rst b/script-docs/hub/sdk/datasets/business.rst
@@ -5,7 +5,7 @@
 Detect business failures by generating synthetic tests
 ======================================================
 
-Generative AI agents can face an endless variety of real-world scenarios, making it impossible to manually enumerate all possible test cases. Automated, synthetic test case generation is therefore essential—especially when you lack real user conversations to import as tests. However, a major challenge is to ensure that these synthetic cases are tailored to your business context, rather than being overly generic.
+Generative AI agents can face an endless variety of real-world scenarios, making it impossible to manually enumerate all possible scenarios. Automated, synthetic test case generation is therefore essential—especially when you lack real user chats to import as tests. However, a major challenge is to ensure that these synthetic cases are tailored to your business context, rather than being overly generic.
 
 By generating domain-specific synthetic tests, you can proactively identify and address these types of failures before they impact your users or business operations.
 
@@ -31,9 +31,9 @@ Before generating test cases, you need to `create a knowledge base </hub/sdk/pro
     # Wait for the dataset to be created
     business_dataset.wait_for_completion()
 
-    # List the conversations in the dataset
-    for conversation in business_dataset.conversations:
-        print(conversation.messages[0].content)
+    # List the chat test cases in the dataset
+    for chat_test_case in business_dataset.chat_test_cases:
+        print(chat_test_case.messages[0].content)
 
 .. note::
 

diff --git a/script-docs/hub/sdk/datasets/import.rst b/script-docs/hub/sdk/datasets/import.rst
@@ -1,5 +1,5 @@
 :og:title: Giskard Hub - Enterprise Agent Testing - Import Datasets
-:og:description: Import existing test data programmatically into Giskard Hub. Support conversations, CSV files, and other formats through our Python SDK.
+:og:description: Import your existing test data into Giskard Hub. Bring chat test cases, CSV files, and other data formats to build comprehensive test datasets.
 
 =============================
 Import existing datasets
@@ -20,7 +20,7 @@ Let's start by initializing the Hub client or take a look at the :doc:`/hub/sdk/
 
     hub = HubClient()
 
-You can now use the ``hub.datasets`` and ``hub.conversations`` clients to import datasets and conversations!
+You can now use the ``hub.datasets`` and ``hub.chat_test_cases`` clients to import datasets and chat_test_cases!
 
 Create a dataset
 ________________
@@ -32,20 +32,20 @@ As we have seen in the :doc:`/hub/sdk/datasets/index` section, we can create a d
    dataset = hub.datasets.create(
       project_id="<PROJECT_ID>",
       name="Production Data",
-      description="This dataset contains conversations that " \
+      description="This dataset contains chats that " \
       "are automatically sampled from the production environment.",
    )
 
-After having created the dataset, we can import conversations into it.
+After having created the dataset, we can import chat test cases (conversations) into it.
 
-Import conversations
+Import chat test cases
 ____________________
 
-We can import conversations into the dataset using the ``hub.conversations.create()`` method.
+We can import the chats into the dataset using the ``hub.chat_test_cases.create()`` method.
 
 .. code-block:: python
 
-    hub.conversations.create(
+    hub.chat_test_cases.create(
         dataset_id=dataset.id,
 
         # A list of messages, without the last assistant answer
@@ -98,7 +98,7 @@ We can then format the testset to the correct format and create the dataset usin
     dataset = hub.datasets.create(
         project_id="<PROJECT_ID>",
         name="RAGET Dataset",
-        description="This dataset contains conversations that are used to evaluate the RAGET model.",
+        description="This dataset contains chats that are used to evaluate the RAGET model.",
     )
 
     for sample in testset.samples:
@@ -155,7 +155,7 @@ We can then format the testset to the correct format and create the dataset usin
                 }
             )
 
-        hub.conversations.create(
+        hub.chat_test_cases.create(
             dataset_id=dataset.id,
             messages=messages,
             checks=checks,