[GSK-1590] native support for claude 3 and titan embeddings on Bedrock #1905

celmore25 · 2024-04-20T00:41:23Z

Description

As the Claude models become more popular, it is helpful to provide native support for these models in an AWS environment. This feature adds native support for using an LLM client with Amazon Bedrock. As a first step, it supports the Claude 3 family of text generation models as well as the Titan text embedding models. Having implemented this myself, I have seen the QA sets produced by Haiku to be a great blend of speed and useful for a very low cost.

Here is how I envision this feature being used in the current library's implementation. This follows the test set generation example here.

import os
import boto3
import pandas as pd

import giskard
from giskard.llm.client.bedrock import ClaudeBedrockClient
from giskard.llm.embeddings.bedrock import BedrockEmbedding
from giskard.rag import generate_testset, KnowledgeBase
from giskard.rag import QATestset

# setup the bedrock client and embedding model
bedrock_runtime = boto3.client("bedrock-runtime", region_name=os.environ["AWS_DEFAULT_REGION"])
oc = ClaudeBedrockClient(bedrock_runtime, model="anthropic.claude-3-haiku-20240307-v1:0")
embedding_model = BedrockEmbedding(bedrock_runtime, model="amazon.titan-embed-text-v1")
giskard.llm.set_default_client(oc)

# Load your data and initialize the KnowledgeBase
df = pd.read_csv("test_faqs.csv")
knowledge_base = KnowledgeBase(
    data=df,
    embedding_model=embedding_model,
)

# Generate a testset with 10 questions & answers for each question types (this will take a while)
testset = generate_testset(
    knowledge_base, 
    num_questions=60,
    language='en',  # optional, we'll auto detect if not provided
    agent_description="A customer support chatbot for Amazon SageMaker", # helps generating better questions
)

# Save the generated testset
testset.save("my_testset.jsonl")

# You can easily load it back
loaded_testset = QATestset.load("my_testset.jsonl")

# Convert it to a pandas dataframe
testset_df = loaded_testset.to_pandas()

Related Issue

GSK-1590

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

Checklist

I've read the CODE_OF_CONDUCT.md document.
I've read the CONTRIBUTING.md guide.
I've written tests for all new methods and classes that I created.
I've written the docstring in Google format for all the methods and classes that I used.
I've updated the pdm.lock running pdm update-lock (only applicable when pyproject.toml has been
modified)

…ing models

luca-martial · 2024-04-22T08:44:27Z

Thanks a lot for this contribution @celmore25 ! We'll review it this week

Those param will be ignored but that's not a big issue. Missing them would cause certain feature to totally fail using this model.

kevinmessiaen

Thanks for your contribution!

I've done a small modification to ensure that the Client will still work when instructed to use format or seed. Those params aren't necessary for the model to work properly for the scan and tests since the format are also instructed in our promps and seeds are nice to have.

celmore25 · 2024-04-22T16:39:00Z

@kevinmessiaen - I not aware why those parameters were there; thanks for fixing that and for the clarification :)

Looks like the merge is blocked based on this failed test.

FAILED tests/integrations/test_mlflow.py::test_fast[drug_classification_data-drug_classification_model]

I am not sure what the issue is here. I did not see any change with our commits would have impacted that integration test. Any thoughts here?

kevinmessiaen · 2024-04-23T01:48:34Z

@kevinmessiaen - I not aware why those parameters were there; thanks for fixing that and for the clarification :)

Looks like the merge is blocked based on this failed test.
FAILED tests/integrations/test_mlflow.py::test_fast[drug_classification_data-drug_classification_model]
I am not sure what the issue is here. I did not see any change with our commits would have impacted that integration test. Any thoughts here?

It seems that the issue is global as we have it in other branches. I'll investigate and fix it on main so we can propagate the fix in other branches.

PS: It seems that the error was caused by the release to mlflow 2.12.1 which cause an error with imbalance-learn

mlflow 2.12.1 causes issues with imbalance-learn

celmore25 · 2024-04-23T04:31:26Z

Thanks for making the fix @kevinmessiaen

celmore25 added 2 commits April 19, 2024 17:08

adding native support for claude 3 on bedrock as well as titan embedd…

faa64a3

…ing models

adding unit tests

8a765dc

luca-martial linked an issue Apr 22, 2024 that may be closed by this pull request

Feature: Amazon Bedrock Integration #1590

Closed

kevinmessiaen self-requested a review April 22, 2024 08:46

kevinmessiaen and others added 4 commits April 22, 2024 16:30

Merge branch 'main' into feature/bedrock-claude3

1e45fa2

Added seed and format param to ClaudeBedrockClient.complete

c1da31c

Those param will be ignored but that's not a big issue. Missing them would cause certain feature to totally fail using this model.

Ran code formatter

e60af31

Added example on how to use ClaudeBedrockClient on the doc

a743f80

kevinmessiaen approved these changes Apr 22, 2024

View reviewed changes

Updated pdm.lock

cff6057

kevinmessiaen force-pushed the feature/bedrock-claude3 branch from 4f8c07f to cff6057 Compare April 22, 2024 14:06

kevinmessiaen enabled auto-merge April 22, 2024 14:19

Downgraded mlflow

a8e2e8c

mlflow 2.12.1 causes issues with imbalance-learn

kevinmessiaen merged commit cf9a4bb into Giskard-AI:main Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[GSK-1590] native support for claude 3 and titan embeddings on Bedrock #1905

[GSK-1590] native support for claude 3 and titan embeddings on Bedrock #1905

Uh oh!

celmore25 commented Apr 20, 2024 •

edited

Loading

Uh oh!

luca-martial commented Apr 22, 2024

Uh oh!

kevinmessiaen left a comment

Uh oh!

celmore25 commented Apr 22, 2024

Uh oh!

kevinmessiaen commented Apr 23, 2024 •

edited

Loading

Uh oh!

celmore25 commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

[GSK-1590] native support for claude 3 and titan embeddings on Bedrock #1905

[GSK-1590] native support for claude 3 and titan embeddings on Bedrock #1905

Uh oh!

Conversation

celmore25 commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Uh oh!

luca-martial commented Apr 22, 2024

Uh oh!

kevinmessiaen left a comment

Choose a reason for hiding this comment

Uh oh!

celmore25 commented Apr 22, 2024

Uh oh!

kevinmessiaen commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

celmore25 commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

celmore25 commented Apr 20, 2024 •

edited

Loading

kevinmessiaen commented Apr 23, 2024 •

edited

Loading