-
-
Notifications
You must be signed in to change notification settings - Fork 379
Support for custom LLM clients #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to remove function_call and tools_call and replace them by asking the model to answer json formatted answers?
I guess this is necessary to allow any model to work but how does it impact the reliability of current scan?
Also, I know it might increase complexity but couldn't we have something like this instead so we keep the feature (not necessary if the switch won't impact scan reliability enough):
class Tool:
tool_spec: Dict
examples: Optional[List[Dict]]
@property
def to_format_instruction() -> str:
pass
def use_tool(chat_response) -> bool
pass
def parse_response(chat_response) -> List[Dict] # List because tool might be called more than once
pass
class LLMClient(ABC):
@abstractmethod
def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
...
class MistralClient(LLMClient):
@abstractmethod
def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
# This logic should be moved in an adapter so we can reuse for other LLMClient easily (and have cleaner code)
if (len(tools) > 0):
tools_instruction = '\n'.join([tool.to_format_instruction for tool in tools])
if (tool_choice is not None):
tools_instruction += f'\n You must call {tool_choice.tool_call['name']}'
# TODO: replace '{tools_instruction}' in `messages`
#
...
# OpenAIClient use native tool_calls (if used model support it) I agree that we should keep the LLMClient as simple as possible. But I do believe that the client should be the one responsible to serialising the message to the API format and not the feature that need to be adapted to work for all LLMs (Ideally we should have something easy to understand in the scan/evaluator/... and the Client should be the translation based on the LLM's feature/needs). Feel free to disregard this comment if deemed not worth it to implement.
@kevinmessiaen please check the test details
Yes, basically to be as flexible as possible in the support. By now, most of the models support some kind JSON format output (through
Yeah, it would be nice to abstract tools but then we would end up reimplementing langchain. I think it's wise to keep our code to a minimum for what is not strictly required. I realized that my first implementation with function calls was over engineered. I feel it's worth waiting for for now, maybe at some point there will be some light open-source library replacing langchain ;) |
| if seed is not None: | ||
| extra_params["random_seed"] = seed | ||
|
|
||
| if format not in (None, "json", "json_object") and "large" not in self.model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format could be of Literal type
also pydantic's validate_call could do the validation job
giskard/llm/embeddings/__init__.py
Outdated
| return _default_embedding | ||
|
|
||
| # Try with OpenAI/AzureOpenAI if available | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: the two try blocks could be extracted into dedicated methods
giskard/llm/embeddings/base.py
Outdated
| ... | ||
|
|
||
|
|
||
| def batched(iterable, batch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be moved to utils.py
giskard/llm/evaluators/base.py
Outdated
| success_examples: Sequence[dict] | ||
| errors: Sequence[dict] | ||
| details: Optional[TestResultDetails] = None | ||
| results: Sequence[EvaluationResultExample] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're calling .append below this should rather be a List, not a Sequence
giskard/llm/evaluators/base.py
Outdated
|
|
||
| return result | ||
|
|
||
| def _evaluate_sample(self, model: BaseModel, sample: Dict) -> Tuple[bool, str, Dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should be Tuple[bool, str]
giskard/llm/evaluators/coherency.py
Outdated
|
|
||
| if len(out.tool_calls) != 1 or "passed_test" not in out.tool_calls[0].function.arguments: | ||
| raise LLMGenerationError("Invalid function call arguments received") | ||
| def _format_messages(self, model: BaseModel, sample: Dict) -> Sequence[ChatMessage]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing meta argument
giskard/llm/evaluators/coherency.py
Outdated
| outputs_2 = model.predict(dataset_2).prediction | ||
|
|
||
| inputs_1 = dataset_1.df.to_dict("records") | ||
| inputs_2 = dataset_2.df.loc[dataset_1.df.index].to_dict("records") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset_2 can be None
| debug_description=debug_description_prefix + "that are <b>failing the evaluation criteria</b>.", | ||
| ) | ||
| def test_llm_as_a_judge_ground_truth_similarity( | ||
| model: BaseModel, dataset: Dataset, prefix: str = "The requirement should be similar to: ", rng_seed: int = 1729 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix isn't used
# Conflicts: # pdm.lock # tests/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me except for minor comments, failing tests and a bunch of typing issues (we should add mypy or something similar)
I tested the new test_llm_ground_truth with the Hub, it worked fine
|
Should be good now, I just not sure about the first feedback regarding |



Support for custom LLMs (Mistral, OpenAI, Bedrock, etc.) in Giskard: