Skip to content

Conversation

@mdelapenya
Copy link
Member

What does this PR do?

This PR refactors the chat code to be more testable, using dependency injection of the OpenAI values (baseURL, key and model) in the form of function arguments. Therefore, it's possible to call the chat with differnt values (for production, or at test-time).

Once the code is more testable, this PR adds four different tests for the chat response:

  • using string comparison: it checks if the response contains certain string. Simple, although not very reliable.
  • using cosine similary and embeddings: it calculates the embeddings (numerical vectors) for the answer and compares them using the cosine similarity with the embeddings of a reference question.
  • using a vector database to store the "knowledge" or reference, doing RAG to augment the prompt with the relevant docs obtained from the vector database.
  • using an LLM-as-a-Judge: it creates an Evaluator, which is another model with a strict prompt, that evaluates the quality of the response of the chat LLM.

With these four tests, users can understand the different approaches to testing when building GenAI apps.

```

or
or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be fixed if you update your branch, the linter doesn't return any errors on main for me ATM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you had allowed edits, so I merged main on it.

@fiam fiam merged commit 9ca7d6f into docker:main Aug 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants