GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility #1824

kevinmessiaen · 2024-02-27T08:28:34Z

Description

Added rng_seed param to evaluators, generators and tests using LLM in. order to ensure reproducibility.

Related Issue

GSK-2376 (Available on Linear)

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

… reproducibility

linear · 2024-02-27T08:28:38Z

GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility

sonarqubecloud · 2024-02-28T02:32:57Z

Quality Gate failed

Failed conditions
68.4% Coverage on New Code (required ≥ 70%)

See analysis details on SonarCloud

Hartorn

No issue for me, would like @mattbit or @rabah-khalek opinion

rabah-khalek · 2024-02-28T12:56:16Z

looks okay to me, although one caveat: https://github.com/openai/openai-python/blob/a7115b5f33acd27326e5f78e19beb0d73bd3268e/src/openai/resources/chat/completions.py#L172-L176

          seed: This feature is in Beta. If specified, our system will make a best effort to
              sample deterministically, such that repeated requests with the same `seed` and
              parameters should return the same result. Determinism is not guaranteed, and you
              should refer to the `system_fingerprint` response parameter to monitor changes
              in the backend.

Could you add this to the docstrings @kevinmessiaen ?

Let me test it on an example and get back you with my final review

rabah-khalek

Apparently I ran out of quota to run gpt-models. I asked @kevinmessiaen and he verified it on the product description tutorial. Since we can't make functional tests with openai. There's nothing much more to do here (maybe just the docstring clarification, but that's minor).

GSK-2376 Add seed to LLM generators and evaluators in order to ensure…

f22be70

… reproducibility

kevinmessiaen added 3 commits February 27, 2024 16:29

GSK-2376 Fixed expected seed

fc67e7d

GSK-2376 Added seed where missing

59e921c

GSK-2376 Fixed typo

843c6cd

kevinmessiaen marked this pull request as ready for review February 27, 2024 10:23

Merge branch 'main' into GSK-2376

839dc84

Hartorn approved these changes Feb 28, 2024

View reviewed changes

Hartorn requested review from mattbit and rabah-khalek February 28, 2024 09:09

rabah-khalek approved these changes Feb 28, 2024

View reviewed changes

Hartorn merged commit 1d9aa62 into main Feb 28, 2024

Hartorn deleted the GSK-2376 branch February 28, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility #1824

GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility #1824

Uh oh!

kevinmessiaen commented Feb 27, 2024

Uh oh!

linear bot commented Feb 27, 2024

Uh oh!

sonarqubecloud bot commented Feb 28, 2024

Uh oh!

Hartorn left a comment

Uh oh!

rabah-khalek commented Feb 28, 2024 •

edited

Loading

Uh oh!

rabah-khalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Uh oh!

GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility #1824

GSK-2376 Add seed to LLM generators and evaluators in order to ensure reproducibility #1824

Uh oh!

Conversation

kevinmessiaen commented Feb 27, 2024

Description

Related Issue

Type of Change

Uh oh!

linear bot commented Feb 27, 2024

Uh oh!

sonarqubecloud bot commented Feb 28, 2024

Quality Gate failed

Uh oh!

Hartorn left a comment

Choose a reason for hiding this comment

Uh oh!

rabah-khalek commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rabah-khalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

rabah-khalek commented Feb 28, 2024 •

edited

Loading