Skip to content

Conversation

@pierlj
Copy link
Member

@pierlj pierlj commented Dec 6, 2023

Description

Add multilanguages support inside the scanner for LLMs. Prompt the generator model to output evaluation queries in specified languages.

Related Issue

GSK-2152

Type of Change

  • Adds a method inside the Dataset class to extract languages from the 'text' columns of the datasets.
  • Add language requirement inside the generator prompts.

@pierlj pierlj requested a review from mattbit December 6, 2023 09:03
@linear
Copy link

linear bot commented Dec 6, 2023

@mattbit mattbit changed the title GSK-2152 Add multilanguages scanner inputs Add language support in LLM generators [GSK-2152] Dec 6, 2023
@pierlj pierlj marked this pull request as ready for review December 7, 2023 09:10
def run(self, model: BaseModel, dataset: Dataset, features=None) -> Sequence[Issue]:
# Generate inputs
generator = ImplausibleDataGenerator(llm_temperature=0.1)
languages = dataset.extract_languages()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@mattbit mattbit enabled auto-merge December 8, 2023 11:56
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@mattbit mattbit merged commit 12f1285 into main Dec 8, 2023
@mattbit mattbit deleted the GSK-2152-scanner-mulitlanguage-input branch December 8, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants