John Snow Labs Releases LangTest 2.4.0: Introducing Multimodal VQA Testing, New Text Robustness Tests, Enhanced Multi-Label Classification, Safety Evaluation, and NER Accuracy Fixes #1124
chakravarthik27
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Highlights
John Snow Labs is excited to announce the release of LangTest 2.4.0! This update introduces cutting-edge features and resolves key issues further to enhance model testing and evaluation across multiple modalities.
🔗 Multimodality Testing with VQA Task: We are thrilled to introduce multimodality testing, now supporting Visual Question Answering (VQA) tasks! With the addition of 10 new robustness tests, you can now perturb images to challenge and assess your model’s performance across visual inputs.
📝 New Robustness Tests for Text Tasks: LangTest 2.4.0 comes with two new robustness tests,
add_new_linesandadd_tabs, applicable to text classification, question-answering, and summarization tasks. These tests push your models to handle text variations and maintain accuracy.🔄 Improvements to Multi-Label Text Classification: We have resolved accuracy and fairness issues affecting multi-label text classification evaluations, ensuring more reliable and consistent results.
🛡 Basic Safety Evaluation with Prompt Guard: We have incorporated safety evaluation tests using the
PromptGuardmodel, offering crucial layers of protection to assess and filter prompts before they interact with large language models (LLMs), ensuring harmful or unintended outputs are mitigated.🛠 NER Accuracy Test Fixes: LangTest 2.4.0 addresses and resolves issues within the Named Entity Recognition (NER) accuracy tests, improving reliability in performance assessments for NER tasks.
🔒 Security Enhancements: We have upgraded various dependencies to address security vulnerabilities, making LangTest more secure for users.
🔥 Key Enhancements
🔗 Multimodality Testing with VQA Task
In this release, we introduce multimodality testing, expanding your model’s evaluation capabilities with Visual Question Answering (VQA) tasks.
Key Features:
Test Type Info
image_resizeimage_rotateimage_blurimage_noiseimage_contrastimage_brightnessimage_sharpnessimage_colorimage_flipimage_cropHow It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
📝 Robustness Tests for Text Classification, Question-Answering, and Summarization
The new
add_new_linesandadd_tabstests push your text models to manage input variations more effectively.Key Features:
Tests
add_new_linesadd_tabsHow It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
🛡 Basic Safety Evaluation with Prompt Guard
LangTest introduces safety checks using the prompt_guard model, providing essential safety layers for evaluating prompts before they are sent to large language models (LLMs), ensuring harmful or unethical outputs are avoided.
Key Features:
jailbreak_probabilities_scoreandinjection_probabilities_scoremetrics before they are sent to LLM models.jailbreak_probabilities_scoreinjection_probabilities_scoreHow It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
🐛 Fixes
⚡ Enhancements
What's Changed
Full Changelog: 2.3.1...2.4.0
This discussion was created from the release John Snow Labs Releases LangTest 2.4.0: Introducing Multimodal VQA Testing, New Text Robustness Tests, Enhanced Multi-Label Classification, Safety Evaluation, and NER Accuracy Fixes.
Beta Was this translation helpful? Give feedback.
All reactions