Release/2.7.0 #1219

chakravarthik27 · 2025-09-11T06:46:51Z

📢 Highlights

We’re thrilled to announce the latest LangTest release, bringing advanced benchmarks, new robustness testing, and improved developer experience to your model evaluation workflows.

🩺 Autonomous Medical Evaluation for Guideline Adherence (AMEGA):
We’ve integrated AMEGA, a comprehensive benchmark for assessing LLM adherence to clinical guidelines. Covering 20 diagnostic scenarios across 13 specialties. The benchmark includes 135 questions and 1,337 weighted scoring elements, providing one of the most rigorous frameworks for evaluating medical knowledge in real-world clinical settings.
🧪 MedFuzz Robustness Testing:
To better reflect real-world clinical complexities, we're introducing MedFuzz, a healthcare-specific robustness approach that probes LLMs beyond standard benchmarks
🎲 Randomized Options in QA Tasks:
Introducing a new robustness test to mitigate positional bias in multiple-choice evaluations, LangTest now supports the randomized option ordering test type in the robustness category.
📝 ACI-Bench: Ambient Clinical Intelligence Benchmark:
LangTest now supports evaluation with ACI-Bench, a novel benchmark for automatic visit note generation in clinical contexts
💬 MTS-Dialog: Clinical Summary Evaluation:
We’ve added support for the MTS-Dialog dataset to evaluate models on dialogue-to-summary generation and to support sectioned summaries (headers + contents) for more structured evaluation
🧠 MentalChat16K Clinical Evaluation Support:
LangTest now supports the MentalChat16K dataset, enabling evaluation of LLMs in mental health–focused conversational contexts.
🔒Security Enhancements:
Critical vulnerabilities and security issues have been addressed, reinforcing the LangTest's overall stability and safety.

… files

… evaluation methods

Co-authored-by: Copilot <[email protected]>

…th sample generation and tqdm progress tracking

…tracking

…nvironments

…set insights

Co-authored-by: Copilot <[email protected]>

Feature/implement the amega

…sks; implement AttackerLLM for adversarial learning in exam questions

…sage handling and reasoning prompt generation

…s; add MedFuzzSample for improved data handling

…d improved response handling

…andling in MedFuzz transformation

…for original and perturbed data

…-tests-in-robustness Feature/implement the fuzz tests in robustness

…ialogue summarization tasks

…rySample with result tracking

… enhance DialogueToSummarySample with evaluation logic

…lass

- Simplified the `_build_prompt` method by removing it and directly using the `llm.predict` method in the `evaluate` function. - Changed the `evaluate` method to accept single input and prediction dictionaries instead of lists. - Updated error handling in the `evaluate` method to return a structured error response. - Enhanced `evaluate_batch` to process a list of input and prediction pairs. - Added support for specifying the number of samples and column names when loading datasets in `ClinicalNoteSummary`. - Implemented a new method `aci_dialog` for loading ACI dialog datasets. - Fixed typos in comments and class docstrings for clarity.

…valuation logic in ClinicalNoteSummary

…-Corp in README.md

…entalchat16k-dataset-support-for-clinical-evaluation Feature/implement the mentalchat16k dataset support for clinical evaluation

…log and ACIBench

…ild workflow

…build workflow

…d environments

…test and self-hosted

…test

…ing torch cache command

…s-for-270 Updates/websites updates for 270

…s-for-270 updated: api documentation with pacific ai links

chakravarthik27 and others added 30 commits March 16, 2025 13:38

feat: add DataRetriever class for loading and filtering data from CSV…

4ce3960

… files

feat: add AMEGA class for clinical tests with response generation and…

5b03e1c

… evaluation methods

fix typo

c515e49

Co-authored-by: Copilot <[email protected]>

fix: correct spelling of 'Clinical' in class names and method signatures

5027d4d

feat: implement AMEGASample class and enhance AMEGA transformation wi…

79a0b35

…th sample generation and tqdm progress tracking

feat: enhance AMEGA class with improved sample handling and progress …

35710a4

…tracking

fix: update tqdm import to use auto for better compatibility across e…

3bf91ac

…nvironments

feat: add dataset_info method to DataRetriever for comprehensive data…

25a4101

…set insights

fix: include case_id in results for enhanced response evaluation

86687cb

update the condition

f12110e

Co-authored-by: Copilot <[email protected]>

fix: streamline test type filtering and enhance AMEGA report generation

369a05a

fix: adjust column insertion index for report summary DataFrame

12c8a3d

fix: TypeError: '<' not supported between instances of 'int' and 'list'

f21d4b5

Merge pull request #1188 from JohnSnowLabs/feature/implement-the-amega

d996cbc

Feature/implement the amega

fix: implement CSV file download and save functionality in DataRetriever

a7fb936

feat: add MedFuzz class for question-answering and text-generation ta…

8866c3b

…sks; implement AttackerLLM for adversarial learning in exam questions

feat: add TargetLLM class for clinical decision-making; implement mes…

c5c8a0b

…sage handling and reasoning prompt generation

feat: enhance MedFuzz class with transformation and processing method…

62696e2

…s; add MedFuzzSample for improved data handling

feat: enhance MedFuzz and TargetLLM classes with progress tracking an…

d505c48

…d improved response handling

feat: enhance model handling with TypeVar support and improve error h…

901f637

…andling in MedFuzz transformation

feat: enhance MedFuzzSample class with improved context highlighting …

993829d

…for original and perturbed data

Merge pull request #1190 from JohnSnowLabs/feature/implement-the-fuzz…

034d18d

…-tests-in-robustness Feature/implement the fuzz tests in robustness

feat: add ClinicalNoteSummary class and DialogueToSummarySample for d…

a04d4d8

…ialogue summarization tasks

fix: liniting issue

115a8a7

feat: add dialogue field to Harness class and enhance DialogueToSumma…

d9ab209

…rySample with result tracking

feat: implement SummaryEval class for clinical summary evaluation and…

399b889

… enhance DialogueToSummarySample with evaluation logic

fix: improve dataset file extension handling in ClinicalNoteSummary c…

025ab01

…lass

fix: add check for already evaluated status in DialogueToSummarySample

90947c9

feat: add threshold attribute to DialogueToSummarySample and update e…

f8d98af

…valuation logic in ClinicalNoteSummary

chakravarthik27 added 3 commits September 9, 2025 15:49

updated: notebook link for amega

f1d367c

updated: change repository references from JohnSnowLabs to Pacific-AI…

9d5c187

…-Corp in README.md

Merge pull request #1218 from Pacific-AI-Corp/feature/implement-the-m…

9490563

…entalchat16k-dataset-support-for-clinical-evaluation Feature/implement the mentalchat16k dataset support for clinical evaluation

chakravarthik27 requested a review from dcecchini September 11, 2025 06:46

chakravarthik27 self-assigned this Sep 11, 2025

chakravarthik27 marked this pull request as draft September 11, 2025 07:34

dcecchini approved these changes Sep 11, 2025

View reviewed changes

chakravarthik27 added 19 commits September 12, 2025 16:45

update: bump version to 2.7.0 and add new data directories for MTSDia…

0b893ae

…log and ACIBench

updated: free up additional disk space by removing cached files in bu…

576dc0f

…ild workflow

updated: refine disk space cleanup by removing specific log files in …

a154804

…build workflow

updated: change build job to run on both ubuntu-latest and self-hoste…

6d31bef

…d environments

updated: change build job to run on macos-latest instead of ubuntu-la…

fb107ca

…test and self-hosted

updated: change build job to run on ubuntu-latest instead of macos-la…

87def83

…test

refine disk space cleanup by removing specific log files and simplify…

ac2b8c4

…ing torch cache command

update: change logo image source and add old logo for reference

7c6e135

update: change poetry version to 2.1.3 in release workflow

dce5de5

refine: suppress output during poetry dependency installation

c3bbb8d

refine: enhance poetry installation command for improved output handling

625c5f5

refine: enhance disk space cleanup by removing additional directories

fcb3e92

refine: update footer content with new copyright and policy links

e948889

updated: replace the githhub links with pacific-ai

d0548a6

updated: replaced the links with pacific.ai

2c1a482

updated: links in the header and footer

7ad73cf

Merge pull request #1221 from Pacific-AI-Corp/updates/websites-update…

0e1a9a5

…s-for-270 Updates/websites updates for 270

updated: api documentation with pacific ai links

d316128

Merge pull request #1222 from Pacific-AI-Corp/updates/websites-update…

8f6ddaf

…s-for-270 updated: api documentation with pacific ai links

chakravarthik27 marked this pull request as ready for review September 19, 2025 12:43

chakravarthik27 requested a review from amit-shrestha September 19, 2025 12:43

amit-shrestha approved these changes Sep 19, 2025

View reviewed changes

chakravarthik27 merged commit cdcadc7 into main Sep 19, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release/2.7.0 #1219

Release/2.7.0 #1219

Uh oh!

chakravarthik27 commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Release/2.7.0 #1219

Release/2.7.0 #1219

Uh oh!

Conversation

chakravarthik27 commented Sep 11, 2025

📢 Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants