Releases: Pacific-AI-Corp/langtest
John Snow Labs NLP Test 1.5.0: Amplifying Model Comparisons, Bias Tests, Runtime Checks, Harnessing HF Datasets for Superior Text Classification and Introducing Augmentation Proportion Control
π’ Overview
NLP Test 1.5.0 π comes with brand new features, including: new capabilities to run comparisons between different models from same/different hubs in a single Harness for robustness, representation, bias, fairness and accuracy tests. It includes support for runtime checks and ability to pass custom replacement dictionaries for bias testing. Also added support for HF datasets for text classification task and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for Model Comparisons #514
- Adding support for passing custom replacement dictionaries #509
- Adding support for hf datasets for text classification task #511
- Adding support for runtime checks #515
- Adding support for Augmentation Proportion Control #506
- Adding new tutorial notebooks #526
π Bug Fixes
- Review issues with add-context for QA #507
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Defining a dictionary to run model comparisons
models = {
"ner.dl": "johnsnowlabs",
"en_core_web_sm": "spacy"
}
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='ner', model=models, data='/Path-to-test-conll')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- Fix/context-issue by @RakshitKhajuria in #507
- supports custom proportions for augument by @chakravarthik27 in #506
- Feature/ Add option to pass custom replacement dictionaries for bias tests by @RakshitKhajuria in #509
- feature/Add support for hf datasets for text classification task by @Prikshit7766 in #511
- test/hf-load-dataset by @Prikshit7766 in #517
- Features/model comparisons by @ArshaanNazir in #514
- Docs/nb docs update by @RakshitKhajuria in #518
- Feature/add runtime tests by @chakravarthik27 in #515
- Restructure quac dataset by @Prikshit7766 in #508
- Fix/runtime compare conflict by @alytarik in #522
- fix bug for runtime tests by @alytarik in #523
- fix coloring by @alytarik in #524
- support of hf dataset for jsl and spacy by @RakshitKhajuria in #521
- Chore/website updates by @ArshaanNazir in #519
- updated time unit in report() by @chakravarthik27 in #520
- augmentation and runtime tests nb by @chakravarthik27 in #525
- Chore/tutorial nbs and website updates by @ArshaanNazir in #526
- Release/1.5.0 by @ArshaanNazir in #527
Full Changelog: v1.4.0...v1.5.0
John Snow Labs NLP Test 1.4.0: Enhancing Support for Toxicity test and new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA)
John Snow Labs NLP Test 1.4.0: Enhancing Support for Toxicity test and new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA)
π’ Overview
NLP Test 1.4.0 π comes with brand new features, including: new capabilities for testing Large Language Models for toxicity and support for new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA) for robustness, representation, fairness and accuracy tests. It also includes addition of some new robustness tests and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for NarrativeQA dataset #487
- Adding support for toxicity task #488
- Adding support for TruthfulQA dataset #477
- Adding support for new dyslexia swap test for robustness testing #474
- Adding support for new slangificator test for robustness testing #463
- Adding support for new abbreviation test for robustness testing #471
- Adding support for OpenBookQA dataset #479
- Adding support for MMLU dataset #481
- Adding support for hellaswag dataset #486
- Adding new tutorial notebooks #497
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='toxicity', model='text-davinci-002', hub='openai', data='toxicity-test-tiny')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- updated/doc by @Prikshit7766 in #459
- docs/Update documentation of models by @RakshitKhajuria in #465
- refactor user prompt by @alytarik in #472
- Feature/dyslexia swap feature by @ArkajyotiChakraborty in #417
- Feature/add support for abbreviation test by @RakshitKhajuria in #471
- Hotfix/get rid of some dependencies by @chakravarthik27 in #473
- Draft: refactor/perturbations and samples to support QA. by @chakravarthik27 in #460
- feature/Add speech to text typo by @Prikshit7766 in #475
- hotfix/get rid of inflect dependency and refactoring robustness by @RakshitKhajuria in #478
- Added TruthfulQA Dataset by @RakshitKhajuria in #477
- feature/Add support for slangificator test by @Prikshit7766 in #463
- Dataset/OpenBookQA datasets by @Prikshit7766 in #479
- Datasets/MMLU Datasets by @Prikshit7766 in #481
- Docs/update model hub-summarization nb-readme by @RakshitKhajuria in #480
- Hotfix/fixed some tests and refactored number_to_word.py by @RakshitKhajuria in #483
- Dataset/quac dataset by @Prikshit7766 in #484
- Feature/dyslexia swap test by @alytarik in #474
- Feature/hellaswag dataset by @alytarik in #486
- Feature/narrativeqa dataset by @alytarik in #487
- Feature/create toxicity test 438 by @chakravarthik27 in #488
- hot-fix/fix-slangify-test by @RakshitKhajuria in #489
- DRAFT : Docs/update nb and docs by @RakshitKhajuria in #490
- Update datasets by @RakshitKhajuria in #493
- Fix/toxicity by @chakravarthik27 in #492
- Feature/add tutorial nbs by @ArshaanNazir in #497
- default toxicity config by @chakravarthik27 in #498
- docs/add dataset notebooks by @alytarik in #499
- Release/1.4.0 by @ArshaanNazir in #500
New Contributors
- @ArkajyotiChakraborty made their first contribution in #417
Full Changelog: v1.3.0...v1.4.0
John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization
John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization
π’ Overview
NLP Test 1.3.0 π comes with brand new features, including: new capabilities for testing Large Language Models on Summarization task with support for robustness, bias, representation, fairness and accuracy tests on the XSum dataset. Also added fairness tests for the Question Answering datasets and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for summarization with the XSum dataset #433
- Adding support for fairness tests for testing LLMs on Question Answering #430
- Adding support for accuracy/fairness tests for testing LLMs on summarization #446
- Adding new robustness test called add_ocr_typo #428
π Bug Fixes
- Review issues with QAEval in OpenAI Natural Questions #444
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='summarization', model='text-davinci-002', hub='openai', data='XSum-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- Docs/website llm accuracy tests by @alytarik in #412
- Docs/website number to word robustnes test by @RakshitKhajuria in #416
- Release/1.2.0 by @ArshaanNazir in #425
- Docs/add disclaimer for QAEval by @RakshitKhajuria in #429
- feature/added ocr typo test by @Prikshit7766 in #428
- tutorials/Cleaned notebooks by @Prikshit7766 in #431
- feature/add-support-for-summarization by @ArshaanNazir in #433
- feature/fairness for qa task by @alytarik in #430
- Chore: add logos to landing page by @luca-martial in #435
- feature/add_ocr_typo_for_QA_and_Summarization by @Prikshit7766 in #436
- Fix/review issues with qa eval in open ai natural questions using custom prompt by @RakshitKhajuria in #444
- Feature/update bias in summarization by @ArshaanNazir in #445
- Feature/accuracy fairness for summarization by @alytarik in #446
- hot-fix: harness_config in Harness Class by @chakravarthik27 in #447
- Update/docs for summarization by @Prikshit7766 in #448
- fix format for qa task by @alytarik in #450
- hot-fix/XSum-test by @Prikshit7766 in #449
- update summarization prompt by @ArshaanNazir in #451
- Fix/tutorial nbs by @ArshaanNazir in #453
- DRAFT: Fix/max f1 score by @alytarik in #452
- Fix/tutorial nbs by @ArshaanNazir in #454
- fix eval score by @alytarik in #455
- update QA is_pass by @ArshaanNazir in #456
- Release/1.3.0 by @ArshaanNazir in #457
New Contributors
- @Prikshit7766 made their first contribution in #428
Full Changelog: v1.2.0...v1.3.0
John Snow Labs NLP Test 1.2.0: Announcing Support for Cohere, AI21, Azure OpenAI and Hugging Face Inference API
π’ Overview
NLP Test 1.2.0 π comes with brand new features, including: support for testing Cohere, AI21, Hugging Face Inference API and Azure-OpenAI LLMs for robustness, bias, accuracy and representation tests on the BoolQ and Natural Questions datasets, and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for 4 new LLM APIs for Question Answering task #388
- Adding support for bias tests for testing LLMs on Question Answering #404
- Adding support for representation tests for testing LLMs on Question Answering #405
- Adding support for accuracy tests for testing LLMs on Question Answering #394
- Adding new robustness test called number_to_word #377
π Bug Fixes
- Fixed bias tests to enable multi-token name replacements #400
- Fixed issue in ethnicity/religion-names #393
- Fixed issue in default HF text classification model #402
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='question-answering', model='gpt-3.5-turbo', hub='openai', data='BoolQ-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- fix/task test supoort check by @alytarik in #378
- Add boolq dev dataset by @alytarik in #390
- Issue 374 add representation tests by @ArshaanNazir in #381
- Issue in ethnicity religion names by @ArshaanNazir in #393
- Feature: Add representation tests for LLMs by @ArshaanNazir in #405
- Fix: default HF text classification model issue by @chakravarthik27 in #402
- Feature: Add support for bias tests for question answering by @ArshaanNazir in #404
- Chore: Adding supported hubs as logos to landing page by @luca-martial in #403
- Fix/bias_tests Enable multi-token name replacements by @ArshaanNazir in #400
- Feature: Add support for number to words robustness test by @RakshitKhajuria in #377
- Feature: Adding support for 4 new LLM APIs by @chakravarthik27 in #388
- DRAFT: Feature/accuracy for qa task by @alytarik in #394
- fix typo and order of columns by @alytarik in #406
- Fix/llm accuracy bug fix by @alytarik in #407
- Fix prompt template llm and transformer version by @ArshaanNazir in #408
- added number_to_words test to robustness nb by @RakshitKhajuria in #410
- notebooks and default_config paths updated. by @chakravarthik27 in #411
- Fix: switch default HF classifier dataset from tweet to imdb by @luca-martial in #409
- Chore: Website updates for new LLMs and pages by @luca-martial in #401
- Release/1.2.0 by @ArshaanNazir in #415
New Contributors
- @RakshitKhajuria made their first contribution in #377
Full Changelog: v1.1.0...v1.2.0
John Snow Labs NLP Test 1.1.0: Announcing Support for Testing LLMs
π’ Overview
NLP Test 1.1.0 π comes with brand new features, including: new capabilities for testing Large Language Models on Question Answering tasks, with support for testing OpenAI-based LLMs and support for robustness tests on the BoolQ and Natural Questions datasets!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Support for testing OpenAI LLMs on Question Answering #361
- Support for BoolQ and Natural Questions datasets #361
- Improved layout for configuring tests #361
- Improved warning and error messaging #361
π Bug Fixes
- Fixed overlapping and mis-formatted country names in dictionaries #347
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='question-answering', model='gpt-3.5-turbo', hub='openai', data='BoolQ-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- fix country names by @alytarik in #347
- Fix/country names by @alytarik in #348
- Adding support for openAI model testing for question-answering on several benchmark datasets by @chakravarthik27 in #361
- update boolQ prompt by @ArshaanNazir in #366
- Chore: Website updates for LLM release by @luca-martial in #369
- Update notebooks by @alytarik in #368
- Release/1.1.0 by @luca-martial in #367
Full Changelog: v1.0.2...v1.1.0
John Snow Labs NLP Test 1.0.2: Patch Release
π’ Overview
NLP Test 1.0.2 π comes with several improvements and bug fixes, including: 7x speed-up on test generation, support for installation from conda-forge, brand new sphinx docs, bug fixes for token mismatches, and many other enhancements and bug fixes!
A big thank you to our early-stage community for their feedback, questions, and feature requests π A special thank you to @sugatoray for becoming the library's first contributor from outside of John Snow Labs! π₯³
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- 7x speed-up through multithreading-based parallelization and other optimizations #325 #321
- Support for installation from conda-forge channel conda-forge/staged-recipes#22525
- Brand new sphinx docs and website updates #335
- Cleaner outputs when generating and running tests #317 #329
π Bug Fixes
- Fixed token mismatch issues occurring in various edge-cases #328 #331
- Fixed representation and fairness test attribute errors in text classification #325
- Standardized model outputs for default text classification code blocks #325
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='ner', model='dslim/bert-base-NER', hub='transformers')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- Add KDnuggets blogpost notebook by @luca-martial in #314
- Added workflow to let contributors self-assign issues by @sugatoray in #320
- fix invalid hub by @alytarik in #317
- refacto: Sample class by @JulesBelveze in #321
- remove protobuf dependency by @alytarik in #323
- Fix/new tutorials by @ArshaanNazir in #324
- Fix/remove pertubation,py by @ArshaanNazir in #327
- fix: realignment when trailing whitespace in
Transformationby @JulesBelveze in #328 - Fix/remove cohyphonym test by @alytarik in #326
- remove default task by @alytarik in #329
- Fix/shouldnt generate after load by @alytarik in #330
- Integrate website alignment fixes into updated docs website branch by @luca-martial in #332
- Update quick_start.md with conda installation instruction by @sugatoray in #334
- fix alignment condition by @ArshaanNazir in #337
- fix: alignment
add_contractionby @JulesBelveze in #331 - Refactoring Run Method by @chakravarthik27 in #325
- fixed: warning in augment by @chakravarthik27 in #340
- attribute error emtpy -> empty by @chakravarthik27 in #341
- Update website with new documentation and sphinx docs by @luca-martial in #335
- Release/1.0.2 by @luca-martial in #345
New Contributors
- @sugatoray made their first contribution in #320
Full Changelog: v1.0.1...v1.0.2
John Snow Labs NLP Test 1.0.1: Patch Release
π’ Overview
NLP Test 1.0.1 π comes with several improvements and bug fixes, including: a clean display format for expected and actual results on NER tests, support for a default spaCy text classifier, a bug fix for token mismatches in transformers, and many other enhancements and bug fixes!
A big thank you to our early-stage community for their feedback, questions, and feature requests. π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Clean display for actual and expected results on NER tests #301
- Added default spaCy text classifier support #285
- Removed memory location display when calling Harness methods #302
- Enhanced error messages for spaCy model downloads #286
- Standardize NER model outputs for all supported libraries #289
π Bug Fixes
- Fix
swap_entitiesaugmentation failures #284 - Linked
replace_to_inter_racial_lastnamesandreplace_to_native_american_lastnamesto transformation #300 - Fix token mismatch issue occurring with transformers #279
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='ner', model='dslim/bert-base-NER', hub='transformers')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- Change default data_dir by @luca-martial in #277
- update tutorial notebook links by @ArshaanNazir in #278
- fix: add spaCy model download error message by @chakravarthik27 in #286
- Update README.md by @gadde5300 in #288
- strip bio-tag from jsl by @ArshaanNazir in #290
- chore: strip BIO tag in NEROutput comparison by @JulesBelveze in #289
- fix jsl offset issue by @ArshaanNazir in #293
- Issue 225 finalize augmentation issues by @chakravarthik27 in #284
- fix AddPunctuation test category by @ArshaanNazir in #295
- fix: add perturbation tests and compute transformations by @JulesBelveze in #279
- docs/Add disclaimers and information to tests by @alytarik in #291
- Implementing full test suite for GH actions by @alytarik in #285
- Add pydantic dependency by @luca-martial in #296
- add HF real world notebook by @ArshaanNazir in #298
- fix bias tests by @ArshaanNazir in #300
- Feature: NER label display cleanup by @luca-martial in #301
- Fix/remove output from h.generate() and h.run() and h.augment() by @alytarik in #302
- Fix/add contraction issue by @ArshaanNazir in #303
- Release v1.0.1 by @luca-martial in #306
New Contributors
- @gadde5300 made their first contribution in #288
Full Changelog: v1.0.0...v1.0.1
John Snow Labs - NLP Test 1.0.0: An open-source library for delivering safe & effective models into production!
π’ Overview
We are very excited to release John Snow Labs' latest library: NLP Test! π This is our first major step towards building responsible AI.
NLP Test is an open-source library for testing NLP models and datasets from all major NLP libraries in a few lines of code. π§ͺ The library has 1 goal: delivering safe & effective models into production. π―
Make sure to give the project a star right here β
π₯ Features
- Generate & run over 50 test types in a few lines of code π»
- Test all aspects of model quality: robustness, bias, representation, fairness and accuracy
- Automatically augment training data based on test results πͺ
- Support for popular NLP libraries: Spark NLP, Hugging Face Transformers & spaCy
- Support for popular NLP tasks: Named Entity Recognition and Text Classification π
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='ner', model='dslim/bert-base-NER', hub='transformers')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
π Mission
While there is a lot of talk about the need to train AI models that are safe, robust, and fair - few tools have been made available to data scientists to meet these goals. As a result, the front line of NLP models in production systems reflects a sorry state of affairs.
We propose here an early stage open-source community project that aims to fill this gap, and would love for you to join us on this mission. We aim to build on the foundation laid by previous research such as Ribeiro et al. (2020), Song et al. (2020), Parrish et al. (2021), van Aken et al. (2021) and many others.
John Snow Labs has a full development team allocated to the project and is committed to improving the library for years, as we do with other open-source libraries. Expect frequent releases with new test types, tasks, languages, and platforms to be added regularly. We look forward to working together to make safe, reliable, and responsible NLP an everyday reality.