Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 38 additions & 4 deletions docs/pages/docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,16 @@ modify_date: "2019-05-16"

<div class="main-docs" markdown="1"><div class="h3-box" markdown="1">

Supported data input formats are task-dependent. For `ner` and `text-classification`, the user is meant to provide a **`CoNLL`** or **`CSV`** dataset. For `question-answering` the user is meant to choose from a list of benchmark datasets.
Supported data input formats are task-dependent. For `ner` and `text-classification`, the user is meant to provide a **`CoNLL`** or **`CSV`** dataset. For `question-answering`, `summarization` and `toxicity` the user is meant to choose from a list of benchmark datasets we support.

{:.table2}
| Task | Supported Data Inputs |
| - | - |
|**ner** |CoNLL and CSV|
|**text-classification** |CSV
|**text-classification** |CSV or a Dictionary (containing the name, subset, split, feature_column and target_column for loading the HF dataset.)
|**question-answering** |Select list of benchmark datasets
|**summarization** |Select list of benchmark datasets
|**toxicity** |Select list of benchmark datasets

</div><div class="h3-box" markdown="1">

Expand Down Expand Up @@ -69,7 +70,7 @@ harness = Harness(task='ner',

### Text Classification

There is 1 option for datasets to test Text Classification models: **`CSV`** datasets. Here are some details of what these may look like:
There are 2 options for datasets to test Text Classification models: **`CSV`** datasets or a **`Dictionary`** containing the name, subset, split, feature_column and target_column for loading the HF datasets. Here are some details of what these may look like:

#### CSV Format for Text Classification

Expand All @@ -90,7 +91,7 @@ For `CSV` files, we support different variations of the column names. They are s

</div><div class="h3-box" markdown="1">

#### Passing a Text Classification Dataset to the Harness
#### Passing a CSV Text Classification Dataset to the Harness

In the Harness, we specify the data input in the following way:

Expand All @@ -107,6 +108,39 @@ harness = Harness(task='text-classification',

</div><div class="h3-box" markdown="1">

#### Dictionary Format for Text Classification
To handle text classification task for Hugging Face Datasets, the Harness class accepts the data parameter as a dictionary with following attributes:

```python
{
"name": "",
"subset": "",
"feature_column": "",
"target_column": "",
"split": ""
}
```

#### Passing a Hugging Face Dataset for Text Classification to the Harness

In the Harness, we specify the data input in the following way:

```python
# Import Harness from the nlptest library
from nlptest import Harness

harness = Harness(task="text-classification", hub="huggingface",
model="distilbert-base-uncased-finetuned-sst-2-english",
data={"name":'glue',
"subset":"sst2",
"feature_column":"sentence",
"target_column":'label',
"split":"train"
})
```

</div><div class="h3-box" markdown="1">

### Question Answering

To test Question Answering models, the user is meant to select a benchmark dataset from the following list:
Expand Down
4 changes: 2 additions & 2 deletions docs/pages/docs/harness.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ Here is a list of the different parameters that can be passed to the `Harness` c
| Parameter | Description |
| - | - |
|**task** |Task for which the model is to be evaluated ('text-classification', 'question-answering', 'ner')|
|**model** |Pretrained pipeline or model from the corresponding hub, or path to a saved model from the corresponding hub, or PipelineModel object - see [Model Input](https://nlptest.org/docs/pages/docs/model_input) for more details
|**model** |Pretrained pipeline or model from the corresponding hub, or path to a saved model from the corresponding hub, or PipelineModel object or a dictionary containing the names of the models you want to compare, each paired with its respective hub - see [Model Input](https://nlptest.org/docs/pages/docs/model_input) for more details
|**hub** |Hub (library) to use in back-end for loading model from public models hub or from path|
|**data** |Path to the data to be used for evaluation. Should be `.csv` for text classification, or `.conll` or `.txt` file in CoNLL format for NER - see [Data Input](https://nlptest.org/docs/pages/docs/data_input) for more details
|**data** |Path to the data to be used for evaluation. Should be `.csv` or a dictionary containing the name, subset, split, feature_column and target_column for loading the HF dataset for text classification, or `.conll` or `.txt` file in CoNLL format for NER - see [Data Input](https://nlptest.org/docs/pages/docs/data_input) for more details
|**config** |Path to the YAML file with configuration of tests to be performed

</div></div>
30 changes: 30 additions & 0 deletions docs/pages/docs/one_liner.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,33 @@ h.generate().run().report()
</div>

</div>


### One Liner - Model Comparisons

To compare different models (either from same or different hubs) on the same task and test configuration, you can pass a dictionary to the 'model' parameter of the harness. This dictionary should contain the names of the models you want to compare, each paired with its respective hub.

<div id="one_liner_text_tab" class="tabs-wrapper h3-box">
<div class="tabs-body">
<div class="tabs-item">
<div class="highlight-box">
{% highlight python %}
from nlptest import Harness

# Define the dictionary
model_comparison_dict = {
"ner.dl":"johnsnowlabs",
"dslim/bert-base-NER":"huggingface",
"en_core_web_sm":"spacy"
}

# Create a Harness object
harness = Harness(task='ner', model=model_comparison_dict, data="/path-to-test-conll")

# Generate, run and get a report on your test cases
h.generate().run().report()
{% endhighlight %}
</div>
</div>
</div>
</div>
2 changes: 2 additions & 0 deletions docs/pages/tutorials/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ The following table gives an overview of the different tutorial notebooks. We ha
|Representation Tests |John Snow Labs |NER |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/test-specific-notebooks/Representation_Demo.ipynb)|
|Robustness Tests |John Snow Labs |NER |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/test-specific-notebooks/Robustness_DEMO.ipynb)|
|Toxicity Test |OpenAI |Toxicity|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/Toxicity_NB.ipynb)|
|Custom Bias |Spacy |NER/Text-Classification|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb)|
|End-to-End Workflow |John Snow Labs |NER |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Notebook.ipynb)|
|End-to-End Custom Pipeline Workflow |John Snow Labs |NER |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Custom_Pipeline_Notebook.ipynb)|
|End-to-End Workflow |Spacy |NER |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/Spacy_Real_World_Notebook.ipynb)|
Expand All @@ -46,6 +47,7 @@ The following table gives an overview of the different tutorial notebooks. We ha
|TruthfulQA |OpenAI |Question-Answering |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/TruthfulQA_dataset.ipynb)|
|NarrativeQA |OpenAI |Question-Answering |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/NarrativeQA_Question_Answering.ipynb)|
|HellaSWag |OpenAI |Question-Answering |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/HellaSwag_Question_Answering.ipynb)|
|HuggingFaceDataset-Support |Hugging Face |Text-Classification |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/end-to-end-notebooks/HuggingFace_Dataset_Notebook.ipynb)|

<style>
.heading {
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
# For a discussion on single-sourcing the version across setup.py and the
# project code, see
# https://packaging.python.org/guides/single-sourcing-package-version/
version="1.4.0", # Required
version="1.5.0.rc.1", # Required
# This is a one-line description or tagline of what your project does. This
# corresponds to the "Summary" metadata field:
# https://packaging.python.org/specifications/core-metadata/#summary
Expand Down Expand Up @@ -174,7 +174,7 @@
"data/HellaSwag/*",
"data/toxicity/*",
"data/TruthfulQA/*",
"data/Quac/*",
"data/quac/*",
"data/OpenBookQA/*",
"data/MMLU/*",
"data/NarrativeQA/*"
Expand Down