-
Notifications
You must be signed in to change notification settings - Fork 3k
Update text classification template labels in DatasetInfo __post_init__ #2392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lewtun
merged 28 commits into
huggingface:master
from
lewtun:refactor-text-clf-template
May 28, 2021
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
538f3be
Update labels in DatasetInfo __post_init__
lewtun c02a2e4
Add emotion example
lewtun 2eab30c
Flush task templates before casting
lewtun feaca48
Add labels to TextClassification __post_init__
lewtun 188d02c
Add comment about casting to tuple
lewtun 1e3e830
Fix capitalisation
lewtun 635e54d
Refactor tests to account for label update in `DatasetInfo`, add test
lewtun a892fde
Merge branch 'master' into refactor-text-clf-template
lewtun d85d73d
Merge branch 'master' into refactor-text-clf-template
lewtun 43f9d55
Update label schema in post_init
lewtun 5d66b4f
Use __dict__ instead of __setattr__ to update task template labels
lewtun e7b1f7a
Raise ValueError if TextClassification template has None or incompati…
lewtun 6f3ff6d
Remove task templates from emotion demo
lewtun 1bf0b5b
Add decorator to share docstrings across multiple functions
lewtun 0dda59e
Update docstring for prepare_for_task
lewtun 654b2b0
Reorder TextClassification args for better intuition
lewtun 812bd87
fix missing "task" field in json + edit copy of objects instead of mo…
lhoestq 159a6f6
style
lhoestq a580339
Fix failing tests due to new DatasetInfo.__post_init__
lewtun cff9d52
Refactor TextClassification test to cover templates w / w-out labels
lewtun 8146867
Refactor use of label names in task template concatenation test
lewtun fa53dc5
Add separate test for template with labels in DatasetInfo
lewtun f78d5c4
Fix log message
lewtun 514890d
Fix comments
lewtun 71d5ac8
Merge branch 'master' into refactor-text-clf-template
lewtun 40e4400
Remove custom feature with lazy classlabel
lewtun 321e4e6
Move conditional check of features to outer if statement
lewtun e79dfe4
Move feature is not None check to inner if-statement
lewtun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,52 +1,32 @@ | ||
| from dataclasses import dataclass | ||
| from typing import ClassVar, Dict, List | ||
| from typing import ClassVar, Dict, Optional, Tuple | ||
|
|
||
| from ..features import ClassLabel, Features, Value | ||
| from .base import TaskTemplate | ||
|
|
||
|
|
||
| class FeaturesWithLazyClassLabel: | ||
| def __init__(self, features, label_column="labels"): | ||
| assert label_column in features, f"Key '{label_column}' missing in features {features}" | ||
| self._features = features | ||
| self._label_column = label_column | ||
|
|
||
| def __get__(self, obj, objtype=None): | ||
| if obj is None: | ||
| return self._features | ||
|
|
||
| assert hasattr(obj, self._label_column), f"Object has no attribute '{self._label_column}'" | ||
| features = self._features.copy() | ||
| features["labels"] = ClassLabel(names=getattr(obj, self._label_column)) | ||
| return features | ||
|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class TextClassification(TaskTemplate): | ||
| task: ClassVar[str] = "text-classification" | ||
| # `task` is not a ClassVar since we want it to be part of the `asdict` output for JSON serialization | ||
| task: str = "text-classification" | ||
lhoestq marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| input_schema: ClassVar[Features] = Features({"text": Value("string")}) | ||
| # TODO(lewtun): Find a more elegant approach without descriptors. | ||
| label_schema: ClassVar[Features] = FeaturesWithLazyClassLabel(Features({"labels": ClassLabel})) | ||
| labels: List[str] | ||
| label_schema: ClassVar[Features] = Features({"labels": ClassLabel}) | ||
| text_column: str = "text" | ||
| label_column: str = "labels" | ||
| labels: Optional[Tuple[str]] = None | ||
|
|
||
| def __post_init__(self): | ||
| assert len(self.labels) == len(set(self.labels)), "Labels must be unique" | ||
| # Cast labels to tuple to allow hashing | ||
| self.__dict__["labels"] = tuple(sorted(self.labels)) | ||
| if self.labels: | ||
| assert len(self.labels) == len(set(self.labels)), "Labels must be unique" | ||
| # Cast labels to tuple to allow hashing | ||
| self.__dict__["labels"] = tuple(sorted(self.labels)) | ||
| self.__dict__["label_schema"] = self.label_schema.copy() | ||
| self.label_schema["labels"] = ClassLabel(names=self.labels) | ||
|
|
||
| @property | ||
| def column_mapping(self) -> Dict[str, str]: | ||
| return { | ||
| self.text_column: "text", | ||
| self.label_column: "labels", | ||
| } | ||
|
|
||
| @property | ||
| def label2id(self): | ||
| return {label: idx for idx, label in enumerate(self.labels)} | ||
|
|
||
| @property | ||
| def id2label(self): | ||
| return {idx: label for idx, label in enumerate(self.labels)} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| from typing import Callable | ||
|
|
||
|
|
||
| def is_documented_by(function_with_docstring: Callable): | ||
| """Decorator to share docstrings across common functions. | ||
|
|
||
| Args: | ||
| function_with_docstring (`Callable`): Name of the function with the docstring. | ||
| """ | ||
|
|
||
| def wrapper(target_function): | ||
| target_function.__doc__ = function_with_docstring.__doc__ | ||
| return target_function | ||
|
|
||
| return wrapper |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.