checks: add readability NLP metric check#2412
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a Readability check using the textstat library, allowing users to validate text quality metrics like Flesch Reading Ease. The implementation includes optional dependency handling and threshold validation. Reviewers suggested expanding the list of supported metrics, ensuring JSON serializability for error details, and adding error handling around the metric computation to improve robustness.
| ReadabilityMetric = Literal[ | ||
| "flesch_reading_ease", | ||
| "flesch_kincaid_grade", | ||
| "gunning_fog", | ||
| ] |
There was a problem hiding this comment.
The ReadabilityMetric literal is currently limited to only three metrics. To better align with the "configurable metric" goal mentioned in the PR description, consider expanding this list to include other common metrics supported by textstat, such as automated_readability_index, coleman_liau_index, and dale_chall_readability_score.
| return CheckResult.failure( | ||
| message=( | ||
| f"Value for key '{self.key}' must be a string, but found " | ||
| f"{type(text).__name__}." | ||
| ), | ||
| details={**details, "value": text}, | ||
| ) |
There was a problem hiding this comment.
Including the raw text value in the details dictionary when it is not a string (e.g., a complex dictionary or custom object) may lead to serialization errors if the CheckResult is converted to JSON for the UI or storage. It is safer to store a string representation of the value.
| return CheckResult.failure( | |
| message=( | |
| f"Value for key '{self.key}' must be a string, but found " | |
| f"{type(text).__name__}." | |
| ), | |
| details={**details, "value": text}, | |
| ) | |
| if not isinstance(text, str): | |
| return CheckResult.failure( | |
| message=( | |
| f"Value for key '{self.key}' must be a string, but found " | |
| f"{type(text).__name__}." | |
| ), | |
| details={**details, "value": str(text)}, | |
| ) |
| score_fn = getattr(textstat, self.metric) | ||
| score = float(score_fn(text)) |
There was a problem hiding this comment.
The call to textstat metrics can raise exceptions due to internal processing errors or missing NLTK data (e.g., the punkt tokenizer). Wrapping this in a try-except block and returning a CheckStatus.ERROR result makes the check more robust and provides better feedback than an unhandled exception.
try:
score = float(getattr(textstat, self.metric)(text))
except Exception as e:
return CheckResult(
status=CheckStatus.ERROR,
message=f"Failed to compute readability score ({self.metric}): {e}",
details={**details, "error": str(e)},
)
Updated/added unit tests; |
|
Hi @harsh21234i can you resolve the conflict. Also, can we expose what are some realistic and expected readability scores across the various proposed metrics so the end-user can understand the min-max paradigm? After we should be good to merge. |
| "rich>=14.2.0,<15", | ||
| ] | ||
|
|
||
| [project.optional-dependencies] |
There was a problem hiding this comment.
Can we make these checks more specific like readability?
|
done can you recheck sir!! |
Adds a new built-in Readability check backed by textstat, with
configurable metric, min_score, and max_score. Exposes it via
giskard.checks exports, wires it into giskard.checks.builtin, adds
textstat under the nlp optional extra (giskard-checks[nlp]), and includes
unit tests covering pass/fail cases, missing key/type errors, missing
dependency behavior, and invalid threshold ranges. Related to issue
#2349.