Update duration-filtering.md #1243

arhamm1 · 2025-11-19T00:52:38Z

Description

Fixed unclear sections.

Want to confirm if the current code snippet under CalculateSpeechRateStage works?

If not, should something like this be added?

Calculating Speech Rate Metrics

To calculate speech rate, create a custom stage using the utility functions from NeMo Curator:

from dataclasses import dataclass
from nemo_curator.stages.audio.common import LegacySpeechStage
from nemo_curator.stages.audio.metrics.get_wer import get_wordrate, get_charrate
from nemo_curator.tasks import AudioBatch


@dataclass
class CalculateSpeechRateStage(LegacySpeechStage):
    """
    Calculate speech rate metrics (word rate and character rate) for audio samples.
    
    Args:
        text_key: Key containing the transcript text
        duration_key: Key containing the audio duration in seconds
        word_rate_key: Key to store words per second (default: "word_rate")
        char_rate_key: Key to store characters per second (default: "char_rate")
    """
    text_key: str = "text"
    duration_key: str = "duration"
    word_rate_key: str = "word_rate"
    char_rate_key: str = "char_rate"
    
    def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]:
        """Calculate and add speech rate metrics to the data entry."""
        text = data_entry[self.text_key]
        duration = data_entry[self.duration_key]
        
        # Calculate rates using utility functions
        data_entry[self.word_rate_key] = get_wordrate(text, duration)
        data_entry[self.char_rate_key] = get_charrate(text, duration)
        
        return [AudioBatch(data=data_entry)]

## Checklist
<!--
Note: All commits need to be signed and signed off. This can be done via `-sS` flags while commiting
`git commit -sS -m "...."
-->
- [ ] I am familiar with the [Contributing Guide](https://github.com/NVIDIA-NeMo/Curator/blob/main/CONTRIBUTING.md).
- [ ] New or Existing tests cover these changes.
- [ ] The documentation is up to date with these changes.

Signed-off-by: Arham Mehta <[email protected]>

greptile-apps · 2025-11-19T00:53:32Z

Greptile Overview

Greptile Summary

Documentation improvements that enhance clarity and usability without introducing any code changes. The PR addresses unclear sections by adding helpful context about speech rate metrics, reorganizing the filtering section with better headings and transitions, and providing a practical reference table with typical speech rate ranges for different contexts.

Added introductory explanation for speech rate metrics
Reorganized "Filtering by Speech Rate" section with clearer structure
Added comprehensive reference table showing typical speech rate ranges (slow/clear, normal, fast, very fast) with corresponding use cases
All referenced functions (get_wordrate(), get_charrate()) exist in the codebase
No functional code changes, only documentation improvements

Confidence Score: 5/5

This PR is safe to merge with no risk - it only contains documentation improvements
Documentation-only changes that improve clarity without modifying any functional code. All referenced functions and modules exist in the codebase. The additions are well-structured and provide valuable reference information for users.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
docs/curate-audio/process-data/quality-assessment/duration-filtering.md	5/5	Improved documentation clarity by adding context about speech rate metrics, reorganizing sections, and adding a helpful reference table for speech rate ranges

Sequence Diagram

sequenceDiagram
    participant User
    participant Documentation
    participant SpeechRate as Speech Rate Section
    participant Table as Reference Table
    
    User->>Documentation: Read duration-filtering.md
    Documentation->>SpeechRate: Added introductory context (line 109)
    Note over SpeechRate: "Speech rate metrics help identify<br/>appropriate speaking speeds"
    
    Documentation->>SpeechRate: Reorganized "Filtering by Speech Rate" (line 149)
    Note over SpeechRate: Improved section heading and<br/>added transitional text
    
    Documentation->>Table: Added "Normal Speech Rate Ranges" table (line 189-198)
    Note over Table: Reference values for:<br/>- Slow/Clear: 1.5-2.5 WPS<br/>- Normal: 2.5-4.0 WPS<br/>- Fast: 4.0-5.0 WPS<br/>- Very Fast: >5.0 WPS
    
    User->>Table: Reference speech rate values
    Table-->>User: Context-specific guidance

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Update duration-filtering.md

e57b45b

Signed-off-by: Arham Mehta <[email protected]>

arhamm1 requested review from karpnv and lbliii November 19, 2025 00:52

greptile-apps bot reviewed Nov 19, 2025

View reviewed changes

lbliii approved these changes Nov 25, 2025

View reviewed changes

Merge branch 'main' into arhamm1-patch-6

6a08276

lbliii enabled auto-merge (squash) November 25, 2025 19:12

lbliii merged commit 92c586b into main Nov 25, 2025
12 checks passed

greptile-apps bot reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update duration-filtering.md #1243

Update duration-filtering.md #1243

Uh oh!

arhamm1 commented Nov 19, 2025

Uh oh!

greptile-apps bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update duration-filtering.md #1243

Update duration-filtering.md #1243

Uh oh!

Conversation

arhamm1 commented Nov 19, 2025

Description

Calculating Speech Rate Metrics

Uh oh!

greptile-apps bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Nov 19, 2025 •

edited

Loading