Skip to content

Conversation

@skytin1004
Copy link
Collaborator

@skytin1004 skytin1004 commented Aug 15, 2025

Enable hash-based change detection for notebook translator

Purpose

Implements intelligent change detection for Jupyter notebook translation to optimize translation efficiency and ensure translations are always up-to-date. Previously, notebook translations couldn't detect when source files changed, leading to unnecessary re-translations or outdated translations.

Description

This PR adds comprehensive change detection capabilities to the notebook translator:

Key Features:

  • Hash-based change detection: Compares original file hash with stored metadata to detect changes
  • Smart translation skipping: Skips translation when notebooks are already up-to-date
  • Automatic retranslation: Triggers retranslation when source files are modified
  • Metadata integration: Adds coopTranslator metadata to translated notebooks for tracking
  • File type awareness: Ensures retranslation logic uses correct translator (notebook vs markdown)

Technical Changes:

  • Moved notebook metadata functions from notebook_utils.py to metadata_utils.py for better organization
  • Added add_notebook_metadata(), read_notebook_metadata(), and is_notebook_up_to_date() functions
  • Enhanced JupyterNotebookTranslator to automatically add metadata after translation
  • Fixed retranslate_outdated_files() to use appropriate translator based on file extension
  • Added comprehensive test coverage for all new functionality

Related Issue

Closes #[issue_number] (if applicable)

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?

  • Yes
  • No

This change is backward compatible. Existing translated notebooks without metadata will be treated as outdated and retranslated with the new metadata system.

Type of change

  • Bugfix
  • Feature
  • Code style update (e.g., formatting, local variables)
  • Refactoring (no functional or API changes)
  • Documentation content changes
  • Other... Please describe:

Checklist

Before submitting your pull request, please confirm the following:

  • I have thoroughly tested my changes: I confirm that I have run the code and manually tested all affected areas.
  • All existing tests pass: I have run all tests and confirmed that nothing is broken.
  • I have added new tests (if applicable): I have written tests that cover the new functionality introduced by my code changes.
  • I have followed the Co-op Translators coding conventions: My code adheres to the style guide and coding conventions outlined in the repository.
  • I have documented my changes (if applicable): I have updated the documentation to reflect the changes where necessary.

Additional context

Files Modified:

  • src/co_op_translator/utils/common/metadata_utils.py - Added notebook metadata functions
  • src/co_op_translator/core/llm/jupyter_notebook_translator.py - Added metadata integration
  • src/co_op_translator/core/project/translation_manager.py - Enhanced retranslation logic
  • tests/ - Added comprehensive test coverage

Testing:

  • ✅ All existing tests pass
  • ✅ New tests cover metadata functions, notebook translation, and change detection
  • ✅ Manual testing confirms notebooks are correctly translated and tracked

Performance Impact:

  • Significantly reduces unnecessary API calls for unchanged notebooks
  • Minimal overhead from hash calculation and metadata operations
  • Smart caching prevents redundant translations

This enhancement brings notebook translation in line with the existing markdown translation optimization, providing a consistent and efficient translation experience across all supported file types.

… up-to-date notebooks

- Add calculate_string_hash for per-cell change detection
- Store original_hash/language_code in notebook.metadata.coopTranslator
- Store source_hash in each markdown cell’s metadata (cell.metadata.coopTranslator)
- Reuse unchanged cells; only retranslate modified cells
- Skip notebook translation when original_hash matches unless update=True

Affects: JupyterNotebookTranslator, TranslationManager, metadata_utils
@skytin1004 skytin1004 self-assigned this Aug 15, 2025
@skytin1004 skytin1004 added the core Related to any changes in core source files label Aug 15, 2025
@github-actions github-actions bot changed the title Enable hash-based change detection for notebook translator Core: Enable hash-based change detection for notebook translator Aug 15, 2025
@github-actions github-actions bot added the tests label Aug 15, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 15, 2025
- Update error messages from Computer Vision to Azure AI Service
- Change environment variable references from AZURE_COMPUTER_VISION_KEY to AZURE_AI_SERVICE_API_KEY
- Rename AzureComputerVisionConfig class to AzureAIVisionConfig
- Update docstrings and comments to reflect Azure AI Service branding
Notebooks were always considered outdated because _is_translation_outdated
was looking for HTML comment metadata instead of JSON metadata format.
- docs: fix CLI option inconsistencies in command-reference.md
- docs: resolve README.md markdown linting errors
- docs: add beta warning for evaluation functionality
- fix: remove duplicate method definition in font_config.py
- docs: add evaluation command documentation and examples
@github-actions github-actions bot added the build Related to the build process, dependency management, and CI/CD configurations label Aug 17, 2025
@skytin1004
Copy link
Collaborator Author

I have reviewed the changes and everything looks good.

@skytin1004 skytin1004 merged commit 235d463 into Azure:main Aug 17, 2025
2 checks passed
@skytin1004 skytin1004 deleted the enable-notebook-translator branch August 17, 2025 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Related to the build process, dependency management, and CI/CD configurations core Related to any changes in core source files documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant