-
Notifications
You must be signed in to change notification settings - Fork 69
Core: Add Jupyter Notebook (.ipynb) translation support #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: doggy8088 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for translating Jupyter Notebook (.ipynb) files by introducing a JupyterNotebookTranslator and weaving it into the existing translation workflow.
- Added
JupyterNotebookTranslatorto extract and translate markdown cells while preserving code and metadata. - Introduced
SUPPORTED_NOTEBOOK_EXTENSIONSand integrated notebook handling in bothProjectTranslatorandTranslationManager. - Updated async workflows to include notebook translation alongside markdown.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/co_op_translator/core/project/test_jupyter_notebook_integration.py | New integration tests for notebook translation |
| tests/co_op_translator/core/llm/test_jupyter_notebook_translator.py | New unit tests for JupyterNotebookTranslator |
| src/co_op_translator/config/constants.py | Defined SUPPORTED_NOTEBOOK_EXTENSIONS |
| src/co_op_translator/core/llm/jupyter_notebook_translator.py | Implemented notebook translation logic |
| src/co_op_translator/core/llm/init.py | Exposed JupyterNotebookTranslator in __all__ |
| src/co_op_translator/core/project/project_translator.py | Initialized notebook translator and updated workflow |
| src/co_op_translator/core/project/translation_manager.py | Added methods to discover and translate notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @doggy8088, great work on adding Jupyter notebook support! 👍
I noticed there's a small issue with the filter_files that's causing some test failures.
Note
File: src/co_op_translator/utils/common/file_utils.py
Lines: 178-204
The function currently only accepts 2 parameters, but in translation_manager.py it's being called with 3 arguments (including the file extension). This is throwing a TypeError.
Could you please update the function signature to include the optional extension parameter and update the logic ? Something like:
def filter_files(directory: str | Path, excluded_dirs, extension: str = None) -> list:
"""
Filter and return only the files in the given directory, excluding specified directories.
Optionally filter by file extension.
Args:
directory (str | Path): The directory path to search for files.
excluded_dirs (set): A set of directory names to exclude from the search.
extension (str, optional): File extension to filter by (e.g., '.ipynb').
If None, all file types are included.
Returns:
list: A list of Path objects representing only the files (excluding specified directories).
"""
directory = Path(directory)
files = []
# Recursively traverse the directory
for path in directory.rglob("*"):
# Check if the path is a file, matches extension if specified, and doesn't contain excluded dirs
if (
path.is_file()
and (extension is None or path.suffix.lower() == extension.lower())
and not any(excluded_dir in path.parts for excluded_dir in excluded_dirs)
):
files.append(path)
return files
I've tested the fixes locally and they work perfectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the test file needs to be updated to properly mock the JupyterNotebookTranslator and avoid test failures.
Note
File: tests/co_op_translator/core/project/test_project_translator.py
I've tested the fixes locally and they work perfectly! Here are the complete updated files to resolve all test failures:
Could you please replace the entire file content (test_project_translator.py) with this tested version?
Here's what needs to be added to support Jupyter notebook extension:
import pytest
import asyncio
from unittest.mock import AsyncMock, MagicMock, patch
from co_op_translator.core.project.project_translator import ProjectTranslator
from co_op_translator.config.llm_config.provider import LLMProvider
@pytest.fixture
async def temp_project_dir(tmp_path):
"""Create a temporary project directory structure."""
# Create project structure
docs_dir = tmp_path / "docs"
docs_dir.mkdir()
(docs_dir / "test.md").write_text("# Test Document\nThis is a test.")
images_dir = tmp_path / "images"
images_dir.mkdir()
(images_dir / "test.png").touch()
translations_dir = tmp_path / "translations"
translations_dir.mkdir()
return tmp_path
@pytest.fixture
async def project_translator(temp_project_dir):
"""Create a ProjectTranslator instance with mocked dependencies."""
with (
patch(
"co_op_translator.core.llm.text_translator.TextTranslator"
) as mock_text_translator,
patch(
"co_op_translator.core.llm.markdown_translator.MarkdownTranslator"
) as mock_markdown_translator,
patch(
"co_op_translator.core.vision.image_translator.ImageTranslator"
) as mock_image_translator,
patch(
"co_op_translator.core.llm.jupyter_notebook_translator.JupyterNotebookTranslator"
) as mock_jupyter_translator,
patch(
"co_op_translator.config.llm_config.config.LLMConfig.get_available_provider"
) as mock_get_provider,
):
# Setup mock translators and config
mock_text_translator.create.return_value = MagicMock()
mock_markdown_translator.create.return_value = MagicMock()
mock_image_translator.create.return_value = MagicMock()
mock_jupyter_translator.create.return_value = MagicMock()
mock_get_provider.return_value = LLMProvider.AZURE_OPENAI # Mock LLM provider
translator = ProjectTranslator("ko ja", root_dir=temp_project_dir)
translator.translation_manager.translate_all_markdown_files = AsyncMock(
return_value=(2, [])
)
translator.translation_manager.translate_all_image_files = AsyncMock(
return_value=(0, [])
)
# Mock translate_project_async to avoid unawaited coroutine warning
translator.translation_manager.translate_project_async = AsyncMock(
return_value=None
)
yield translator
@pytest.mark.asyncio
async def test_check_and_retry_translations(project_translator, temp_project_dir):
"""Test checking and retrying translations."""
# Setup
md_file = temp_project_dir / "docs" / "test.md"
translated_file = temp_project_dir / "translations" / "ko" / "docs" / "test.md"
translated_file.parent.mkdir(parents=True)
translated_file.write_text(
"# Test\nBroken translation"
) # Create a "broken" translation
# Mock translation methods with proper async behavior
project_translator.translation_manager.check_outdated_files = AsyncMock(
return_value=(1, [])
)
project_translator.translation_manager.translate_all_markdown_files = AsyncMock(
return_value=(2, [])
)
project_translator.translation_manager.translate_all_image_files = AsyncMock(
return_value=(0, [])
)
# Execute and verify
total_count, errors = await project_translator.check_and_retry_translations()
# Verify results
assert total_count == 3 # 1 (outdated) + 2 (markdown) + 0 (images)
assert errors == [] # No errors expected
assert project_translator.translation_manager.check_outdated_files.called
assert project_translator.translation_manager.translate_all_markdown_files.called
assert project_translator.translation_manager.translate_all_image_files.called
def test_translate_project(project_translator):
"""Test the synchronous translate_project method."""
# Setup
with patch.object(asyncio, "run", side_effect=lambda x: None) as mock_run:
# Execute
project_translator.translate_project(images=True, markdown=True)
# Verify
mock_run.assert_called_once()
@pytest.mark.asyncio
async def test_markdown_only_mode(temp_project_dir):
"""Test ProjectTranslator in markdown-only mode."""
with (
patch(
"co_op_translator.core.llm.text_translator.TextTranslator"
) as mock_text_translator,
patch(
"co_op_translator.core.llm.markdown_translator.MarkdownTranslator"
) as mock_markdown_translator,
patch(
"co_op_translator.core.llm.jupyter_notebook_translator.JupyterNotebookTranslator"
) as mock_jupyter_translator,
patch(
"co_op_translator.config.llm_config.config.LLMConfig.get_available_provider"
) as mock_get_provider,
):
# Setup translator mocks
mock_text_translator_instance = AsyncMock()
mock_text_translator.create.return_value = mock_text_translator_instance
mock_markdown_translator_instance = AsyncMock()
mock_markdown_translator.create.return_value = mock_markdown_translator_instance
mock_jupyter_translator_instance = AsyncMock()
mock_jupyter_translator.create.return_value = mock_jupyter_translator_instance
mock_get_provider.return_value = LLMProvider.AZURE_OPENAI
# Create translator in markdown-only mode
translator = ProjectTranslator(
"ko ja", root_dir=temp_project_dir, markdown_only=True
)
# Mock the async methods after initialization
translator.translation_manager.translate_project_async = AsyncMock()
translator.translation_manager.check_outdated_files = AsyncMock(
return_value=(0, [])
)
translator.translation_manager.translate_all_markdown_files = AsyncMock(
return_value=(0, [])
)
translator.translation_manager.translate_all_image_files = AsyncMock(
return_value=(0, [])
)
# Verify markdown-only mode configuration
assert translator.markdown_only is True
assert translator.image_translator is None
# Test async operation to ensure all coroutines are properly handled
await translator.translation_manager.translate_project_async(markdown=True)
assert (
translator.translation_manager.translate_project_async.call_args.kwargs[
"markdown"
]
is True
)|
I just ran Co-op Translator with a notebook locally, and it worked perfectly, awesome work, @doggy8088! One thing I noticed is that every Markdown cell includes metadata and a disclaimer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion I have is to make the addition of metadata and disclaimers configurable in the translate_markdown function. Currently, these are added to every Markdown cell, which may not be necessary in all contexts.
Note
File: src\co_op_translator\core\llm\markdown_translator.py
Lines: 87-173
Here's a proposed refactor of the function to support optional metadata and disclaimers via parameters:
I tested the code below, and it worked well!
async def translate_markdown(
self,
document: str,
language_code: str,
md_file_path: str | Path,
markdown_only: bool = False,
add_metadata: bool = True,
add_disclaimer: bool = True,
) -> str:
"""Translate markdown document to target language.
Handles complex documents by splitting into manageable chunks while
preserving formatting, links, and code blocks.
Args:
document: Content of the markdown file
language_code: Target language code
md_file_path: Path to the markdown file
markdown_only: Skip embedded image translation if True
add_metadata: Whether to add metadata comment at the beginning
add_disclaimer: Whether to add disclaimer at the end
Returns:
str: The translated content with optional metadata and disclaimer.
"""
md_file_path = Path(md_file_path)
# Create and format metadata (only if requested)
metadata_comment = ""
if add_metadata:
metadata = self.create_metadata(md_file_path, language_code)
metadata_comment = self.format_metadata_comment(metadata)
# Step 1: Replace code blocks and inline code with placeholders
(
document_with_placeholders,
placeholder_map,
) = replace_code_blocks_and_inline_code(document)
# Step 2: Split the document into chunks and generate prompts
link_limit = 30
if count_links_in_markdown(document_with_placeholders) > link_limit:
logger.info(
f"Document contains more than {link_limit} links, splitting the document into chunks."
)
document_chunks = process_markdown_with_many_links(
document_with_placeholders, link_limit
)
else:
logger.info(
f"Document contains {link_limit} or fewer links, processing normally."
)
document_chunks = process_markdown(document_with_placeholders)
# Step 3: Generate translation prompts and translate each chunk
language_name = self.font_config.get_language_name(language_code)
is_rtl = self.font_config.is_rtl(language_code)
prompts = [
generate_prompt_template(language_code, language_name, chunk, is_rtl)
for chunk in document_chunks
]
results = await self._run_prompts_sequentially(prompts)
translated_content = "\n".join(results)
# Step 4: Restore the code blocks and inline code from placeholders
translated_content = restore_code_blocks_and_inline_code(
translated_content, placeholder_map
)
# Step 5: Update links
updated_content = update_links(
md_file_path,
translated_content,
language_code,
self.root_dir,
markdown_only=markdown_only,
)
# Step 6: Add metadata and disclaimer (only if requested)
result = updated_content
if add_metadata:
result = metadata_comment + result
if add_disclaimer:
disclaimer = await self.generate_disclaimer(language_code)
result = result + "\n\n" + disclaimer
return result
…own function Co-authored-by: doggy8088 <[email protected]>
|
Hi @doggy8088, As the implementation is mostly complete and has been inactive for some time, I'll proceed with merging it within the next few days. I'll handle the remaining items (tests, cleanup) myself. |
|
Thanks again @doggy8088 for the contribution! |
Purpose
This pull request introduces support for translating Jupyter Notebook (
.ipynb) files in theco_op_translatorproject.Description
The changes include adding a new
JupyterNotebookTranslatorclass, integrating it into the project translation workflow, and updating relevant configurations and methods to handle notebook files.Related Issue
Fixes #51
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Type of change
Checklist
Before submitting your pull request, please confirm the following:
Additional context
Notebook Translation Support:
JupyterNotebookTranslatorclass: Added a dedicated class insrc/co_op_translator/core/llm/jupyter_notebook_translator.pyto handle the translation of Jupyter Notebook files. It extracts markdown cells for translation while preserving code cells and metadata.SUPPORTED_NOTEBOOK_EXTENSIONSinsrc/co_op_translator/config/constants.pyto define supported notebook file extensions.Integration into Translation Workflow:
Core module updates:
JupyterNotebookTranslatorinsrc/co_op_translator/core/llm/__init__.pyand added it to the__all__list.src/co_op_translator/core/project/project_translator.pyto initialize and use the notebook translator in theProjectTranslatorclass.Translation manager enhancements:
TranslationManagerclass insrc/co_op_translator/core/project/translation_manager.py. Added methodstranslate_notebookandtranslate_all_notebook_filesto handle individual and bulk notebook translations.translate_project_asyncto include notebook translation when processing markdown files.get_outdated_translationsto account for outdated notebook translations.These changes collectively enable seamless translation of Jupyter Notebook files within the existing project translation workflow.