Skip to content

✨(summary) add localization support for transcription context text#982

Merged
lebaudantoine merged 1 commit intomainfrom
localize-summary
Feb 25, 2026
Merged

✨(summary) add localization support for transcription context text#982
lebaudantoine merged 1 commit intomainfrom
localize-summary

Conversation

@cameledev
Copy link
Collaborator

@cameledev cameledev commented Feb 19, 2026

Purpose

Currently, transcription/summarization results use French structure text ("Réunion du..."), independently of user preference and/or meeting language. This PR aims to provide localization for this microservice.

Proposal

  • Add new parameter context_language to summary endpoint
  • Select context language by order of priority: context_language parameter > language parameter (language of meeting specified by user) > default_context_language
  • If language is not supported, fallback to default
  • Given the small number of strings to translate, it seems acceptable to stay within python, and not introduce additional dependencies (poedit, etc.).

TODO:

  • Test run

@cameledev cameledev changed the title Localize summary ✨(summary) add localization support for transcription summaries Feb 24, 2026
@cameledev cameledev changed the title ✨(summary) add localization support for transcription summaries ✨(summary) add localization support for transcription context text Feb 24, 2026
@cameledev cameledev marked this pull request as ready for review February 25, 2026 14:15
Copy link
Collaborator

@lebaudantoine lebaudantoine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nit-picking comment as I'm a never-nester.



@dataclass(frozen=True)
class LocaleStrings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, we'll refactor this approach as soon as the current system is not manageable anymore.

@lebaudantoine
Copy link
Collaborator

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Walkthrough

This pull request introduces localization support for transcription context text throughout the summary service. The changes establish a new locale system with a LocaleStrings dataclass and create locale-specific modules (fr, en, de, nl) containing translated strings and templates. The TranscriptFormatter class is refactored to accept and use locale parameters instead of hard-coded strings. A context_language parameter is added to the API request model and propagated through the Celery worker task pipeline to enable locale-aware formatting. Configuration settings are updated to support locale resolution with fallback precedence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding localization support for transcription context text in the summary service.
Description check ✅ Passed The description clearly explains the purpose, proposal, and implementation approach for adding localization support to the summary microservice.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/summary/summary/core/celery_worker.py (1)

75-90: 🧹 Nitpick | 🔵 Trivial

Hardcoded French strings in format_actions remain un-localized.

"Assignée à", "Échéance", and "Prochaines étapes" are still hardcoded in French. While this function is in the summarization path (not transcription), it produces user-visible text. Consider localizing these strings in a follow-up to complete the localization effort.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/summary/summary/core/celery_worker.py` around lines 75 - 90, The function
format_actions currently emits hardcoded French strings; update format_actions
to produce localized text by using the project's i18n mechanism (e.g.,
passing/using a translation function like _ or i18n.t) or by accepting a
locale/translator parameter, then replace "Assignée à", "Échéance" and the
header "Prochaines étapes" with translated strings (e.g., _('Assigned to'),
_('Due date'), _('Next steps')) so the markdown lines and header are localized
while preserving existing variables title, assignees, and due_date in the
output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/summary/summary/core/config.py`:
- Around line 62-64: The current free-form default_context_language str can
accept invalid locale codes; update the config to validate it against the set of
supported locales by either: (1) changing the field type to a constrained type
(an Enum or Literal) representing allowed locale codes, or (2) adding a
validator function (e.g., a pydantic `@validator` or dataclass post-init) that
checks default_context_language against the project's SUPPORTED_LOCALES constant
and raises a clear ValueError on mismatch; modify the declaration around
default_context_language and add the validator reference so invalid values are
rejected at config load time.

In `@src/summary/summary/core/locales/__init__.py`:
- Around line 19-30: Normalize incoming language tags in the resolver by
lowercasing and converting underscores to dashes (e.g., lang =
lang.lower().replace("_","-")) before membership checks, use the same
normalization when computing base_lang, and when returning the default use a
safe lookup for get_settings().default_context_language (e.g., check if it
exists in _LOCALES and if not fall back to a stable default such as "en" or the
first key from _LOCALES) so that KeyError is avoided; update the loop and the
final return to reference _LOCALES and get_settings().default_context_language
with these presence checks and normalization.

In `@src/summary/summary/core/transcript_formatter.py`:
- Around line 91-93: Add a startup-time validation that ensures all locale
template placeholder names match the expected keys by iterating over each
LocaleStrings instance and calling each template's .format() with a dummy
mapping for the named placeholders (download_link, room, room_recording_date,
room_recording_time) to surface KeyErrors early; implement this check in the
application initialization path (e.g., where locales are loaded) and reference
specific template attributes like download_header_template used in
transcript_formatter.py so any mismatched placeholders (e.g., {downlaod_link})
fail at startup rather than runtime.

---

Outside diff comments:
In `@src/summary/summary/core/celery_worker.py`:
- Around line 75-90: The function format_actions currently emits hardcoded
French strings; update format_actions to produce localized text by using the
project's i18n mechanism (e.g., passing/using a translation function like _ or
i18n.t) or by accepting a locale/translator parameter, then replace "Assignée
à", "Échéance" and the header "Prochaines étapes" with translated strings (e.g.,
_('Assigned to'), _('Due date'), _('Next steps')) so the markdown lines and
header are localized while preserving existing variables title, assignees, and
due_date in the output.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e647787 and 040d6ae.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • gitlint/gitlint_emoji.py
  • src/backend/core/recording/event/notification.py
  • src/summary/summary/api/route/tasks.py
  • src/summary/summary/core/celery_worker.py
  • src/summary/summary/core/config.py
  • src/summary/summary/core/locales/__init__.py
  • src/summary/summary/core/locales/de.py
  • src/summary/summary/core/locales/en.py
  • src/summary/summary/core/locales/fr.py
  • src/summary/summary/core/locales/nl.py
  • src/summary/summary/core/locales/strings.py
  • src/summary/summary/core/transcript_formatter.py

Comment on lines 62 to 64
# Locale
default_context_language: str = "fr"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Validate default_context_language against supported locale codes.

Using a free-form str allows invalid values to pass config loading and fail later at runtime.

Proposed validation change
-from typing import Annotated, List, Optional, Set
+from typing import Annotated, List, Literal, Optional, Set
@@
-    default_context_language: str = "fr"
+    default_context_language: Literal["fr", "en", "de", "nl"] = "fr"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Locale
default_context_language: str = "fr"
from typing import Annotated, List, Literal, Optional, Set
# Locale
default_context_language: Literal["fr", "en", "de", "nl"] = "fr"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/summary/summary/core/config.py` around lines 62 - 64, The current
free-form default_context_language str can accept invalid locale codes; update
the config to validate it against the set of supported locales by either: (1)
changing the field type to a constrained type (an Enum or Literal) representing
allowed locale codes, or (2) adding a validator function (e.g., a pydantic
`@validator` or dataclass post-init) that checks default_context_language against
the project's SUPPORTED_LOCALES constant and raises a clear ValueError on
mismatch; modify the declaration around default_context_language and add the
validator reference so invalid values are rejected at config load time.

Comment on lines 19 to 30
for lang in languages:
if not lang:
continue
if lang in _LOCALES:
return _LOCALES[lang].STRINGS

# Provide fallback for longer formats of ISO 639-1 (e.g. "en-au" -> "en")
base_lang = lang.split("-")[0]
if base_lang in _LOCALES:
return _LOCALES[base_lang].STRINGS

return _LOCALES[get_settings().default_context_language].STRINGS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Locale resolver can fail hard on config drift and misses common language formats.

Line 30 can throw a KeyError when default_context_language is not in _LOCALES, and inputs like en_US / uppercase variants are not normalized before lookup.

Proposed hardening patch
 def get_locale(*languages: Optional[str]) -> LocaleStrings:
@@
-    for lang in languages:
+    for lang in languages:
         if not lang:
             continue
-        if lang in _LOCALES:
-            return _LOCALES[lang].STRINGS
+        normalized = lang.strip().lower().replace("_", "-")
+        if normalized in _LOCALES:
+            return _LOCALES[normalized].STRINGS
         
         # Provide fallback for longer formats of ISO 639-1 (e.g. "en-au" -> "en")
-        base_lang = lang.split("-")[0]
+        base_lang = normalized.split("-")[0]
         if base_lang in _LOCALES:
             return _LOCALES[base_lang].STRINGS
-  
-    return _LOCALES[get_settings().default_context_language].STRINGS
+
+    default_lang = get_settings().default_context_language.strip().lower()
+    return _LOCALES.get(default_lang, fr).STRINGS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/summary/summary/core/locales/__init__.py` around lines 19 - 30, Normalize
incoming language tags in the resolver by lowercasing and converting underscores
to dashes (e.g., lang = lang.lower().replace("_","-")) before membership checks,
use the same normalization when computing base_lang, and when returning the
default use a safe lookup for get_settings().default_context_language (e.g.,
check if it exists in _LOCALES and if not fall back to a stable default such as
"en" or the first key from _LOCALES) so that KeyError is avoided; update the
loop and the final return to reference _LOCALES and
get_settings().default_context_language with these presence checks and
normalization.

Comment on lines +91 to 93
header = self._locale.download_header_template.format(
download_link=download_link
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

cd /repo && find . -type f -name "*.py" | grep -E "(locale|transcript_formatter)" | head -20

Repository: suitenumerique/meet

Length of output: 118


🏁 Script executed:

cd /repo && find . -type d -name "locale*" -o -name "locales"

Repository: suitenumerique/meet

Length of output: 118


🏁 Script executed:

cd /repo && ls -la src/summary/summary/core/ 2>/dev/null || echo "Path not found, searching for transcript_formatter..."

Repository: suitenumerique/meet

Length of output: 189


🏁 Script executed:

cd /repo && find . -name "transcript_formatter.py" -type f

Repository: suitenumerique/meet

Length of output: 118


🏁 Script executed:

find . -name "transcript_formatter.py" -type f

Repository: suitenumerique/meet

Length of output: 113


🏁 Script executed:

find . -type d -name "locales"

Repository: suitenumerique/meet

Length of output: 124


🏁 Script executed:

find . -type f -path "*/locale*" -name "*.py" | head -20

Repository: suitenumerique/meet

Length of output: 319


🏁 Script executed:

wc -l ./src/summary/summary/core/transcript_formatter.py

Repository: suitenumerique/meet

Length of output: 117


🏁 Script executed:

sed -n '85,120p' ./src/summary/summary/core/transcript_formatter.py

Repository: suitenumerique/meet

Length of output: 983


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/en.py

Repository: suitenumerique/meet

Length of output: 1157


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/fr.py

Repository: suitenumerique/meet

Length of output: 1207


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/de.py

Repository: suitenumerique/meet

Length of output: 1254


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/nl.py

Repository: suitenumerique/meet

Length of output: 1208


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/__init__.py

Repository: suitenumerique/meet

Length of output: 1301


🏁 Script executed:

cat -n ./src/summary/summary/core/locales/strings.py

Repository: suitenumerique/meet

Length of output: 560


🏁 Script executed:

find . -path "*/test*" -name "*locale*" -o -path "*/test*" -name "*transcript*" | grep -i "\.py$"

Repository: suitenumerique/meet

Length of output: 45


🏁 Script executed:

grep -r "download_header_template\|document_title_template" --include="*.py" | grep -v ".pyc"

Repository: suitenumerique/meet

Length of output: 994


🏁 Script executed:

find . -type f -name "test*.py" -o -name "*test.py" | xargs grep -l "locale\|download_header\|document_title" 2>/dev/null | head -10

Repository: suitenumerique/meet

Length of output: 45


🏁 Script executed:

grep -A 10 -B 5 "def get_locale" ./src/summary/summary/core/locales/__init__.py

Repository: suitenumerique/meet

Length of output: 597


🏁 Script executed:

cat -n ./src/summary/summary/core/transcript_formatter.py

Repository: suitenumerique/meet

Length of output: 4660


Template placeholder names must be consistent across locales to avoid runtime KeyErrors.

The .format() calls use named placeholders (download_link, room, room_recording_date, room_recording_time). While the LocaleStrings frozen dataclass ensures all templates are defined, it doesn't validate placeholder names—a misspelled placeholder (e.g., {downlaod_link}) would pass validation but fail at runtime.

All current locale modules (en, fr, de, nl) correctly define matching placeholders. Consider adding a startup-time validation that calls .format() on each locale's templates with dummy values to catch placeholder mismatches before they reach production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/summary/summary/core/transcript_formatter.py` around lines 91 - 93, Add a
startup-time validation that ensures all locale template placeholder names match
the expected keys by iterating over each LocaleStrings instance and calling each
template's .format() with a dummy mapping for the named placeholders
(download_link, room, room_recording_date, room_recording_time) to surface
KeyErrors early; implement this check in the application initialization path
(e.g., where locales are loaded) and reference specific template attributes like
download_header_template used in transcript_formatter.py so any mismatched
placeholders (e.g., {downlaod_link}) fail at startup rather than runtime.

@cameledev cameledev force-pushed the localize-summary branch 7 times, most recently from 00c1bdd to da697a2 Compare February 25, 2026 16:50
Transcription and summarization results were always generated
using a French text structure (e.g. "Réunion du..."), regardless
of user preference or meeting language. Introduced basic localization
support to adapt generated string languages.
@sonarqubecloud
Copy link

@lebaudantoine lebaudantoine merged commit f5e0ddf into main Feb 25, 2026
21 checks passed
@lebaudantoine lebaudantoine deleted the localize-summary branch February 25, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants