✨(summary) add localization support for transcription context text#982
✨(summary) add localization support for transcription context text#982lebaudantoine merged 1 commit intomainfrom
Conversation
6c1b616 to
30cb0ca
Compare
lebaudantoine
left a comment
There was a problem hiding this comment.
LGTM, minor nit-picking comment as I'm a never-nester.
|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class LocaleStrings: |
There was a problem hiding this comment.
As discussed IRL, we'll refactor this approach as soon as the current system is not manageable anymore.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis pull request introduces localization support for transcription context text throughout the summary service. The changes establish a new locale system with a Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/summary/summary/core/celery_worker.py (1)
75-90: 🧹 Nitpick | 🔵 TrivialHardcoded French strings in
format_actionsremain un-localized.
"Assignée à","Échéance", and"Prochaines étapes"are still hardcoded in French. While this function is in the summarization path (not transcription), it produces user-visible text. Consider localizing these strings in a follow-up to complete the localization effort.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/summary/summary/core/celery_worker.py` around lines 75 - 90, The function format_actions currently emits hardcoded French strings; update format_actions to produce localized text by using the project's i18n mechanism (e.g., passing/using a translation function like _ or i18n.t) or by accepting a locale/translator parameter, then replace "Assignée à", "Échéance" and the header "Prochaines étapes" with translated strings (e.g., _('Assigned to'), _('Due date'), _('Next steps')) so the markdown lines and header are localized while preserving existing variables title, assignees, and due_date in the output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/summary/summary/core/config.py`:
- Around line 62-64: The current free-form default_context_language str can
accept invalid locale codes; update the config to validate it against the set of
supported locales by either: (1) changing the field type to a constrained type
(an Enum or Literal) representing allowed locale codes, or (2) adding a
validator function (e.g., a pydantic `@validator` or dataclass post-init) that
checks default_context_language against the project's SUPPORTED_LOCALES constant
and raises a clear ValueError on mismatch; modify the declaration around
default_context_language and add the validator reference so invalid values are
rejected at config load time.
In `@src/summary/summary/core/locales/__init__.py`:
- Around line 19-30: Normalize incoming language tags in the resolver by
lowercasing and converting underscores to dashes (e.g., lang =
lang.lower().replace("_","-")) before membership checks, use the same
normalization when computing base_lang, and when returning the default use a
safe lookup for get_settings().default_context_language (e.g., check if it
exists in _LOCALES and if not fall back to a stable default such as "en" or the
first key from _LOCALES) so that KeyError is avoided; update the loop and the
final return to reference _LOCALES and get_settings().default_context_language
with these presence checks and normalization.
In `@src/summary/summary/core/transcript_formatter.py`:
- Around line 91-93: Add a startup-time validation that ensures all locale
template placeholder names match the expected keys by iterating over each
LocaleStrings instance and calling each template's .format() with a dummy
mapping for the named placeholders (download_link, room, room_recording_date,
room_recording_time) to surface KeyErrors early; implement this check in the
application initialization path (e.g., where locales are loaded) and reference
specific template attributes like download_header_template used in
transcript_formatter.py so any mismatched placeholders (e.g., {downlaod_link})
fail at startup rather than runtime.
---
Outside diff comments:
In `@src/summary/summary/core/celery_worker.py`:
- Around line 75-90: The function format_actions currently emits hardcoded
French strings; update format_actions to produce localized text by using the
project's i18n mechanism (e.g., passing/using a translation function like _ or
i18n.t) or by accepting a locale/translator parameter, then replace "Assignée
à", "Échéance" and the header "Prochaines étapes" with translated strings (e.g.,
_('Assigned to'), _('Due date'), _('Next steps')) so the markdown lines and
header are localized while preserving existing variables title, assignees, and
due_date in the output.
ℹ️ Review info
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
CHANGELOG.mdgitlint/gitlint_emoji.pysrc/backend/core/recording/event/notification.pysrc/summary/summary/api/route/tasks.pysrc/summary/summary/core/celery_worker.pysrc/summary/summary/core/config.pysrc/summary/summary/core/locales/__init__.pysrc/summary/summary/core/locales/de.pysrc/summary/summary/core/locales/en.pysrc/summary/summary/core/locales/fr.pysrc/summary/summary/core/locales/nl.pysrc/summary/summary/core/locales/strings.pysrc/summary/summary/core/transcript_formatter.py
| # Locale | ||
| default_context_language: str = "fr" | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Validate default_context_language against supported locale codes.
Using a free-form str allows invalid values to pass config loading and fail later at runtime.
Proposed validation change
-from typing import Annotated, List, Optional, Set
+from typing import Annotated, List, Literal, Optional, Set
@@
- default_context_language: str = "fr"
+ default_context_language: Literal["fr", "en", "de", "nl"] = "fr"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Locale | |
| default_context_language: str = "fr" | |
| from typing import Annotated, List, Literal, Optional, Set | |
| # Locale | |
| default_context_language: Literal["fr", "en", "de", "nl"] = "fr" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/summary/summary/core/config.py` around lines 62 - 64, The current
free-form default_context_language str can accept invalid locale codes; update
the config to validate it against the set of supported locales by either: (1)
changing the field type to a constrained type (an Enum or Literal) representing
allowed locale codes, or (2) adding a validator function (e.g., a pydantic
`@validator` or dataclass post-init) that checks default_context_language against
the project's SUPPORTED_LOCALES constant and raises a clear ValueError on
mismatch; modify the declaration around default_context_language and add the
validator reference so invalid values are rejected at config load time.
| for lang in languages: | ||
| if not lang: | ||
| continue | ||
| if lang in _LOCALES: | ||
| return _LOCALES[lang].STRINGS | ||
|
|
||
| # Provide fallback for longer formats of ISO 639-1 (e.g. "en-au" -> "en") | ||
| base_lang = lang.split("-")[0] | ||
| if base_lang in _LOCALES: | ||
| return _LOCALES[base_lang].STRINGS | ||
|
|
||
| return _LOCALES[get_settings().default_context_language].STRINGS |
There was a problem hiding this comment.
Locale resolver can fail hard on config drift and misses common language formats.
Line 30 can throw a KeyError when default_context_language is not in _LOCALES, and inputs like en_US / uppercase variants are not normalized before lookup.
Proposed hardening patch
def get_locale(*languages: Optional[str]) -> LocaleStrings:
@@
- for lang in languages:
+ for lang in languages:
if not lang:
continue
- if lang in _LOCALES:
- return _LOCALES[lang].STRINGS
+ normalized = lang.strip().lower().replace("_", "-")
+ if normalized in _LOCALES:
+ return _LOCALES[normalized].STRINGS
# Provide fallback for longer formats of ISO 639-1 (e.g. "en-au" -> "en")
- base_lang = lang.split("-")[0]
+ base_lang = normalized.split("-")[0]
if base_lang in _LOCALES:
return _LOCALES[base_lang].STRINGS
-
- return _LOCALES[get_settings().default_context_language].STRINGS
+
+ default_lang = get_settings().default_context_language.strip().lower()
+ return _LOCALES.get(default_lang, fr).STRINGS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/summary/summary/core/locales/__init__.py` around lines 19 - 30, Normalize
incoming language tags in the resolver by lowercasing and converting underscores
to dashes (e.g., lang = lang.lower().replace("_","-")) before membership checks,
use the same normalization when computing base_lang, and when returning the
default use a safe lookup for get_settings().default_context_language (e.g.,
check if it exists in _LOCALES and if not fall back to a stable default such as
"en" or the first key from _LOCALES) so that KeyError is avoided; update the
loop and the final return to reference _LOCALES and
get_settings().default_context_language with these presence checks and
normalization.
| header = self._locale.download_header_template.format( | ||
| download_link=download_link | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
cd /repo && find . -type f -name "*.py" | grep -E "(locale|transcript_formatter)" | head -20Repository: suitenumerique/meet
Length of output: 118
🏁 Script executed:
cd /repo && find . -type d -name "locale*" -o -name "locales"Repository: suitenumerique/meet
Length of output: 118
🏁 Script executed:
cd /repo && ls -la src/summary/summary/core/ 2>/dev/null || echo "Path not found, searching for transcript_formatter..."Repository: suitenumerique/meet
Length of output: 189
🏁 Script executed:
cd /repo && find . -name "transcript_formatter.py" -type fRepository: suitenumerique/meet
Length of output: 118
🏁 Script executed:
find . -name "transcript_formatter.py" -type fRepository: suitenumerique/meet
Length of output: 113
🏁 Script executed:
find . -type d -name "locales"Repository: suitenumerique/meet
Length of output: 124
🏁 Script executed:
find . -type f -path "*/locale*" -name "*.py" | head -20Repository: suitenumerique/meet
Length of output: 319
🏁 Script executed:
wc -l ./src/summary/summary/core/transcript_formatter.pyRepository: suitenumerique/meet
Length of output: 117
🏁 Script executed:
sed -n '85,120p' ./src/summary/summary/core/transcript_formatter.pyRepository: suitenumerique/meet
Length of output: 983
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/en.pyRepository: suitenumerique/meet
Length of output: 1157
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/fr.pyRepository: suitenumerique/meet
Length of output: 1207
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/de.pyRepository: suitenumerique/meet
Length of output: 1254
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/nl.pyRepository: suitenumerique/meet
Length of output: 1208
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/__init__.pyRepository: suitenumerique/meet
Length of output: 1301
🏁 Script executed:
cat -n ./src/summary/summary/core/locales/strings.pyRepository: suitenumerique/meet
Length of output: 560
🏁 Script executed:
find . -path "*/test*" -name "*locale*" -o -path "*/test*" -name "*transcript*" | grep -i "\.py$"Repository: suitenumerique/meet
Length of output: 45
🏁 Script executed:
grep -r "download_header_template\|document_title_template" --include="*.py" | grep -v ".pyc"Repository: suitenumerique/meet
Length of output: 994
🏁 Script executed:
find . -type f -name "test*.py" -o -name "*test.py" | xargs grep -l "locale\|download_header\|document_title" 2>/dev/null | head -10Repository: suitenumerique/meet
Length of output: 45
🏁 Script executed:
grep -A 10 -B 5 "def get_locale" ./src/summary/summary/core/locales/__init__.pyRepository: suitenumerique/meet
Length of output: 597
🏁 Script executed:
cat -n ./src/summary/summary/core/transcript_formatter.pyRepository: suitenumerique/meet
Length of output: 4660
Template placeholder names must be consistent across locales to avoid runtime KeyErrors.
The .format() calls use named placeholders (download_link, room, room_recording_date, room_recording_time). While the LocaleStrings frozen dataclass ensures all templates are defined, it doesn't validate placeholder names—a misspelled placeholder (e.g., {downlaod_link}) would pass validation but fail at runtime.
All current locale modules (en, fr, de, nl) correctly define matching placeholders. Consider adding a startup-time validation that calls .format() on each locale's templates with dummy values to catch placeholder mismatches before they reach production.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/summary/summary/core/transcript_formatter.py` around lines 91 - 93, Add a
startup-time validation that ensures all locale template placeholder names match
the expected keys by iterating over each LocaleStrings instance and calling each
template's .format() with a dummy mapping for the named placeholders
(download_link, room, room_recording_date, room_recording_time) to surface
KeyErrors early; implement this check in the application initialization path
(e.g., where locales are loaded) and reference specific template attributes like
download_header_template used in transcript_formatter.py so any mismatched
placeholders (e.g., {downlaod_link}) fail at startup rather than runtime.
00c1bdd to
da697a2
Compare
Transcription and summarization results were always generated using a French text structure (e.g. "Réunion du..."), regardless of user preference or meeting language. Introduced basic localization support to adapt generated string languages.
da697a2 to
45a14fb
Compare
|



Purpose
Currently, transcription/summarization results use French structure text ("Réunion du..."), independently of user preference and/or meeting language. This PR aims to provide localization for this microservice.
Proposal
context_languageto summary endpointcontext_languageparameter >languageparameter (language of meeting specified by user) >default_context_languageTODO: