Skip to content

Conversation

@skytin1004
Copy link
Collaborator

We now send translation rules as a system instruction and user content as the user message, avoiding prompt leakage (instructions leaking into outputs) and improving output cleanliness across providers.

Key changes

  • utils/llm/markdown_utils.py
    • Added SPLIT_DELIMITER and updated generate_prompt_template() to emit a single prompt string with a clear delimiter between rules (system) and content (user).
    • Preserved RTL/LTR guidance within the system instruction.
  • core/llm/providers/openai/markdown_translator.py
    • Switched to role-based messaging via Semantic Kernel ChatHistory.
    • Split prompt by SPLIT_DELIMITER, then add_system_message/add_user_message.
  • core/llm/providers/azure/markdown_translator.py
    • Same role-based system/user split and ChatHistory usage as OpenAI.
  • core/llm/markdown_translator.py
    • Wiring remains compatible with current prompt generation; strict split enforced (no fallback).

Why this change

  • Prevent instruction echoing in model outputs (prompt leakage).
  • Make role intent explicit (system: translation rules, user: source text).
  • Improve stability across providers while keeping minimal surface changes to call sites.

Purpose

Move translation rules from the user prompt into system instructions to prevent prompt leakage (instructions appearing in model outputs). This improves the cleanliness and consistency of translated content across providers.

Description

This PR updates the markdown translation flow so that:

  • System instructions (translation rules, RTL/LTR direction) are sent as a system message.
  • The source document chunk is sent as a user message.

To achieve this with minimal surface change:

  • Provider implementations split the prompt by this delimiter and add messages to the chat as system (rules) and user (content).

Net effect: rules no longer leak into the translated output, improving translation quality while preserving the existing CLI and higher-level behavior.

Related Issue

N/A (preventative enhancement for prompt leakage)

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

  • Yes
  • No

Notes:

  • Public CLI and external behavior remain compatible.
  • Internals of provider message construction changed, but tests pass and the user-facing API is unchanged.

Type of change

  • Feature
  • Bugfix
  • Code style update (e.g., formatting, local variables)
  • Refactoring (no functional or API changes)
  • Documentation content changes
  • Other... Please describe:

Checklist

Before submitting your pull request, please confirm the following:

  • I have thoroughly tested my changes: I confirm that I have run the code and manually tested all affected areas.
  • All existing tests pass: I have run all tests and confirmed that nothing is broken.
  • I have added new tests (if applicable): (Not required; behavior validated via existing test suite and manual runs.)
  • I have followed the Co-op Translators coding conventions: My code adheres to the style guide and coding conventions outlined in the repository.
  • I have documented my changes (if applicable): Inline comments and commit message explain the change and rationale.

We now send translation rules as a system instruction and user content as the user message, avoiding prompt leakage (instructions leaking into outputs) and improving output cleanliness across providers.

Key changes
- utils/llm/markdown_utils.py
  - Added SPLIT_DELIMITER and updated generate_prompt_template() to emit a single prompt string with a clear delimiter between rules (system) and content (user).
  - Preserved RTL/LTR guidance within the system instruction.
- core/llm/providers/openai/markdown_translator.py
  - Switched to role-based messaging via Semantic Kernel ChatHistory.
  - Split prompt by SPLIT_DELIMITER, then add_system_message/add_user_message.
- core/llm/providers/azure/markdown_translator.py
  - Same role-based system/user split and ChatHistory usage as OpenAI.
- core/llm/markdown_translator.py
  - Wiring remains compatible with current prompt generation; strict split enforced (no fallback).

Why this change
- Prevent instruction echoing in model outputs (prompt leakage).
- Make role intent explicit (system: translation rules, user: source text).
- Improve stability across providers while keeping minimal surface changes to call sites.
@skytin1004 skytin1004 marked this pull request as ready for review September 15, 2025 07:25
@github-actions github-actions bot added the core Related to any changes in core source files label Sep 15, 2025
@skytin1004 skytin1004 self-assigned this Sep 15, 2025
@skytin1004
Copy link
Collaborator Author

I have reviewed the changes and everything looks good.

@skytin1004 skytin1004 merged commit c1f90af into Azure:main Sep 15, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Related to any changes in core source files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant