Skip to content

PennStateLefty/document-translation-ref-arch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Translation Reference Architecture

A reference implementation demonstrating scalable document translation on Azure using Durable Functions fan-out/fan-in orchestration, batch splitting, and Infrastructure-as-Code patterns. Designed for deployment into Azure tenants with all local authentication disabled and managed identity used exclusively for service-to-service access.

⚠️ Reference Implementation Notice: This project demonstrates the core scalability pattern — it is not intended for production use as-is. See Known Limitations for shortcomings and the Production Readiness Guide for what would need to change to make this production-viable.

Architecture Overview

Full detailed diagram with RBAC roles: docs/architecture.md

Architecture Overview

All service-to-service communication uses managed identity with RBAC — no API keys, connection strings, or SAS tokens.

Key Patterns Demonstrated

  • Fan-out/Fan-in Orchestration: Durable Functions orchestrator fans out translation work to per-batch activity functions and fans in results
  • Automatic Batch Splitting: Transparently splits large uploads (>1,000 files or >250 MB) into parallel batches respecting Azure Document Translation service limits
  • Infrastructure-as-Code: All Azure resources defined in Bicep modules under infra/, deployable via Azure Developer CLI
  • Polling-based Status: Frontend polls backend at 5-second intervals for translation progress updates
  • Zero Local Auth: All services disable local authentication (shared keys, API keys, instrumentation keys). Every service-to-service connection uses system-assigned managed identity with least-privilege RBAC
  • Flex Consumption Plan: Function App runs on the Flex Consumption SKU (Linux) with blob-based deployment, managed identity deployment auth, and auto-scaling up to 100 instances

Documentation

Document Description
Core Scalability Pattern Deep dive into the Durable Functions fan-out/fan-in orchestration paired with the batch Document Translation API — the central pattern this reference architecture demonstrates
Known Limitations Shortcomings of this reference implementation: Static Web Apps constraints, in-memory upload/download handling, polling-based status, and lack of authentication
Production Readiness Guide What would need to change to make this production-viable, organized by the five Azure Well-Architected Framework pillars (Reliability, Security, Cost Optimization, Operational Excellence, Performance Efficiency)
Architecture Diagram Detailed Mermaid diagram with RBAC roles and identity assignments

Quick Start

Prerequisites

Deploy

# Clone the repository
git clone <repository-url>
cd document-translation-ref-arch

# Provision and deploy everything
azd up

Tear Down

azd down

Project Structure

├── azure.yaml              # azd manifest
├── infra/                   # Bicep IaC modules
│   ├── main.bicep           # Orchestrator
│   └── modules/             # Individual resource modules
├── src/
│   ├── api/                 # C# Azure Functions backend
│   │   ├── Functions/       # HTTP triggers + Durable orchestrator
│   │   ├── Models/          # Data model classes
│   │   └── Services/        # Blob storage + translation services
│   └── web/                 # React frontend
│       ├── src/components/  # UI components
│       ├── src/hooks/       # Custom React hooks
│       └── src/services/    # API client
└── .github/
    └── workflows/           # CI/CD pipelines

API Endpoints

Method Route Description
POST /api/translate Upload files and start translation
GET /api/translate/{sessionId} Get translation status
GET /api/translate/{sessionId}/download Download translated files
GET /api/languages List supported languages

Development

Backend (Azure Functions)

cd src/api
dotnet restore
dotnet build
func start

Frontend (React)

cd src/web
npm install
npm run dev

Run Tests

# Backend tests
cd src/api
dotnet test DocumentTranslation.Api.Tests/

# Frontend tests
cd src/web
npm test

Security Model

This reference architecture is designed for deployment into Azure tenants where local authentication must be disabled on all services:

Service Local Auth Identity RBAC Roles
Storage Account allowSharedKeyAccess: false Function App system MI Blob Data Owner, Blob Data Contributor, Queue Data Contributor, Table Data Contributor, Storage Account Contributor
Translator system MI Blob Data Contributor (read source / write translated)
Cognitive Services (Translator) disableLocalAuth: true Function App system MI Cognitive Services User
Application Insights DisableLocalAuth: true Function App system MI Monitoring Metrics Publisher
Log Analytics Workspace disableLocalAuth: true
Function App SCM & FTP basic auth disabled System-assigned MI
GitHub Actions (CI/CD) OIDC federated credentials User-assigned MI Contributor, User Access Administrator
  • No API keys or connection strings are used in application code
  • BlobServiceClient authenticates via DefaultAzureCredential
  • DocumentTranslationClient authenticates via DefaultAzureCredential
  • App Insights telemetry uses Authorization=AAD authentication
  • Translator accesses blob storage via its own system MI (no SAS tokens)

Architecture Decisions

See the specs documentation for detailed architectural decisions, data model, API contracts, and research notes:

License

This project is a reference implementation for educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors