A collection of Jupyter notebooks demonstrating the capabilities of Azure Content Understanding (GA version - API 2025-11-01) . These demos showcase how to process and analyze documents, images, videos, and audio files using generative AI to extract structured data from unstructured content.
Azure Content Understanding is a Generally Available (GA) Azure AI service that uses generative AI to transform unstructured content into structured, searchable data. This repository contains practical, ready-to-run notebooks that demonstrate various capabilities of the service across different content types.
Azure Content Understanding in Foundry Tools is an AI service available as part of the Microsoft Foundry Resource in Azure. It processes and ingests content of many types:
- Documents (PDF, DOCX, XLSX, images)
- Videos (MP4, MOV, AVI)
- Audio (WAV, MP3, M4A)
- Images (JPG, PNG, TIFF)
The service offers a streamlined process to reason over large amounts of unstructured data, accelerating time-to-value by generating structured output that can be integrated into automation and analytical workflows.
- Document Processing: OCR, layout analysis, table recognition, and structural element detection
- Video Analysis: Frame extraction, shot detection, speech-to-text transcription
- Audio Processing: Speech-to-text transcription with high accuracy
- Image Analysis: Visual content understanding and data extraction
- Field Extraction: Define custom schemas to extract specific fields from any content type
- Classification: Categorize content into up to 200 categories with integrated classification
- Content Summarization: Generate summaries and insights from extracted content
- Face Description: Generate textual descriptions of faces in video and image content (with proper authorization)
- Microsoft Entra ID authentication
- Managed identities support
- Customer-managed keys
- Virtual networks and private endpoints
- Transparent pricing model
| Notebook | Description |
|---|---|
| Managing analyzers | Learn how to create, configure, and manage analyzers for different content types. |
| Field extraction | Demonstrates extracting predefined fields from documents using built-in schemas. |
| Custom field extraction | Shows how to define and extract custom fields tailored to your business needs. |
| Classifier | Explains how to classify content into categories using integrated classification APIs. |
| Document content extraction | Focuses on OCR, layout analysis, and table recognition for multi-page documents. |
| Audio extraction | Covers speech-to-text transcription and audio content analysis with speaker identification. |
| Video content extraction | Demonstrates video frame extraction, scene detection, and speech transcription from video. |
Before running these notebooks, ensure you have:
-
Azure Subscription
- An active Azure subscription (Create one for free)
-
Azure AI Foundry Resource
- Create an Azure AI Foundry resource
- Note your endpoint URL and API key
-
Model Deployments (Required for prebuilt analyzers)
- Deploy GPT-4.1-mini, GPT-4.1 model
- Deploy text-embedding-3-large model
- See deployment documentation
-
Role Assignment
- Grant yourself the Cognitive Services User role on the resource
- This is required even if you're the resource owner
-
Python Environment
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab
This repository contains demonstration notebooks covering:
- Document field extraction with custom schemas
- Layout analysis and table extraction
- Multi-page document processing
- Classification and routing
- Image content extraction
- Visual question answering
- Figure detection and analysis
- Object and text recognition
- Video frame extraction
- Scene detection and segmentation
- Speech transcription from video
- Visual content summarization
- Audio transcription
- Speaker identification
- Audio content analysis
- Multi-modal content analysis
- RAG (Retrieval-Augmented Generation) integration
- Batch processing workflows
- Custom analyzer creation
Azure Content Understanding is ideal for:
- Financial Services: Tax document processing, mortgage application analysis
- Healthcare: Medical record extraction and analysis
- Legal: Contract review and clause extraction
- Manufacturing: Quality control and defect detection
- Retail: Inventory management and shelf analysis
- Media: Content cataloging and metadata extraction
- Analytics & Reporting: Enhanced business intelligence from unstructured data
These notebooks use the GA API version: 2025-11-01
This is the Generally Available version with production-ready features, enterprise security, and enhanced capabilities compared to previous preview versions.
If you're migrating from preview API versions (2024-12-01-preview or 2025-05-01-preview), refer to the migration guide.
- Managed capacity for preview models retired (BYO model deployments required)
- Dedicated classifier APIs deprecated (now integrated in analyzer API)
- Video segmentation unified with classification capabilities
- Azure AI Foundry Documentation
- Azure Content Understanding Overview
- What's New in Content Understanding
- Content Understanding Studio
- Pricing Information
- Language and Region Support
Serge Retkowsky
| Platform | Link |
|---|---|
| GitHub | https://github.com/retkowsky |
| https://www.linkedin.com/in/serger/ | |
| YouTube | https://www.youtube.com/@serge1840/videos |
| Medium | https://medium.com/@sergems18 |
| Role | AI & APPS Global Black Belt @ Microsoft France |
Last Updated: 02-December-2025
For questions, issues, or feedback, please open an issue in this repository or contact through the channels above.