Skip to content

retkowsky/azure-content-understanding-ga

Repository files navigation

Azure Content Understanding

License: MIT Python Azure

A collection of Jupyter notebooks demonstrating the capabilities of Azure Content Understanding (GA version - API 2025-11-01) . These demos showcase how to process and analyze documents, images, videos, and audio files using generative AI to extract structured data from unstructured content.

🎯 Overview

Azure Content Understanding is a Generally Available (GA) Azure AI service that uses generative AI to transform unstructured content into structured, searchable data. This repository contains practical, ready-to-run notebooks that demonstrate various capabilities of the service across different content types.

🔍 What is Azure Content Understanding?

Azure Content Understanding in Foundry Tools is an AI service available as part of the Microsoft Foundry Resource in Azure. It processes and ingests content of many types:

  • Documents (PDF, DOCX, XLSX, images)
  • Videos (MP4, MOV, AVI)
  • Audio (WAV, MP3, M4A)
  • Images (JPG, PNG, TIFF)

The service offers a streamlined process to reason over large amounts of unstructured data, accelerating time-to-value by generating structured output that can be integrated into automation and analytical workflows.

✨ Key Features

Content Extraction

  • Document Processing: OCR, layout analysis, table recognition, and structural element detection
  • Video Analysis: Frame extraction, shot detection, speech-to-text transcription
  • Audio Processing: Speech-to-text transcription with high accuracy
  • Image Analysis: Visual content understanding and data extraction

Generative Capabilities

  • Field Extraction: Define custom schemas to extract specific fields from any content type
  • Classification: Categorize content into up to 200 categories with integrated classification
  • Content Summarization: Generate summaries and insights from extracted content
  • Face Description: Generate textual descriptions of faces in video and image content (with proper authorization)

Enterprise Features (GA)

  • Microsoft Entra ID authentication
  • Managed identities support
  • Customer-managed keys
  • Virtual networks and private endpoints
  • Transparent pricing model

📚 Python Notebooks

Notebook Description
Managing analyzers Learn how to create, configure, and manage analyzers for different content types.
Field extraction Demonstrates extracting predefined fields from documents using built-in schemas.
Custom field extraction Shows how to define and extract custom fields tailored to your business needs.
Classifier Explains how to classify content into categories using integrated classification APIs.
Document content extraction Focuses on OCR, layout analysis, and table recognition for multi-page documents.
Audio extraction Covers speech-to-text transcription and audio content analysis with speaker identification.
Video content extraction Demonstrates video frame extraction, scene detection, and speech transcription from video.

📦 Prerequisites

Before running these notebooks, ensure you have:

  1. Azure Subscription

  2. Azure AI Foundry Resource

  3. Model Deployments (Required for prebuilt analyzers)

  4. Role Assignment

    • Grant yourself the Cognitive Services User role on the resource
    • This is required even if you're the resource owner
  5. Python Environment

    • Python 3.8 or higher
    • Jupyter Notebook or JupyterLab

📓 Notebooks Overview

This repository contains demonstration notebooks covering:

Document Analysis

  • Document field extraction with custom schemas
  • Layout analysis and table extraction
  • Multi-page document processing
  • Classification and routing

Image Processing

  • Image content extraction
  • Visual question answering
  • Figure detection and analysis
  • Object and text recognition

Video Analysis

  • Video frame extraction
  • Scene detection and segmentation
  • Speech transcription from video
  • Visual content summarization

Audio Processing

  • Audio transcription
  • Speaker identification
  • Audio content analysis

Advanced Scenarios

  • Multi-modal content analysis
  • RAG (Retrieval-Augmented Generation) integration
  • Batch processing workflows
  • Custom analyzer creation

💼 Use Cases

Azure Content Understanding is ideal for:

  • Financial Services: Tax document processing, mortgage application analysis
  • Healthcare: Medical record extraction and analysis
  • Legal: Contract review and clause extraction
  • Manufacturing: Quality control and defect detection
  • Retail: Inventory management and shelf analysis
  • Media: Content cataloging and metadata extraction
  • Analytics & Reporting: Enhanced business intelligence from unstructured data

📌 API Version

These notebooks use the GA API version: 2025-11-01

This is the Generally Available version with production-ready features, enterprise security, and enhanced capabilities compared to previous preview versions.

Migration from Preview

If you're migrating from preview API versions (2024-12-01-preview or 2025-05-01-preview), refer to the migration guide.

Breaking Changes from Preview

  • Managed capacity for preview models retired (BYO model deployments required)
  • Dedicated classifier APIs deprecated (now integrated in analyzer API)
  • Video segmentation unified with classification capabilities

📚 Resources

Documentation

Related Repositories

Responsible AI

👤 Author

Serge Retkowsky

Platform Link
GitHub https://github.com/retkowsky
LinkedIn https://www.linkedin.com/in/serger/
YouTube https://www.youtube.com/@serge1840/videos
Medium https://medium.com/@sergems18
Role AI & APPS Global Black Belt @ Microsoft France

Last Updated: 02-December-2025

For questions, issues, or feedback, please open an issue in this repository or contact through the channels above.