You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Azure OpenAI support for LangExtract recognizer (#1801)
* Add Azure OpenAI support for LangExtract recognizer
* ruff
* add redundnant tests to achieve 100% coverage
* Add error handling and tests for Azure OpenAI provider initialization
* Fix Microsoft Defender secret scanning false positives
Replace test API keys with obviously fake placeholders:
- test-api-key → PLACEHOLDER_NOT_A_REAL_KEY
- test-key-123 → PLACEHOLDER_NOT_A_REAL_KEY
- env-key → PLACEHOLDER_FROM_ENV
- key → PLACEHOLDER_KEY
These are unit test mock values, not real secrets. Using
placeholder patterns that won't trigger security scanners
while maintaining test validity. All 29 tests passing.
* remove bandit
* Add bandit tool to Microsoft Security DevOps workflow
* Remove bandit from Microsoft Security DevOps workflow tools
* Update AzureOpenAILangExtractRecognizer to use deployment name from environment variable
* Refactor Azure OpenAI integration: remove legacy provider, update recognizer, and adjust tests
* Improve error handling during Azure OpenAI provider registration by logging as error and raising exception
* Refactor Azure OpenAI provider initialization logging for improved readability
* Refactor test imports in Azure OpenAI recognizer tests for improved clarity and organization
* Refactor Azure OpenAI provider tests to remove unnecessary variable assignments for improved clarity
* Refactor langextract configuration: reorder supported entities and update entity mappings for consistency
* Refactor Azure OpenAI integration: enhance documentation, improve endpoint validation, and streamline provider registration
* Refactor Azure OpenAI LangExtract Recognizer: remove unused import and clean up code formatting
* Refactor Azure OpenAI provider tests: update imports to use the correct module and remove obsolete test for langextract availability
* Refactor Azure authentication handling: consolidate credential management into azure_auth_helper and update related recognizers and tests
* Refactor Azure OpenAI recognizers: enhance module imports for registration, streamline model ID handling, and improve test coverage for credential selection
* Refactor AHDS Surrogate operator: streamline error handling by mocking Azure credentials and client, and improve code readability
* Refactor AHDS Recognizer tests: replace multiple credential mocks with a single get_azure_credential mock for improved clarity and maintainability
# AZURE_OPENAI_API_KEY not set - uses DefaultAzureCredential in development mode
319
+
# (includes Azure CLI, VS Code, etc.)
320
+
321
+
azure_openai = AzureOpenAILangExtractRecognizer()
322
+
```
323
+
324
+
**Setup Managed Identity**:
325
+
326
+
1. Enable managed identity on your Azure resource (VM, App Service, Container Instance, etc.)
327
+
2. Grant the managed identity **"Cognitive Services OpenAI User"** role on your Azure OpenAI resource
328
+
3. No API keys needed - authentication is automatic
329
+
330
+
See [Azure Managed Identity documentation](https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview) for details.
331
+
332
+
### Configuration File (Optional)
333
+
334
+
The configuration file is **optional** for basic usage. You only need it to customize:
335
+
336
+
- Supported entity types
337
+
- Entity mappings (LangExtract → Presidio)
338
+
- Prompts and examples
339
+
- Detection parameters
340
+
341
+
For basic usage, just pass `model_id` as a parameter (see examples above).
-`model.model_id`: Azure OpenAI deployment name (e.g., "gpt-4o", "gpt-4", "gpt-35-turbo")
378
+
-`model.temperature`: Model temperature for generation (null = use model default)
379
+
-`prompt_file`: Path to custom prompt template file
380
+
-`examples_file`: Path to few-shot examples file
381
+
-`entity_mappings`: Map LangExtract entity classes to Presidio entity names
382
+
383
+
See the [full config file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/langextract_config_azureopenai.yaml) for details.
Copy file name to clipboardExpand all lines: presidio-analyzer/README.md
+41-3Lines changed: 41 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,15 +14,53 @@ Named Entity Recognition and other types of logic to detect PII in unstructured
14
14
15
15
### Language Model-based PII/PHI Detection
16
16
17
-
Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses [LangExtract](https://github.com/google/langextract) with Ollama for local model deployment.
17
+
Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses [LangExtract](https://github.com/google/langextract) with support for multiple providers:
18
+
19
+
-**Ollama** - Local model deployment for privacy-sensitive environments
20
+
-**Azure OpenAI** - Cloud-based deployment with enterprise features
18
21
19
22
```bash
20
23
pip install presidio-analyzer[langextract]
21
24
```
22
25
23
-
**Note:** The Ollama recognizer does not validate server connectivity or model availability during initialization. Connection errors or missing models will be reported when `analyze()` is first called. Ensure Ollama is running and the required model is installed before analysis.
26
+
#### Quick Usage
27
+
28
+
**Ollama** (local models):
29
+
30
+
```python
31
+
from presidio_analyzer.predefined_recognizers import OllamaLangExtractRecognizer
**Note:** LangExtract recognizers do not validate connectivity during initialization. Connection errors or missing models will be reported when `analyze()` is first called.
24
62
25
-
See the [Language Model-based PII/PHI Detection guide](https://microsoft.github.io/presidio/samples/python/langextract/) for setup and usage.
63
+
See the [Language Model-based PII/PHI Detection guide](https://microsoft.github.io/presidio/samples/python/langextract/) for complete setup and usage instructions.
0 commit comments