Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions mongodb-qe-tutorial/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# MongoDB Queryable Encryption Tutorial (Python)
**Automatic Client-Side Field Level Encryption with Azure Key Vault – Including CMK Rotation in Atlas**

## Overview

This repository demonstrates how to set up [MongoDB Queryable Encryption (QE)](https://www.mongodb.com/docs/manual/core/queryable-encryption/#std-label-qe-manual-feature-qe) using Python and Azure Key Vault, including secure Data Encryption Key (DEK) management and rewrapping after Customer Master Key (CMK) rotation in MongoDB Atlas.

Queryable Encryption allows you to **encrypt sensitive data client side**, perform expressive queries on encrypted fields, and manage your encryption keys securely with cloud KMS providers such as Azure Key Vault.

## Features

- **Create encrypted MongoDB collections** with [automatic encryption](https://www.mongodb.com/docs/manual/core/queryable-encryption/install-library/#std-label-qe-csfle-install-library)
- **Encrypt and decrypt fields transparently** in application code
- Use [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/overview) for secure key management (CMK)
- **Rewrap DEKs** (change key under which your encrypted keys are wrapped) after CMK rotation
- Full Python demo including helper functions, insertion, and querying

## Prerequisites

### Software

- **Python 3**
- [MongoDB Atlas Cluster](https://www.mongodb.com/cloud/atlas/register)
- [PyMongo Driver](https://www.mongodb.com/docs/languages/python/pymongo-driver/current/) (`>=4.4`)
- [pymongocrypt](https://pypi.org/project/pymongocrypt/) (`>=1.6`)
- Automatic Encryption Shared Library ([crypt_shared](https://www.mongodb.com/docs/manual/core/queryable-encryption/install-library/#automatic-encryption-shared-library))

### Cloud Providers (Azure)

- [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/overview) with your **CMK**
- [Register your application in Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app)
- Assign the application the **Key Vault Administrator** role, or permissions to wrap/unwrap keys

### Other Supported KMS Providers
- AWS, GCP, KMIP, or local (see `.env` placeholders)

---

## Getting Started

### 1. Clone This Repository

```bash
git clone https://github.com/<your-org>/<your-repo>.git
cd /<your-repo>/mongodb-qe-tutorial
```

### 2. Populate Environment Variables

Edit the **.env** file and replace all placeholder values (`<Your ...>`) with your credentials.

```bash
# Azure Example:
export AZURE_TENANT_ID="<Your Azure tenant ID>"
export AZURE_CLIENT_ID="<Your Azure client ID>"
export AZURE_CLIENT_SECRET="<Your Azure client secret>"
export AZURE_KEY_NAME="<Your Azure Key Name>"
export AZURE_KEY_VERSION="<Your Azure Key Version>"
export AZURE_KEY_VAULT_ENDPOINT="<Your Azure Key Vault Endpoint>"
export KEY_VAULT_MONGODB_URI="<Your Atlas Connection String>"
export MONGODB_URI="<Your Atlas Connection String>"
export SHARED_LIB_PATH="/full/path/to/mongo_crypt_v1.so"
...
```

See `.env` in repo for a full example including other KMS providers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a best practice, we don't add .env files to repo so I would remove this.


### 3. Install Python Dependencies

```bash
python -m pip install -r requirements.txt
```

### 4. Download Automatic Encryption Shared Library

Follow [these instructions](https://www.mongodb.com/docs/manual/core/queryable-encryption/install-library/#automatic-encryption-shared-library) to download the correct `mongo_crypt_v1.so` (or `.dylib` for Mac) for your system, and record its full path in your `.env`.

---

## Usage

### Step 1: Create Key Vault and Encrypted Collection

This script creates the **key vault collection** (to hold your DEKs) and sets up an **encrypted collection** for your data.

```bash
python create_encrypted_collections.py
```

### Step 2: Insert Encrypted Document

This script uses automatic encryption to insert a document with encrypted fields.

```bash
python insert_encrypted_doc.py
```

**Sample output:**
```plaintext
Successfully inserted another patient with ssn: 123-45-6789
{...decrypted document...}
```

### Step 3: Rotate Your CMK in Azure Key Vault

- Use the Azure Portal to [rotate your root key](https://learn.microsoft.com/en-us/azure/key-vault/keys/change-key-version).
- Record the new version in your `.env` if needed.

### Step 4: Rewrap Data Encryption Keys (DEKs)

After CMK rotation, rewrap all the DEKs in MongoDB – they’ll be wrapped under the new version of your master key and remain usable.

Edit `rewrap_deks.py` with your new CMK details if needed:

```bash
python rewrap_deks.py
```

---

## Troubleshooting

### Common Issues

- **"Not all keys were satisfied":**
If demo code is run multiple times without dropping collections, documents may be encrypted under keys that are lost or missing. Drop your vault and collection, restart, and generate keys once.

- **Shared library load errors:**
Example:
```
Error while opening candidate for crypt_shared dynamic library [/path/mongo_crypt_v1.so]
```
- Ensure your library matches your OS and CPU arch (`file mongo_crypt_v1.so`, `uname -a`)
- Path must be correct and the file must be present

---

## File Reference

- `requirements.txt` – Python package requirements
- `.env` – Environment variables for all supported KMS providers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove .env here as well as this file won't be in the repository.

- `queryable_encryption_helpers.py` – Helper functions for KMS credentials and encryption setup
- `create_encrypted_collections.py` – Create vault, DEKs, and encrypted collection
- `insert_encrypted_doc.py` – Insert and query encrypted documents
- `rewrap_deks.py` – Rewrap DEKs after master key rotation

---

## References & Documentation

- [Queryable Encryption Tutorials](https://www.mongodb.com/docs/manual/core/queryable-encryption/tutorials/#queryable-encryption-tutorials)
- [Queryable Encryption Quick Start](https://www.mongodb.com/docs/manual/core/queryable-encryption/quick-start/#queryable-encryption-quick-start)
- [MongoDB Atlas](https://www.mongodb.com/docs/atlas/)
- [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/overview)
119 changes: 119 additions & 0 deletions mongodb-qe-tutorial/create_encrypted_collections.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
from pymongo import MongoClient #import MongoClient class to connect to MongoDB servers/clusters.
import queryable_encryption_helpers as helpers # our helper functions
import os #For reading environment variables.
from dotenv import load_dotenv #Loads variables from a .env file into your environment

load_dotenv() #Loads the values in a .env file

# start-setup-application-variables
kms_provider_name = "azure"

# URIs for Atlas clusters
key_vault_uri = os.environ['KEY_VAULT_MONGODB_URI'] # Key Vault Cluster!
data_uri = os.environ['MONGODB_URI'] # Application Data Cluster!

key_vault_database_name = "queryable_encryption"
key_vault_collection_name = "queryable_keyVault"
key_vault_namespace = f"{key_vault_database_name}.{key_vault_collection_name}"
encrypted_database_name = "mongoMedicalRecords"
encrypted_collection_name = "mongoDBpatients"



kms_provider_credentials = helpers.get_kms_provider_credentials(kms_provider_name)
customer_master_key_credentials = helpers.get_customer_master_key_credentials(kms_provider_name)

#Drop old collections for a fresh setup
data_client = MongoClient(data_uri)
try:
data_client[encrypted_database_name][encrypted_collection_name].drop()
except Exception:
pass

key_vault_client = MongoClient(key_vault_uri)
try:
key_vault_client[key_vault_database_name][key_vault_collection_name].drop()
except Exception:
pass

# ---- Ensure the key vault collection has a unique index on keyAltNames ----
key_vault_client[key_vault_database_name][key_vault_collection_name].create_index(
"keyAltNames",
unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}} #Creates a unique index only on documents that actually have keyAltNames (not all do).
)
print("Created unique index on keyAltNames for key vault collection.")

# Set Up the ClientEncryption Object
#Initializes an object that lets you securely create and use data encryption keys (DEKs).
#Uses the key vault, KMS, credentials, and collection namespace.
client_encryption = helpers.get_client_encryption(
key_vault_client,
kms_provider_name,
kms_provider_credentials,
key_vault_namespace
)



# ---- Create DEKs with keyAltNames (one per field) ----
ssn_altname = f"{encrypted_database_name}.ssn"
billing_altname = f"{encrypted_database_name}.billing"

# create a DEK (only once), record its keyId:
# key_id is a BSON Binary(UUID_subtype_4) and Use the keyIds for both fields:
ssn_key_id = client_encryption.create_data_key(
kms_provider_name,
master_key=customer_master_key_credentials,
key_alt_names=[ssn_altname]
)
billing_key_id = client_encryption.create_data_key(
kms_provider_name,
master_key=customer_master_key_credentials,
key_alt_names=[billing_altname]
)
print(f"Created SSN Key ID: {ssn_key_id}")
print(f"Created Billing Key ID: {billing_key_id}")

# Save the DEKs for use in insert_doc.py (write to file, print, etc.)
with open("ssn_key_id.bin", "wb") as f:
f.write(ssn_key_id)
with open("billing_key_id.bin", "wb") as f:
f.write(billing_key_id)


# start-encrypted-fields-map

encrypted_fields_map = {
"fields": [
{
"path": "patientRecord.ssn",
"bsonType": "string",
"queries": [{"queryType": "equality"}],
"keyId": ssn_key_id
},
{
"path": "patientRecord.billing",
"bsonType": "object",
"keyId": billing_key_id
}
]
}


# creates a new collection in your MongoDB data cluster.

try:
client_encryption.create_encrypted_collection(
data_client[encrypted_database_name],
encrypted_collection_name,
encrypted_fields_map,
kms_provider_name,
customer_master_key_credentials,
)
print("Encrypted collection created successfully.")
except Exception as e:
print("Unable to create encrypted collection due to the following error:", e)

data_client.close()
key_vault_client.close()
96 changes: 96 additions & 0 deletions mongodb-qe-tutorial/insert_encrypted_doc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
from pymongo import MongoClient
import queryable_encryption_helpers as helpers
import os
from dotenv import load_dotenv
from random import randint
load_dotenv()

kms_provider_name = "azure"
uri = os.environ['MONGODB_URI']

key_vault_database_name = "queryable_encryption"
key_vault_collection_name = "queryable_keyVault"
key_vault_namespace = f"{key_vault_database_name}.{key_vault_collection_name}"
encrypted_database_name = "mongoMedicalRecords"
encrypted_collection_name = "mongoDBpatients"

kms_provider_credentials = helpers.get_kms_provider_credentials(kms_provider_name)

# --- Connect to key vault and retrieve DEKs by keyAltName ---
key_vault_client = MongoClient(os.environ['KEY_VAULT_MONGODB_URI'])
key_vault_coll = key_vault_client[key_vault_database_name][key_vault_collection_name]

ssn_key_id = key_vault_coll.find_one({"keyAltNames": "mongoMedicalRecords.ssn"})["_id"]
billing_key_id = key_vault_coll.find_one({"keyAltNames": "mongoMedicalRecords.billing"})["_id"]

encrypted_fields_map = {
f"{encrypted_database_name}.{encrypted_collection_name}": {
"fields": [
{
"path": "patientRecord.ssn",
"bsonType": "string",
"queries": [{"queryType": "equality"}],
"keyId": ssn_key_id
},
{
"path": "patientRecord.billing",
"bsonType": "object",
"keyId": billing_key_id
}
]
}
}

# specify the key vault client (recommended for Atlas multi-region)
key_vault_client = MongoClient(os.environ['KEY_VAULT_MONGODB_URI'])

auto_encryption_options = helpers.get_auto_encryption_options(
kms_provider_name,
key_vault_namespace,
kms_provider_credentials,
encrypted_fields_map=encrypted_fields_map,
key_vault_client=key_vault_client
)

# Set up the encrypted client
encrypted_client = MongoClient(uri, auto_encryption_opts=auto_encryption_options)

# Get the encrypted collection reference
encrypted_collection = encrypted_client[encrypted_database_name][encrypted_collection_name]

ssn = f"{randint(100, 999)}-{randint(10,99)}-{randint(1000,9999)}"
new_patient = {
"patientName": f"Alice Charles {randint(1, 1000)}", # Randomize
"patientId": randint(10000000, 99999999), # random patientId
"patientRecord": {
"ssn": ssn, # random SSN
"billing": {
"type": "Amex",
"number": "340000000000009"
},
"billAmount": randint(1000, 5000), # Optional: random bill amount
},
}

result = encrypted_collection.insert_one(new_patient)
if result.acknowledged:
print(f"Successfully inserted another patient with ssn: {ssn}")

# start-find-document
find_result = encrypted_collection.find_one({
"patientRecord.ssn": ssn
})

print(find_result)
# end-find-document

encrypted_client.close()
key_vault_client.close()

'''
print("Listing all DEKs and their keyAltNames in key vault:")
for doc in key_vault_coll.find():
print("DEK _id:", doc.get("_id"), "keyAltNames:", doc.get("keyAltNames"))
print("ssn_key_id:", ssn_key_id, type(ssn_key_id))
print("billing_key_id:", billing_key_id, type(billing_key_id))
'''
Loading