Skip to content
This repository was archived by the owner on Mar 11, 2026. It is now read-only.

Commit 864006b

Browse files
authored
feat(deploy): integrate Azure Key Vault secrets sync via CSI driver (#32)
1 parent 51ed7d6 commit 864006b

51 files changed

Lines changed: 733 additions & 832 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 16 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The infrastructure deploys an AKS cluster with GPU node pools running the NVIDIA
2626
| Private Endpoints | Secure access to Azure services (7 endpoints, 11+ DNS zones) |
2727
| AKS Cluster | Kubernetes with GPU Spot node pools and Workload Identity |
2828
| Key Vault | Secrets management with RBAC authorization |
29-
| Azure ML Workspace | Experiment tracking, model registry, compute management |
29+
| Azure ML Workspace | Experiment tracking, model registry |
3030
| Storage Account | Training data, checkpoints, and workflow artifacts |
3131
| Container Registry | Training and OSMO container images |
3232
| Azure Monitor | Log Analytics, Prometheus metrics, Managed Grafana |
@@ -88,25 +88,14 @@ OSMO orchestration on Azure enables production-scale robotics training across in
8888
### 1. Deploy Infrastructure
8989

9090
```bash
91-
# Set subscription for Terraform
92-
source deploy/000-prerequisites/az-sub-init.sh
93-
94-
# Register providers (new subscriptions only)
95-
./deploy/000-prerequisites/register-azure-providers.sh
96-
9791
cd deploy/001-iac
98-
99-
# Create terraform.tfvars with your values
100-
cat > terraform.tfvars << 'EOF'
101-
environment = "dev"
102-
resource_prefix = "robotst" # Your prefix (3-8 chars)
103-
location = "eastus2" # Azure region with GPU quota
104-
EOF
105-
106-
terraform init && terraform apply
92+
source ../000-prerequisites/az-sub-init.sh
93+
cp terraform.tfvars.example terraform.tfvars
94+
# Edit terraform.tfvars with your values
95+
terraform init && terraform apply -var-file=terraform.tfvars
10796
```
10897

109-
For optional VPN deployment and additional configuration, see [deploy/001-iac/README.md](deploy/001-iac/README.md).
98+
For VPN, automation, and additional configuration, see [deploy/001-iac/README.md](deploy/001-iac/README.md).
11099

111100
### 2. Configure Cluster
112101

@@ -172,7 +161,7 @@ az aks get-credentials --resource-group <rg> --name <aks>
172161
./scripts/submit-azureml-validation.sh --model-name my-policy --stream
173162
```
174163

175-
> **Tip**: Run `./scripts/submit-*-training.sh --help` for all available options.
164+
> **Tip**: Run any script with `--help` for all available options.
176165
177166
## 🔐 Deployment Scenarios
178167

@@ -189,17 +178,17 @@ See [002-setup/README.md](deploy/002-setup/README.md) for detailed instructions.
189178
```text
190179
.
191180
├── deploy/
192-
│ ├── 000-prerequisites/ # Validation scripts
193-
│ ├── 001-iac/ # Terraform infrastructure
194-
│ └── 002-setup/ # Cluster configuration scripts
181+
│ ├── 000-prerequisites/ # Azure CLI and provider setup
182+
│ ├── 001-iac/ # Terraform infrastructure
183+
│ └── 002-setup/ # Cluster configuration scripts
195184
├── scripts/
196-
│ ├── submit-azureml-*.sh # AzureML job submission
197-
│ └── submit-osmo-*.sh # OSMO workflow submission
185+
│ ├── submit-azureml-*.sh # AzureML job submission
186+
│ └── submit-osmo-*.sh # OSMO workflow submission
198187
├── workflows/
199-
│ ├── azureml/ # AzureML job templates
200-
│ └── osmo/ # OSMO workflow templates
201-
├── src/training/ # Training code
202-
└── docs/ # Additional documentation
188+
│ ├── azureml/ # AzureML job templates
189+
│ └── osmo/ # OSMO workflow templates
190+
├── src/training/ # Training code
191+
└── docs/ # Additional documentation
203192
```
204193

205194
## 📖 Documentation

deploy/000-prerequisites/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
Azure CLI initialization and subscription setup for Terraform deployments.
44

5-
## Scripts
5+
## 📜 Scripts
66

77
| Script | Purpose |
88
|--------|---------|
99
| `az-sub-init.sh` | Azure login and `ARM_SUBSCRIPTION_ID` export |
1010
| `register-azure-providers.sh` | Register required Azure resource providers |
1111

12-
## Usage
12+
## 🚀 Usage
1313

1414
Source the initialization script to set `ARM_SUBSCRIPTION_ID` for Terraform:
1515

@@ -33,7 +33,7 @@ For new Azure subscriptions or subscriptions that haven't deployed AKS, AzureML,
3333

3434
The script reads providers from `robotics-azure-resource-providers.txt` and waits for registration to complete. This is a one-time operation per subscription.
3535

36-
## What It Does
36+
## ⚙️ What It Does
3737

3838
### az-sub-init.sh
3939

@@ -50,6 +50,6 @@ The subscription ID is required by Terraform's Azure provider when not running i
5050
3. Registers unregistered providers
5151
4. Polls until all providers reach `Registered` state
5252

53-
## Next Step
53+
## ➡️ Next Step
5454

5555
After initialization, proceed to [001-iac](../001-iac/) to deploy infrastructure.

0 commit comments

Comments
 (0)