You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 11, 2026. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+74-7Lines changed: 74 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,56 @@ Production-ready framework for orchestrating robotics and AI workloads on [Azure
18
18
19
19
The infrastructure deploys an AKS cluster with GPU node pools running the NVIDIA GPU Operator and KAI Scheduler. Training workloads can be submitted via OSMO workflows (control plane and backend operator) and AzureML jobs (ML extension). Both platforms share common infrastructure: Azure Storage for checkpoints and data, Key Vault for secrets, and Azure Container Registry for container images. OSMO additionally uses PostgreSQL for workflow state and Redis for caching.
GPU Spot VMs provide significant savings (60-90%) compared to on-demand pricing. Actual costs depend on training frequency, job duration, and data volumes.
Copy file name to clipboardExpand all lines: deploy/001-iac/README.md
+15-1Lines changed: 15 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,20 @@ Terraform configuration for the robotics reference architecture. Deploys Azure r
10
10
| Terraform | 1.5+ |`terraform version`|
11
11
| GPU VM quota | Region-specific | e.g., `Standard_NV36ads_A10_v5`|
12
12
13
+
### Azure RBAC Permissions
14
+
15
+
| Role | Scope |
16
+
|------|-------|
17
+
| Contributor | Subscription (new RG) or Resource Group (existing RG) |
18
+
| Role Based Access Control Administrator | Subscription (new RG) or Resource Group (existing RG) |
19
+
20
+
Terraform creates role assignments for managed identities, requiring `Microsoft.Authorization/roleAssignments/write` permission. The Contributor role explicitly blocks this action; the RBAC Administrator role provides it.
21
+
22
+
> [!NOTE]
23
+
> Use subscription scope if creating a new resource group (`should_create_resource_group = true`). Use resource group scope if the resource group already exists.
24
+
25
+
**Alternative**: Owner role (grants more permissions than required).
26
+
13
27
## 🚀 Quick Start
14
28
15
29
```bash
@@ -260,7 +274,7 @@ Issues and resolutions encountered during infrastructure deployment and teardown
260
274
261
275
### Destroy Takes a Long Time
262
276
263
-
Terraform destroy removes resources in dependency order. Private Endpoints, AKS clusters, and PostgreSQL servers commonly take 10-15 minutes each.
277
+
Terraform destroy removes resources in dependency order. Private Endpoints, AKS clusters, and PostgreSQL servers commonly take 5-10 minutes each. Full destruction typically takes 20-30 minutes.
0 commit comments