Skip to content
This repository was archived by the owner on Mar 11, 2026. It is now read-only.

Commit 3262f10

Browse files
WilliamBerryiiiBill Berry
andauthored
docs(docs): add getting-started hub and quickstart tutorial (#369)
- create docs/getting-started/README.md hub page with navigation table - create docs/getting-started/quickstart.md with 8-step deployment path - use full-public networking and Access Keys for simplest onboarding path 📝 - Generated by Copilot Co-authored-by: Bill Berry <wbery@microsoft.com>
1 parent fb7a217 commit 3262f10

2 files changed

Lines changed: 237 additions & 0 deletions

File tree

docs/getting-started/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Getting Started
3+
description: Entry point for deploying the Azure NVIDIA Robotics Reference Architecture
4+
author: Microsoft Robotics-AI Team
5+
ms.date: 2026-02-22
6+
ms.topic: overview
7+
keywords:
8+
- getting-started
9+
- quickstart
10+
- deployment
11+
- onboarding
12+
---
13+
14+
Deploy the Azure NVIDIA Robotics Reference Architecture and submit your first training job. This hub guides you through setup, deployment, and verification.
15+
16+
## 🚀 Guides
17+
18+
| Guide | Description |
19+
| --- | --- |
20+
| [Quickstart](quickstart.md) | 8-step path from clone to first training job |
21+
| Architecture Overview (coming soon) | System topology, components, and data flow |
22+
| Glossary (coming soon) | Term definitions for Azure, NVIDIA, and OSMO |
23+
24+
## ⏱️ Time and Cost
25+
26+
| Item | Estimate |
27+
| --- | --- |
28+
| Total deployment time | ~1.5-2 hours |
29+
| Quick validation cost | ~$25-50 |
30+
| GPU VM rate | ~$3.06/hour (A100) |
31+
32+
> [!NOTE]
33+
> Run `terraform destroy` when finished to stop incurring costs. See [Cost Considerations](../contributing/cost-considerations.md) for detailed estimates.
34+
35+
## 📋 Prerequisites Summary
36+
37+
| Tool | Version |
38+
| --- | --- |
39+
| Terraform | ≥1.9.8 |
40+
| Azure CLI | ≥2.65.0 |
41+
| kubectl | ≥1.31 |
42+
| Helm | ≥3.16 |
43+
| Python | ≥3.11 |
44+
45+
Azure subscription with Contributor + User Access Administrator roles, GPU quota for `Standard_NC24ads_A100_v4`, and an NVIDIA NGC account are required. See [Prerequisites](../contributing/prerequisites.md) for full details.
46+
47+
## 📚 Related Documentation
48+
49+
| Resource | Description |
50+
| --- | --- |
51+
| [Contributing Guide](../contributing/README.md) | Development workflow and code standards |
52+
| [Deployment Guide](../../deploy/README.md) | Detailed deployment reference |
53+
| [Cost Considerations](../contributing/cost-considerations.md) | Pricing breakdown and optimization |

docs/getting-started/quickstart.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: "Quickstart: Clone to First Training Job"
3+
description: Deploy infrastructure and submit your first robotics training job in 8 steps
4+
author: Microsoft Robotics-AI Team
5+
ms.date: 2026-02-22
6+
ms.topic: tutorial
7+
keywords:
8+
- quickstart
9+
- deployment
10+
- training
11+
- tutorial
12+
---
13+
14+
Deploy the full Azure NVIDIA Robotics stack and submit a training job in ~1.5-2 hours. This guide uses full-public networking and Access Keys authentication for the simplest path.
15+
16+
> [!NOTE]
17+
> This guide expands on the [Getting Started hub](README.md).
18+
19+
## Prerequisites
20+
21+
| Requirement | Details |
22+
| --- | --- |
23+
| Azure subscription | Contributor + User Access Administrator roles |
24+
| GPU quota | `Standard_NC24ads_A100_v4` in target region |
25+
| NVIDIA NGC account | Sign up at <https://ngc.nvidia.com/> for API key |
26+
| Development environment | Devcontainer (recommended) or local tools |
27+
28+
See [Prerequisites](../contributing/prerequisites.md) for installation commands and version requirements.
29+
30+
## Step 1: Clone and Set Up Environment
31+
32+
Clone the repository and initialize the development environment.
33+
34+
```bash
35+
git clone https://github.com/Azure-Samples/azure-nvidia-robotics-reference-architecture.git
36+
cd azure-nvidia-robotics-reference-architecture
37+
```
38+
39+
Use the devcontainer (recommended) or run local setup:
40+
41+
```bash
42+
./setup-dev.sh
43+
```
44+
45+
## Step 2: Configure Azure Subscription
46+
47+
Authenticate with Azure and register required resource providers.
48+
49+
```bash
50+
source deploy/000-prerequisites/az-sub-init.sh
51+
bash deploy/000-prerequisites/register-azure-providers.sh
52+
```
53+
54+
Verify your subscription:
55+
56+
```bash
57+
az account show --query "{name:name, id:id}" -o table
58+
```
59+
60+
## Step 3: Configure Terraform Variables
61+
62+
Create a Terraform variables file for the full-public deployment path. From the repository root:
63+
64+
```bash
65+
cd deploy/001-iac
66+
cp terraform.tfvars.example terraform.tfvars
67+
```
68+
69+
Edit `terraform.tfvars` with these values:
70+
71+
```hcl
72+
project_name = "robotics"
73+
environment = "dev"
74+
location = "eastus"
75+
gpu_vm_size = "Standard_NC24ads_A100_v4"
76+
77+
enable_azure_ml = true
78+
enable_osmo = true
79+
enable_vpn_gateway = false
80+
enable_private_dns = false
81+
```
82+
83+
> [!TIP]
84+
> For private networking, set `enable_vpn_gateway = true` and `enable_private_dns = true`. See the [Infrastructure Guide](../../deploy/001-iac/README.md) for details.
85+
86+
## Step 4: Deploy Infrastructure
87+
88+
Initialize and apply the Terraform configuration. This step takes ~30-40 minutes.
89+
90+
```bash
91+
terraform init
92+
terraform plan -out=tfplan
93+
terraform apply tfplan
94+
```
95+
96+
Verify deployment:
97+
98+
```bash
99+
terraform output
100+
```
101+
102+
Connect to the AKS cluster:
103+
104+
```bash
105+
az aks get-credentials \
106+
--resource-group "$(terraform output -raw resource_group_name)" \
107+
--name "$(terraform output -raw aks_cluster_name)"
108+
```
109+
110+
## Step 5: Configure AKS Cluster
111+
112+
Deploy GPU Operator, KAI Scheduler, and the AzureML extension. From the repository root:
113+
114+
```bash
115+
cd deploy/002-setup
116+
bash 01-deploy-robotics-charts.sh
117+
bash 02-deploy-azureml-extension.sh
118+
```
119+
120+
Verify GPU operator pods:
121+
122+
```bash
123+
kubectl get pods -n gpu-operator
124+
```
125+
126+
## Step 6: Deploy OSMO Components
127+
128+
Deploy the OSMO control plane and backend using Access Keys authentication.
129+
130+
```bash
131+
bash 03-deploy-osmo-control-plane.sh
132+
bash 04-deploy-osmo-backend.sh --use-access-keys
133+
```
134+
135+
Verify OSMO pods:
136+
137+
```bash
138+
kubectl get pods -n osmo-control-plane
139+
```
140+
141+
## Step 7: Submit First Training Job
142+
143+
Navigate to the scripts directory and submit a training job. From the repository root:
144+
145+
```bash
146+
cd scripts
147+
bash submit-osmo-training.sh
148+
```
149+
150+
Scripts auto-detect configuration from Terraform outputs. Override values with CLI arguments or environment variables as needed. See [Scripts](../../scripts/README.md) for all submission options.
151+
152+
## Step 8: Verify Results
153+
154+
Confirm the training job is running:
155+
156+
```bash
157+
kubectl get pods -n osmo-control-plane --watch
158+
```
159+
160+
Check OSMO training status through the OSMO web UI or query pod logs:
161+
162+
```bash
163+
kubectl logs -n osmo-control-plane -l app=osmo-training --tail=50
164+
```
165+
166+
## Cleanup
167+
168+
Destroy all infrastructure when finished to stop incurring costs. From the repository root:
169+
170+
```bash
171+
cd deploy/001-iac
172+
terraform destroy
173+
```
174+
175+
See [Cost Considerations](../contributing/cost-considerations.md) for detailed pricing.
176+
177+
## Next Steps
178+
179+
| Resource | Description |
180+
| --- | --- |
181+
| [LeRobot Inference](../lerobot-inference.md) | Run inference with trained LeRobot models |
182+
| [MLflow Integration](../mlflow-integration.md) | Track experiments with MLflow |
183+
| [Deployment Guide](../../deploy/README.md) | Full deployment reference and options |
184+
| [Contributing Guide](../contributing/README.md) | Development workflow and code standards |

0 commit comments

Comments
 (0)