Enterprise-grade astronomical computing platform enabling DESI research through hybrid Kubernetes/VM infrastructure.
This repository documents the Proxmox Astronomy Lab β a 7-node production cluster running hybrid RKE2 Kubernetes and strategic VMs for astronomical data science. We produce enhanced datasets and original research from public surveys, operating at organizational scale with enterprise infrastructure standards.
This section provides context for the platform's purpose and design. If you're familiar with hybrid Proxmox/Kubernetes infrastructure, skip to Quick Start.
The Proxmox Astronomy Lab serves as the compute foundation for RadioAstronomy.io research projects. The platform combines virtualization flexibility with container orchestration β VMs for persistent services (databases, file servers, GPU workloads) and Kubernetes for dynamic ML/AI pipelines.
Our primary workload is DESI (Dark Energy Spectroscopic Instrument) data analysis, processing tens of gigabytes of galaxy catalogs and spectral data. The infrastructure enables reproducible research at scale: PostgreSQL with pgvector for catalog queries, distributed Ray clusters for ML training, and GPU acceleration for model inference.
AI is embedded in 100% of our workflows, governed by a formal NIST AI RMF implementation with model cards, risk scenarios, and multi-framework compliance mapping. We target CISv8 L1 as an achievable security baseline appropriate for a research platform.
| Audience | Use Case |
|---|---|
| Researchers | Access compute resources for DESI analysis projects |
| Infrastructure Engineers | Deploy similar hybrid clusters, learn Proxmox/RKE2 patterns |
| Data Scientists | Understand ML infrastructure, GPU pipelines, database architecture |
| DevOps/SRE | Review automation patterns, monitoring stack, security implementation |
| AI Governance Practitioners | See production AI policy framework in action |
The platform uses a hybrid architecture: RKE2 Kubernetes for dynamic workloads, strategic VMs for persistent services, multi-cloud identity and hosting.
| Cloud | Role | Active Services |
|---|---|---|
| Google Workspace Enterprise | Primary IdP, daily operations | Mail, Docs, SSO, Chrome Enterprise, DLP, Gemini integration |
| Azure | Platform services + security backbone | Static Web Apps, CosmoDB, SharePoint, Business Premium + Intune Suite |
| AWS | Greenfield expansion | IAM Identity Center (SSO from Google), breakglass users configured |
| Service Tier | Implementation | Components |
|---|---|---|
| Identity | Google Workspace Enterprise + Netbird ZTNA | SSO, conditional access, MFA, zero-trust remote access |
| Orchestration | RKE2 + Portainer | 3-node Kubernetes control plane, container management |
| Compute | Hybrid K8s/VM | Dynamic scaling + persistent services across 35+ VMs |
| Data | PostgreSQL + File Services | 30GB+ DESI databases + distributed file systems |
| AI/ML | Ray + GPU acceleration | Distributed computing + RTX A4000 inference |
| Monitoring | Prometheus + Grafana + Loki | Centralized observability |
| Security | CISv8 L1 + CIS-RAM + NIST-AI-RMF | Infrastructure hardening, AI governance |
The platform is in production, supporting active research workloads.
| Area | Status | Description |
|---|---|---|
| Core Infrastructure | β Production | 7-node cluster, networking, storage |
| RKE2 Kubernetes | β Production | 3-node cluster + GPU worker |
| Database Services | β Production | PostgreSQL 16 with pgvector, PostGIS |
| Monitoring Stack | β Production | Prometheus, Grafana, Loki |
| AI Governance | β Production | NIST AI RMF framework, 8 model cards, 10 risk scenarios |
| Security Baseline | π In Progress | CISv8 L1 implementation |
| Documentation | π In Progress | Phase 1 migration to new standards |
| Resource | Value |
|---|---|
| Nodes | 7 |
| Total Cores | 144 |
| Total RAM | 768 GB |
| Total NVMe | 24 TB |
| Network Fabric | LACP 2.5G/10G per node |
| GPU | 2Γ RTX A4000 16GB |
| Production VMs | 35+ |
| Node | CPU | Cores | RAM | Role |
|---|---|---|---|---|
| node01 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node02 | i5-12600H | 16 | 96 GB | Light compute + storage |
| node03 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node04 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node05 | i5-12600H | 16 | 96 GB | Light compute + storage |
| node06 | i9-13900H | 20 | 96 GB | Heavy compute (databases) |
| node07 | AMD 5950X | 32 | 128 GB | GPU compute |
AI is embedded throughout our workflows with formal governance based on the NIST AI Risk Management Framework. Our complete policy framework is published in the NIST AI RMF Cookbook.
| Component | Count | Description |
|---|---|---|
| Core Policies | 3 | AI governance, acceptable use, model registry |
| Technical Standards | 3 | Risk assessment, secure AI systems, transparency |
| Risk Scenarios | 10 | R01-R10 covering data egress, prompt injection, drift, etc. |
| Model Cards | 8 | Claude Sonnet 4.5, Opus 4.2, Gemini Pro 2.5, and others |
| Framework | Scope |
|---|---|
| NIST AI RMF | Full Govern/Map/Measure/Manage implementation |
| ISO/IEC 42001 | AI management system alignment |
| CIS Controls v8 L1 | Security baseline (in progress) |
| CIS-RAM | Risk assessment methodology |
| Colorado SB24-205 | AI disclosure and duty-of-care |
Current DESI Data Release 1 analysis projects. See RadioAstronomy.io for full project details.
| Project | Focus | Status |
|---|---|---|
| Cosmic Void Galaxies | Environmental quenching, ARD factory | Active |
| QSO Anomaly Detection | ML outlier detection on 1.6M spectra | Planned |
| Quasar Outflows | AGN feedback energetics | Planned |
| RBH-1 Reanalysis | Hypervelocity SMBH validation | Active |
PROXMOX-ASTRONOMY-LAB/
βββ π€ ai-and-machine-learning/ # AI/ML infrastructure, GPU computing, RAG systems
βββ π οΈ applications-and-services/ # Production service configurations
βββ π astronomy-projects/ # DESI research projects and workflows
βββ π§ automation-and-orchestration/ # Ansible automation, infrastructure as code
βββ π docs/ # Documentation standards and procedures
βββ π© hardware/ # Cluster specifications, network architecture
βββ ποΈ infrastructure/ # Core platform services, hybrid architecture
βββ π policies-and-procedures/ # Enterprise governance, compliance
βββ π project-management/ # Project coordination and planning
βββ π publishing/ # Scientific publication workflows
βββ π security-assurance/ # CIS Controls v8 L1 implementation
βββ π wiki/ # Operational procedures and guidesThis organization benefits from open source programs that provide tooling to qualifying public repositories.
| Program | Provides | Use Case |
|---|---|---|
| CodeRabbit | AI code review (Pro tier) | PR review, CLI integration |
| Atlassian | Jira, Confluence (Standard) | Project tracking, documentation |
| Program | Provides | Planned Use |
|---|---|---|
| Snyk | Security scanning | Dependency vulnerability detection |
| SonarCloud | Code quality | Static analysis |
| Sentry | Error tracking | Runtime monitoring |
| Datadog | Observability | Metrics, logs, APM |
- Review Astronomy Projects for active research and datasets
- Understand Infrastructure Overview for compute capabilities
- Explore Publishing Workflows for data release procedures
- Study Hardware Architecture for cluster specifications
- Examine Security Framework for CIS Controls implementation
- Deploy using Infrastructure as Code patterns
- Explore AI/ML Infrastructure for GPU and distributed computing
- Review Application Services for databases and ML platforms
- Learn Kubernetes Platform for container orchestration
- Review NIST AI RMF Cookbook for policy framework
- Examine model cards and risk scenarios for implementation patterns
- See compliance mapping for multi-framework alignment
This project is licensed under the MIT License β see LICENSE for details.
- Proxmox β Virtualization platform
- Rancher/SUSE β RKE2 Kubernetes distribution
- DESI Collaboration β Public data access
- NIST β AI Risk Management Framework
- Open source community β Tools and libraries that make this possible
Last Updated: January 2, 2026 | Platform Status: Production


