This repository contains CloudFormation templates for deploying the core Lakerunner platform on AWS ECS using Fargate. Pre-generated templates are available in the generated-templates/ directory for immediate use.
The core Lakerunner deployment consists of CloudFormation stacks that can be deployed in order:
Option 1: Use Existing VPC
- Common Infrastructure (
lakerunner-common.yaml) - RDS database, S3 bucket, SQS queue, and ALB - ECS Setup (
lakerunner-ecs-setup.yaml) - Database and Kafka setup task that runs once during initial setup - ECS Services (
lakerunner-ecs-services.yaml) - ECS Fargate services for all Lakerunner microservices
Option 2: Create New VPC (Recommended for POCs)
- VPC Infrastructure (
lakerunner-vpc.yaml) - Cost-optimized VPC with essential VPC endpoints - Common Infrastructure (
lakerunner-common.yaml) - RDS database, S3 bucket, SQS queue, and ALB - ECS Setup (
lakerunner-ecs-setup.yaml) - Database and Kafka setup task that runs once during initial setup - ECS Services (
lakerunner-ecs-services.yaml) - ECS Fargate services for all Lakerunner microservices
If you don't have a VPC ready, use the lakerunner-vpc.yaml template to create a cost-optimized VPC for your POC deployment:
- NAT Gateway: For ECS nodes to pull container images and reach internet services
- VPC Endpoint - Secrets Manager: For RDS password retrieval without internet routing
- VPC Endpoint - CloudWatch Logs: For ECS logging without internet egress charges
- VPC Endpoint - ECS/ECR: For container orchestration and image pulls from private subnets
- S3 Gateway Endpoint: For S3 access without internet data transfer costs
- Private subnets only: All Lakerunner services run without public IP addresses
- Security groups: VPC endpoints restricted to HTTPS (port 443) from VPC CIDR only
- Single NAT Gateway: Shared across AZs instead of per-AZ (saves ~$45/month)
- Minimal VPC endpoints: Only essential AWS services to reduce interface endpoint costs
- Gateway endpoints where available: S3 uses free gateway endpoint instead of paid interface endpoint
aws cloudformation create-stack \
--stack-name lakerunner-vpc \
--template-body file://generated-templates/lakerunner-vpc.yaml \
--parameters ParameterKey=EnvironmentName,ParameterValue=lakerunner \
ParameterKey=VpcCidr,ParameterValue=10.0.0.0/16| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
EnvironmentName |
String | No | lakerunner | Environment name for resource naming |
VpcCidr |
String | No | 10.0.0.0/16 | CIDR block for the VPC |
CreateNatGateway |
String | No | Yes | Create NAT Gateway for private subnet internet access |
The VPC template exports these values for use by other stacks:
VpcId- VPC IDPublicSubnets- Comma-separated public subnet IDsPrivateSubnets- Comma-separated private subnet IDsVPCEndpointSecurityGroupId- Security group for VPC endpoints
When using the VPC template, pass the exported values to the Common Infrastructure stack:
# Get VPC outputs
VPC_ID=$(aws cloudformation describe-stacks --stack-name lakerunner-vpc --query 'Stacks[0].Outputs[?OutputKey==`VpcId`].OutputValue' --output text)
PRIVATE_SUBNETS=$(aws cloudformation describe-stacks --stack-name lakerunner-vpc --query 'Stacks[0].Outputs[?OutputKey==`PrivateSubnets`].OutputValue' --output text)
# Deploy common infrastructure
aws cloudformation create-stack \
--stack-name lakerunner-common \
--template-body file://generated-templates/lakerunner-common.yaml \
--parameters ParameterKey=VpcId,ParameterValue=$VPC_ID \
ParameterKey=PrivateSubnets,ParameterValue=$PRIVATE_SUBNETS \
--capabilities CAPABILITY_IAMPre-generated CloudFormation templates are available in the generated-templates/ directory. These are region and account agnostic, and should deploy to any AWS account where you have sufficient permissions.
The simplest way to deploy Lakerunner is using the root stack (lakerunner-root.yaml), which orchestrates all nested stacks with a single deployment command. This approach allows you to selectively enable components.
Deploy generated-templates/lakerunner-root.yaml with the following parameters:
Required parameters:
- TemplateBaseUrl – S3 URL where nested templates are stored (upload all generated-templates/*.yaml files to S3)
- VpcId – VPC where resources will be created
- PrivateSubnet1Id, PrivateSubnet2Id – Private subnet IDs (at least two in different AZs)
Component selection (set to "Yes" to enable):
- CreateS3Storage – Create S3 bucket and SQS queue (default: Yes)
- CreateRDS – Create PostgreSQL database (default: Yes)
- CreateMSK – Create Kafka cluster (default: Yes)
- CreateECSInfrastructure – Create ECS cluster (default: Yes)
- CreateECSServices – Deploy Lakerunner services (default: Yes, requires ECS infrastructure)
- CreateECSCollector – Deploy OTEL Collector (default: Yes, requires ECS infrastructure)
- CreateECSGrafana – Deploy Grafana dashboard (default: No, requires ECS infrastructure)
Example deployment:
# First, upload templates to S3
aws s3 cp generated-templates/ s3://my-bucket/lakerunner/templates/ --recursive
# Deploy with defaults (all components except Grafana)
aws cloudformation create-stack \
--stack-name lakerunner \
--template-body file://generated-templates/lakerunner-root.yaml \
--parameters \
ParameterKey=TemplateBaseUrl,ParameterValue=https://s3.amazonaws.com/my-bucket/lakerunner/templates \
ParameterKey=VpcId,ParameterValue=vpc-12345678 \
ParameterKey=PrivateSubnet1Id,ParameterValue=subnet-12345678 \
ParameterKey=PrivateSubnet2Id,ParameterValue=subnet-87654321 \
--capabilities CAPABILITY_IAM
# To also include Grafana, add:
# ParameterKey=CreateECSGrafana,ParameterValue=Yes
# To disable ECS components, add:
# ParameterKey=CreateECSInfrastructure,ParameterValue=NoFor more control over deployment order and configuration, you can deploy each stack individually:
Deploy generated-templates/lakerunner-common.yaml using the AWS Console or CLI. Required parameters:
- VpcId – VPC where resources will be created
- PrivateSubnets – Private subnet IDs (for ECS/RDS). Provide at least two in different AZs.
Optional parameters:
- PublicSubnets – Public subnet IDs (required only for internet-facing ALB)
- AlbScheme – Load balancer scheme: "internal" (default) or "internet-facing"
- ApiKeysOverride – Custom API keys configuration in YAML format
- StorageProfilesOverride – Custom storage profiles configuration in YAML format
Example AWS CLI deployment:
aws cloudformation create-stack \\
--stack-name lakerunner-common \\
--template-body file://generated-templates/lakerunner-common.yaml \\
--parameters ParameterKey=VpcId,ParameterValue=vpc-12345678 \\
ParameterKey=PrivateSubnets,ParameterValue="subnet-12345678,subnet-87654321" \\
--capabilities CAPABILITY_IAMDeploy generated-templates/lakerunner-ecs-setup.yaml to run database and Kafka setup. Required parameters:
- CommonInfraStackName – Name of the CommonInfra stack (e.g., "lakerunner-common")
Optional parameters:
- ContainerImage – Container image for setup task (default: public.ecr.aws/cardinalhq.io/lakerunner:latest)
Example AWS CLI deployment:
aws cloudformation create-stack \\
--stack-name lakerunner-ecs-setup \\
--template-body file://generated-templates/lakerunner-ecs-setup.yaml \\
--parameters ParameterKey=CommonInfraStackName,ParameterValue=lakerunner-common \\
--capabilities CAPABILITY_IAMDeploy generated-templates/lakerunner-ecs-services.yaml for all Lakerunner microservices. Required parameters:
- CommonInfraStackName – Name of the CommonInfra stack (e.g., "lakerunner-common")
Optional parameters:
- OtelEndpoint – OTEL collector HTTP endpoint URL (e.g., http://collector-dns:4318). Leave blank to disable OTLP telemetry export.
- Container image overrides for air-gapped deployments:
- GoServicesImage – Image for Go services (default: public.ecr.aws/cardinalhq.io/lakerunner:latest)
- QueryApiImage – Image for query-api (default: public.ecr.aws/cardinalhq.io/lakerunner/query-api:latest)
- QueryWorkerImage – Image for query-worker (default: public.ecr.aws/cardinalhq.io/lakerunner/query-worker:latest)
- GrafanaImage – Image for Grafana (default: grafana/grafana:latest)
Example AWS CLI deployment:
aws cloudformation create-stack \\
--stack-name lakerunner-ecs-services \\
--template-body file://generated-templates/lakerunner-ecs-services.yaml \\
--parameters ParameterKey=CommonInfraStackName,ParameterValue=lakerunner-common \\
--capabilities CAPABILITY_IAMWhen updating existing CloudFormation stacks, container image versions are not automatically updated to the new defaults. You must explicitly specify the image parameters during stack updates.
- Go Services:
public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1 - Query API:
public.ecr.aws/cardinalhq.io/lakerunner/query-api:v1.2.1 - Query Worker:
public.ecr.aws/cardinalhq.io/lakerunner/query-worker:v1.2.1 - Migration:
public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1
aws cloudformation update-stack \\
--stack-name lakerunner-services \\
--template-body file://generated-templates/lakerunner-services.yaml \\
--parameters \\
ParameterKey=CommonInfraStackName,UsePreviousValue=true \\
ParameterKey=GoServicesImage,ParameterValue=public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1 \\
ParameterKey=QueryApiImage,ParameterValue=public.ecr.aws/cardinalhq.io/lakerunner/query-api:v1.2.1 \\
ParameterKey=QueryWorkerImage,ParameterValue=public.ecr.aws/cardinalhq.io/lakerunner/query-worker:v1.2.1 \\
--capabilities CAPABILITY_IAMaws cloudformation update-stack \\
--stack-name lakerunner-migration \\
--template-body file://generated-templates/lakerunner-migration.yaml \\
--parameters \\
ParameterKey=CommonInfraStackName,UsePreviousValue=true \\
ParameterKey=ContainerImage,ParameterValue=public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1 \\
--capabilities CAPABILITY_IAMFor convenience, you can use the provided helper script:
# Update services stack
./update-images.sh lakerunner-services services
# Update migration stack
./update-images.sh lakerunner-migration migrationThe services stack supports optional OTLP (OpenTelemetry Protocol) telemetry export. When enabled, all Go services will export logs and metrics to the specified collector endpoint.
Add the OtelEndpoint parameter when deploying the services stack:
# Deploy services with OTLP telemetry enabled
aws cloudformation create-stack \
--stack-name lakerunner-services \
--template-body file://generated-templates/lakerunner-services.yaml \
--parameters ParameterKey=CommonInfraStackName,ParameterValue=lakerunner-common \
ParameterKey=OtelEndpoint,ParameterValue=http://collector-dns:4318 \
--capabilities CAPABILITY_IAMExternal Collector
For collectors outside the ECS cluster, provide the full endpoint URL:
http://my-collector.example.com:4318- External HTTP collectorhttp://internal-lb-dns:4318- Internal load balancerhttp://10.0.1.100:4318- Direct IP address
When OtelEndpoint is provided, these environment variables are automatically added to all Go services:
OTEL_EXPORTER_OTLP_ENDPOINT- The collector endpoint URLENABLE_OTLP_TELEMETRY=true- Enables telemetry export in the application
Deploy in this order:
lakerunner-common- Core infrastructurelakerunner-migration- Database migrationlakerunner-services- Core serviceslakerunner-grafana-service- Grafana dashboard (optional)
After successful deployment:
- Grafana Dashboard: Access via ALB DNS name on port 3000
- Username:
lakerunner - Password: Retrieve from AWS Secrets Manager using the GrafanaAdminSecretArn output
- Username:
- Query API: Access via ALB DNS name on port 7101
- S3 Bucket: Upload data to the created bucket with appropriate prefixes
# Get the secret ARN from stack outputs
SECRET_ARN=$(aws cloudformation describe-stacks \
--stack-name lakerunner-services \
--query 'Stacks[0].Outputs[?OutputKey==`GrafanaAdminSecretArn`].OutputValue' \
--output text)
# Retrieve the password
aws secretsmanager get-secret-value \
--secret-id $SECRET_ARN \
--query 'SecretString' \
--output text | jq -r '.password'If you lose access to Grafana (forgot password, account locked, etc.), you can reset the entire Grafana configuration using the Grafana Reset Token feature.
The GrafanaResetToken parameter allows you to wipe all Grafana data and start fresh with the original admin credentials. When you change this parameter value, Grafana will:
- Delete all existing Grafana data (dashboards, users, settings, database)
- Start with a clean database
- Use the original admin credentials (
lakerunnerusername with password from AWS Secrets Manager)
To reset Grafana:
# Update the stack with a reset token (use any unique value)
aws cloudformation update-stack \
--stack-name lakerunner-services \
--use-previous-template \
--parameters \
ParameterKey=CommonInfraStackName,UsePreviousValue=true \
ParameterKey=GrafanaResetToken,ParameterValue="reset-$(date +%s)" \
--capabilities CAPABILITY_IAMTo return to normal operation (prevent accidental resets):
# Clear the reset token after successful reset
aws cloudformation update-stack \
--stack-name lakerunner-services \
--use-previous-template \
--parameters \
ParameterKey=CommonInfraStackName,UsePreviousValue=true \
ParameterKey=GrafanaResetToken,ParameterValue="" \
--capabilities CAPABILITY_IAM- Empty token (default): Normal operation, preserves all Grafana data
- New token value: Wipes Grafana data and resets to fresh state with original admin credentials
- Same token value: No action taken, data preserved
- Token tracking: The system remembers the last reset token to prevent accidental resets
- Password Recovery: Set reset token, deploy, access with original credentials, clear token
- Clean Demo Environment: Use reset token to quickly restore demo state
- Development Reset: Reset during development to test fresh configurations
- User Management Reset: Remove all custom users and return to single admin account
An Application Load Balancer is always created and points to:
query-apiservice on port 7101grafanaservice on port 3000
The ALB can be configured as:
- Internal (default) - Only accessible from within the VPC
- Internet-facing - Accessible from the internet (requires public subnets)
Lakerunner consists of these microservices that process telemetry data:
- pubsub-sqs - Receives SQS notifications from S3 bucket
- ingest-logs/metrics - Process raw log and metric files
- compact-logs/metrics - Optimize storage format
- rollup-metrics - Pre-aggregate metrics for faster queries
- sweeper - Clean up temporary files
- query-api - REST API for data queries (ALB-attached)
- query-worker - Query execution engine
- grafana - Visualization dashboard (ALB-attached)
All services share:
- Common ECS task execution and task roles
- Unified secret injection from Secrets Manager and SSM
- Standardized logging to CloudWatch
- Scratch volumes for temporary workspace (/scratch)
- Health checks appropriate to service type (Go, Scala, cURL)
- Optional OTLP telemetry export to OpenTelemetry collectors
This repository includes several specialized guides:
- OTEL Collector - Dedicated telemetry ingestion service setup and configuration
- Building from Source - Development guide for modifying and building CloudFormation templates
This repository provides 4 CloudFormation stacks with specific dependencies:
- Common Infrastructure (
lakerunner-common.yaml) - Core infrastructure - ECS Setup (
lakerunner-ecs-setup.yaml) - Database and Kafka setup task - ECS Services (
lakerunner-ecs-services.yaml) - Core Lakerunner services - ECS Grafana (
lakerunner-ecs-grafana.yaml) - Optional Grafana dashboard
lakerunner-common (required)
├── lakerunner-ecs-setup (required after common)
├── lakerunner-ecs-services (required after common)
└── lakerunner-ecs-grafana (optional, after common + services)
# 1. Core infrastructure
aws cloudformation create-stack --stack-name lakerunner-common ...
# 2. Database and Kafka setup
aws cloudformation create-stack --stack-name lakerunner-ecs-setup ...
# 3. Services
aws cloudformation create-stack --stack-name lakerunner-ecs-services ...# 1. Core infrastructure
aws cloudformation create-stack --stack-name lakerunner-common ...
# 2. Database and Kafka setup
aws cloudformation create-stack --stack-name lakerunner-ecs-setup ...
# 3. Services
aws cloudformation create-stack --stack-name lakerunner-ecs-services ...
# 4. Grafana dashboard (optional)
aws cloudformation create-stack --stack-name lakerunner-ecs-grafana \
--parameters ParameterKey=CommonInfraStackName,ParameterValue="lakerunner-common" \
ParameterKey=ServicesStackName,ParameterValue="lakerunner-ecs-services" ...Common Infrastructure exports:
- VPC ID, Private/Public Subnets
- ECS Cluster ARN
- Database endpoint, port, credentials ARN
- S3 bucket name/ARN
- MSK cluster ARN and credentials
- Security groups
Services imports from Common:
- All infrastructure resources
- Database credentials for app configuration
- MSK cluster for Kafka connectivity
ECS Setup imports from Common:
- Database connection details
- MSK cluster for Kafka topic creation
- ECS cluster for task execution
- Network configuration
OTEL Collector imports from Common:
- VPC and networking for ALB
- ECS cluster for service deployment
Demo Apps imports from:
- Common: ECS cluster, networking, security groups
- Services: Task execution role, security groups
- OTEL Collector: Telemetry endpoint URL
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
VpcId |
AWS::EC2::VPC::Id | Yes | - | VPC where resources will be created |
PrivateSubnets |
ListAWS::EC2::Subnet::Id | Yes | - | Private subnet IDs for ECS/RDS/MSK (≥2 in different AZs) |
PublicSubnets |
ListAWS::EC2::Subnet::Id | No | - | Public subnet IDs for internet-facing ALB (≥2 in different AZs) |
ApiKeysOverride |
String | No | "" | Custom API keys configuration in YAML format |
StorageProfilesOverride |
String | No | "" | Custom storage profiles configuration in YAML format |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
CommonInfraStackName |
String | Yes | - | Name of the CommonInfra stack to import values from |
ContainerImage |
String | No | public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1 | Setup container image |
Cpu |
String | No | "512" | Fargate CPU units (256/512/1024/2048/4096) |
MemoryMiB |
String | No | "1024" | Fargate Memory in MiB (512/1024/2048/3072/4096/5120/6144/7168/8192) |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
CommonInfraStackName |
String | Yes | - | Name of the CommonInfra stack to import values from |
AlbScheme |
String | No | "internal" | ALB scheme: "internal" or "internet-facing" |
OtelEndpoint |
String | No | "" | OTEL collector HTTP endpoint (e.g., http://collector-dns:4318) |
GrafanaResetToken |
String | No | "" | Change this value to reset Grafana data (recreate database) |
GoServicesImage |
String | No | public.ecr.aws/cardinalhq.io/lakerunner:v1.2.1 | Container image for Go services |
QueryApiImage |
String | No | public.ecr.aws/cardinalhq.io/lakerunner/query-api:latest | Container image for query-api service |
QueryWorkerImage |
String | No | public.ecr.aws/cardinalhq.io/lakerunner/query-worker:latest | Container image for query-worker service |
GrafanaImage |
String | No | grafana/grafana:latest | Container image for Grafana service |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
CommonInfraStackName |
String | Yes | - | Name of the CommonInfra stack to import values from |
LoadBalancerType |
String | No | "internal" | ALB type: "internal" or "internet-facing" |
OrganizationId |
String | No | 12340000-0000-4000-8000-000000000000 | Organization ID for OTEL data routing |
CollectorName |
String | No | "lakerunner" | Collector name for OTEL data routing |
OtelCollectorImage |
String | No | public.ecr.aws/cardinalhq.io/cardinalhq-otel-collector:latest | OTEL collector container image |
OtelConfigYaml |
String | No | "" | Custom OTEL collector configuration in YAML format |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
CommonInfraStackName |
String | Yes | - | Name of the CommonInfra stack to import values from |
ServicesStackName |
String | Yes | - | Name of the Services stack to import ALB target groups from |
OtelCollectorStackName |
String | Yes | - | Name of the OTEL Collector stack to get collector endpoint from |
SampleAppImage |
String | No | public.ecr.aws/cardinalhq.io/lakerunner-demo/sample-app:latest | Container image for sample-app service |
Default configurations are stored in lakerunner-stack-defaults.yaml and include:
- API Keys - Default organization and API key configurations
- Storage Profiles - S3 storage configuration for telemetry data
- Container Images - Default image locations for all services
- Service Settings - CPU, memory, replica counts, and environment variables
- Database credentials stored in AWS Secrets Manager
- Application secrets (HMAC keys, Grafana passwords) auto-generated
- ECS task roles follow principle of least privilege
- All tasks run in private subnets with no public IP assignment
- SSL/TLS encryption for database connections (
LRDB_SSLMODE: require)
- All services log to CloudWatch under
/ecs/<service-name> - Health checks can be monitored via ECS console
- Database connectivity issues often indicate security group or subnet configuration problems
- ALB target health can be checked via EC2 console under Load Balancers
- README.md - Main deployment guide (this document)
- README-DEMO-APPS.md - Demo applications and testing
- README-OTEL-COLLECTOR.md - OTEL collector setup
- README-BUILDING.md - Building and development guide