Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Commit b960c5b

Browse files
author
Jason Schmidt
authored
feat: add otel operator and sample configurations (#68)
* feat: add observability logic * feat: add observability configurations and readme * fix: re-enable errexit; mistakenly disabled on previous commit. * feat: create tracing config map and update yaml * feat: updates to README.md for observability
1 parent e07d7ea commit b960c5b

File tree

15 files changed

+3204
-16
lines changed

15 files changed

+3204
-16
lines changed

pulumi/aws/README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ vpc - defines and installs the VPC and subnets to use with EKS
2929
└─certmgr - deploys the open source cert-manager.io helm chart to the EKS cluster
3030
└─prometheus - deploys prometheus server, node exporter, and statsd collector for metrics
3131
└─grafana - deploys the grafana visualization platform
32-
└─sirius - deploys the Bank of Sirus application to the EKS cluster
32+
└─observability - deploys the OTEL operator and instantiates a simple collector
33+
└─sirius - deploys the Bank of Sirus application to the EKS cluster
3334
3435
```
3536

@@ -158,6 +159,14 @@ Grafana is deployed and configured with a connection to the prometheus datasourc
158159
writing, the NGINX Plus KIC dashboard is installed as part of the initial setup. Additional datasources and dashboards
159160
can be added by the user either in the code, or via the standard Grafana tooling.
160161
162+
### Observability
163+
164+
We deploy the [OTEL Collector Operator](https://github.com/open-telemetry/opentelemetry-collector) along with a simple
165+
collector. There are several other configurations in the [observability/otel-objects](./observability/otel-objects)
166+
directory. See the [README.md](./observability/otel-objects/README.md) file in the
167+
[observability/otel-objects](./observability/otel-objects) for more information, including an explaination of the
168+
default configuration.
169+
161170
### Demo Application
162171
163172
A forked version of the Google

pulumi/aws/destroy.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ if command -v aws > /dev/null; then
9191
validate_aws_credentials
9292
fi
9393

94-
k8s_projects=(sirius grafana prometheus certmgr logagent logstore kic-helm-chart)
94+
k8s_projects=(sirius observability grafana prometheus certmgr logagent logstore kic-helm-chart)
9595

9696
# Test to see if EKS has been destroy AND there are still Kubernetes resources
9797
# that are being managed by Pulumi. If so, we have to destroy the stack for
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
name: observability
2+
runtime:
3+
name: python
4+
options:
5+
virtualenv: ../venv
6+
config: ../config
7+
description: Deploys OTEL
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
import os
2+
3+
import pulumi
4+
import pulumi_kubernetes as k8s
5+
from pulumi_kubernetes.yaml import ConfigGroup
6+
7+
from kic_util import pulumi_config
8+
9+
# Removes the status field from the Nginx Ingress Helm Chart, so that i#t is
10+
# compatible with the Pulumi Chart implementation.
11+
def remove_status_field(obj):
12+
if obj['kind'] == 'CustomResourceDefinition' and 'status' in obj:
13+
del obj['status']
14+
15+
def pulumi_eks_project_name():
16+
script_dir = os.path.dirname(os.path.abspath(__file__))
17+
eks_project_path = os.path.join(script_dir, '..', 'eks')
18+
return pulumi_config.get_pulumi_project_name(eks_project_path)
19+
20+
21+
def pulumi_ingress_project_name():
22+
script_dir = os.path.dirname(os.path.abspath(__file__))
23+
ingress_project_path = os.path.join(script_dir, '..', 'kic-helm-chart')
24+
return pulumi_config.get_pulumi_project_name(ingress_project_path)
25+
26+
27+
def otel_operator_location():
28+
script_dir = os.path.dirname(os.path.abspath(__file__))
29+
otel_operator_path = os.path.join(script_dir, 'otel-operator', '*.yaml')
30+
return otel_operator_path
31+
32+
def otel_deployment_location():
33+
script_dir = os.path.dirname(os.path.abspath(__file__))
34+
otel_deployment_path = os.path.join(script_dir, 'otel-objects', '*.yaml')
35+
return otel_deployment_path
36+
37+
def add_namespace(obj):
38+
obj['metadata']['namespace'] = 'observability'
39+
40+
41+
stack_name = pulumi.get_stack()
42+
project_name = pulumi.get_project()
43+
eks_project_name = pulumi_eks_project_name()
44+
pulumi_user = pulumi_config.get_pulumi_user()
45+
46+
eks_stack_ref_id = f"{pulumi_user}/{eks_project_name}/{stack_name}"
47+
eks_stack_ref = pulumi.StackReference(eks_stack_ref_id)
48+
kubeconfig = eks_stack_ref.get_output('kubeconfig').apply(lambda c: str(c))
49+
eks_stack_ref.get_output('cluster_name').apply(
50+
lambda s: pulumi.log.info(f'Cluster name: {s}'))
51+
52+
k8s_provider = k8s.Provider(resource_name=f'ingress-setup-sample', kubeconfig=kubeconfig)
53+
54+
# Create the namespace
55+
ns = k8s.core.v1.Namespace(resource_name='observability',
56+
metadata={'name': 'observability'},
57+
opts=pulumi.ResourceOptions(provider=k8s_provider))
58+
59+
# Config Manifests: OTEL operator
60+
otel_operator = otel_operator_location()
61+
62+
otel_op = ConfigGroup(
63+
'otel-op',
64+
files=[otel_operator],
65+
transformations=[remove_status_field], # Need to review w/ operator
66+
opts=pulumi.ResourceOptions(depends_on=[ns])
67+
)
68+
69+
# Config Manifests: OTEL components
70+
otel_deployment = otel_deployment_location()
71+
72+
otel_dep = ConfigGroup(
73+
'otel-dep',
74+
files=[otel_deployment],
75+
transformations=[add_namespace, remove_status_field], # Need to review w/ operator
76+
opts=pulumi.ResourceOptions(depends_on=[ns,otel_op])
77+
)
78+
79+
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
## Sample Configurations
2+
This directory contains a number of sample configurations that can be used with the
3+
[OTEL kubernetes operator](https://github.com/open-telemetry/opentelemetry-operator) that is installed as part of the
4+
MARA project.
5+
6+
Each configuration currently uses the `simplest` deployment, which uses an in-memory store for data being processed.
7+
This is obviously not suited to a production deployment, but it is intended to illustrate the steps required to work
8+
with the OTEL deployment.
9+
10+
## Commonality
11+
12+
### Listening Ports
13+
Each of the sample files is configured to listen on the
14+
[OTLP protocol](https://opentelemetry.io/docs/reference/specification/protocol/otlp/). The listen ports configured are:
15+
* grpc on port 9978
16+
* http on port 9979
17+
18+
### Logging
19+
All of the examples log to the container's stdout. However, the basic configuration is configured to only show the
20+
condensed version of the traces being received. In order to see the full traces, you need to set the logging level to
21+
`DEBUG`. The basic-debug object is configured to do this automatically.
22+
23+
## Configurations
24+
### `otel-collector.yaml.basic`
25+
This is the default collector that only listens and logs summary spans to the container's stdout.
26+
27+
### `otel-collector.yaml.basic`
28+
This is a variant of the default collector that will output full spans to the container's stdout.
29+
30+
### `otel-collector.yaml.full`
31+
This is a more complex variant that contains multiple receivers, processors, and exporters. Please see the file for
32+
details.
33+
34+
### `otel-collector.yaml.lightstep`
35+
This configuration file deploys lightstep as an ingester. Please note you will need to have a
36+
[lightstep](https://lightstep.com/) account to use this option, and you will need to add your lightstep access token
37+
to the file in the field noted.
38+
39+
## Usage
40+
By default, the `otel-collector.yaml.basic` configuration is copied into the live `otel-collector.yaml`. The logic for
41+
this project runs all files ending in `.yaml` as part of the configuration so you simply need to either rename your
42+
chosen file to `otel-collector.yaml` or add ensuring only the files you want to use have the `.yaml` extension.
43+
44+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: simplest
5+
namespace: observability
6+
spec:
7+
config: |
8+
receivers:
9+
otlp:
10+
protocols:
11+
grpc:
12+
endpoint: 0.0.0.0:9978
13+
http:
14+
endpoint: 0.0.0.0:9979
15+
16+
processors:
17+
batch:
18+
19+
exporters:
20+
logging:
21+
logLevel:
22+
23+
service:
24+
pipelines:
25+
traces:
26+
receivers: [otlp]
27+
processors: [batch]
28+
exporters: [logging]
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: simplest
5+
namespace: observability
6+
spec:
7+
config: |
8+
receivers:
9+
otlp:
10+
protocols:
11+
grpc:
12+
endpoint: 0.0.0.0:9978
13+
http:
14+
endpoint: 0.0.0.0:9979
15+
16+
processors:
17+
batch:
18+
19+
exporters:
20+
logging:
21+
logLevel:
22+
23+
service:
24+
pipelines:
25+
traces:
26+
receivers: [otlp]
27+
processors: [batch]
28+
exporters: [logging]
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: simplest
5+
namespace: observability
6+
spec:
7+
config: |
8+
receivers:
9+
otlp:
10+
protocols:
11+
grpc:
12+
endpoint: 0.0.0.0:9978
13+
http:
14+
endpoint: 0.0.0.0:9979
15+
16+
processors:
17+
batch:
18+
19+
exporters:
20+
logging:
21+
logLevel: debug
22+
23+
service:
24+
pipelines:
25+
traces:
26+
receivers: [otlp]
27+
processors: [batch]
28+
exporters: [logging]
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: simplest
5+
namespace: observability
6+
spec:
7+
config: |
8+
extensions:
9+
health_check:
10+
pprof:
11+
endpoint: 0.0.0.0:1777
12+
zpages:
13+
endpoint: 0.0.0.0:55679
14+
15+
receivers:
16+
otlp:
17+
protocols:
18+
grpc:
19+
endpoint: 0.0.0.0:9978
20+
http:
21+
endpoint: 0.0.0.0:9979
22+
opencensus:
23+
jaeger:
24+
protocols:
25+
grpc:
26+
thrift_binary:
27+
thrift_compact:
28+
thrift_http:
29+
zipkin:
30+
31+
# Collect own metrics
32+
prometheus:
33+
config:
34+
scrape_configs:
35+
- job_name: 'otel-collector'
36+
scrape_interval: 120s
37+
static_configs:
38+
- targets: [ '0.0.0.0:8080']
39+
metrics_path: '/z/prometheus'
40+
41+
processors:
42+
batch:
43+
44+
exporters:
45+
prometheus:
46+
endpoint: "0.0.0.0:8889"
47+
48+
logging:
49+
logLevel: debug
50+
51+
jaeger:
52+
endpoint: "0.0.0.0:14250"
53+
54+
service:
55+
pipelines:
56+
traces:
57+
receivers: [otlp, opencensus, jaeger, zipkin]
58+
processors: [batch]
59+
exporters: [logging, jaeger]
60+
metrics:
61+
receivers: [otlp, opencensus, prometheus]
62+
processors: [batch]
63+
exporters: [logging]
64+
65+
extensions: [health_check, pprof, zpages]
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: simplest
5+
namespace: observability
6+
spec:
7+
config: |
8+
receivers:
9+
otlp:
10+
protocols:
11+
grpc:
12+
endpoint: 0.0.0.0:9978
13+
http:
14+
endpoint: 0.0.0.0:9979
15+
16+
exporters:
17+
logging:
18+
otlp:
19+
endpoint: ingest.lightstep.com:443
20+
headers:
21+
"lightstep-access-token":"YOURTOKEN"
22+
23+
processors:
24+
batch:
25+
26+
service:
27+
pipelines:
28+
traces:
29+
receivers: [otlp]
30+
processors: [batch]
31+
exporters: [logging, otlp]
32+
metrics:
33+
receivers: [otlp]
34+
processors: [batch]
35+
exporters: [logging, otlp]

0 commit comments

Comments
 (0)