Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 39 additions & 10 deletions pulumi/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@ vpc - defines and installs the VPC and subnets to use with EKS
└─logagent - deploys a logging agent (filebeat) to the EKS cluster
└─certmgr - deploys the open source cert-manager.io helm chart to the EKS cluster
└─prometheus - deploys prometheus server, node exporter, and statsd collector for metrics
└─grafana - deploys the grafana visualization platform
└─observability - deploys the OTEL operator and instantiates a simple collector
└─sirius - deploys the Bank of Sirus application to the EKS cluster
└─observability - deploys the OTEL operator and instantiates a simple collector
└─sirius - deploys the Bank of Sirus application to the EKS cluster

```

Expand Down Expand Up @@ -146,15 +145,40 @@ deployment.
### Prometheus

Prometheus is deployed and configured to enable the collection of metrics for all components that have
properties `prometheus.io:scrape: true` set in the annotations
(along with any other connection information). This includes the prometheus `node-exporter`
daemonset which is deployed in this step as well.
a defined service monitor. At installation time, the deployment will instantiate:
- Node Exporters
- Kubernetes Service Monitors
- Grafana preloaded with dashboards and datasources for Kubernetes management
- The NGINX Ingress Controller
- Statsd receiver

The former behavior of using the `prometheus.io:scrape: true` property set in the annotations
indicating pods where metrics should be scraped has been deprecated, and these annotations will
be removed in the near future.

Also, the standalone Grafana deployment has been removed from the standard deployment scripts, but has been left as
a project in the event someone wishes to run this standalone.

Finally, this namespace will hold service monitors created by other projects, for example the Bank of Sirius
deployment currently deploys a service monitor for each of the postgres monitors that are deployed.

Notes:
1. The NGINX IC needs to be configured to expose prometheus metrics; this is currently done by default.
2. The default address binding of the `kube-proxy` component is set to `127.0.0.1` and as such will cause errors when the
canned prometheus scrape configurations are run. The fix is to set this address to `0.0.0.0`. An example manifest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a security issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on everything I read, no, because:

  1. It's an internal address that this is exposed on (whatever the cluster addressing is internally)
  2. When the connections are made they are made over TLS using a shared secret, so w/o that secret you're not going to be allowed to connect.

So, I view it as most likely safe - but I'm leaving it as something that everyone can decide for themselves if they want to run or not. I suppose once we get more of an automated process in place we can have this as a 'do you want to run this y/n" prompt.

has been provided in [prometheus/extras](./prometheus/extras) that can be applied against your installation with
`kubectl apply -f ./filename`. Please only apply this change once you have verified that it will work with your
version of Kubernetes.
3. The _grafana_ namespace has been maintained in the conifugration file to be used by the prometheus operator deployed
version of Grafana. This version only accepts a password; you can still specify a username for the admin account but it
will be silently ignored.

This also pulls data from the NGINX KIC, provided the KIC is configured to allow prometheus access (which is enabled by
default).

### Grafana

**NOTE:** This deployment has been deprecated but the project has been left as an example on how to deploy Grafana in this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just delete and point folks to the git history. We don't want to carry this forward. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. Part of me wanted to delete it, but then another part started down the "well, what if the user wants to swap out prometheus for something else and still wants grafana?"

If we go to a modular approach where the user runs a script and answers prompts as to what they want / don't want, I feel that just keeping it in place (preferably with a few tests around it to make sure it works) would be fine - since I'm pulling from the mainline grafana builds, we could just manage it like the other dependencies.

That said, I'm not married to this idea - so let me know what you think in light of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say, let's delete it. It will always be in the source history and we can always come back and add it again after we have better support for multiple options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted in last commit.

architecture.

Grafana is deployed and configured with a connection to the prometheus datasource installed above. At the time of this
writing, the NGINX Plus KIC dashboard is installed as part of the initial setup. Additional datasources and dashboards
can be added by the user either in the code, or via the standard Grafana tooling.
Expand Down Expand Up @@ -188,7 +212,10 @@ As part of the Bank of Sirius deployment, we deploy a cluster-wide
[self-signed](https://cert-manager.io/docs/configuration/selfsigned/)
issuer using the cert-manager deployed above. This is then used by the Ingress object created to enable TLS access to
the application. Note that this Issuer can be changed out by the user, for example to use the
[ACME](https://cert-manager.io/docs/configuration/acme/) issuer.
[ACME](https://cert-manager.io/docs/configuration/acme/) issuer. The use of the ACME issuer has been tested and works
without issues, provided the FQDN meets the length requirements. As of this writing the AWS ELB hostname is too long
to work with the ACME server. Additional work in this area will be undertaken to provide dynamic DNS record creation
as part of this process so legitimate certificates can be issued.

In order to provide visibility into the Postgres databases that are running as part of the application, the Prometheus
Postgres data exporter will be deployed into the same namespace as the application and will be configured to be scraped
Expand All @@ -204,4 +231,6 @@ provides better tools for hierarchical configuration files.

In order to help enable simple load testing, a script has been provided that uses the
`kubectl` command to port-forward monitoring and management connections to the local workstation. This command
is [`test-foward.sh`](./extras/test-forward.sh) and is located in the [`extras`](./extras) directory.
is [`test-foward.sh`](./extras/test-forward.sh) and is located in the [`extras`](./extras) directory.

**NOTE:** This script has been modified to use the new Prometheus Operator based deployment.
2 changes: 1 addition & 1 deletion pulumi/aws/destroy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ if command -v aws > /dev/null; then
validate_aws_credentials
fi

k8s_projects=(sirius observability grafana prometheus certmgr logagent logstore kic-helm-chart)
k8s_projects=(sirius observability prometheus certmgr logagent logstore kic-helm-chart)

# Test to see if EKS has been destroy AND there are still Kubernetes resources
# that are being managed by Pulumi. If so, we have to destroy the stack for
Expand Down
4 changes: 2 additions & 2 deletions pulumi/aws/extras/scripts/test-forward.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ kubectl port-forward service/elastic-kibana --namespace logstore 5601:5601 &
echo $! > $PID01

## Grafana Tunnel
kubectl port-forward service/grafana --namespace grafana 3000:80 &
kubectl port-forward service/prometheus-grafana --namespace prometheus 3000:80 &
echo $! > $PID02

## Loadgenerator Tunnel
kubectl port-forward service/loadgenerator --namespace bos 8089:8089 &
echo $! > $PID03

## Prometheus Tunnel
kubectl port-forward service/prometheus-server --namespace prometheus 9090:80 &
kubectl port-forward service/prometheus-kube-prometheus-prometheus --namespace prometheus 9090:9090 &
echo $! > $PID04

## Elasticsearch Tunnel
Expand Down
30 changes: 27 additions & 3 deletions pulumi/aws/kic-helm-chart/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,25 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
'service': {
'annotations': {
'co.elastic.logs/module': 'nginx'
}
},
"extraLabels": {
"app": "kic-nginx-ingress"
},
"customPorts": [
{
"name": "dashboard",
"targetPort": 8080,
"protocol": "TCP",
"port": 8080
},
{
"name": "prometheus",
"targetPort": 9113,
"protocol": "TCP",
"port": 9113
}
]

},
'pod': {
'annotations': {
Expand All @@ -62,7 +80,10 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
'prometheus': {
'create': True,
'port': 9113
}
},
"opentracing-tracer": "/usr/local/lib/libjaegertracing_plugin.so",
"opentracing-tracer-config": "{\n \"service_name\": \"nginx-ingress\",\n \"propagation_format\": \"w3c\",\n \"sampler\": {\n \"type\": \"const\",\n \"param\": 1\n },\n \"reporter\": {\n \"localAgentHostPort\": \"simplest-collector.observability.svc.cluster.local:9978\"\n }\n} \n",
"opentracing": True
}

has_image_tag = 'image_tag' in repository or 'image_tag_alias' in repository
Expand Down Expand Up @@ -109,7 +130,10 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
kubeconfig=kubeconfig)

ns = k8s.core.v1.Namespace(resource_name='nginx-ingress',
metadata={'name': 'nginx-ingress'},
metadata={'name': 'nginx-ingress',
'labels': {
'prometheus': 'scrape' }
},
opts=pulumi.ResourceOptions(provider=k8s_provider))

chart_values = ecr_repository.apply(build_chart_values)
Expand Down
111 changes: 99 additions & 12 deletions pulumi/aws/prometheus/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import pulumi_kubernetes as k8s
from pulumi_kubernetes.helm.v3 import Release, ReleaseArgs, RepositoryOptsArgs
from pulumi import Output
from pulumi_kubernetes.yaml import ConfigFile
from pulumi_kubernetes.yaml import ConfigGroup

from kic_util import pulumi_config

Expand All @@ -14,6 +16,12 @@ def project_name_from_project_dir(dirname: str):
return pulumi_config.get_pulumi_project_name(project_path)


def servicemon_manifests_location():
script_dir = os.path.dirname(os.path.abspath(__file__))
servicemon_manifests_path = os.path.join(script_dir, 'manifests', '*.yaml')
return servicemon_manifests_path


stack_name = pulumi.get_stack()
project_name = pulumi.get_project()
pulumi_user = pulumi_config.get_pulumi_user()
Expand All @@ -33,7 +41,7 @@ def project_name_from_project_dir(dirname: str):
config = pulumi.Config('prometheus')
chart_name = config.get('chart_name')
if not chart_name:
chart_name = 'prometheus'
chart_name = 'kube-prometheus-stack'
chart_version = config.get('chart_version')
if not chart_version:
chart_version = '14.6.0'
Expand All @@ -44,6 +52,12 @@ def project_name_from_project_dir(dirname: str):
if not helm_repo_url:
helm_repo_url = 'https://prometheus-community.github.io/helm-charts'

grafana_config = pulumi.Config('grafana')
# Require an admin password, but do not encrypt it due to the
# issues we experienced with Anthos; this can be adjusted at the
# same time that we fix the Anthos issues.
adminpass = grafana_config.require('adminpass')

prometheus_release_args = ReleaseArgs(
chart=chart_name,
repository_opts=RepositoryOptsArgs(
Expand All @@ -54,6 +68,75 @@ def project_name_from_project_dir(dirname: str):

# Values from Chart's parameters specified hierarchically,
values={
"prometheus": {
"serviceAccount": {
"create": True,
"name": "prometheus",
"annotations": {}
},
"prometheusSpec": {
"podMonitorSelectorNilUsesHelmValues": False,
"serviceMonitorSelectorNilUsesHelmValues": False,
"serviceMonitorSelector": {},
"serviceMonitorNamespaceSelector ": {
"matchLabels": {
"prometheus": True
}
},
"storageSpec": {
"volumeClaimTemplate": {
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "5Gi"
}
}
}
}
}
}
},
"grafana": {
"serviceAccount": {
"create": False,
"name": "prometheus",
"annotations": {}
},
"adminPassword": adminpass,
"persistence": {
"enabled": True,
"accessModes": [
"ReadWriteOnce"
],
"size": "5Gi"
}
},
"alertmanager": {
"serviceAccount": {
"create": False,
"name": "prometheus",
"annotations": {}
},
"alertmanagerSpec": {
"storage": {
"volumeClaimTemplate": {
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "5Gi"
}
}
}
}
}
}
}
},
# By default Release resource will wait till all created resources
# are available. Set this to true to skip waiting on resources being
Expand All @@ -67,15 +150,22 @@ def project_name_from_project_dir(dirname: str):
# Force update if required
force_update=True)

prometheus_release = Release("prometheus", args=prometheus_release_args)
prometheus_release = Release("prometheus", args=prometheus_release_args, opts=pulumi.ResourceOptions(depends_on=[ns]))

prom_status = prometheus_release.status

servicemon_manifests = servicemon_manifests_location()

servicemon = ConfigGroup(
'servicemon',
files=[servicemon_manifests],
opts=pulumi.ResourceOptions(depends_on=[ns, prometheus_release])
)

#
# Deploy the statsd collector
#


config = pulumi.Config('prometheus')
statsd_chart_name = config.get('statsd_chart_name')
if not statsd_chart_name:
Expand All @@ -100,18 +190,14 @@ def project_name_from_project_dir(dirname: str):

# Values from Chart's parameters specified hierarchically,
values={
"serviceMonitor": {
"enabled": True,
"namespace": "prometheus"
},
"serviceAccount": {
"create": True,
"annotations": {},
"name": ""
},
"podAnnotations": {
"prometheus.io/scrape": "true",
"prometheus.io/port": "9102"
},
"annotations": {
"prometheus.io/scrape": "true",
"prometheus.io/port": "9102"
}
},
# By default Release resource will wait till all created resources
Expand All @@ -127,7 +213,8 @@ def project_name_from_project_dir(dirname: str):
# Force update if required
force_update=True)

statsd_release = Release("statsd", args=statsd_release_args)
statsd_release = Release("statsd", args=statsd_release_args,
opts=pulumi.ResourceOptions(depends_on=[ns, prometheus_release]))

statsd_status = statsd_release.status

Expand Down
13 changes: 13 additions & 0 deletions pulumi/aws/prometheus/extras/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Purpose
This directory contains a manifest that can be used to change the metrics bind port
for the kube-proxy from 127.0.0.1 to 0.0.0.0 in order to allow the metrics to be scraped
by the prometheus service.

This is not being automatically applied, since it is changing the bind address that is
being used for the metrics port. That said, this should be secure since it's internal
to the installation and the connection is done via HTTPS.

However, please see this
[github issue](https://github.com/prometheus-community/helm-charts/issues/977)
for the full discussion of why this is required.

Loading