feat: transition from standalone prometheus to kube-prometheus-stack #70

qdzlug · 2021-12-09T21:29:46Z

Proposed changes

This change moves us from using a standalone "ala cart" version of the prometheus services to an integrated prometheus operator based deployment using the prometheus community kube-prometheus-stack.

This update also installs the appropriate service monitors to handle stated (from the gunicorn python apps in the bank of sirius project), postgres/prometheus exporters (in the bank of sirius postgres installs), and the NGINX KIC.

This change also includes an extras script to handle the updates needed to read kube-proxy metrics along with a readme.

Documentation updates are in progress.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have written my commit messages in the Conventional Commits format.
I have read the CONTRIBUTING doc
I have added tests (when possible) that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto master
I will ensure my PR is targeting the master branch and pulling from my branch from my own fork

feat: update test-forwards utility script for prometheus operator use

feat: convert prometheus to kube-prometheus-stack

feat: Update utility script to use new services from prometheus operator

feat: add extras script to fix permissions on kube-proxy metrics

feat: modifications to NGINX IC to allow prometheus service monitor to pull metrics

feat: added service monitor for ledgerdb and accountdb postgres

fix: adjust depends_on for prometheus deployment

dekobon · 2021-12-17T00:27:36Z

pulumi/aws/README.md

+Notes: 
+1. The NGINX IC needs to be configured to expose prometheus metrics; this is currently done by default.
+2. The default address binding of the `kube-proxy` component is set to `127.0.0.1` and as such will cause errors when the 
+canned prometheus scrape configurations are run. The fix is to set this address to `0.0.0.0`. An example manifest


Could this be a security issue?

Based on everything I read, no, because:

It's an internal address that this is exposed on (whatever the cluster addressing is internally)

When the connections are made they are made over TLS using a shared secret, so w/o that secret you're not going to be allowed to connect.

So, I view it as most likely safe - but I'm leaving it as something that everyone can decide for themselves if they want to run or not. I suppose once we get more of an automated process in place we can have this as a 'do you want to run this y/n" prompt.

dekobon · 2021-12-17T00:28:36Z

pulumi/aws/README.md


 ### Grafana

+**NOTE:** This deployment has been deprecated but the project has been left as an example on how to deploy Grafana in this 


Let's just delete and point folks to the git history. We don't want to carry this forward. Thoughts?

I went back and forth on this. Part of me wanted to delete it, but then another part started down the "well, what if the user wants to swap out prometheus for something else and still wants grafana?"

If we go to a modular approach where the user runs a script and answers prompts as to what they want / don't want, I feel that just keeping it in place (preferably with a few tests around it to make sure it works) would be fine - since I'm pulling from the mainline grafana builds, we could just manage it like the other dependencies.

That said, I'm not married to this idea - so let me know what you think in light of that.

I say, let's delete it. It will always be in the source history and we can always come back and add it again after we have better support for multiple options.

Deleted in last commit.

dekobon · 2021-12-17T00:30:16Z

Any ideas why the build is failing?

qdzlug · 2021-12-17T15:20:34Z

re: why is the build failling.

I have no idea; I've been digging into it and we keep hitting this:

Run ./setup_venv.sh
  ./setup_venv.sh
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.10.1/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.1/x64/lib
    VIRTUAL_ENV: /home/runner/.virtualenvs/.venv
./setup_venv.sh: /home/runner/.virtualenvs/.venv/bin/pip3: /home/runner/.virtualenvs/.venv/bin/python: bad interpreter: No such file or directory
Error: Process completed with exit code 126.

Nothing has changed w/ this code as far as I know...

feat: remove grafana standalone in favor of prometheus kube stack

chore: upgrade pulumi version

qdzlug · 2021-12-20T16:20:52Z

Note that the issue with the tests was corrected by updating the requirements.txt to a new version of pulumi; pretty sure it's not a matter of what was upgraded but more a matter of the fact that we upgraded it.

Jason Schmidt added 8 commits December 9, 2021 14:18

feat: comment out standalone grafana in start/stop scripts

3b9b97d

feat: update test-forwards utility script for prometheus operator use

269efa1

feat: convert prometheus to kube-prometheus-stack

39ce4f5

feat: Update utility script to use new services from prometheus operator

178f8cf

feat: add extras script to fix permissions on kube-proxy metrics

63c4837

feat: modifications to NGINX IC to allow prometheus service monitor to pull metrics

263aed6

feat: added service monitor for ledgerdb and accountdb postgres

feat: update README to reflect current configuration

ca5d32a

feat: update documentation to clarify Grafana Password

dcc9617

qdzlug marked this pull request as ready for review December 16, 2021 23:42

qdzlug requested a review from dekobon December 16, 2021 23:42

qdzlug linked an issue Dec 16, 2021 that may be closed by this pull request

feat: convert from standalone prometheus to prometheus-kube-stack #75

Closed

2cf217d

fix: adjust depends_on for prometheus deployment

dekobon approved these changes Dec 17, 2021

View reviewed changes

Jason Schmidt added 2 commits December 17, 2021 10:50

646977f

feat: remove grafana standalone in favor of prometheus kube stack

6686f48

chore: upgrade pulumi version

qdzlug merged commit fd4d25c into nginxinc:master Dec 20, 2021

qdzlug deleted the prom-operator branch December 20, 2021 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: transition from standalone prometheus to kube-prometheus-stack #70

feat: transition from standalone prometheus to kube-prometheus-stack #70

Uh oh!

qdzlug commented Dec 9, 2021

Uh oh!

dekobon Dec 17, 2021

Uh oh!

qdzlug Dec 17, 2021

Uh oh!

dekobon Dec 17, 2021

Uh oh!

qdzlug Dec 17, 2021

Uh oh!

dekobon Dec 17, 2021

Uh oh!

qdzlug Dec 17, 2021

Uh oh!

dekobon commented Dec 17, 2021

Uh oh!

qdzlug commented Dec 17, 2021

Uh oh!

qdzlug commented Dec 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Grafana

		NOTE: This deployment has been deprecated but the project has been left as an example on how to deploy Grafana in this

feat: transition from standalone prometheus to kube-prometheus-stack #70

feat: transition from standalone prometheus to kube-prometheus-stack #70

Uh oh!

Conversation

qdzlug commented Dec 9, 2021

Proposed changes

Checklist

Uh oh!

dekobon Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

qdzlug Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

dekobon Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

qdzlug Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

dekobon Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

qdzlug Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

dekobon commented Dec 17, 2021

Uh oh!

qdzlug commented Dec 17, 2021

Uh oh!

qdzlug commented Dec 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants