Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Conversation

@qdzlug
Copy link
Contributor

@qdzlug qdzlug commented Dec 9, 2021

Proposed changes

This change moves us from using a standalone "ala cart" version of the prometheus services to an integrated prometheus operator based deployment using the prometheus community kube-prometheus-stack.

This update also installs the appropriate service monitors to handle stated (from the gunicorn python apps in the bank of sirius project), postgres/prometheus exporters (in the bank of sirius postgres installs), and the NGINX KIC.

This change also includes an extras script to handle the updates needed to read kube-proxy metrics along with a readme.

Documentation updates are in progress.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have written my commit messages in the Conventional Commits format.
  • I have read the CONTRIBUTING doc
  • I have added tests (when possible) that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto master
  • I will ensure my PR is targeting the master branch and pulling from my branch from my own fork

Jason Schmidt added 8 commits December 9, 2021 14:18
feat: update test-forwards utility script for prometheus operator use
feat: convert prometheus to kube-prometheus-stack
feat: Update utility script to use new services from prometheus operator
feat: add extras script to fix permissions on kube-proxy metrics
feat: modifications to NGINX IC to allow prometheus service monitor to pull metrics
feat: added service monitor for ledgerdb and accountdb postgres
@qdzlug qdzlug marked this pull request as ready for review December 16, 2021 23:42
@qdzlug qdzlug requested a review from dekobon December 16, 2021 23:42
@qdzlug qdzlug linked an issue Dec 16, 2021 that may be closed by this pull request
fix: adjust depends_on for prometheus deployment
Notes:
1. The NGINX IC needs to be configured to expose prometheus metrics; this is currently done by default.
2. The default address binding of the `kube-proxy` component is set to `127.0.0.1` and as such will cause errors when the
canned prometheus scrape configurations are run. The fix is to set this address to `0.0.0.0`. An example manifest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a security issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on everything I read, no, because:

  1. It's an internal address that this is exposed on (whatever the cluster addressing is internally)
  2. When the connections are made they are made over TLS using a shared secret, so w/o that secret you're not going to be allowed to connect.

So, I view it as most likely safe - but I'm leaving it as something that everyone can decide for themselves if they want to run or not. I suppose once we get more of an automated process in place we can have this as a 'do you want to run this y/n" prompt.

### Grafana
**NOTE:** This deployment has been deprecated but the project has been left as an example on how to deploy Grafana in this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just delete and point folks to the git history. We don't want to carry this forward. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. Part of me wanted to delete it, but then another part started down the "well, what if the user wants to swap out prometheus for something else and still wants grafana?"

If we go to a modular approach where the user runs a script and answers prompts as to what they want / don't want, I feel that just keeping it in place (preferably with a few tests around it to make sure it works) would be fine - since I'm pulling from the mainline grafana builds, we could just manage it like the other dependencies.

That said, I'm not married to this idea - so let me know what you think in light of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say, let's delete it. It will always be in the source history and we can always come back and add it again after we have better support for multiple options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted in last commit.

@dekobon
Copy link
Collaborator

dekobon commented Dec 17, 2021

Any ideas why the build is failing?

@qdzlug
Copy link
Contributor Author

qdzlug commented Dec 17, 2021

re: why is the build failling.

I have no idea; I've been digging into it and we keep hitting this:

Run ./setup_venv.sh
  ./setup_venv.sh
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.10.1/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.1/x64/lib
    VIRTUAL_ENV: /home/runner/.virtualenvs/.venv
./setup_venv.sh: /home/runner/.virtualenvs/.venv/bin/pip3: /home/runner/.virtualenvs/.venv/bin/python: bad interpreter: No such file or directory
Error: Process completed with exit code 126.

Nothing has changed w/ this code as far as I know...

Jason Schmidt added 2 commits December 17, 2021 10:50
feat: remove grafana standalone in favor of prometheus kube stack
chore: upgrade pulumi version
@qdzlug qdzlug merged commit fd4d25c into nginxinc:master Dec 20, 2021
@qdzlug qdzlug deleted the prom-operator branch December 20, 2021 16:20
@qdzlug
Copy link
Contributor Author

qdzlug commented Dec 20, 2021

Note that the issue with the tests was corrected by updating the requirements.txt to a new version of pulumi; pretty sure it's not a matter of what was upgraded but more a matter of the fact that we upgraded it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: convert from standalone prometheus to prometheus-kube-stack

2 participants