Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
f0d8cf0
wip
mrnicegyu11 Sep 19, 2024
e906b41
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
14c751d
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
293f63c
Add csi-s3 and have portainer use it
mrnicegyu11 Oct 24, 2024
f7f72ec
Change request @hrytsuk 1GB max portainer volume size
mrnicegyu11 Oct 25, 2024
94cfb76
t push
mrnicegyu11 Oct 28, 2024
509c717
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Oct 29, 2024
1a65ecf
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 13, 2024
77ee45e
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 25, 2024
c9c70d6
Arch Linux Certificates Customization
mrnicegyu11 Dec 3, 2024
7b8be53
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 5, 2024
bcd61cd
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 12, 2024
58e1030
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 13, 2024
ed8d479
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jan 10, 2025
dda6e01
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Feb 4, 2025
f6f4f36
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Feb 25, 2025
5dca5c3
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 13, 2025
4a653ef
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 20, 2025
3a21f0f
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Mar 28, 2025
48fbbca
Fix pgsql exporter failure
mrnicegyu11 Apr 24, 2025
08c57db
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 May 6, 2025
5ecbfec
[Kubernetes] Introduce on-prem persistent Storage (Longhorn) :tada: …
YuryHrytsuk May 6, 2025
3ea41b5
Experimental: Try to add tracing to simcore-traefik on master
mrnicegyu11 May 9, 2025
29f2f2e
Fixes https://github.com/ITISFoundation/osparc-simcore/issues/7363
mrnicegyu11 May 14, 2025
cdef57f
Merge branch 'ITISFoundation:main' into main
mrnicegyu11 May 21, 2025
c0f393e
t push
mrnicegyu11 May 23, 2025
34a86fd
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jul 2, 2025
df3f5df
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jul 3, 2025
ac44663
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jul 8, 2025
4100b87
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Jul 21, 2025
759f657
Merge branch 'ITISFoundation:main' into main
mrnicegyu11 Jul 21, 2025
d60fd0c
t checkout -b 2025/redactoMerge remote-tracking branch 'upstream/main'
mrnicegyu11 Jul 25, 2025
a1e36c7
Merge branch 'ITISFoundation:main' into main
mrnicegyu11 Jul 30, 2025
b856eb0
Arch Linux Certificates Customization - 2
mrnicegyu11 Jul 30, 2025
81ce9fb
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Aug 7, 2025
0e32699
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Aug 19, 2025
a5b9950
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Aug 25, 2025
70695e2
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Aug 27, 2025
786a5d9
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Sep 10, 2025
038be52
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Sep 10, 2025
5e1f220
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Sep 23, 2025
5448087
Revert: disable loki & vector-dev, oldschool graylog logging (#1223)
mrnicegyu11 Sep 25, 2025
534f6f4
Enable Chatbot for S4L products (#1221)
mrnicegyu11 Sep 25, 2025
1e15c94
Kubernetes: fix global network policy (#1227)
YuryHrytsuk Oct 2, 2025
6571bb8
Add authentication middleware to cahtbot vendor service
mrnicegyu11 Oct 3, 2025
c05f58c
Revert "Kubernetes: fix global network policy (#1227)"
mrnicegyu11 Oct 6, 2025
9a8113b
Add ACME DNS Resolver for gitlabCD and k8s (#1217)
mrnicegyu11 Oct 7, 2025
acf8518
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Oct 15, 2025
62f4547
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Oct 27, 2025
6cb9761
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 4, 2025
dc8fbb1
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 26, 2025
e5b6414
Add prometheus metric scraping for: vector, loki, tempo, grafana, jaeger
mrnicegyu11 Nov 26, 2025
0261014
fix
mrnicegyu11 Nov 26, 2025
e268b8d
add sink for vector prom metrics
mrnicegyu11 Nov 26, 2025
3add80f
Make scrape interval and scrape timeout global and configurable
mrnicegyu11 Nov 27, 2025
9407426
Fix format
mrnicegyu11 Nov 27, 2025
0decbe0
fix
mrnicegyu11 Nov 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions scripts/purge-docker-registry/docker-registry-curl.bash
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ main() {
console "${REGISTRY_HOST}"
console "${WWW_AUTHENTICATE}"

if [ "x${WWW_AUTHENTICATE}" != "x" ];then
if [ "${WWW_AUTHENTICATE}" != "" ];then
# we need to get a token
DOCKER_AUTH_TYPE=$(echo "${WWW_AUTHENTICATE}" | cut --delimiter=" " --fields=1)
DETAILS=$(echo "${WWW_AUTHENTICATE}" | cut --delimiter=" " --fields=2-)
Expand All @@ -42,7 +42,7 @@ main() {
SCOPE=$(echo "${DETAILS}" | cut --delimiter=',' --fields=3 | cut --delimiter="=" --fields=2 | tr --delete '"')
if [ -v DOCKER_AUTH ];then
:
elif [[ "x${DOCKER_USERNAME}" != "x" && "x${DOCKER_PASSWORD}" != "x" ]];then
elif [[ "${DOCKER_USERNAME}" != "" && "${DOCKER_PASSWORD}" != "" ]];then
DOCKER_AUTH="${DOCKER_USERNAME}:${DOCKER_PASSWORD}"
elif [ -e ~/.docker/config.json ];then
DOCKER_AUTH=$(jq -r ".[\"auths\"][\"${REGISTRY_HOST}\"][\"auth\"]" ~/.docker/config.json | base64 -d)
Expand Down
3 changes: 3 additions & 0 deletions services/jaeger/docker-compose.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ services:
command:
- "--config=/etc/otel/config.yaml"
deploy:
labels:
- prometheus-job=otel-collector
- prometheus-port=8888
placement:
constraints:
- node.labels.ops==true
Expand Down
8 changes: 8 additions & 0 deletions services/jaeger/opentelemetry-collector-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@ service:
exporters: [otlphttp,otlp]
processors: [batch,filter/drop_healthcheck]
telemetry:
metrics:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a link with this config excerpt documentation for future.

it looks like it exports metrics to prometheus (aka sends them directly) but the URL is looks more like it exposes metrics 🤔

Looks like this link https://opentelemetry.io/docs/collector/internal-telemetry/#prometheus-endpoint-for-internal-metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the right link but there is no concise paragraph about it, the information is scattered on this page

readers:
- pull:
exporter:
prometheus:
host: '0.0.0.0'
port: 8888

logs:
level: ${TRACING_OPENTELEMETRY_COLLECTOR_SERVICE_TELEMETRY_LOG_LEVEL}
processors:
Expand Down
8 changes: 7 additions & 1 deletion services/logging/docker-compose.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -118,18 +118,21 @@ services:
- VECTOR_CONFIG=/etc/vector/vector.yaml
- VECTOR_LOG=info
- VECTOR_LOG_DESTINATION=${VECTOR_LOG_DESTINATION}
- PROMETHEUS_SCRAPE_INTERVAL=${PROMETHEUS_SCRAPE_INTERVAL}
configs:
- source: vector_config
target: /etc/vector/vector.yaml
deploy:
replicas: 1
labels:
- prometheus-job=vector
- prometheus-port=9598
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
memory: 256M
labels: []
networks:
logging:

Expand All @@ -153,6 +156,9 @@ services:
- S3_ENDPOINT_LOKI=${S3_ENDPOINT_LOKI}
- LOKI_RETENTION_PERIOD=${LOKI_RETENTION_PERIOD}
deploy:
labels:
- prometheus-job=loki
- prometheus-port=3100
placement:
constraints: []
replicas: 1
Expand Down
1 change: 1 addition & 0 deletions services/logging/template.env
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ S3_REGION_LOKI=${S3_REGION_LOKI}
S3_SECRET_KEY_LOKI=${S3_SECRET_KEY_LOKI}
STORAGE_DOMAIN=${STORAGE_DOMAIN}
VECTOR_LOG_DESTINATION=${VECTOR_LOG_DESTINATION}
PROMETHEUS_SCRAPE_INTERVAL=${PROMETHEUS_SCRAPE_INTERVAL}
9 changes: 8 additions & 1 deletion services/logging/vector.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

sources:
# Receive GELF messages from Docker containers via UDP
vector_metrics:
type: internal_metrics
scrape_interval_secs: ${PROMETHEUS_SCRAPE_INTERVAL}
docker_gelf:
type: socket
address: "0.0.0.0:12201"
Expand Down Expand Up @@ -115,7 +118,11 @@ sinks:

healthcheck:
enabled: true

prometheus_exporter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, documentation that makes clear why this is necessary.

https://vector.dev/docs/administration/monitoring/#metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add this thx

type: prometheus_exporter
inputs:
- vector_metrics
address: "0.0.0.0:9598"
# Send to Graylog via GELF over TCP
graylog:
type: socket
Expand Down
10 changes: 10 additions & 0 deletions services/monitoring/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,16 @@ config.prometheus.simcore: ${REPO_CONFIG_LOCATION} venv
envsubst < prometheus/prometheus.yml > prometheus/prometheus.temp.yml; \
mv prometheus/prometheus.temp.yml prometheus/prometheus.yml

.PHONY: config.prometheus.federation
config.prometheus.federation: ${REPO_CONFIG_LOCATION} venv
@set -o allexport; \
source $(REPO_CONFIG_LOCATION); \
set +o allexport; \
envsubst < prometheus/prometheus-federation.template.yml > prometheus/prometheus-federation.yml

.PHONY: prometheus/prometheus-federation.yml
prometheus/prometheus-federation.yml: config.prometheus.federation

.PHONY: config.prometheus.simcore.aws
config.prometheus.simcore.aws: ${REPO_CONFIG_LOCATION} venv
@set -o allexport; \
Expand Down
7 changes: 4 additions & 3 deletions services/monitoring/docker-compose.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -230,12 +230,11 @@ services:
- monitored # needed to access postgres
- public
deploy:
#restart_policy:
# condition: on-failure
labels:
- prometheus-job=grafana
- prometheus-port=3000
- traefik.enable=true
- traefik.swarm.network=${PUBLIC_NETWORK}
# direct access through port
- traefik.http.services.grafana.loadbalancer.server.port=3000
- traefik.http.routers.grafana.rule=Host(`${MONITORING_DOMAIN}`) && PathPrefix(`/grafana`)
- traefik.http.routers.grafana.entrypoints=https
Expand Down Expand Up @@ -391,6 +390,8 @@ services:
- monitored
deploy:
labels:
- prometheus-job=tempo
- prometheus-port=3200
- traefik.enable=true
- traefik.swarm.network=${PUBLIC_NETWORK}
- traefik.http.services.tempo.loadbalancer.server.port=9095
Expand Down
1 change: 1 addition & 0 deletions services/monitoring/prometheus/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
prometheus-ceph.yml
prometheus.yml
prometheus-federation.yml
6 changes: 3 additions & 3 deletions services/monitoring/prometheus/prometheus-base.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# global config
# DOLLAR SIGNS NEED TO BE EXCAPED (see https://stackoverflow.com/a/61259844/10198629)
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, scrape targets every 15 seconds.
# scrape_timeout global default would be (10s).
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
Copy link
Collaborator

@YuryHrytsuk YuryHrytsuk Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROMETHEUS_SCRAPE_INTERVAL_SECONDS would clearly define purpose and units

evaluation_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s # By default, scrape targets every 15 seconds.
scrape_timeout: ${PROMETHEUS_SCRAPE_TIMEOUT}s

# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
Expand Down
4 changes: 2 additions & 2 deletions services/monitoring/prometheus/prometheus-ceph.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ scrape_configs:
- job_name: ceph-production
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 30s
scrape_interval: {{PROMETHEUS_SCRAPE_INTERVAL}}s
scrape_timeout: {{PROMETHEUS_SCRAPE_TIMEOUT}}s
metrics_path: /metrics
scheme: http
static_configs:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
global:
scrape_interval: 29s # Set the scrape interval to every 29 seconds. Default is every 1 minute.
evaluation_interval: 29s # Evaluate rules every 29 seconds. The default is every 1 minute.
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
evaluation_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
scrape_timeout: ${PROMETHEUS_SCRAPE_TIMEOUT}s

scrape_configs:
- job_name: 'federate' # A job defines a series of targets and parameters describing how to scrape them.
scrape_interval: 29s # Overwrite the global scrape interval for this job, set to every 29 seconds.
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s # Overwrite the global scrape interval for this job,
scrape_timeout: ${PROMETHEUS_SCRAPE_TIMEOUT}s # Overwrite the global scrape timeout for this job.
honor_labels: true # Do not overwrite labels in scraped data.
scheme: http
metrics_path: '/federate' # Path to fetch the metrics from, '/federate' is for federation.
Expand Down
3 changes: 2 additions & 1 deletion services/monitoring/prometheus/prometheus-simcore.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
scrape_configs:
# SIMCORE -------------------------------------------------------------------
- job_name: "simcore"
scrape_interval: 15s
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
scrape_timeout: ${PROMETHEUS_SCRAPE_TIMEOUT}s
relabel_configs:
- source_labels: [__meta_dns_name]
separator: ;
Expand Down
2 changes: 2 additions & 0 deletions services/monitoring/template.env
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ MONITORED_NETWORK=${MONITORED_NETWORK}
TEMPO_S3_BUCKET=${TEMPO_S3_BUCKET}
STORAGE_DOMAIN=${STORAGE_DOMAIN}
S3_REGION=${S3_REGION}
PROMETHEUS_SCRAPE_INTERVAL=${PROMETHEUS_SCRAPE_INTERVAL}
PROMETHEUS_SCRAPE_TIMEOUT=${PROMETHEUS_SCRAPE_TIMEOUT}
S3_ACCESS_KEY=${S3_ACCESS_KEY}
S3_SECRET_KEY=${S3_SECRET_KEY}
TF_VAR_PROMETHEUS_CATCHALL_URL=${TF_VAR_PROMETHEUS_CATCHALL_URL}
3 changes: 3 additions & 0 deletions services/monitoring/tempo_config.yaml.j2
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
server:
http_listen_address: 0.0.0.0
http_listen_port: 3200

distributor:
Expand Down Expand Up @@ -70,3 +71,5 @@ overrides:
rate_limit_bytes: 30000000
burst_size_bytes: 40000000
max_traces_per_user: 10000
usage_report:
reporting_enabled: false
Loading