[BUG][Opensearch] Cannot override startupProbe/incorrect check in starupProbe

**Describe the bug**
Cannot override startupProbe when using exec instead of tcpSocket
 
**To Reproduce**
Steps to reproduce the behavior:
1. set startupProbe to:
```yaml
startupProbe:
  exec:
    command:
      - sh
      - -c
      - "exit 0" # this is just for reproducing the error
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 30
 
```
2. helm install to k8s or in our case by using fluxcd
3. Check events: validation error because two kind of probes
4. Check yaml of statefulset: startupProbe has exec.command and tcpProbe 
5. Similar error when explicitly setting tcpProbe to {}, null and false
6. Similar error when setting tcpProbe.port to null, only that port has value 0
 
If port is set under tcpProbe in values.yaml, my understanding is, that helm merges it with the supplied values, hence tcpProbe will always be set.
 
**Chart Name**
Opensearch, Opensearch-dashboard and data-prepper (but here an [open PR exists](https://github.com/opensearch-project/helm-charts/pull/691/files))
 
**Screenshots**
If applicable, add screenshots to help explain your problem.
 
**Host/Environment (please complete the following information):**
- Helm Version: 3.19.0
- Kubernetes Version: 1.31
 
**Additional context**
Easy fix: Remove startupProbe from values.yaml, because just checking if API is available is imo not enough. It is a breaking change, but using the current default will lead to bigger problems for the users.
 
Example: The [opensearch documentation for rolling upgrades](https://docs.opensearch.org/latest/migrate-or-upgrade/rolling-upgrade/) asks in step 11:
 
> Confirm that the cluster is healthy

Only checking if we get an answer from port 9200 is not telling us anything about the cluster health.
With `maxUnavailable: 1`and this startupProbe/readinessProbe combination could restart the next node before the just restarted node is actually ready.
 
This [issue](https://github.com/opensearch-project/helm-charts/issues/402) touches the problem.
But I think doing extensive checks in the readinessProbe could lead to cascading failure (cluster is yellow and everything gets restarted).
The startup probe is the perfect place to do it.
Hence we need a freely configurable startupProbe to do more extensive checks, to ensure that a restarted node is actually ready.
I also think there are no good default one-fits-all values for startup, because speed of a restart depends completely on the amount of data and how it is sharded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG][Opensearch] Cannot override startupProbe/incorrect check in starupProbe #703

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG][Opensearch] Cannot override startupProbe/incorrect check in starupProbe #703

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions