Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/recommendations.md

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add some guidance about when to use private fork branches for working on patches for vulnerabilities?

Private avoids the possibility of leaking details in advance, but the current lack of CI for private forks makes this problematic to verify a patch, particularly when there's many different unrelated patches queued up to go out in the same release due to the possibility of merge conflicts etc.

Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,70 @@ updates:
And [here's
one](https://github.com/open-telemetry/opentelemetry-operator/pull/2990) from
Dependabot.

## Vulnerability Management

There are several aspects of vulnerability management that the Security SIG
recommends to the OpenTelemetry project maintainers:

- The maintainers of an OpenTelemetry project should establish a clear

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this document follow OTel spec style?

if so, should ---> SHOULD. It is applicable to whole file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This is a recommendation instead of a spec or enforcement, at least for now.

accountability for security issues, including identifying the direct
responsible individual for security issues at a certain time, for
example, via a duty rotation. This should be
documented in the main README.md file of the project.
Comment on lines +103 to +107

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a bit heavyweight for most of the repos to establish a DRI rotation where there are 2-3 maintainers

as long as the maintainers are meeting the bar we are setting for triage / fixing

if we want to keep this, maybe "... or some other process that ensures vulnerabilities are being triaged in a timely manner"

- The direct responsible individual should monitor the [repository security
advisories](https://docs.github.com/code-security/security-advisories/working-with-repository-security-advisories/about-repository-security-advisories),
make sure security advisories are triaged in a timely manner, and there is
active engagement and communication on the issue. Refer to the [incident
response guideline](../security-response.md#incident-response) for more
details.
- Regularly scan your code and dependencies for **deprecations** and
**vulnerabilities** using tools. This should include but not be limited to:
- CI/CD tooling - for example, some GitHub Actions might be deprecated or no
longer maintained, certain GitHub Actions might have known vulnerabilities,
a compiler might have a known vulnerability, etc.
- CI/CD environment - for example, the CI/CD job might be running on a
deprecated or vulnerable version of the operating system.
- container image dependencies - for example, the base image used in
Dockerfiles or image referenced by Helm charts might have known
vulnerabilities, or the image might be using a deprecated version of a
library.
- repository configurations - for example, a hotfix branch might not have the
proper branch protection rules, or the repository might not have the proper
security settings enabled.
Comment on lines +125 to +127

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be mostly covered by org-wide config-as-code tooling by now but we don't cover custom branches beyond main there usually, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.
This is more serving as a check list for the maintainers, while working on cleaning up the branch policies we have noticed several projects not protecting things correctly (couple of them were surprising).

- package dependencies - for example, a package might have a known
vulnerability, or a package might be using a deprecated version of a
library.
Comment on lines +128 to +130

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to put that one to the top of the list as it's the most frequent concern that's brought up.

Comment on lines +128 to +130

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it's covered elsewhere, maybe add a recommendation to use dependabot or renovate?

- All security vulnerabilities that are found - whether from the user reported
repository security advisories or through automated scanning - should be
handled in a timely manner based on the severity level. In case of a real
vulnerability that doesn't have a fix available, the maintainers should
evaluate the impact and likelihood of exploitation and take appropriate
action, such as applying workarounds or communicating with affected users. In
the worst case, this can be a potential end-of-life announcement of the
affected component or project.
Comment on lines +137 to +138

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or project" made me think catastrophic end of OpenTelemetry

Suggested change
the worst case, this can be a potential end-of-life announcement of the
affected component or project.
the worst case, this can be a potential end-of-life announcement of the
affected component.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe change to "repository"?


Here is a check list for the maintainers:

- [ ] Identify the direct responsible individual for security issues and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [ ] Identify the direct responsible individual for security issues and
- [ ] Identify the directly responsible individual(s) for security issues and

document it in the main README.md file of the project.
Comment thread
reyang marked this conversation as resolved.
Outdated
- [ ] Monitor the GitHub repository security advisories and triage security
issues in a timely manner.
- [ ] Daily scan CI/CD tooling for deprecations and vulnerabilities.
- [ ] Daily scan CI/CD environment for deprecations and vulnerabilities.
- [ ] Daily scan container image dependencies for deprecations and
vulnerabilities.
Comment on lines +146 to +149

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some suggested tooling?

Things like renovate and dependabot probably remove the need to explicitly do some of these as they will indirectly do it and propose patches (e.g. bumping a container's version/SHA).

- [ ] Daily scan repository configurations for deprecations and vulnerabilities.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something maintainers can do?

- [ ] Daily scan package dependencies for deprecations and vulnerabilities.
- [ ] False positives are documented (e.g., by commenting on the security
advisory, by providing the dismissal reason to a code scanning alert) and
dismissed.
- [ ] Critical and high severity vulnerabilities are patched within 2 weeks of
Comment thread
reyang marked this conversation as resolved.
Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment generally related to the "contrib" repositories where maintainers mostly are responsible for the release processs/merging/keeping code quality guard.

What is the recommendation for the vulnerabilities in abandoned/high difficutlies to contact with codewoners?
Should we follow this SLA? Should maintainers be responsible for fixing it, or we should just announce GHSA after ~2/4 weeks from findings?

It is not artificial question, I am affected exactly by this, example can be shared in the private channel.

discovery.
- [ ] Medium and low severity vulnerabilities are patched within 4 weeks of

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about wording

Suggested change
- [ ] Medium and low severity vulnerabilities are patched within 4 weeks of
- [ ] Medium and low severity vulnerabilities (regardless of rescoring) are patched within 4 weeks of

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 weeks seems to be pretty optimistic. I think that we should make first scan of all otel-repositories against potential vectors, prioritize it and then, follow this practice.

Lets consider Codex with security plugin. It can provide 40-80 findings per pretty small repository. Reviewing, deciding if it is a security issue/bug/etc takes time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also consider splitting the timeline for medium and low.

discovery.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's good to document an SLA here, I think it should be defined more clearly rather than just in the checklist that's summarizing the above.
When this is done, we should likely point out that this is a best-effort guideline for the project itself but without any guarantees or legal implications.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the following wording in the Collector SIG in case it helps:

We aim to provide a release that fixes security-related issues in at most 30 days since they are publicly announced; with the current release schedule this means security issues will typically not warrant a bugfix release.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are a bit too aggressive. You can see e.g. these ones from Gitlab: https://handbook.gitlab.com/handbook/security/product-security/vulnerability-management/sla/
where low and medium have a 90/180 day SLA. I think we should align to something like that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenSSF questionnaire says

There MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days.

I would go with that maybe at least for the core repos

@reyang reyang Aug 14, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenSSF questionnaire says

There MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days.

I would go with that maybe at least for the core repos

I feel this is a low bar and we should put a higher one. Here is my thinking: OpenTelemetry components are widely used by other OSS components, if our bar is to get the medium or higher CVEs patched in 60 days, we are not giving sufficient time for our users (including other OSS software that depend on us) to meet their 60 days time. That's the reason why I put 4 weeks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a higher bar makes sense to some but not all OpenTelemetry components necessarily (e.g. I don't think a security vulnerability in, say, Weaver, is as problematic as one in opentelemetry-python: they have different use cases with different security implications and their usage is different). I am in favor of a stronger requirement in specific components, but since this is an universal standard I think we should put a lower bar here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Do you feel this is a good balance?

Core components: 30 days
Everything else: 60 days

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to revisit this topic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would like us to set a higher bar.
Having that said, the current state is that we have no bar set for OpenTelemetry, so I think we should move ahead with some bar, learn from it, and decide if we want to raise or lower the bar.

I suggest that we stick to this bar at least for now:

Core components: 30 days
Everything else: 60 days

- [ ] For vulnerabilities that cannot be patched in a timely manner (for
example, the component is depending on an outdated library, and there is no
replacement), the maintainers should evaluate the impact and likelihood of
exploitation and take appropriate action, such as applying workarounds or
communicating with affected users.
7 changes: 6 additions & 1 deletion sig-security-charter.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,12 @@ Author and maintain cross-cutting security documentation. Seek out and
coordinate with experts in other SIGs for input on the documentation (i.e. we go
to them, they don't need to come to us). In-scope documentation includes:

* TBD
* Best practices for secure development lifecycle
* Authentication and authorization
* Secret management
* Storage security
* Supply chain security
* Transport security

#### Security Audit

Expand Down