Add vulnerability management recommendations for maintainers#156
Add vulnerability management recommendations for maintainers#156reyang wants to merge 4 commits into
Conversation
| - repository configurations - for example, a hotfix branch might not have the | ||
| proper branch protection rules, or the repository might not have the proper | ||
| security settings enabled. |
There was a problem hiding this comment.
This should be mostly covered by org-wide config-as-code tooling by now but we don't cover custom branches beyond main there usually, right?
There was a problem hiding this comment.
Right.
This is more serving as a check list for the maintainers, while working on cleaning up the branch policies we have noticed several projects not protecting things correctly (couple of them were surprising).
| - package dependencies - for example, a package might have a known | ||
| vulnerability, or a package might be using a deprecated version of a | ||
| library. |
There was a problem hiding this comment.
We might want to put that one to the top of the list as it's the most frequent concern that's brought up.
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of | ||
| discovery. |
There was a problem hiding this comment.
While it's good to document an SLA here, I think it should be defined more clearly rather than just in the checklist that's summarizing the above.
When this is done, we should likely point out that this is a best-effort guideline for the project itself but without any guarantees or legal implications.
There was a problem hiding this comment.
We have the following wording in the Collector SIG in case it helps:
We aim to provide a release that fixes security-related issues in at most 30 days since they are publicly announced; with the current release schedule this means security issues will typically not warrant a bugfix release.
Co-authored-by: Armin Ruech <7052238+arminru@users.noreply.github.com>
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of | ||
| discovery. |
There was a problem hiding this comment.
We have the following wording in the Collector SIG in case it helps:
We aim to provide a release that fixes security-related issues in at most 30 days since they are publicly announced; with the current release schedule this means security issues will typically not warrant a bugfix release.
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of | ||
| discovery. |
There was a problem hiding this comment.
I think these are a bit too aggressive. You can see e.g. these ones from Gitlab: https://handbook.gitlab.com/handbook/security/product-security/vulnerability-management/sla/
where low and medium have a 90/180 day SLA. I think we should align to something like that
There was a problem hiding this comment.
The OpenSSF questionnaire says
There MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days.
I would go with that maybe at least for the core repos
There was a problem hiding this comment.
The OpenSSF questionnaire says
There MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days.
I would go with that maybe at least for the core repos
I feel this is a low bar and we should put a higher one. Here is my thinking: OpenTelemetry components are widely used by other OSS components, if our bar is to get the medium or higher CVEs patched in 60 days, we are not giving sufficient time for our users (including other OSS software that depend on us) to meet their 60 days time. That's the reason why I put 4 weeks.
There was a problem hiding this comment.
I think a higher bar makes sense to some but not all OpenTelemetry components necessarily (e.g. I don't think a security vulnerability in, say, Weaver, is as problematic as one in opentelemetry-python: they have different use cases with different security implications and their usage is different). I am in favor of a stronger requirement in specific components, but since this is an universal standard I think we should put a lower bar here
There was a problem hiding this comment.
Makes sense to me. Do you feel this is a good balance?
Core components: 30 days
Everything else: 60 days
There was a problem hiding this comment.
Personally, I would like us to set a higher bar.
Having that said, the current state is that we have no bar set for OpenTelemetry, so I think we should move ahead with some bar, learn from it, and decide if we want to raise or lower the bar.
I suggest that we stick to this bar at least for now:
Core components: 30 days
Everything else: 60 days
| dismissed. | ||
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of |
There was a problem hiding this comment.
not sure about wording
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of | |
| - [ ] Medium and low severity vulnerabilities (regardless of rescoring) are patched within 4 weeks of |
|
IMHO this looks good. |
| the worst case, this can be a potential end-of-life announcement of the | ||
| affected component or project. |
There was a problem hiding this comment.
"or project" made me think catastrophic end of OpenTelemetry
| the worst case, this can be a potential end-of-life announcement of the | |
| affected component or project. | |
| the worst case, this can be a potential end-of-life announcement of the | |
| affected component. |
There was a problem hiding this comment.
Or maybe change to "repository"?
| - [ ] Daily scan CI/CD environment for deprecations and vulnerabilities. | ||
| - [ ] Daily scan container image dependencies for deprecations and | ||
| vulnerabilities. | ||
| - [ ] Daily scan repository configurations for deprecations and vulnerabilities. |
There was a problem hiding this comment.
is this something maintainers can do?
| - The maintainers of an OpenTelemetry project should establish a clear | ||
| accountability for security issues, including identifying the direct | ||
| responsible individual for security issues at a certain time, for | ||
| example, via a duty rotation. This should be | ||
| documented in the main README.md file of the project. |
There was a problem hiding this comment.
this feels a bit heavyweight for most of the repos to establish a DRI rotation where there are 2-3 maintainers
as long as the maintainers are meeting the bar we are setting for triage / fixing
if we want to keep this, maybe "... or some other process that ensures vulnerabilities are being triaged in a timely manner"
jmacd
left a comment
There was a problem hiding this comment.
As we know, there is a rise in security advisories. It is time to assert these recommendations.
| There are several aspects of vulnerability management that the Security SIG | ||
| recommends to the OpenTelemetry project maintainers: | ||
|
|
||
| - The maintainers of an OpenTelemetry project should establish a clear |
There was a problem hiding this comment.
Is this document follow OTel spec style?
if so, should ---> SHOULD. It is applicable to whole file.
There was a problem hiding this comment.
Nope. This is a recommendation instead of a spec or enforcement, at least for now.
| dismissed. | ||
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of |
There was a problem hiding this comment.
4 weeks seems to be pretty optimistic. I think that we should make first scan of all otel-repositories against potential vectors, prioritize it and then, follow this practice.
Lets consider Codex with security plugin. It can provide 40-80 findings per pretty small repository. Reviewing, deciding if it is a security issue/bug/etc takes time.
There was a problem hiding this comment.
I would also consider splitting the timeline for medium and low.
| - [ ] False positives are documented (e.g., by commenting on the security | ||
| advisory, by providing the dismissal reason to a code scanning alert) and | ||
| dismissed. | ||
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of |
There was a problem hiding this comment.
Comment generally related to the "contrib" repositories where maintainers mostly are responsible for the release processs/merging/keeping code quality guard.
What is the recommendation for the vulnerabilities in abandoned/high difficutlies to contact with codewoners?
Should we follow this SLA? Should maintainers be responsible for fixing it, or we should just announce GHSA after ~2/4 weeks from findings?
It is not artificial question, I am affected exactly by this, example can be shared in the private channel.
| - package dependencies - for example, a package might have a known | ||
| vulnerability, or a package might be using a deprecated version of a | ||
| library. |
There was a problem hiding this comment.
Unless it's covered elsewhere, maybe add a recommendation to use dependabot or renovate?
| the worst case, this can be a potential end-of-life announcement of the | ||
| affected component or project. |
There was a problem hiding this comment.
Or maybe change to "repository"?
|
|
||
| Here is a check list for the maintainers: | ||
|
|
||
| - [ ] Identify the direct responsible individual for security issues and |
There was a problem hiding this comment.
| - [ ] Identify the direct responsible individual for security issues and | |
| - [ ] Identify the directly responsible individual(s) for security issues and |
| - [ ] Daily scan CI/CD tooling for deprecations and vulnerabilities. | ||
| - [ ] Daily scan CI/CD environment for deprecations and vulnerabilities. | ||
| - [ ] Daily scan container image dependencies for deprecations and | ||
| vulnerabilities. |
There was a problem hiding this comment.
Maybe add some suggested tooling?
Things like renovate and dependabot probably remove the need to explicitly do some of these as they will indirectly do it and propose patches (e.g. bumping a container's version/SHA).
| dismissed. | ||
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||
| discovery. | ||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of |
There was a problem hiding this comment.
I would also consider splitting the timeline for medium and low.
There was a problem hiding this comment.
Would it make sense to add some guidance about when to use private fork branches for working on patches for vulnerabilities?
Private avoids the possibility of leaking details in advance, but the current lack of CI for private forks makes this problematic to verify a patch, particularly when there's many different unrelated patches queued up to go out in the same release due to the possibility of merge conflicts etc.
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Co-authored-by: Martin Costello <martin@martincostello.com>
No description provided.