-
Notifications
You must be signed in to change notification settings - Fork 21
Add vulnerability management recommendations for maintainers #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -94,3 +94,70 @@ updates: | |||||||||
| And [here's | ||||||||||
| one](https://github.com/open-telemetry/opentelemetry-operator/pull/2990) from | ||||||||||
| Dependabot. | ||||||||||
|
|
||||||||||
| ## Vulnerability Management | ||||||||||
|
|
||||||||||
| There are several aspects of vulnerability management that the Security SIG | ||||||||||
| recommends to the OpenTelemetry project maintainers: | ||||||||||
|
|
||||||||||
| - The maintainers of an OpenTelemetry project should establish a clear | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this document follow OTel spec style? if so, should ---> SHOULD. It is applicable to whole file.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope. This is a recommendation instead of a spec or enforcement, at least for now. |
||||||||||
| accountability for security issues, including identifying the direct | ||||||||||
| responsible individual for security issues at a certain time, for | ||||||||||
| example, via a duty rotation. This should be | ||||||||||
| documented in the main README.md file of the project. | ||||||||||
|
Comment on lines
+103
to
+107
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this feels a bit heavyweight for most of the repos to establish a DRI rotation where there are 2-3 maintainers as long as the maintainers are meeting the bar we are setting for triage / fixing if we want to keep this, maybe "... or some other process that ensures vulnerabilities are being triaged in a timely manner" |
||||||||||
| - The direct responsible individual should monitor the [repository security | ||||||||||
| advisories](https://docs.github.com/code-security/security-advisories/working-with-repository-security-advisories/about-repository-security-advisories), | ||||||||||
| make sure security advisories are triaged in a timely manner, and there is | ||||||||||
| active engagement and communication on the issue. Refer to the [incident | ||||||||||
| response guideline](../security-response.md#incident-response) for more | ||||||||||
| details. | ||||||||||
| - Regularly scan your code and dependencies for **deprecations** and | ||||||||||
| **vulnerabilities** using tools. This should include but not be limited to: | ||||||||||
| - CI/CD tooling - for example, some GitHub Actions might be deprecated or no | ||||||||||
| longer maintained, certain GitHub Actions might have known vulnerabilities, | ||||||||||
| a compiler might have a known vulnerability, etc. | ||||||||||
| - CI/CD environment - for example, the CI/CD job might be running on a | ||||||||||
| deprecated or vulnerable version of the operating system. | ||||||||||
| - container image dependencies - for example, the base image used in | ||||||||||
| Dockerfiles or image referenced by Helm charts might have known | ||||||||||
| vulnerabilities, or the image might be using a deprecated version of a | ||||||||||
| library. | ||||||||||
| - repository configurations - for example, a hotfix branch might not have the | ||||||||||
| proper branch protection rules, or the repository might not have the proper | ||||||||||
| security settings enabled. | ||||||||||
|
Comment on lines
+125
to
+127
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be mostly covered by org-wide config-as-code tooling by now but we don't cover custom branches beyond main there usually, right?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. |
||||||||||
| - package dependencies - for example, a package might have a known | ||||||||||
| vulnerability, or a package might be using a deprecated version of a | ||||||||||
| library. | ||||||||||
|
Comment on lines
+128
to
+130
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might want to put that one to the top of the list as it's the most frequent concern that's brought up.
Comment on lines
+128
to
+130
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless it's covered elsewhere, maybe add a recommendation to use dependabot or renovate? |
||||||||||
| - All security vulnerabilities that are found - whether from the user reported | ||||||||||
| repository security advisories or through automated scanning - should be | ||||||||||
| handled in a timely manner based on the severity level. In case of a real | ||||||||||
| vulnerability that doesn't have a fix available, the maintainers should | ||||||||||
| evaluate the impact and likelihood of exploitation and take appropriate | ||||||||||
| action, such as applying workarounds or communicating with affected users. In | ||||||||||
| the worst case, this can be a potential end-of-life announcement of the | ||||||||||
| affected component or project. | ||||||||||
|
Comment on lines
+137
to
+138
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "or project" made me think catastrophic end of OpenTelemetry
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe change to "repository"? |
||||||||||
|
|
||||||||||
| Here is a check list for the maintainers: | ||||||||||
|
|
||||||||||
| - [ ] Identify the direct responsible individual for security issues and | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| document it in the main README.md file of the project. | ||||||||||
|
reyang marked this conversation as resolved.
Outdated
|
||||||||||
| - [ ] Monitor the GitHub repository security advisories and triage security | ||||||||||
| issues in a timely manner. | ||||||||||
| - [ ] Daily scan CI/CD tooling for deprecations and vulnerabilities. | ||||||||||
| - [ ] Daily scan CI/CD environment for deprecations and vulnerabilities. | ||||||||||
| - [ ] Daily scan container image dependencies for deprecations and | ||||||||||
| vulnerabilities. | ||||||||||
|
Comment on lines
+146
to
+149
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add some suggested tooling? Things like renovate and dependabot probably remove the need to explicitly do some of these as they will indirectly do it and propose patches (e.g. bumping a container's version/SHA). |
||||||||||
| - [ ] Daily scan repository configurations for deprecations and vulnerabilities. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this something maintainers can do? |
||||||||||
| - [ ] Daily scan package dependencies for deprecations and vulnerabilities. | ||||||||||
| - [ ] False positives are documented (e.g., by commenting on the security | ||||||||||
| advisory, by providing the dismissal reason to a code scanning alert) and | ||||||||||
| dismissed. | ||||||||||
| - [ ] Critical and high severity vulnerabilities are patched within 2 weeks of | ||||||||||
|
reyang marked this conversation as resolved.
Outdated
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment generally related to the "contrib" repositories where maintainers mostly are responsible for the release processs/merging/keeping code quality guard. What is the recommendation for the vulnerabilities in abandoned/high difficutlies to contact with codewoners? It is not artificial question, I am affected exactly by this, example can be shared in the private channel. |
||||||||||
| discovery. | ||||||||||
| - [ ] Medium and low severity vulnerabilities are patched within 4 weeks of | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure about wording
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 4 weeks seems to be pretty optimistic. I think that we should make first scan of all otel-repositories against potential vectors, prioritize it and then, follow this practice. Lets consider Codex with security plugin. It can provide 40-80 findings per pretty small repository. Reviewing, deciding if it is a security issue/bug/etc takes time.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also consider splitting the timeline for medium and low. |
||||||||||
| discovery. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While it's good to document an SLA here, I think it should be defined more clearly rather than just in the checklist that's summarizing the above.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have the following wording in the Collector SIG in case it helps:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think these are a bit too aggressive. You can see e.g. these ones from Gitlab: https://handbook.gitlab.com/handbook/security/product-security/vulnerability-management/sla/
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The OpenSSF questionnaire says
I would go with that maybe at least for the core repos
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I feel this is a low bar and we should put a higher one. Here is my thinking: OpenTelemetry components are widely used by other OSS components, if our bar is to get the medium or higher CVEs patched in 60 days, we are not giving sufficient time for our users (including other OSS software that depend on us) to meet their 60 days time. That's the reason why I put 4 weeks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a higher bar makes sense to some but not all OpenTelemetry components necessarily (e.g. I don't think a security vulnerability in, say, Weaver, is as problematic as one in opentelemetry-python: they have different use cases with different security implications and their usage is different). I am in favor of a stronger requirement in specific components, but since this is an universal standard I think we should put a lower bar here
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense to me. Do you feel this is a good balance? Core components: 30 days
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds reasonable to me There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Time to revisit this topic.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I would like us to set a higher bar. I suggest that we stick to this bar at least for now: Core components: 30 days |
||||||||||
| - [ ] For vulnerabilities that cannot be patched in a timely manner (for | ||||||||||
| example, the component is depending on an outdated library, and there is no | ||||||||||
| replacement), the maintainers should evaluate the impact and likelihood of | ||||||||||
| exploitation and take appropriate action, such as applying workarounds or | ||||||||||
| communicating with affected users. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add some guidance about when to use private fork branches for working on patches for vulnerabilities?
Private avoids the possibility of leaking details in advance, but the current lack of CI for private forks makes this problematic to verify a patch, particularly when there's many different unrelated patches queued up to go out in the same release due to the possibility of merge conflicts etc.