Skip to content

Conversation

@DmitriyLewen
Copy link
Contributor

@DmitriyLewen DmitriyLewen commented Aug 25, 2025

Description

This PR fixes license handling in CycloneDX SBOM marshaling to properly support mixed license types and resolve normalization errors. The changes include:

  • Enhanced license normalization: Return the original license string as a Name field when normalization fails, instead of returning an empty LicenseChoice
  • Mixed license type handling: Added NormalizeLicenses() function:
    • to resolve conflicts when both License and Expression types are present in the same license collection
    • to merge multiple SPDX expressions
  • Schema compliance: Ensures CycloneDX schema compliance by using only one license type (Name/ID or Expression) per license collection

Examples

before:

➜  trivy -q image -f cyclonedx registry.gitlab.com/hoppr/choppr/debian-test:bugfix-fix-mypy-dependencies | jq '.components[] | select(."bom-ref"=="pkg:deb/ubuntu/[email protected]?arch=amd64&distro=ubuntu-22.04")| .licenses'
[
...
  {
    "license": {
      "id": "GPL-2.0-or-later"
    }
  },
  {
    "license": {
      "name": "BSD-3-Clause WITH advertising-clause"
    }
  },
  {
    "expression": "Libpng OR Apache-2.0 OR BSD-3-Clause"
  },
...
]

after:

➜  ./trivy -q image -f cyclonedx registry.gitlab.com/hoppr/choppr/debian-test:bugfix-fix-mypy-dependencies | jq '.components[] | select(."bom-ref"=="pkg:deb/ubuntu/[email protected]?arch=amd64&distro=ubuntu-22.04")| .licenses'
[
...
  {
    "license": {
      "id": "GPL-2.0-or-later"
    }
  },
  {
    "license": {
      "name": "BSD-3-Clause WITH advertising-clause"
    }
  },
...
  {
    "license": {
      "name": "Libpng OR Apache-2.0 OR BSD-3-Clause"
    }
  }
]

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

- Use Name/ID or Expression type for Licenses
- Use Name field for Expression if both types exist
@DmitriyLewen DmitriyLewen self-assigned this Aug 25, 2025
@DmitriyLewen DmitriyLewen marked this pull request as ready for review August 25, 2025 10:32
@DmitriyLewen DmitriyLewen requested a review from knqyf263 as a code owner August 25, 2025 10:32
@knqyf263
Copy link
Collaborator

knqyf263 commented Sep 1, 2025

It’s unclear when exactly we should use an expression in licenses rather than ids/names, but the current logic feels unnecessarily complicated. Wouldn’t it be sufficient to have a simple implementation where everything is consolidated into an SPDX expression whenever possible?

diff --git a/pkg/sbom/cyclonedx/marshal.go b/pkg/sbom/cyclonedx/marshal.go
index f887692d2..d42085f86 100644
--- a/pkg/sbom/cyclonedx/marshal.go
+++ b/pkg/sbom/cyclonedx/marshal.go
@@ -295,20 +295,42 @@ func (m *Marshaler) Licenses(licenses []string) *cdx.Licenses {
 	if len(licenses) == 0 {
 		return nil
 	}
-	choices := lo.Map(licenses, func(license string, _ int) cdx.LicenseChoice {
+	return m.normalizeLicenses(licenses)
+}
+
+func (m *Marshaler) normalizeLicenses(licenses []string) *cdx.Licenses {
+	expressions := lo.Map(licenses, func(license string, _ int) expression.Expression {
 		return m.normalizeLicense(license)
 	})
 
-	return lo.ToPtr(NormalizeLicenses(choices))
+	// Check if all licenses are valid SPDX expressions (both SimpleExpr and CompoundExpr)
+	if lo.EveryBy(expressions, func(expr expression.Expression) bool {
+		return expr.IsSPDXExpression()
+	}) {
+		// When all licenses are valid SPDX expressions, combine them into a single expression field
+		exprStrs := lo.Map(expressions, func(expr expression.Expression, _ int) string {
+			return expr.String()
+		})
+		return &cdx.Licenses{{Expression: strings.Join(exprStrs, " AND ")}}
+	}
+
+	// If not all licenses are SPDX expressions, use individual LicenseChoice entries with license.id or license.name
+	choices := lo.Map(expressions, func(expr expression.Expression, _ int) cdx.LicenseChoice {
+		if s, ok := expr.(expression.SimpleExpr); ok && s.IsSPDXExpression() {
+			// Use license.id for valid SPDX ID (e.g., "MIT", "Apache-2.0")
+			return cdx.LicenseChoice{License: &cdx.License{ID: s.String()}}
+		}
+		// Use license.name for everything else (invalid SPDX ID, SPDX expression, etc.)
+		return cdx.LicenseChoice{License: &cdx.License{Name: expr.String()}}
+	})
+	return lo.ToPtr(cdx.Licenses(choices))
 }
 
-func (m *Marshaler) normalizeLicense(license string) cdx.LicenseChoice {
+func (m *Marshaler) normalizeLicense(license string) expression.Expression {
 	// Save text license as licenseChoice.license.name
 	if after, ok := strings.CutPrefix(license, licensing.LicenseTextPrefix); ok {
-		return cdx.LicenseChoice{
-			License: &cdx.License{
-				Name: after,
-			},
+		return expression.SimpleExpr{
+			License: after,
 		}
 	}
 
@@ -320,81 +342,9 @@ func (m *Marshaler) normalizeLicense(license string) cdx.LicenseChoice {
 	if err != nil {
 		// Not fail on the invalid license
 		m.logger.Warn("Unable to marshal SPDX licenses", log.String("license", license))
-		return cdx.LicenseChoice{
-			License: &cdx.License{Name: license},
-		}
-	}
-
-	// The license is not a valid SPDX ID or SPDX expression
-	if !normalizedLicenses.IsSPDXExpression() {
-		// Use LicenseChoice.License.Name for invalid SPDX ID / SPDX expression
-		return cdx.LicenseChoice{
-			License: &cdx.License{Name: normalizedLicenses.String()},
-		}
-	}
-
-	// The license is a valid SPDX ID or SPDX expression
-	var licenseChoice cdx.LicenseChoice
-	switch normalizedLicenses.(type) {
-	case expression.SimpleExpr:
-		// Use LicenseChoice.License.ID for valid SPDX ID
-		licenseChoice.License = &cdx.License{ID: normalizedLicenses.String()}
-	case expression.CompoundExpr:
-		// Use LicenseChoice.Expression for valid SPDX expression (with any conjunction)
-		// e.g. "GPL-2.0 WITH Classpath-exception-2.0" or "GPL-2.0 AND MIT"
-		licenseChoice.Expression = normalizedLicenses.String()
-	}
-
-	return licenseChoice
-}
-
-// NormalizeLicenses fixes invalid CycloneDX schema cases:
-// - using both LicenseChoice.License and LicenseChoice.Expression in Licenses (convert Expression to License.Name)
-// - using more than one LicenseChoice with Expression (combine them with AND into one Expression)
-func NormalizeLicenses(choices []cdx.LicenseChoice) cdx.Licenses {
-	var licenses, expressions []cdx.LicenseChoice
-
-	for _, choice := range choices {
-		if choice.Expression != "" {
-			expressions = append(expressions, choice)
-		} else if choice.License != nil {
-			licenses = append(licenses, choice)
-		}
-	}
-
-	if len(licenses) == 0 && len(expressions) == 0 {
-		return nil
-	}
-
-	// If both License and Expression are used, convert Expression to License.Name
-	if len(expressions) > 0 && len(licenses) > 0 {
-		for _, c := range expressions {
-			licenses = append(licenses, cdx.LicenseChoice{
-				License: &cdx.License{Name: c.Expression},
-			})
-		}
-		return licenses
-	}
-
-	// Names/IDs only
-	if len(licenses) > 0 {
-		return licenses
-	}
-
-	// Merge multiple expressions into one expression with AND
-	if len(expressions) > 1 {
-		var b strings.Builder
-		for i, c := range expressions {
-			if i > 0 {
-				b.WriteString(" AND ")
-			}
-			b.WriteString(c.Expression)
-		}
-		return cdx.Licenses{{Expression: b.String()}}
+		return expression.SimpleExpr{License: license}
 	}
-
-	// len(expressions) == 1
-	return expressions
+	return normalizedLicenses
 }
 
 func (*Marshaler) Properties(properties []core.Property) *[]cdx.Property {
diff --git a/pkg/sbom/cyclonedx/marshal_test.go b/pkg/sbom/cyclonedx/marshal_test.go
index fb0277c0c..a78cfd82a 100644
--- a/pkg/sbom/cyclonedx/marshal_test.go
+++ b/pkg/sbom/cyclonedx/marshal_test.go
@@ -511,9 +511,7 @@ func TestMarshaler_MarshalReport(t *testing.T) {
 						Version: "2.30-93.el8",
 						Licenses: &cdx.Licenses{
 							cdx.LicenseChoice{
-								License: &cdx.License{
-									ID: "GPL-3.0-or-later",
-								},
+								Expression: "GPL-3.0-or-later",
 							},
 						},
 						PackageURL: "pkg:rpm/centos/[email protected]?arch=aarch64&distro=centos-8.3.2011",
@@ -1031,9 +1029,7 @@ func TestMarshaler_MarshalReport(t *testing.T) {
 						Version: "1:2.2.53-1.el8",
 						Licenses: &cdx.Licenses{
 							cdx.LicenseChoice{
-								License: &cdx.License{
-									ID: "GPL-2.0-or-later",
-								},
+								Expression: "GPL-2.0-or-later",
 							},
 						},
 						PackageURL: "pkg:rpm/centos/[email protected]?arch=aarch64&distro=centos-8.3.2011&epoch=1",
@@ -1077,9 +1073,7 @@ func TestMarshaler_MarshalReport(t *testing.T) {
 						Version: "2.28-151.el8",
 						Licenses: &cdx.Licenses{
 							cdx.LicenseChoice{
-								License: &cdx.License{
-									ID: "GPL-2.0-or-later",
-								},
+								Expression: "GPL-2.0-or-later",
 							},
 						},
 						PackageURL: "pkg:rpm/centos/[email protected]?arch=aarch64&distro=centos-8.3.2011",
@@ -2020,9 +2014,7 @@ func TestMarshaler_MarshalReport(t *testing.T) {
 						PackageURL: "pkg:npm/[email protected]",
 						Licenses: &cdx.Licenses{
 							cdx.LicenseChoice{
-								License: &cdx.License{
-									ID: "MIT",
-								},
+								Expression: "MIT",
 							},
 						},
 						Properties: &[]cdx.Property{
@@ -2140,9 +2132,7 @@ func TestMarshaler_Licenses(t *testing.T) {
 			},
 			want: &cdx.Licenses{
 				cdx.LicenseChoice{
-					License: &cdx.License{
-						ID: "MIT",
-					},
+					Expression: "MIT",
 				},
 			},
 		},
@@ -2240,14 +2230,7 @@ func TestMarshaler_Licenses(t *testing.T) {
 			},
 			want: &cdx.Licenses{
 				cdx.LicenseChoice{
-					License: &cdx.License{
-						ID: "MIT",
-					},
-				},
-				cdx.LicenseChoice{
-					License: &cdx.License{
-						ID: "AFL-2.0",
-					},
+					Expression: "MIT AND AFL-2.0",
 				},
 			},
 		},
@@ -2278,14 +2261,7 @@ func TestMarshaler_Licenses(t *testing.T) {
 			},
 			want: &cdx.Licenses{
 				cdx.LicenseChoice{
-					License: &cdx.License{
-						ID: "MIT",
-					},
-				},
-				cdx.LicenseChoice{
-					License: &cdx.License{
-						Name: "AFL-2.0 WITH Linux-syscall-note",
-					},
+					Expression: "MIT AND AFL-2.0 WITH Linux-syscall-note",
 				},
 			},
 		},

If using license.id is preferable in the case of a single SPDX ID, then we could also handle it as shown below.

Diff

diff --git a/pkg/sbom/cyclonedx/marshal.go b/pkg/sbom/cyclonedx/marshal.go
index f887692d2..8bbb6a0b6 100644
--- a/pkg/sbom/cyclonedx/marshal.go
+++ b/pkg/sbom/cyclonedx/marshal.go
@@ -295,20 +295,50 @@ func (m *Marshaler) Licenses(licenses []string) *cdx.Licenses {
 	if len(licenses) == 0 {
 		return nil
 	}
-	choices := lo.Map(licenses, func(license string, _ int) cdx.LicenseChoice {
+	return m.normalizeLicenses(licenses)
+}
+
+func (m *Marshaler) normalizeLicenses(licenses []string) *cdx.Licenses {
+	expressions := lo.Map(licenses, func(license string, _ int) expression.Expression {
 		return m.normalizeLicense(license)
 	})
 
-	return lo.ToPtr(NormalizeLicenses(choices))
+	// Check if all licenses are valid SPDX expressions
+	allValidSPDX := lo.EveryBy(expressions, func(expr expression.Expression) bool {
+		return expr.IsSPDXExpression()
+	})
+
+	// Check if at least one is a CompoundExpr
+	hasCompoundExpr := lo.ContainsBy(expressions, func(expr expression.Expression) bool {
+		_, isCompound := expr.(expression.CompoundExpr)
+		return isCompound
+	})
+
+	// If all are valid SPDX AND at least one contains CompoundExpr, combine into single Expression
+	if allValidSPDX && hasCompoundExpr {
+		exprStrs := lo.Map(expressions, func(expr expression.Expression, _ int) string {
+			return expr.String()
+		})
+		return &cdx.Licenses{{Expression: strings.Join(exprStrs, " AND ")}}
+	}
+
+	// Otherwise use individual LicenseChoice entries with license.id or license.name
+	choices := lo.Map(expressions, func(expr expression.Expression, _ int) cdx.LicenseChoice {
+		if s, ok := expr.(expression.SimpleExpr); ok && s.IsSPDXExpression() {
+			// Use license.id for valid SPDX ID (e.g., "MIT", "Apache-2.0")
+			return cdx.LicenseChoice{License: &cdx.License{ID: s.String()}}
+		}
+		// Use license.name for everything else (invalid SPDX ID, SPDX expression, etc.)
+		return cdx.LicenseChoice{License: &cdx.License{Name: expr.String()}}
+	})
+	return lo.ToPtr(cdx.Licenses(choices))
 }
 
-func (m *Marshaler) normalizeLicense(license string) cdx.LicenseChoice {
+func (m *Marshaler) normalizeLicense(license string) expression.Expression {
 	// Save text license as licenseChoice.license.name
 	if after, ok := strings.CutPrefix(license, licensing.LicenseTextPrefix); ok {
-		return cdx.LicenseChoice{
-			License: &cdx.License{
-				Name: after,
-			},
+		return expression.SimpleExpr{
+			License: after,
 		}
 	}
 
@@ -320,81 +350,9 @@ func (m *Marshaler) normalizeLicense(license string) cdx.LicenseChoice {
 	if err != nil {
 		// Not fail on the invalid license
 		m.logger.Warn("Unable to marshal SPDX licenses", log.String("license", license))
-		return cdx.LicenseChoice{
-			License: &cdx.License{Name: license},
-		}
-	}
-
-	// The license is not a valid SPDX ID or SPDX expression
-	if !normalizedLicenses.IsSPDXExpression() {
-		// Use LicenseChoice.License.Name for invalid SPDX ID / SPDX expression
-		return cdx.LicenseChoice{
-			License: &cdx.License{Name: normalizedLicenses.String()},
-		}
-	}
-
-	// The license is a valid SPDX ID or SPDX expression
-	var licenseChoice cdx.LicenseChoice
-	switch normalizedLicenses.(type) {
-	case expression.SimpleExpr:
-		// Use LicenseChoice.License.ID for valid SPDX ID
-		licenseChoice.License = &cdx.License{ID: normalizedLicenses.String()}
-	case expression.CompoundExpr:
-		// Use LicenseChoice.Expression for valid SPDX expression (with any conjunction)
-		// e.g. "GPL-2.0 WITH Classpath-exception-2.0" or "GPL-2.0 AND MIT"
-		licenseChoice.Expression = normalizedLicenses.String()
+		return expression.SimpleExpr{License: license}
 	}
-
-	return licenseChoice
-}
-
-// NormalizeLicenses fixes invalid CycloneDX schema cases:
-// - using both LicenseChoice.License and LicenseChoice.Expression in Licenses (convert Expression to License.Name)
-// - using more than one LicenseChoice with Expression (combine them with AND into one Expression)
-func NormalizeLicenses(choices []cdx.LicenseChoice) cdx.Licenses {
-	var licenses, expressions []cdx.LicenseChoice
-
-	for _, choice := range choices {
-		if choice.Expression != "" {
-			expressions = append(expressions, choice)
-		} else if choice.License != nil {
-			licenses = append(licenses, choice)
-		}
-	}
-
-	if len(licenses) == 0 && len(expressions) == 0 {
-		return nil
-	}
-
-	// If both License and Expression are used, convert Expression to License.Name
-	if len(expressions) > 0 && len(licenses) > 0 {
-		for _, c := range expressions {
-			licenses = append(licenses, cdx.LicenseChoice{
-				License: &cdx.License{Name: c.Expression},
-			})
-		}
-		return licenses
-	}
-
-	// Names/IDs only
-	if len(licenses) > 0 {
-		return licenses
-	}
-
-	// Merge multiple expressions into one expression with AND
-	if len(expressions) > 1 {
-		var b strings.Builder
-		for i, c := range expressions {
-			if i > 0 {
-				b.WriteString(" AND ")
-			}
-			b.WriteString(c.Expression)
-		}
-		return cdx.Licenses{{Expression: b.String()}}
-	}
-
-	// len(expressions) == 1
-	return expressions
+	return normalizedLicenses
 }
 
 func (*Marshaler) Properties(properties []core.Property) *[]cdx.Property {

@DmitriyLewen
Copy link
Contributor Author

If using license.id is preferable in the case of a single SPDX ID, then we could also handle it as shown below.

I think this is more correct behavior.
So I chose this way.

Thanks for simplifying and organizing my logic.
Updated in 76bb42d

@knqyf263
Copy link
Collaborator

knqyf263 commented Sep 1, 2025

I think this is more correct behavior.

SBOM is always too flexible :) We need to make a decision.

@DmitriyLewen DmitriyLewen added this pull request to the merge queue Sep 1, 2025
Merged via the queue into aquasecurity:main with commit 46ab76a Sep 1, 2025
13 checks passed
@DmitriyLewen DmitriyLewen deleted the fix/cyclonedx-license-fields branch September 1, 2025 12:31
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Sep 3, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.65.0` -> `0.66.0` |

---

### Release Notes

<details>
<summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary>

### [`v0.66.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0660-2025-09-02)

[Compare Source](aquasecurity/trivy@v0.65.0...v0.66.0)

##### Features

- add timeout handling for cache database operations ([#&#8203;9307](aquasecurity/trivy#9307)) ([235c24e](aquasecurity/trivy@235c24e))
- **misconf:** added audit config attribute ([#&#8203;9249](aquasecurity/trivy#9249)) ([4d4a244](aquasecurity/trivy@4d4a244))
- **secret:** implement streaming secret scanner with byte offset tracking ([#&#8203;9264](aquasecurity/trivy#9264)) ([5a5e097](aquasecurity/trivy@5a5e097))
- **terraform:** use .terraform cache for remote modules in plan scanning ([#&#8203;9277](aquasecurity/trivy#9277)) ([298a994](aquasecurity/trivy@298a994))

##### Bug Fixes

- **conda:** memory leak by adding closure method for `package.json` file ([#&#8203;9349](aquasecurity/trivy#9349)) ([03d039f](aquasecurity/trivy@03d039f))
- create temp file under composite fs dir ([#&#8203;9387](aquasecurity/trivy#9387)) ([ce22f54](aquasecurity/trivy@ce22f54))
- **cyclonedx:** handle multiple license types ([#&#8203;9378](aquasecurity/trivy#9378)) ([46ab76a](aquasecurity/trivy@46ab76a))
- **fs:** avoid shadowing errors in file.glob ([#&#8203;9286](aquasecurity/trivy#9286)) ([b51c789](aquasecurity/trivy@b51c789))
- **image:** use standardized HTTP client for ECR authentication ([#&#8203;9322](aquasecurity/trivy#9322)) ([84fbf86](aquasecurity/trivy@84fbf86))
- **misconf:** ensure ignore rules respect subdirectory chart paths ([#&#8203;9324](aquasecurity/trivy#9324)) ([d3cd101](aquasecurity/trivy@d3cd101))
- **misconf:** ensure module source is known ([#&#8203;9404](aquasecurity/trivy#9404)) ([81d9425](aquasecurity/trivy@81d9425))
- **misconf:** preserve original paths of remote submodules from .terraform ([#&#8203;9294](aquasecurity/trivy#9294)) ([1319d8d](aquasecurity/trivy@1319d8d))
- **misconf:** use correct field log\_bucket instead of target\_bucket in gcp bucket ([#&#8203;9296](aquasecurity/trivy#9296)) ([04ad0c4](aquasecurity/trivy@04ad0c4))
- persistent flag option typo ([#&#8203;9374](aquasecurity/trivy#9374)) ([6e99dd3](aquasecurity/trivy@6e99dd3))
- **plugin:** don't remove plugins when updating index.yaml file ([#&#8203;9358](aquasecurity/trivy#9358)) ([5f067ac](aquasecurity/trivy@5f067ac))
- **python:** impove package name normalization  ([#&#8203;9290](aquasecurity/trivy#9290)) ([1473e88](aquasecurity/trivy@1473e88))
- **repo:** preserve RepoMetadata on FS cache hit ([#&#8203;9389](aquasecurity/trivy#9389)) ([4f2a44e](aquasecurity/trivy@4f2a44e))
- **repo:** sanitize git repo URL before inserting into report metadata ([#&#8203;9391](aquasecurity/trivy#9391)) ([1ac9b1f](aquasecurity/trivy@1ac9b1f))
- **sbom:** add support for `file` component type of `CycloneDX` ([#&#8203;9372](aquasecurity/trivy#9372)) ([aa7cf43](aquasecurity/trivy@aa7cf43))
- suppress debug log for context cancellation errors ([#&#8203;9298](aquasecurity/trivy#9298)) ([2458d5e](aquasecurity/trivy@2458d5e))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4zNS4xIiwidXBkYXRlZEluVmVyIjoiNDEuMzUuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1367
Co-authored-by: Renovate Bot <[email protected]>
Co-committed-by: Renovate Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants