Skip to content

Conversation

@armintaenzertng
Copy link
Collaborator

fixes #373

Copy link
Collaborator

@meretp meretp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! I didn't look at each testcase and regex in detail, but overall this looks very good to me. Only two remarks from my side.
I'm open to discuss if a dictionary approach is suitable here but as I have another use case with the rdf writer, I think it makes sense to use such a mapping.

Comment on lines 107 to 115
CPE22TYPE_REGEX = r'^c[pP][eE]:/[AHOaho]?(:[A-Za-z0-9._\-~%]*){0,6}$'
CPE23TYPE_REGEX = r'^cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&\'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&\'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){4}$'
MAVEN_CENTRAL_REGEX = r'^[^:]+:[^:]+(:[^:]+)?$'
NPM_REGEX = r'^[^@]+@[^@]+$'
NUGET_REGEX = r'^[^/]+/[^/]+$'
BOWER_REGEX = r'^[^#]+#[^#]+$'
PURL_REGEX = r'^pkg:.+(\/.+)?\/.+(@.+)?(\?.+)?(#.+)?$'
SWH_REGEX = r'^swh:1:(snp|rel|rev|dir|cnt):[0-9a-fA-F]{40}$'
GITOID_REGEX = r'^gitoid:(blob|tree|commit|tag):(sha1:[0-9a-fA-F]{40}|sha256:[0-9a-fA-F]{64})$'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would import these regex statements from external_package_ref_validator.py to prevent possible mismatch in the future if one of the regex needs changes and simply to avoid duplications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 71 to 78
if reference_type == "npm":
return validate_against_regex(locator, NPM_REGEX, "npm", context)
if reference_type == "nuget":
return validate_against_regex(locator, NUGET_REGEX, "nuget", context)
if reference_type == "bower":
return validate_against_regex(locator, BOWER_REGEX, "bower", context)
if reference_type == "purl":
return validate_against_regex(locator, PURL_REGEX, "purl", context)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of code duplication. We could use a dictionary instead to map ExternalPackageRefCategory to a set of valid types and map the regex expressions to the corresponding type.
As I am currently working on the rdf writer and there I need to check if the specified type is listed (so one of cpe22Type, cpe23Type, swid and so on) or not, I would need such a mapping as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, the code is now much more concise. Please have a look if you can work with this in rdf now, too! :)

Copy link
Collaborator

@meretp meretp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me and yes, I can use it now, thanks a lot! 😊

@armintaenzertng armintaenzertng merged commit 6b3899f into spdx:refactor-python-tools Jan 25, 2023
@armintaenzertng armintaenzertng deleted the validateExternalPackageRef branch January 25, 2023 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants