Currently the PURL is defined as either an RFC 3986 URL or a WHATWG URL:
A purl is a valid URL and URI that conforms to the URL definitions or specifications at:
There is however a big difference between the two:
- RFC 3986 defines a URI as an ASCII string and its components are given by syntactic rules.
- WHATWG defines a URL as a struct of Unicode strings and leaves the serializing/parsing details to other sections:
A URL is a struct that represents a universal identifier. To disambiguate from a valid URL string it can also be referred to as a URL record.
Should we define a PURL in the same way as WHATWG? This would remove a lot of serialization/deserialization details from the definitions:
-
scheme:
- The
scheme is a constant with the value "pkg".
-
type:
- The package
type MUST be composed only of lowercase ASCII letters and numbers,
'.', '+' and '-' (period, plus, and dash).
- The
type MUST start with an ASCII letter.
-
namespace:
- The
namespace is a possibly empty list of segments.
-
name:
- The
name is a Unicode string.
-
version:
- The
version is a Unicode string.
-
qualifiers:
- The
qualifiers are a map with Unicode strings as keys and values.
- The
key must be composed only of ASCII lowercase letters and numbers, '.', '-' and
'_' (period, dash and underscore)
- The
value is any Unicode string.
-
subpath:
- The
subpath is a possibly empty list of segments
-
segment:
- A
segment is a Unicode string.
- A
segment MUST NOT be empty, . or ..
- A
segment MUST NOT contain a solid / character.
The rest of the details (percent-encoding, transforming to lowercase, dropping empty segments, etc.) can be kept in the parsing/serializing sections.
Currently the PURL is defined as either an RFC 3986 URL or a WHATWG URL:
There is however a big difference between the two:
Should we define a PURL in the same way as WHATWG? This would remove a lot of serialization/deserialization details from the definitions:
scheme:
schemeis a constant with the value "pkg".type:
typeMUST be composed only of lowercase ASCII letters and numbers,'.', '+' and '-' (period, plus, and dash).
typeMUST start with an ASCII letter.namespace:
namespaceis a possibly empty list of segments.name:
nameis a Unicode string.version:
versionis a Unicode string.qualifiers:
qualifiersare a map with Unicode strings as keys and values.keymust be composed only of ASCII lowercase letters and numbers, '.', '-' and'_' (period, dash and underscore)
valueis any Unicode string.subpath:
subpathis a possibly empty list of segmentssegment:
segmentis a Unicode string.segmentMUST NOT be empty,.or..segmentMUST NOT contain a solid/character.The rest of the details (percent-encoding, transforming to lowercase, dropping empty segments, etc.) can be kept in the parsing/serializing sections.