Skip to content

Conversation

@bourgeoa
Copy link
Member

@bourgeoa bourgeoa commented Nov 4, 2025

Fix Turtle parser/serializer handling of dots in terms

Summary

Fixes #601 - Parser and serializer now correctly handle dots within local names per Turtle 1.1 specification.

Problem

The Turtle parser incorrectly treated dots (.) inside local names as path operators or statement terminators, breaking valid Turtle like:

@prefix ex: <http://example.com/> .
ex:subject.example ex:pred ex:obj .
[] :loves [] .

The serializer also failed to produce spec-compliant abbreviated forms, outputting <http://example.com/subject.example> instead of the valid ex:subject.example. It also occasionally created a spurious loc: prefix for base‑relative IRIs like </results.ttl>.

Solution

Parser (src/n3parser.js)

  • Allow dots in names unless followed by whitespace/comment/EOF
  • Added dotTerminatesName(str, i) helper to centralize logic across 3 call sites
  • Shared wsOrHash regex at module level for performance
  • Clarified why node() leaves . for unified checkDot() handling

Serializer (src/serializer.js)

  • Implemented isValidPNLocal(local) validator per Turtle 1.1 spec
  • Produces abbreviated qnames with dots: ex:subject.example
  • Rejects trailing dots: ex:subject.<http://example.com/subject.>
  • Allows empty local names: ex: for URIs ending in / or #
  • Added minimum namespace length check to prevent splitting at protocol
  • Avoid abbreviating when namespace equals the document's base directory (prevents accidental loc: with </>) ✅
  • New optional flag o: do not abbreviate to a prefixed name when the local part contains a dot (opt‑out of dotted locals)
    • Usage (Turtle): serialize(doc, kb, doc.value, 'text/turtle', undefined, { flags: 'o' })
    • Notes: user flags are merged with defaults for Turtle/JSON‑LD; 'p' still disables all QName abbreviation

Tests (tests/unit/dot-in-term-test.ts)

  • Parse dots in local names: ex:subject.example
  • Parse blank node terminators: [] :loves [] .
  • Serialize with dots: ex:subject.example (not <URI>)
  • Reject trailing dots in serialization
  • Empty local names for URIs ending in /
  • Honors 'o' flag: dotted locals fall back to <IRI> instead of prefix:local

Results

  • Spec‑compliant default output: abbreviates dotted locals when valid.
  • Opt‑out available via 'o' flag for conservative output.
  • Serialize fixtures generate expected outputs with the harness using 'o'.

Behavior Change

Before: Conservative approach — URIs with dots serialized as <IRI>.
After (default): Spec-compliant — valid qnames with dots abbreviated (e.g., ex:subject.example).
Opt-out: Pass 'o' in flags to keep dotted locals as <IRI>.

This produces more compact, spec‑correct Turtle by default while offering a simple opt‑out for legacy expectations.

Files Changed

  • src/n3parser.js — Parser fix, helper, shared regex
  • src/serializer.js — PN_LOCAL validator; improved splitting logic; base‑dir guard; 'o' flag support
  • src/serialize.ts — Merge user flags with defaults so 'o' is honored in Turtle/JSON‑LD
  • tests/unit/dot-in-term-test.ts — Added tests including 'o' flag case
  • tests/serialize/data.js — Pass 'o' for Turtle fixture generation; preserve RDF/XML behavior
  • README.md — Document serializer flags and 'o' usage
  • changes.txt — Changelog entry
  • lib/* — Transpiled updates kept in sync (for repo consumers)

References

@bourgeoa bourgeoa self-assigned this Nov 4, 2025

## Serializer flags

The Turtle/N3/JSON‑LD serializers accept an optional flags string to tweak output formatting and abbreviation behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Turtle/N3/JSON‑LD serializers accept an optional flags string to tweak output formatting and abbreviation behavior.
The Turtle/N3/JSON‑LD serializers accept an optional `flags` string to tweak output formatting and abbreviation behavior.


- `s` `i` – used by default for Turtle to suppress `=`, `=>` notations
- `d e i n p r s t u x` – used for N-Triples/N-Quads to simplify output
- `dr` – used with JSON‑LD conversion (no default, no relative prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking for a confirmation that dr is a two-character flag

Suggested change
- `dr` – used with JSON‑LD conversion (no default, no relative prefix)
- `dr` – used with JSON‑LD conversion (no default, no relative prefix)

- `d e i n p r s t u x` – used for N-Triples/N-Quads to simplify output
- `dr` – used with JSON‑LD conversion (no default, no relative prefix)
- `o` – new: do not abbreviate to a prefixed name when the local part contains a dot. This keeps IRIs like
`http://foo.test/ns/subject.example` in `<...>` form instead of `ns:subject.example`.
Copy link
Contributor

@TallTed TallTed Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to match line 6 of scratch-serialize.js.

Suggested change
`http://foo.test/ns/subject.example` in `<...>` form instead of `ns:subject.example`.
`http://example.org/ns/subject.example` in `<...>` form instead of `ns:subject.example`.

const base = 'http://example.com/';
const doc = $rdf.sym(base + 'doc');
// A URI in a different namespace so it can abbreviate to a prefix
const other = 'http://foo.test/ns/subject.example';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different namespace should also be in the reserved TLDs.

Suggested change
const other = 'http://foo.test/ns/subject.example';
const other = 'http://example.org/ns/subject.example';

documentString = sz.statementsToN3(newSts)
return executeCallback(null, documentString)
case NTriplesContentType:
sz.setFlags('deinprstux') // Suppress nice parts of N3 to make ntriples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be deinprstux or d e i n p r s t u x?

// Allows dots inside the local name but not as trailing character
// Also allows empty local names (for URIs ending in / or #)
isValidPNLocal(local) {
// Empty local name is valid (e.g., ex: for http://example.com/)
Copy link
Contributor

@TallTed TallTed Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that resolution of this "empty" local name is determined server-side. It often defaults to index.html, but not always. This is configurable in Apache and some (most? all?) other HTTP servers, so it cannot be relied upon without out-of-band communications. Configured local name can be things like index.php.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

turtle parsing not correctly handling . in suffix

3 participants