Skip to content

v_ahb_diff: id_path-based JOIN produces false adds/deletes when segments are inserted #256

@hf-mrdachner

Description

@hf-mrdachner

Problem

v_ahb_diff joins old and new version on id_path. The id_path contains positional indices (e.g. @00005-00001-). When a new segment is inserted in a newer format version, all subsequent positional indices shift — causing the JOIN to fail and producing false added + deleted rows for lines that haven't actually changed.

Example: UTILMD/44001, FV2510 → FV2604

  • FV2510: 213 lines
  • FV2604: 218 lines (5 genuinely new)
  • v_ahb_diff reports: 69 added, 64 deleted, 27 modified — but most of those 69+64 are the same lines with shifted id_path

For example, DTM "Beginn zum" has id_path SG4>DTM>@00005-00001- in FV2510 but a different index in FV2604 because IMD segments were inserted before it. The view sees this as a delete + add.

Root cause

id_path is a positional index within a version, not a stable semantic identifier across versions. The v_ahb_diff view uses it as a cross-version JOIN key, which breaks on insertions.

Possible fix

Use a semantic key for the JOIN instead of the positional id_path. A stable identifier could be built from the structural path + qualifying qualifiers (the "front" qualifiers that identify which segment it is, not the value qualifiers):

  • SG4>DTM+92 = "Beginn zum" (Qualifier 92 identifies the segment)
  • SG4>DTM+93 = "Ende zum" (Qualifier 93 identifies the segment)
  • SG4>STS+7 = Transaktionsgrund

This would be stable across versions as long as the MIG structure doesn't fundamentally change (which it doesn't — segments get added but the order stays the same).

Impact

ahb-tabellen's compare feature shows the same inflated numbers. Downstream consumers (like our MCP server) have to implement content-fingerprint matching as a workaround to filter the noise.

Workaround (current)

In mcp-bdew-mako we post-process the v_ahb_diff output with two-pass fingerprint matching:

  1. Strict match (identity + content identical) → filter as noise
  2. Loose match (identity only, via segment_code + data_element + qualifier) → reclassify as modified

This reduces UTILMD/44001 from 160 "changes" to ~30 real ones.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions