Skip to content

ChangesetReader performance improvements #9415

Description

@rschili

Had a brief chat with the Jobs and Transformer people and checked our new "happy path" for getting changes instance IDs, I found a few issues that we should tackle because they cost performance. Not all of them confirmed, the biggest concern is about crossing the N-API boundary too often.

Details:

Consumers like the iModel transformer and the imodels-jobs changed-elements service only need the
instance keys of changed rows - { ECInstanceId, ECClassId, op } - not the decoded property values.
The new ChangesetReader (native PreparedChangesetReader, TS wrapper in
core/backend/src/ChangesetReader.ts) already supports this cheaply via PropertyFilter.InstanceKey,
which short-circuits ChangesetValueFactory::Create before BuildPropertyFields:

fields.emplace_back(std::move(instanceIdField));
fields.emplace_back(std::move(classIdField));
if (propertyFilter == PropertyFilter::InstanceKey)
    return SUCCESS; // skip user-property materialization

That removes the expensive value-decoding work (nav props, DateTime conversion, structs, arrays, GUID
/geometry). The remaining cost is not decoding - it is per-row overhead, dominated by N-API boundary
crossings and repeated PRAGMA lookups. For a workload of hundreds of changesets x millions of rows this
fixed cost dominates.

Problem 1 (primary): per-row N-API chatter

The TS step() wrapper crosses the native boundary 3-4 times per row:

public step(): boolean {
  ...
  if (this._nativeReader.step()) {                                  // crossing #1
    const meta = this._nativeReader.getChangeMetadata();           // crossing #2 (builds a JS object)
    ...
    if (op === "Inserted" || op === "Updated")
      const rowValue = this._nativeReader.getValue(New, opts);     // crossing #3 (builds {data,key,changeFetchedPropNames})
    if (op === "Deleted" || op === "Updated")
      const rowValue = this._nativeReader.getValue(Old, opts);     // crossing #4 (Update only)
  }
}
  • Insert -> 3 crossings
  • Delete -> 3 crossings
  • Update -> 4 crossings

Each of getChangeMetadata/getValue not only crosses but allocates and marshals a transient napi
object
(getValue returns { data, key, changeFetchedPropNames }, where data is itself an object and
changeFetchedPropNames an array). For a 1M-row changeset that is ~3-4M boundary crossings plus ~2-3M
throwaway JS objects. The N-API boundary is the known-expensive part here, and we pay it several times per
row.

Proposed fix: batch/chunk results like ECSqlReader / ConcurrentQuery's QueryReader

ECSqlReader never crosses per row. It drains a local page buffer (_localRows) and only refills it when
empty (fetchRows() -> readRows()), so the boundary cost is amortized across a whole page:

// ECSqlReader.ts (paraphrased)
private _localRows: any[] = [];
public async step(): Promise<boolean> {
  if (this._localOffset < this._localRows.length - 1) {
    ++this._localOffset;                 // pure local advance, NO crossing
  } else {
    await this.fetchRows();              // page exhausted -> ONE transfer for the next chunk
    this._localOffset = 0;
  }
}

Note: ConcurrentQuery's page size is quota-sized, not a fixed row count - the backend returns a
Partial status with as many rows as fit the time/memory quota, and the reader keeps requesting until
Done. The transfer is also request/response (RPC/IPC on the frontend, native boundary on the backend)
rather than strictly N-API. The relevant property is what we want to copy: one transfer per chunk, not
per row.
For the changeset batch API below we can pick a fixed maxRows instead of a quota.

Mirror that for the lightweight changeset path. Add a native batch drain that advances the reader and
returns up to maxRows instance keys in a single crossing, with the TS wrapper iterating the local buffer:

// native addon
nextInstanceKeys(maxRows: number): ChangesetInstanceKeyBatch | undefined;

interface ChangesetInstanceKeyBatch {
  // option A: array of small objects
  rows: { id: Id64String; classId: Id64String; op: SqliteChangeOp }[];
  // option B (cheaper, preferred for millions of rows): columnar typed buffers
  //   ids: BigUint64Array;       // ECInstanceId
  //   classIds: BigUint64Array;  // ECClassId
  //   ops: Uint8Array;           // opcode per row
  done: boolean;                  // true when the underlying walk is exhausted
}
  • Columnar (option B) is the cheapest: it avoids per-row JS object allocation entirely and packs the
    data into a couple of typed arrays, which marshal across N-API as buffers. This is the shape that makes
    "millions of rows" genuinely cheap.
  • Under PropertyFilter.InstanceKey the native side already has ECInstanceId + ECClassId materialized
    as fixed slots after Create, so assembling the batch is just reading those two slots + the opcode per
    row - no extra decode.
  • TS wrapper keeps a _localBuffer, drains it, and only calls nextInstanceKeys() when empty. Result:
    one crossing per batch (e.g. a fixed maxRows of a few thousand) instead of 3-4 per row.

This is purely additive - the existing per-row step()/getValue() API is untouched for consumers that
need full row data.

Problem 2: PRAGMA_TABLE_INFO runs 2x per stage, every row

PreparedChangesetReader::GetColumnValues re-derives static table metadata on every row:

GetColumnCountForCurrentChangedTable(...);  // SELECT COUNT(*) FROM PRAGMA_TABLE_INFO(?)
... "SELECT [name] FROM PRAGMA_TABLE_INFO(?) ORDER BY [cid]"

The statements are cached but executed every call. Column count and names for a given table are
constant for the entire walk. An UPDATE processes both stages, so that is 4 PRAGMA scans per row
re-computing constant metadata. Cache column count + ordered names keyed by table name (populated lazily on
first encounter) to eliminate this.

Problem 3: redundant Old-stage processing on UPDATE for instance keys

ReFetchValues builds both New and Old stages for an UPDATE. For PropertyFilter.InstanceKey the key is
identical in both stages (PK and ClassId do not change on update), so the entire Old-stage pass
(another GetColumnValues + 2 PRAGMA + ResolveClassId + Create) plus the extra getValue(Old)
crossing produces no new information. Under InstanceKey, skip Old-stage processing for updates (or have the
batch drain emit a single key per update row).

Minor: full column map built even for InstanceKey

GetColumnValues maps every present column into an unordered_map<Utf8String, DbValue> per stage, when
InstanceKey only needs the PK + ClassId columns. No value decode happens, but it is O(columns) string-keyed
allocations per row. Lower priority than 1-3, but relevant at scale.

Priority

  1. Batch/chunk drain (Problem 1) - biggest win, removes the per-row N-API cost that dominates the
    lightweight path. Model after ECSqlReader._localRows paging; prefer columnar typed buffers.
  2. PRAGMA caching (Problem 2) - cheap native-side fix, removes 2-4 redundant pragma scans per row.
  3. Skip Old stage on UPDATE for InstanceKey (Problem 3) - halves per-update work for key-only
    consumers.

All are native-side and backward compatible; none change the existing ChangesetReader API surface.

Files referenced

  • itwinjs-core/core/backend/src/ChangesetReader.ts - step(), getValue wrapper
  • itwinjs-core/core/common/src/ECSqlReader.ts - _localRows paging pattern to mirror
  • imodel-native/iModelJsNodeAddon/IModelJsNative.cpp - NativeChangesetReader (step/getValue/getChangeMetadata bindings)
  • imodel-native/iModelCore/ECDb/ECDb/PreparedChangesetReader.cpp - Step, ReFetchValues, GetColumnValues, GetInstanceKey
  • imodel-native/iModelCore/ECDb/ECDb/ChangesetValueFactory.cpp - Create (InstanceKey short-circuit)

Metadata

Metadata

Assignees

Labels

ecdbECDb and ECSQL related issues
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions