Had a brief chat with the Jobs and Transformer people and checked our new "happy path" for getting changes instance IDs, I found a few issues that we should tackle because they cost performance. Not all of them confirmed, the biggest concern is about crossing the N-API boundary too often.
Details:
Consumers like the iModel transformer and the imodels-jobs changed-elements service only need the
instance keys of changed rows - { ECInstanceId, ECClassId, op } - not the decoded property values.
The new ChangesetReader (native PreparedChangesetReader, TS wrapper in
core/backend/src/ChangesetReader.ts) already supports this cheaply via PropertyFilter.InstanceKey,
which short-circuits ChangesetValueFactory::Create before BuildPropertyFields:
fields.emplace_back(std::move(instanceIdField));
fields.emplace_back(std::move(classIdField));
if (propertyFilter == PropertyFilter::InstanceKey)
return SUCCESS; // skip user-property materialization
That removes the expensive value-decoding work (nav props, DateTime conversion, structs, arrays, GUID
/geometry). The remaining cost is not decoding - it is per-row overhead, dominated by N-API boundary
crossings and repeated PRAGMA lookups. For a workload of hundreds of changesets x millions of rows this
fixed cost dominates.
Problem 1 (primary): per-row N-API chatter
The TS step() wrapper crosses the native boundary 3-4 times per row:
public step(): boolean {
...
if (this._nativeReader.step()) { // crossing #1
const meta = this._nativeReader.getChangeMetadata(); // crossing #2 (builds a JS object)
...
if (op === "Inserted" || op === "Updated")
const rowValue = this._nativeReader.getValue(New, opts); // crossing #3 (builds {data,key,changeFetchedPropNames})
if (op === "Deleted" || op === "Updated")
const rowValue = this._nativeReader.getValue(Old, opts); // crossing #4 (Update only)
}
}
- Insert -> 3 crossings
- Delete -> 3 crossings
- Update -> 4 crossings
Each of getChangeMetadata/getValue not only crosses but allocates and marshals a transient napi
object (getValue returns { data, key, changeFetchedPropNames }, where data is itself an object and
changeFetchedPropNames an array). For a 1M-row changeset that is ~3-4M boundary crossings plus ~2-3M
throwaway JS objects. The N-API boundary is the known-expensive part here, and we pay it several times per
row.
Proposed fix: batch/chunk results like ECSqlReader / ConcurrentQuery's QueryReader
ECSqlReader never crosses per row. It drains a local page buffer (_localRows) and only refills it when
empty (fetchRows() -> readRows()), so the boundary cost is amortized across a whole page:
// ECSqlReader.ts (paraphrased)
private _localRows: any[] = [];
public async step(): Promise<boolean> {
if (this._localOffset < this._localRows.length - 1) {
++this._localOffset; // pure local advance, NO crossing
} else {
await this.fetchRows(); // page exhausted -> ONE transfer for the next chunk
this._localOffset = 0;
}
}
Note: ConcurrentQuery's page size is quota-sized, not a fixed row count - the backend returns a
Partial status with as many rows as fit the time/memory quota, and the reader keeps requesting until
Done. The transfer is also request/response (RPC/IPC on the frontend, native boundary on the backend)
rather than strictly N-API. The relevant property is what we want to copy: one transfer per chunk, not
per row. For the changeset batch API below we can pick a fixed maxRows instead of a quota.
Mirror that for the lightweight changeset path. Add a native batch drain that advances the reader and
returns up to maxRows instance keys in a single crossing, with the TS wrapper iterating the local buffer:
// native addon
nextInstanceKeys(maxRows: number): ChangesetInstanceKeyBatch | undefined;
interface ChangesetInstanceKeyBatch {
// option A: array of small objects
rows: { id: Id64String; classId: Id64String; op: SqliteChangeOp }[];
// option B (cheaper, preferred for millions of rows): columnar typed buffers
// ids: BigUint64Array; // ECInstanceId
// classIds: BigUint64Array; // ECClassId
// ops: Uint8Array; // opcode per row
done: boolean; // true when the underlying walk is exhausted
}
- Columnar (option B) is the cheapest: it avoids per-row JS object allocation entirely and packs the
data into a couple of typed arrays, which marshal across N-API as buffers. This is the shape that makes
"millions of rows" genuinely cheap.
- Under
PropertyFilter.InstanceKey the native side already has ECInstanceId + ECClassId materialized
as fixed slots after Create, so assembling the batch is just reading those two slots + the opcode per
row - no extra decode.
- TS wrapper keeps a
_localBuffer, drains it, and only calls nextInstanceKeys() when empty. Result:
one crossing per batch (e.g. a fixed maxRows of a few thousand) instead of 3-4 per row.
This is purely additive - the existing per-row step()/getValue() API is untouched for consumers that
need full row data.
Problem 2: PRAGMA_TABLE_INFO runs 2x per stage, every row
PreparedChangesetReader::GetColumnValues re-derives static table metadata on every row:
GetColumnCountForCurrentChangedTable(...); // SELECT COUNT(*) FROM PRAGMA_TABLE_INFO(?)
... "SELECT [name] FROM PRAGMA_TABLE_INFO(?) ORDER BY [cid]"
The statements are cached but executed every call. Column count and names for a given table are
constant for the entire walk. An UPDATE processes both stages, so that is 4 PRAGMA scans per row
re-computing constant metadata. Cache column count + ordered names keyed by table name (populated lazily on
first encounter) to eliminate this.
Problem 3: redundant Old-stage processing on UPDATE for instance keys
ReFetchValues builds both New and Old stages for an UPDATE. For PropertyFilter.InstanceKey the key is
identical in both stages (PK and ClassId do not change on update), so the entire Old-stage pass
(another GetColumnValues + 2 PRAGMA + ResolveClassId + Create) plus the extra getValue(Old)
crossing produces no new information. Under InstanceKey, skip Old-stage processing for updates (or have the
batch drain emit a single key per update row).
Minor: full column map built even for InstanceKey
GetColumnValues maps every present column into an unordered_map<Utf8String, DbValue> per stage, when
InstanceKey only needs the PK + ClassId columns. No value decode happens, but it is O(columns) string-keyed
allocations per row. Lower priority than 1-3, but relevant at scale.
Priority
- Batch/chunk drain (Problem 1) - biggest win, removes the per-row N-API cost that dominates the
lightweight path. Model after ECSqlReader._localRows paging; prefer columnar typed buffers.
- PRAGMA caching (Problem 2) - cheap native-side fix, removes 2-4 redundant pragma scans per row.
- Skip Old stage on UPDATE for InstanceKey (Problem 3) - halves per-update work for key-only
consumers.
All are native-side and backward compatible; none change the existing ChangesetReader API surface.
Files referenced
itwinjs-core/core/backend/src/ChangesetReader.ts - step(), getValue wrapper
itwinjs-core/core/common/src/ECSqlReader.ts - _localRows paging pattern to mirror
imodel-native/iModelJsNodeAddon/IModelJsNative.cpp - NativeChangesetReader (step/getValue/getChangeMetadata bindings)
imodel-native/iModelCore/ECDb/ECDb/PreparedChangesetReader.cpp - Step, ReFetchValues, GetColumnValues, GetInstanceKey
imodel-native/iModelCore/ECDb/ECDb/ChangesetValueFactory.cpp - Create (InstanceKey short-circuit)
Had a brief chat with the Jobs and Transformer people and checked our new "happy path" for getting changes instance IDs, I found a few issues that we should tackle because they cost performance. Not all of them confirmed, the biggest concern is about crossing the N-API boundary too often.
Details:
Consumers like the iModel transformer and the imodels-jobs changed-elements service only need the
instance keys of changed rows -
{ ECInstanceId, ECClassId, op }- not the decoded property values.The new
ChangesetReader(nativePreparedChangesetReader, TS wrapper incore/backend/src/ChangesetReader.ts) already supports this cheaply viaPropertyFilter.InstanceKey,which short-circuits
ChangesetValueFactory::CreatebeforeBuildPropertyFields:That removes the expensive value-decoding work (nav props, DateTime conversion, structs, arrays, GUID
/geometry). The remaining cost is not decoding - it is per-row overhead, dominated by N-API boundary
crossings and repeated PRAGMA lookups. For a workload of hundreds of changesets x millions of rows this
fixed cost dominates.
Problem 1 (primary): per-row N-API chatter
The TS
step()wrapper crosses the native boundary 3-4 times per row:Each of
getChangeMetadata/getValuenot only crosses but allocates and marshals a transient napiobject (
getValuereturns{ data, key, changeFetchedPropNames }, wheredatais itself an object andchangeFetchedPropNamesan array). For a 1M-row changeset that is ~3-4M boundary crossings plus ~2-3Mthrowaway JS objects. The N-API boundary is the known-expensive part here, and we pay it several times per
row.
Proposed fix: batch/chunk results like
ECSqlReader/ ConcurrentQuery'sQueryReaderECSqlReadernever crosses per row. It drains a local page buffer (_localRows) and only refills it whenempty (
fetchRows()->readRows()), so the boundary cost is amortized across a whole page:Note: ConcurrentQuery's page size is quota-sized, not a fixed row count - the backend returns a
Partialstatus with as many rows as fit the time/memory quota, and the reader keeps requesting untilDone. The transfer is also request/response (RPC/IPC on the frontend, native boundary on the backend)rather than strictly N-API. The relevant property is what we want to copy: one transfer per chunk, not
per row. For the changeset batch API below we can pick a fixed
maxRowsinstead of a quota.Mirror that for the lightweight changeset path. Add a native batch drain that advances the reader and
returns up to
maxRowsinstance keys in a single crossing, with the TS wrapper iterating the local buffer:data into a couple of typed arrays, which marshal across N-API as buffers. This is the shape that makes
"millions of rows" genuinely cheap.
PropertyFilter.InstanceKeythe native side already hasECInstanceId+ECClassIdmaterializedas fixed slots after
Create, so assembling the batch is just reading those two slots + the opcode perrow - no extra decode.
_localBuffer, drains it, and only callsnextInstanceKeys()when empty. Result:one crossing per batch (e.g. a fixed
maxRowsof a few thousand) instead of 3-4 per row.This is purely additive - the existing per-row
step()/getValue()API is untouched for consumers thatneed full row data.
Problem 2:
PRAGMA_TABLE_INFOruns 2x per stage, every rowPreparedChangesetReader::GetColumnValuesre-derives static table metadata on every row:The statements are cached but executed every call. Column count and names for a given table are
constant for the entire walk. An UPDATE processes both stages, so that is 4 PRAGMA scans per row
re-computing constant metadata. Cache column count + ordered names keyed by table name (populated lazily on
first encounter) to eliminate this.
Problem 3: redundant Old-stage processing on UPDATE for instance keys
ReFetchValuesbuilds both New and Old stages for an UPDATE. ForPropertyFilter.InstanceKeythe key isidentical in both stages (PK and ClassId do not change on update), so the entire Old-stage pass
(another
GetColumnValues+ 2 PRAGMA +ResolveClassId+Create) plus the extragetValue(Old)crossing produces no new information. Under InstanceKey, skip Old-stage processing for updates (or have the
batch drain emit a single key per update row).
Minor: full column map built even for InstanceKey
GetColumnValuesmaps every present column into anunordered_map<Utf8String, DbValue>per stage, whenInstanceKey only needs the PK + ClassId columns. No value decode happens, but it is O(columns) string-keyed
allocations per row. Lower priority than 1-3, but relevant at scale.
Priority
lightweight path. Model after
ECSqlReader._localRowspaging; prefer columnar typed buffers.consumers.
All are native-side and backward compatible; none change the existing
ChangesetReaderAPI surface.Files referenced
itwinjs-core/core/backend/src/ChangesetReader.ts-step(),getValuewrapperitwinjs-core/core/common/src/ECSqlReader.ts-_localRowspaging pattern to mirrorimodel-native/iModelJsNodeAddon/IModelJsNative.cpp-NativeChangesetReader(step/getValue/getChangeMetadata bindings)imodel-native/iModelCore/ECDb/ECDb/PreparedChangesetReader.cpp-Step,ReFetchValues,GetColumnValues,GetInstanceKeyimodel-native/iModelCore/ECDb/ECDb/ChangesetValueFactory.cpp-Create(InstanceKey short-circuit)