You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/assets/materialization.md
+60-5Lines changed: 60 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -435,10 +435,28 @@ When changes are detected in non-primary key columns:
435
435
436
436
**Automatically added columns:**
437
437
438
-
- `_valid_from`: TIMESTAMP when the record version became active (set to `CURRENT_TIMESTAMP()`)
439
-
- `_valid_until`: TIMESTAMP when the record version became inactive (set to `TIMESTAMP('9999-12-31')` for current records)
438
+
- `_valid_from`: TIMESTAMP when the record version became active (defaults to `CURRENT_TIMESTAMP()`, or uses `incremental_key` value if specified)
439
+
- `_valid_until`: TIMESTAMP when the record version became inactive (set to `TIMESTAMP('9999-12-31')` for current records, or uses `incremental_key` value when a record is expired due to changes)
440
440
- `_is_current`: BOOLEAN indicating if this is the current version of the record
441
441
442
+
**Optional: Using `incremental_key` for timestamps:**
443
+
444
+
By default, `_valid_from` and `_valid_until` are set using `CURRENT_TIMESTAMP()`. However, if your source data has a column that indicates when changes actually occurred (e.g., an `updated_at` timestamp), you can specify it using the `incremental_key` option:
445
+
446
+
```yaml
447
+
materialization:
448
+
type: table
449
+
strategy: scd2_by_column
450
+
incremental_key: updated_at
451
+
```
452
+
453
+
When `incremental_key` is specified:
454
+
- `_valid_from`for new/updated records will be set to the value of the `incremental_key` column
455
+
- `_valid_until`for records being expired (due to changes) will be set to the value of the `incremental_key` column from the new record
456
+
- Records expiring because they're no longer in the source data will still use `CURRENT_TIMESTAMP()` for `_valid_until`
457
+
458
+
This is useful when you want the SCD2 timeline to reflect the actual business timestamps from your source data rather than the processing time.
459
+
442
460
**NOTE:***
443
461
444
462
- Unless otherwise specified by `partition_by`, the SCD2 table will be partitioned by `_valid_from` for platforms which support partitioning (BigQuery, Athena, Snowflake).
@@ -475,6 +493,43 @@ UNION ALL
475
493
SELECT 3 AS ID, 'Keyboard' AS Name, 89.99 AS Price
476
494
```
477
495
496
+
**Example with `incremental_key`:**
497
+
498
+
When you want `_valid_from` and `_valid_until` to reflect actual business timestamps instead of processing time:
499
+
500
+
```bruin-sql
501
+
/* @bruin
502
+
name: test.product_catalog
503
+
type: bq.sql
504
+
505
+
materialization:
506
+
type: table
507
+
strategy: scd2_by_column
508
+
incremental_key: updated_at
509
+
510
+
columns:
511
+
- name: ID
512
+
type: INTEGER
513
+
description: "Unique identifier for Product"
514
+
primary_key: true
515
+
- name: Name
516
+
type: VARCHAR
517
+
description: "Name of the Product"
518
+
- name: Price
519
+
type: FLOAT
520
+
description: "Price of the Product"
521
+
- name: updated_at
522
+
type: TIMESTAMP
523
+
description: "When the product was last modified in the source system"
524
+
@bruin */
525
+
526
+
SELECT 1 AS ID, 'Wireless Mouse' AS Name, 29.99 AS Price, TIMESTAMP '2024-01-15 10:30:00' AS updated_at
527
+
UNION ALL
528
+
SELECT 2 AS ID, 'USB Cable' AS Name, 12.99 AS Price, TIMESTAMP '2024-01-14 14:00:00' AS updated_at
529
+
```
530
+
531
+
In this case, `_valid_from` will be set to the `updated_at` value from each record, preserving the actual business timeline of when changes occurred.
532
+
478
533
**Example behavior:**
479
534
480
535
Let's say you want to create a new table to track product catalog with SCD2. If the table doesn't exist yet, you'll need an initial run with the `--full-refresh` flag:
@@ -628,9 +683,9 @@ Notice how:
628
683
| Aspect | scd2_by_column | scd2_by_time |
629
684
|--------|----------------|--------------|
630
685
| **Change Detection** | Automatically detects changes in any non-primary key column | Based on time values in the incremental_key column |
631
-
| **_valid_from Value** | Set to `CURRENT_TIMESTAMP()` when change is processed | Derived from the incremental_key column value |
632
-
| **Use Case** | When you want to track any column changes regardless of when they occurred | When your source data has reliable timestamps indicating when changes happened |
633
-
| **Configuration** | Only requires primary_key columns | Requires both primary_key columns and incremental_key |
686
+
| **_valid_from Value** | Set to `CURRENT_TIMESTAMP()` by default, or uses `incremental_key` value if specified | Always derived from the incremental_key column value |
687
+
| **Use Case** | When you want to track any column changes; optionally use `incremental_key` for business timestamps | When your source data has reliable timestamps indicating when changes happened |
688
+
| **Configuration** | Only requires primary_key columns; `incremental_key` is optional | Requires both primary_key columns and incremental_key |
634
689
635
690
> [!WARNING]
636
691
> SCD2 materializations are currently only supported for BigQuery, Snowflake, Postgres, Amazon Redshift, MySQL, DuckDB, and Databricks.
0 commit comments