Skip to content

5.x: RFC - Schema Drift Detection #971

@dereuromark

Description

@dereuromark

Status quo:

No built-in drift detection => TestHelper/Setup plugin?

I am happy to put this in the plugins mentioned, but maybe this is worth considering directly here?

Issue

Schema drift occurs when your actual database schema gets out of sync with what your migration history says it should be. This can happen due to:

  • Manual SQL changes directly on the database (hotfixes, DBA interventions)
  • Failed migrations that partially applied
  • Different developers applying migrations in different orders
  • External tools modifying the schema
  • Restoring from backups that don't match the migration state

How Other Frameworks Handle It

Prisma's Approach (Shadow Database)

Prisma uses a clever technique with a "shadow database":

  1. Create temporary shadow database
  2. Apply ALL migrations from history to shadow database
  3. Compare shadow database schema vs actual development database
  4. Report any differences as "drift"
  5. Delete shadow database

When you run prisma migrate dev:
Drift detected: Your database schema is not in sync with your migration history.

The following is a summary of the differences between the expected database
schema given your migrations files, and the actual schema of the database.

It should be understood as the set of changes to get from the expected schema
to the actual schema.

[+] Added table _temp_hotfix_users
[-] Removed column legacy_id from table users
[~] Changed column status on table orders: type changed from VARCHAR(50) to TEXT

Django's Approach

Django compares your Python models against the current migration history:

// Check if models match migrations (CI-friendly)
python manage.py makemigrations --check --dry-run

// Output if drift exists:
Migrations for 'myapp':
myapp/migrations/0005_auto.py
- Add field new_column to mymodel

SystemCheckError: Your models have changes that are not yet reflected
in a migration.

Django also has inspectdb to generate models from an existing database, allowing reverse comparison.

Rails' Approach

Rails maintains a db/schema.rb file that represents the current expected state:

// Detect drift by comparing schema.rb to actual database
rails db:schema:dump # Generate from actual DB
git diff db/schema.rb # See what changed

What CakePHP Migrations Could Offer

Currently, you can manually check with:
// Generate a diff migration - if it's not empty, you have drift
bin/cake bake migration_diff CheckDrift

But a dedicated command could be more useful:

// Proposed: Explicit drift detection
bin/cake migrations check

// Output:
Schema drift detected!

Tables in database but not in migrations:
- _temp_backup_users

Missing columns (expected but not in database):
- orders.discount_code

Extra columns (in database but not expected):
- users.legacy_field

Column type mismatches:
- products.price: expected DECIMAL(10,2), found DECIMAL(8,2)

Indexes missing:
- idx_users_email on users(email)

Implementation Approaches

Option A: Shadow Database (like Prisma)
// 1. Create temp database
// 2. Run all migrations on temp database
// 3. Compare schemas using INFORMATION_SCHEMA queries
// 4. Report differences
// 5. Drop temp database

  • Pros: Accurate, catches all drift types
  • Cons: Requires CREATE DATABASE permission, slower

Option B: Schema Snapshot Comparison
// 1. Store expected schema as JSON/PHP after each migration
// 2. Query actual schema from INFORMATION_SCHEMA
// 3. Diff the two

  • Pros: Fast, no extra database needed
  • Cons: Need to maintain snapshot file

Option C: Migration History Replay (in-memory)
// 1. Parse all migration files to build expected schema state
// 2. Query actual database schema
// 3. Compare

  • Pros: No extra files or databases
  • Cons: Complex parsing, may miss raw SQL changes

Use Cases

  1. CI/CD pipelines: Fail deployment if production has unexpected changes
  2. Team development: Detect if someone made manual DB changes
  3. Post-incident: Verify schema after emergency hotfixes
  4. Debugging: "Why doesn't my migration work?" - because the table already has that column from a hotfix

Example Command Output (Proposed)

$ bin/cake migrations check --connection default

Checking schema drift...

Comparing migration history against database 'myapp_production'...

  ✗ Schema drift detected!

  +---------------------+----------------+----------------------------------+
  | Type                | Object         | Details                          |
  +---------------------+----------------+----------------------------------+
  | EXTRA_TABLE         | _backup_users  | Table exists but not in history  |
  | MISSING_COLUMN      | orders.coupon  | Expected but not found           |
  | TYPE_MISMATCH       | users.status   | Expected: VARCHAR(20)            |
  |                     |                | Actual: VARCHAR(50)              |
  | MISSING_INDEX       | idx_email      | On users(email)                  |
  +---------------------+----------------+----------------------------------+

Run 'bin/cake bake migration_diff FixDrift' to generate a migration
that resolves these differences.

This would be valuable for production environments where manual changes sometimes happen and need to be tracked back into the migration history.

Refs #959

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions