Table Diff Guide
SQLMesh's table diff tool allows you to compare the schema and data of two data objects. It supports comparison of a SQLMesh model across two environments or direct comparison of database tables or views.
Note: Table diff requires the two objects to already exist in your project's underlying database or engine. If comparing models, this means you should have already planned and applied your changes to an environment.
Table diff comparisons
Table diff executes two types of comparison on the source and target objects: a schema diff and a row diff.
The schema diff identifies whether fields have been added, removed, or changed data types in the target object relative to the source object.
The row diff identifies changes in data values across columns with the same name and data type in both tables. It does this by performing an
OUTER JOIN of the two tables then, for each column with the same name and data type, comparing data values from one table to those from the other.
The table diff tool can be called in two ways: comparison of a SQLMesh model across two project environments or direct comparison of tables/views. It executes the comparison using the database or engine specified in the SQLMesh project configuration.
Diffing models across environments
Compare a SQLMesh model across environments with the SQLMesh CLI interface by using the command
sqlmesh table_diff [source environment]:[target environment] [model name].
For example, we could make two modifications to the SQLMesh quickstart model
- Change the row whose
- Remove row whose
1by adding a
sqlmesh plan dev and applying the plan, the updated model will be present in the
dev environment but not in
Compare the two versions of the model with the table diff tool by running
sqlmesh table_diff prod:dev sqlmesh_example.incremental_model.
The first argument
prod:dev specifies that
prod is the source environment to which we will compare the target environment
dev. The second argument
sqlmesh_example.incremental_model is the name of the model to compare across the
grain is set to
[id, ds] in the
MODEL statement, SQLMesh knows how to perform the join between the two models. If
grain were not set, the command would need to include the
-o id -o ds option to specify that the tables should be joined on column
-o once for each join column.
Table diff returns this output:
The "Schema Diff" section shows that the
DEV schemas match because no columns have been added, removed, or change data type.
The "Row Counts" section shows that 6 rows were successfully joined and the 1 row we removed is only present in the
COMMON ROWS column comparison stats section shows that the
item_id column values had an 83.3% match for the six joined rows (5 of the 6 row values were unchanged by our
CASE WHEN statement). All non-join columns with the same data type in both tables are included in the comparison stats.
If we include the
--show-sample option in the command, the output also includes rows from the different join components.
COMMON ROWS sample data differences section displays the row whose
item_id value changed. The
PROD__item_id column shows that
item_id is 3 in the
PROD table, and the
DEV__item_id column shows that
item_id is 4.0 in the
PROD ONLY sample rows section shows the one row that is present in
PROD but not in
Diffing tables or views
Compare specific tables or views with the SQLMesh CLI interface by using the command
sqlmesh table_diff [source table]:[target table].
The source and target tables should be fully qualified with catalog or schema names such that a SQL query of the form
SELECT ... FROM [source table] would execute correctly.
Recall that SQLMesh models are accessible via views in the database. In the
prod environment, the view has the same name as the model. For example, in the quickstart example project the
prod incremental model is represented by the view
sqlmesh_example.incremental_model. In the
__dev is appended to the schema name so the incremental model is represented by the view
We can replicate the comparison in the previous section by comparing the model views directly. Because we are passing the view names directly, the command needs to manually specify that the join should be on the
ds columns with the
-o id -o ds flags.
The output matches, with the exception of the column labels in the
COMMON ROWS sample data differences. The underlying table for each column is indicated by
s__ for "source" table (first table in the command's colon operator
t__ for "target" table (second table in the command's colon operator