Table migration guide
SQLMesh projects can read directly from tables not managed by SQLMesh, but in some scenarios it may be useful to migrate an existing table into a SQLMesh project.
This guide describes two methods for migrating existing tables into a SQLMesh project.
Do you need to migrate?
SQLMesh does not assume it manages all data sources: SQL models can read from any data source accessible by the SQL engine, treating them as external models that include column-level lineage or as generic sources. This approach is preferred to migrating existing tables into a SQLMesh project.
You should only migrate a table if both of the following are true:
- The table is ingesting from an upstream source that will continue generating new data
- The table is either too large to be rebuilt or cannot be rebuilt because the necessary historical data is unavailable
If the table's upstream source will not generate more data, there is no ongoing activity for SQLMesh to manage. A SQLMesh model or any other downstream consumer can select directly from the table under its current name.
If the table's upstream source is generating new data, we assume that the table is already being loaded incrementally, as there is no need for migration if the table can be fully rebuilt.
We describe two migration methods below. The stage and union method is preferred and should be used if feasible.
Migration methods
This section describes two methods for migrating tables into SQLMesh.
The method descriptions contain renaming steps that are only necessary if downstream consumers must select from the original table name (e.g., step 2 in the first example). If that is not the case, the original table can retain its name.
The table and model names in the examples below are arbitrary - you may name them whatever is appropriate for your project.
Stage and union
The stage and union method works by treating new and historical data as separate sources.
It requires creating an incremental staging model to ingest new records and a VIEW
model that unions those records with the existing table's static historical records.
Example
Consider an existing table named my_schema.existing_table
. Migrating this table with the stage and union method consists of five steps:
- Ensure
my_schema.existing_table
is up to date (has ingested all available source data) - Rename
my_schema.existing_table
to any other name, such asmy_schema.existing_table_historical
- Optionally, enable column-level lineage for the table by making it an
EXTERNAL
model and adding it to the project'sexternal_models.yaml
file
- Optionally, enable column-level lineage for the table by making it an
- Create a new incremental staging model named
my_schema.existing_table_staging
(see below for code) - Create a new
VIEW
model namedmy_schema.existing_table
(see below for code) - Run
sqlmesh plan
to create and backfill the models
The staging model would contain code similar to the following for an INCREMENTAL_BY_TIME_RANGE
model. An INCREMENTAL_BY_UNIQUE_KEY
model would have a different kind
specification in the MODEL
DDL and might not include the query's WHERE
clause.
The primary model would contain code similar to:
Changes to columns in the source data or staging model may require modifying the code selecting from the historical data so the two tables can be safely unioned.
Snapshot replacement
The snapshot replacement method works by renaming an existing table to a name that SQLMesh recognizes as an existing SQLMesh model.
Background
This section briefly describes how SQLMesh's virtual data environments, forward-only models, and start times work. This information is not necessary for migrating tables but is necessary for understanding why each step in the migration process is required.
Virtual data environments
Conceptually, SQLMesh divides the database into a "physical layer" where data is stored and a "virtual layer" where data is accessed by end users. The physical layer stores materialized objects like tables, and the virtual layer contains views that point to the physical layer objects.
Each time a SQLMesh plan
adds or modifies a model, SQLMesh creates a physical layer "snapshot" object to which the virtual layer view points. The snapshot replacement method simply renames the migrating table to the name of the appropriate snapshot table.
Forward-only models
Sometimes a model's data may be so large that it is not feasible to rebuild either its own or its downstream models' physical tables. In those situations a "forward only" model can be used. The name reflects that the change is only applied "going forward" in time.
Historical data already in the migrated table should not be overwritten, so we specify that the new model is forward-only in step 3a below.
Start time
SQLMesh incremental by time models track the time periods whose data a model has loaded with the interval approach.
The interval approach requires specifying the earliest time interval SQLMesh should track - when time "starts" for the model. For migrated tables, SQLMesh should never load data for the time intervals the table ingested before migration, so interval tracking should start immediately after the time of the last ingested record.
In the example below, we set the model's start time in its MODEL
DDL (step 3b) and pass it as an option to the sqlmesh plan
command (step 3c). The same value must be used in both the MODEL
DDL and the plan command. In this example, the existing table's data ingestion stopped on 2023-12-31, so the model and plan start date is the next day 2024-01-01.
Example
Consider an existing table named my_schema.existing_table
. Migrating this table with the snapshot replacement method involves five steps:
- Ensure
my_schema.existing_table
is up to date (has ingested all available source data) - Rename
my_schema.existing_table
to any other name, such asmy_schema.existing_table_temp
-
Create and initialize an empty incremental model named
my_schema.existing_table
:a. Make the model forward only by setting the
MODEL
DDLkind
'sforward_only
key totrue
b. Specify the start of the first time interval SQLMesh should track in the
MODEL
DDLstart
key (example uses "2024-01-01")c. Create the model in the SQLMesh project without backfilling any data by running
sqlmesh plan [environment name] --skip-backfill --start 2024-01-01
, replacing "[environment name]" with an environment name other thanprod
and using the same start date from theMODEL
DDL in step 3b. -
Determine the name of the model's snapshot physical table by running
sqlmesh table_name my_schema.existing_table
. For example, it might returnsqlmesh__my_schema.existing_table_123456
. - Rename the original table
my_schema.existing_table_temp
tosqlmesh__my_schema.existing_table_123456
The model would have code similar to: