SQL models
SQL models are the main type of models used by SQLMesh. These models can be defined using either SQL or Python that generates SQL.
SQL-based definition
The SQL-based definition of SQL models is the most common one, and consists of the following sections:
- The
MODEL
DDL - Optional pre-statements
- A single query
- Optional post-statements
These models are designed to look and feel like you're simply using SQL, but they can be customized for advanced use cases.
To create a SQL-based model, add a new file with the .sql
suffix into the models/
directory (or a subdirectory of models/
) within your SQLMesh project. Although the name of the file doesn't matter, it is customary to use the model's name (without the schema) as the file name. For example, the file containing the model sqlmesh_example.seed_model
would be named seed_model.sql
.
Example
MODEL
DDL
The MODEL
DDL is used to specify metadata about the model such as its name, kind, owner, cron, and others. This should be the first statement in your SQL-based model's file.
Refer to MODEL
properties for the full list of allowed properties.
Optional pre/post-statements
Optional pre/post-statements can help prepare the model's query and execute "clean-up" actions after it has successfully executed, respectively. For example, you might create temporary views, set permissions, or evict previously cached tables from memory.
However, be careful not to run any statement that could conflict with the execution of another statement if the models run concurrently, such as creating a physical table.
The model query
The model must contain a standalone query, which can be a single SELECT
expression, or multiple SELECT
expressions combined with the UNION
, INTERSECT
, or EXCEPT
operators. The result of this query will be used to populate the model's table or view.
Python-based definition
The Python-based definition of SQL models consists of a single python function, decorated with SQLMesh's @model
decorator. The decorator is required to have the is_sql
keyword argument set to True
to distinguish it from Python models that return DataFrame instances.
This function's return value serves as the model's query, and it must be either a SQL string or a SQLGlot expression. The @model
decorator is used to define the model's metadata and, optionally its pre/post-statements that are also in the form of SQL strings or SQLGlot expressions.
Defining a SQL model using Python can be beneficial in cases where its query is too complex to express cleanly in SQL, for example due to having many dynamic components that would require heavy use of macros. Since Python-based models generate SQL, they support the same features as regular SQL models, such as column-level lineage, automatic change categorization, etc.
To create a Python-based model, add a new file with the .py
suffix into the models/
directory (or a subdirectory of models/
) within your SQLMesh project. The file naming conventions of Python-based models are similar to those of SQL-based models. Inside this file, define a function named entrypoint
with a single evaluator
argument, as shown in the example below.
Example
The following example demonstrates how the above db.customers
model can be defined as a Python-based model using SQLGlot's Expression
builder methods:
Note that one could also define this model by simply returning a string that contained the SQL query of the SQL-based example. Strings used as pre/post-statements or return values in Python-based models will be parsed into SQLGlot expressions, which means that SQLMesh will still be able to understand them semantically and thus provide information such as column-level lineage.
@model
decorator
The @model
decorator is the Python equivalent of the MODEL
DDL. In addition to model metadata and configuration information, one can also set the keyword arguments pre_statements
and post_statements
to a list of SQL strings and/or SQLGlot expressions to define the pre/post-statements of the model, respectively.
Note: All of the metadata field names are the same as those in the MODEL
DDL.
Automatic dependencies
SQLMesh parses your SQL, so it understands what the code does and how it relates to other models. There is no need for you to manually specify dependencies to other models with special tags or commands.
For example, consider a model with this query:
SQLMesh will detect that the model depends on both employees
and countries
. When executing this model, it will ensure that employees
and countries
are executed first.
External dependencies not defined in SQLMesh are also supported. SQLMesh can either depend on them implicitly through the order in which they are executed, or through signals if you are using Airflow.
Although automatic dependency detection works most of the time, there may be specific cases for which you want to define dependencies manually. You can do so in the MODEL
DDL with the dependencies property.
Conventions
SQLMesh encourages explicitly specifying the data types of a model's columns through casting. This allows SQLMesh to understand the data types in your models, and it prevents incorrect type inference. SQLMesh supports the casting format <column name>::<data type>
in models of any SQL dialect.
Explicit SELECTs
Although SELECT *
is convenient, it is dangerous because a model's results can change due to external factors (e.g., an upstream source adding or removing a column). In general, we encourage listing out every column you need or using create_external_models
to capture the schema of an external data source.
If you select from an external source, SELECT *
will prevent SQLMesh from performing some optimization steps and from determining upstream column-level lineage. Use an external
model kind to enable optimizations and upstream column-level lineage for external sources.
Transpilation
SQLMesh leverages SQLGlot to parse and transpile SQL. Therefore, you can write your SQL in any supported dialect and transpile it into another supported dialect.
You can also use advanced syntax that may not be available in your engine of choice. For example, x::int
is equivalent to CAST(x as INT)
, but is only supported in some dialects. SQLGlot allows you to use this feature regardless of what engine you're using.
Additionally, you won't have to worry about minor formatting differences such as trailing commas, as SQLGlot will remove them at parse time.
Macros
Although standard SQL is very powerful, complex data systems often require running SQL queries with dynamic components such as date filters. For example, you may want to change the date ranges in a between
statement so that you can get the latest batch of data. SQLMesh provides these dates automatically through macro variables.
Additionally, large queries can be difficult to read and maintain. In order to make queries more compact, SQLMesh supports a powerful macro syntax as well as Jinja, allowing you to write macros that make your SQL queries easier to manage.