SQLMesh macros
Macro systems: two approaches
SQLMesh macros behave differently than those of templating systems like Jinja.
Macro systems are based on string substitution. The macro system scans code files, identifies special characters that signify macro content, and replaces the macro elements with other text.
In a general sense, that is the entire functionality of templating systems. They have tools that provide control flow logic (if-then) and other functionality, but that functionality is solely to support substituting in the correct strings.
Templating systems are intentionally agnostic to the programming language being templated, and most of them work for everything from blog posts to HTML to SQL.
In contrast, SQLMesh macros are designed specifically for generating SQL code. They have semantic understanding of the SQL code being created by analyzing it with the Python sqlglot library, and they allow use of Python code so users can tidily implement sophisticated macro logic.
SQLMesh macro approach
This section describes how SQLMesh macros work under the hood. Feel free to skip over this section and return if and when it is useful. This information is not required to use SQLMesh macros, but it will be useful for debugging any macros exhibiting puzzling behavior.
The critical distinction between the SQLMesh macro approach and templating systems is the role string substitution plays. In templating systems, string substitution is the entire and only point.
In SQLMesh, string substitution is just one step toward modifying the semantic representation of the SQL query. SQLMesh macros work by building and modifying the semantic representation of the SQL query.
After processing all the non-SQL text, it uses the substituted values to modify the semantic representation of the query to its final state.
It uses the following five step approach to accomplish this:
-
Parse the text with the appropriate sqlglot SQL dialect (e.g., Postgres, BigQuery, etc.). During the parsing, it detects the special macro symbol
@
to differentiate non-SQL from SQL text. The parser builds a semantic representation of the SQL code's structure, capturing non-SQL text as "placeholder" values to use in subsequent steps. -
Examine the placeholder values to classify them as one of the following types:
- Creation of user-defined macro variables with the
@DEF
operator (see more about user-defined macro variables) - Macro variables: SQLMesh pre-defined, user-defined local, and user-defined global
- Macro functions, both SQLMesh's and user-defined
- Creation of user-defined macro variables with the
-
Substitute macro variable values where they are detected. In most cases, this is direct string substitution as with a templating system.
-
Execute any macro functions and substitute the returned values.
-
Modify the semantic representation of the SQL query with the substituted variable values from (3) and functions from (4).
User-defined variables
SQLMesh supports three kinds of user-defined macro variables: global, gateway, and local.
Global and gateway macro variables are defined in the project configuration file and can be accessed in any project model. Local macro variables are defined in a model definition and can only be accessed in that model.
Macro variables with the same name may be specified at any or all of the global, gateway, and local levels. When variables are specified at multiple levels, the value of the most specific level takes precedence. For example, the value of a local variable takes precedence over the value of a gateway variable with the same name, and the value of a gateway variable takes precedence over the value of a global variable.
Global variables
Global variables are defined in the project configuration file variables
key.
Global variable values may be any of the following data types or lists or dictionaries containing these types: int
, float
, bool
, str
.
Access global variable values in a model definition using the @<VAR_NAME>
macro or the @VAR()
macro function. The latter function requires the name of the variable in single quotes as the first argument and an optional default value as the second argument. The default value is a safety mechanism used if the variable name is not found in the project configuration file.
For example, this SQLMesh configuration key defines six variables of different data types:
A model definition could access the int_var
value in a WHERE
clause like this:
Alternatively, the same variable can be accessed by passing the variable name into the @VAR()
macro function. Note that the variable name is in single quotes in the call @VAR('int_var')
:
A default value can be passed as a second argument to the @VAR()
macro function, which will be used as a fallback value if the variable is missing from the configuration file.
In this example, the WHERE
clause would render to WHERE some_value = 0
because no variable named missing_var
was defined in the project configuration file:
A similar API is available for Python macro functions via the evaluator.var
method and Python models via the context.var
method.
Gateway variables
Like global variables, gateway variables are defined in the project configuration file. However, they are specified in a specific gateway's variables
key:
Access them in models using the same methods as global variables.
Gateway-specific variable values take precedence over variables with the same name specified in the root variables
key.
Local variables
Local macro variables are defined in a model. Local variable values take precedence over global or gateway-specific variables with the same name.
Define your own local macro variables with the @DEF
macro operator. For example, you could set the macro variable macro_var
to the value 1
with:
SQLMesh has three basic requirements for using the @DEF
operator:
- The
MODEL
statement must end with a semi-colon;
- All
@DEF
uses must come after theMODEL
statement and before the SQL query - Each
@DEF
use must end with a semi-colon;
For example, consider the following model sqlmesh_example.full_model
from the SQLMesh quickstart guide:
This model could be extended with a user-defined macro variable to filter the query results based on item_size
like this:
This example defines the macro variable size
with @DEF(size, 1)
. When the model is run, SQLMesh will substitute in the number 1
where @size
appears in the WHERE
clause.
Macro functions
In addition to inline user-defined variables, SQLMesh also supports inline macro functions. These functions can be used to express more readable and reusable logic than is possible with variables alone. Lets look at an example:
Multiple arguments can be expressed in a macro function as well:
You can nest macro functions like so:
Macro operators
SQLMesh's macro system has multiple operators that allow different forms of dynamic behavior in models.
@EACH
@EACH
is used to transform a list of items by applying a function to each of them, analogous to a for
loop.
Learn more about for
loops and @EACH
Before diving into the @EACH
operator, let's dissect a for
loop to understand its components.
for
loops have two primary parts: a collection of items and an action that should be taken for each item. For example, here is a for
loop in Python:
This for loop prints each number present in the brackets:
The first line of the example sets up the loop, doing two things:
- Telling Python that code inside the loop will refer to each item as
number
- Telling Python to step through the list of items in brackets
The second line tells Python what action should be taken for each item. In this case, it prints the item.
The loop executes one time for each item in the list, substituting in the item for the word number
in the code. For example, the first time through the loop the code would execute as print(4)
and the second time as print(5)
.
The SQLMesh @EACH
operator is used to implement the equivalent of a for
loop in SQLMesh macros.
@EACH
gets its name from the fact that a loop performs the action "for each" item in the collection. It is fundamentally equivalent to the Python loop above, but you specify the two loop components differently.
@EACH
takes two arguments: a list of items and a function definition.
The function definition is specified inline. This example specifies the identity function, returning the input unmodified:
The loop is set up by the first argument: @EACH([4, 5, 6]
tells SQLMesh to step through the list of items in brackets.
The second argument number -> number
tells SQLMesh what action should be taken for each item using an "anonymous" function (aka "lambda" function). The left side of the arrow states what name the code on the right side will refer to each item as (like name
in for [name] in [items]
in a Python for
loop).
The right side of the arrow specifies what should be done to each item in the list. number -> number
tells @EACH
that for each item number
it should return that item (e.g., 1
).
SQLMesh macros use their semantic understanding of SQL code to take automatic actions based on where in a SQL query macro variables are used. If @EACH
is used in the SELECT
clause of a SQL statement:
- It prints the item
- It knows fields are separated by commas in
SELECT
, so it automatically separates the printed items with commas
Because of the automatic print and comma-separation, the anonymous function number -> number
tells @EACH
that for each item number
it should print the item and separate the items with commas. Therefore, the complete output from the example is:
This basic example is too simple to be useful. Many uses of @EACH
will involve using the values as one or both of a literal value and an identifier.
For example, a column favorite_number
in our data might contain values 4
, 5
, and 6
, and we want to unpack that column into three indicator (i.e., binary, dummy, one-hot encoded) columns. We could write that by hand as:
In that SQL query each number is being used in two distinct ways. For example, 4
is being used:
- As a literal numeric value in
favorite_number = 4
- As part of a column name in
favorite_4
We describe each of these uses separately.
For the literal numeric value, @EACH
substitutes in the exact value that is passed in the brackets, including quotes. For example, consider this query similar to the CASE WHEN
example above:
It renders to this SQL:
Note that the number 4
, 5
, and 6
are unquoted in both the input @EACH
array in brackets and the resulting SQL query.
We can instead quote them in the input @EACH
array:
And they will be quoted in the resulting SQL query:
We can place the array values at the end of a column name by using the SQLMesh macro operator @
inside the @EACH
function definition:
This query will render to:
This syntax works regardless of whether the array values are quoted or not.
NOTE: SQLMesh macros support placing macro values at the end of a column name simply using column_@x
. However if you wish to substitute the variable anywhere else in the identifier, you need to use the more explicit substitution syntax @{}
. This avoids ambiguity. These are valid uses: @{x}_column
or my_@{x}_column
.
@IF
SQLMesh's @IF
macro allows components of a SQL query to change based on the result of a logical condition.
It includes three elements:
- A logical condition that evaluates to
TRUE
orFALSE
- A value to return if the condition is
TRUE
- A value to return if the condition is
FALSE
[optional]
These elements are specified as:
The value to return if the condition is FALSE
is optional - if it is not provided and the condition is FALSE
, then the macro has no effect on the resulting query.
The logical condition should be written in SQL and is evaluated with SQLGlot's SQL executor. It supports the following operators:
- Equality:
=
for equals,!=
or<>
for not equals - Comparison:
<
,>
,<=
,>=
, - Between:
[number] BETWEEN [low number] AND [high number]
- Membership:
[item] IN ([comma-separated list of items])
For example, the following simple conditions are all valid SQL and evaluate to TRUE
:
'a' = 'a'
'a' != 'b'
0 < 1
1 >= 1
2 BETWEEN 1 AND 3
'a' IN ('a', 'b')
@IF
can be used to modify any part of a SQL query. For example, this query conditionally includes sensitive_col
in the query results:
Because 1 > 0
evaluates to TRUE
, the query is rendered as:
Note that @IF(1 > 0, sensitive_col)
does not include the third argument specifying a value if FALSE
. Had the condition evaluated to FALSE
, @IF
would return nothing and only col1
would be selected.
Alternatively, we could specify that nonsensitive_col
be returned if the condition evaluates to FALSE
:
Because 1 > 2
evaluates to FALSE
, the query is rendered as:
Pre/post-statements
@IF
may be used to conditionally execute pre/post-statements:
The @IF
statement itself must end with a semi-colon, but the inner statement argument must not.
This example conditionally executes a pre/post-statement depending on the model's runtime stage, accessed via the pre-defined macro variable @runtime_stage
. The @IF
post-statement will only be executed at model evaluation time:
NOTE: alternatively, we could alter a column's type if the @runtime_stage = 'creating'
, but that would only be useful if the model is incremental and the alteration would persist. FULL
models are rebuilt on each evaluation, so changes made at their creation stage will be overwritten each time the model is evaluated.
@EVAL
@EVAL
evaluates its arguments with SQLGlot's SQL executor.
It allows you to execute mathematical or other calculations in SQL code. It behaves similarly to the first argument of the @IF
operator, but it is not limited to logical conditions.
For example, consider a query adding 5 to a macro variable:
After macro variable substitution, this would render as @EVAL(5 + 1)
and be evaluated to 6
, resulting in the final rendered query:
@FILTER
@FILTER
is used to subset an input array of items to only those meeting the logical condition specified in the anonymous function. Its output can be consumed by other macro operators such as @EACH
or @REDUCE
.
The user-specified anonymous function must evaluate to TRUE
or FALSE
. @FILTER
applies the function to each item in the array, only including the item in the output array if it meets the condition.
The anonymous function should be written in SQL and is evaluated with SQLGlot's SQL executor. It supports standard SQL equality and comparison operators - see @IF
above for more information about supported operators.
For example, consider this @FILTER
call:
It applies the condition x > 1
to each item in the input array [1,2,3]
and returns [2,3]
.
@REDUCE
@REDUCE
is used to combine the items in an array.
The anonymous function specifies how the items in the input array should be combined. In contrast to @EACH
and @FILTER
, the anonymous function takes two arguments whose values are named in parentheses.
For example, an anonymous function for @EACH
might be specified x -> x + 1
. The x
to the left of the arrow tells SQLMesh that the array items will be referred to as x
in the code to the right of the arrow.
Because the @REDUCE
anonymous function takes two arguments, the text to the left of the arrow must contain two comma-separated names in parentheses. For example, (x, y) -> x + y
tells SQLMesh that items will be referred to as x
and y
in the code to the right of the arrow.
Even though the anonymous function takes only two arguments, the input array can contain as many items as necessary.
Consider the anonymous function (x, y) -> x + y
. Conceptually, only the y
argument corresponds to items in the array; the x
argument is a temporary value created when the function is evaluated.
For the call @REDUCE([1,2,3,4], (x, y) -> x + y)
, the anonymous function is applied to the array in the following steps:
- Take the first two items in the array as
x
andy
. Apply the function to them:1 + 2
=3
. - Take the output of step (1) as
x
and the next item in the array3
asy
. Apply the function to them:3 + 3
=6
. - Take the output of step (2) as
x
and the next item in the array4
asy
. Apply the function to them:6 + 4
=10
. - No items remain. Return value from step (3):
10
.
@REDUCE
will almost always be used with another macro operator. For example, we might want to build a WHERE
clause from multiple column names:
We can use @EACH
to build each column's predicate (e.g., col1 = 1
) and @REDUCE
to combine them into a single statement:
@STAR
@STAR
is used to return a set of column selections in a query.
@STAR
is named after SQL's star operator *
, but it allows you to programmatically generate a set of column selections and aliases instead of just selecting all available columns. A query may use more than one @STAR
and may also include explicit column selections.
@STAR
uses SQLMesh's knowledge of each table's columns and data types to generate the appropriate column list. The resulting query always CAST
s columns to their data type in the source table.
@STAR
supports the following arguments, in this order:
relation
: The relation/table whose columns are being selectedalias
(optional): The alias of the relation (if it has one)except
(optional): A list of columns to excludeprefix
(optional): A string to use as a prefix for all selected column namessuffix
(optional): A string to use as a suffix for all selected column namesquote_identifiers
(optional): Whether to quote the resulting identifiers, defaults to true
SQLMesh macro operators do not accept named arguments. For example, @STAR(relation=a)
will error.
For example, consider the following query:
The arguments to @STAR
are:
1. The name of the table foo
(from the query's FROM foo
)
2. The table alias bar
(from the query's AS bar
)
3. A list of columns to exclude from the selection, containing one column c
4. A string baz_
to use as a prefix for all column names
5. A string _qux
to use as a suffix for all column names
foo
is a table that contains four columns: a
(TEXT
), b
(TEXT
), c
(TEXT
) and d
(INT
). After macro expansion, the query would be rendered as:
Note these aspects of the rendered query:
- Each column is CAST
to its data type in the table foo
(e.g., a
to TEXT
)
- Each column selection uses the alias bar
(e.g., "bar"."a"
)
- Column c
is not present because it was passed to @STAR
's except
argument
- Each column alias is prefixed with baz_
and suffixed with _qux
(e.g., "baz_a_qux"
)
Now consider a more complex example that provides different prefixes to a
and b
than to d
and includes an explicit column my_column
:
As before, foo
is a table that contains four columns: a
(TEXT
), b
(TEXT
), c
(TEXT
) and d
(INT
). After macro expansion, the query would be rendered as:
Note these aspects of the rendered query:
- Columns a
and b
have the prefix "ab_pre_"
, while column d
has the prefix "d_pre_"
- Column c
is not present because it was passed to the except
argument in both @STAR
calls
- my_column
is present in the query
@GENERATE_SURROGATE_KEY
@GENERATE_SURROGATE_KEY
generates a surrogate key from a set of columns. The surrogate key is a sequence of alphanumeric digits returned by the MD5
hash function on the concatenated column values.
The surrogate key is created by:
1. CAST
ing each column's value to TEXT
(or the SQL engine's equivalent type)
2. Replacing NULL
values with the text '_sqlmesh_surrogate_key_null_'
for each column
3. Concatenating the column values after steps (1) and (2)
4. Applying the MD5()
hash function to the concatenated value returned by step (3)
For example, the following query:
would be rendered as:
@SAFE_ADD
@SAFE_ADD
adds two or more operands, substituting NULL
s with 0
s. It returns NULL
if all operands are NULL
.
For example, the following query:
would be rendered as:@SAFE_SUB
@SAFE_SUB
subtracts two or more operands, substituting NULL
s with 0
s. It returns NULL
if all operands are NULL
.
For example, the following query:
would be rendered as:@SAFE_DIV
@SAFE_DIV
divides two numbers, returning NULL
if the denominator is 0
.
For example, the following query:
would be rendered as:@UNION
@UNION
returns a UNION
query that selects all columns with matching names and data types from the tables.
Its first argument is the UNION
"type", 'DISTINCT
(removing duplicated rows) or 'ALL'
(returning all rows). Subsequent arguments are the tables to be combined.
Let's assume that:
foo
is a table that contains three columns:a
(INT
),b
(TEXT
),c
(TEXT
)bar
is a table that contains three columns:a
(INT
),b
(INT
),c
(TEXT
)
Then, the following expression:
would be rendered as:
@HAVERSINE_DISTANCE
@HAVERSINE_DISTANCE
returns the haversine distance between two geographic points.
It supports the following arguments, in this order:
lat1
: Latitude of the first pointlon1
: Longitude of the first pointlat2
: Latitude of the second pointlon2
: Longitude of the second pointunit
(optional): The measurement unit, currently only'mi'
(miles, default) and'km'
(kilometers) are supported
SQLMesh macro operators do not accept named arguments. For example, @HAVERSINE_DISTANCE(lat1=lat_column)
will error.
For example, the following query:
would be rendered as:
@PIVOT
@PIVOT
returns a set of columns as a result of pivoting an input column on the specified values. This operation is sometimes described a pivoting from a "long" format (multiple values in a single column) to a "wide" format (one value in each of multiple columns).
It supports the following arguments, in this order:
column
: The column to pivotvalues
: The values to use for pivoting (one column is created for each value invalues
)alias
: Whether to create aliases for the resulting columns, defaults to trueagg
(optional): The aggregation function to use, defaults toSUM
cmp
(optional): The comparison operator to use for comparing the column values, defaults to=
prefix
(optional): A prefix to use for all aliasessuffix
(optional): A suffix to use for all aliasesthen_value
(optional): The value to be used if the comparison succeeds, defaults to1
else_value
(optional): The value to be used if the comparison fails, defaults to0
quote
(optional): Whether to quote the resulting aliases, defaults to truedistinct
(optional): Whether to apply aDISTINCT
clause for the aggregation function, defaults to false
SQLMesh macro operators do not accept named arguments. For example, @PIVOT(column=column_to_pivot)
will error.
For example, the following query:
would be rendered as:
@AND
@AND
combines a sequence of operands using the AND
operator, filtering out any NULL expressions.
For example, the following expression:
would be rendered as:
@OR
@OR
combines a sequence of operands using the OR
operator, filtering out any NULL expressions.
For example, the following expression:
would be rendered as:
SQL clause operators
SQLMesh's macro system has six operators that correspond to different clauses in SQL syntax. They are:
@WITH
: common table expressionWITH
clause@JOIN
: tableJOIN
clause(s)@WHERE
: filteringWHERE
clause@GROUP_BY
: groupingGROUP BY
clause@HAVING
: group by filteringHAVING
clause@ORDER_BY
: orderingORDER BY
clause@LIMIT
: limitingLIMIT
clause
Each of these operators is used to dynamically add the code for its corresponding clause to a model's SQL query.
How SQL clause operators work
The SQL clause operators take a single argument that determines whether the clause is generated.
If the argument is TRUE
the clause code is generated, if FALSE
the code is not. The argument should be written in SQL and its value is evaluated with SQLGlot's SQL engine.
Each SQL clause operator may only be used once in a query, but any common table expressions or subqueries may contain their own single use of the operator as well.
As an example of SQL clause operators, let's revisit the example model from the User-defined Variables section above.
As written, the model will always include the WHERE
clause. We could make its presence dynamic by using the @WHERE
macro operator:
The @WHERE
argument is set to TRUE
, so the WHERE code is included in the rendered model:
If the @WHERE
argument were instead set to FALSE
the WHERE
clause would be omitted from the query.
These operators aren't too useful if the argument's value is hard-coded. Instead, the argument can consist of code executable by the SQLGlot SQL executor.
For example, the WHERE
clause will be included in this query because 1 less than 2:
The operator's argument code can include macro variables.
In this example, the two numbers being compared are defined as macro variables instead of being hard-coded:
The argument to @WHERE
will be "1 < 2" as in the previous hard-coded example after the macro variables left_number
and right_number
are substituted in.
SQL clause operator examples
This section provides brief examples of each SQL clause operator's usage.
The examples use variants of this simple select statement:
@WITH operator
The @WITH
operator is used to create common table expressions, or "CTEs."
CTEs are typically used in place of derived tables (subqueries in the FROM
clause) to make SQL code easier to read. Less commonly, recursive CTEs support analysis of hierarchical data with SQL.
renders to
@JOIN operator
The @JOIN
operator specifies joins between tables or other SQL objects; it supports different join types (e.g., INNER, OUTER, CROSS, etc.).
renders to
The @JOIN
operator recognizes that LEFT OUTER
is a component of the JOIN
specification and will omit it if the @JOIN
argument evaluates to False.
@WHERE operator
The @WHERE
operator adds a filtering WHERE
clause(s) to the query when its argument evaluates to True.
renders to
@GROUP_BY operator
renders to
@HAVING operator
renders to
@ORDER_BY operator
renders to
@LIMIT operator
renders to
User-defined macro functions
User-defined macro functions allow the same macro code to be used in multiple models.
SQLMesh supports user-defined macro functions written in two languages - SQL and Python:
- SQL macro functions must use the Jinja templating system.
- Python macro functions use the SQLGlot library to allow more complex operations than macro variables and operators provide alone.
Python macro functions
Python macro functions should be placed in .py
files in the SQLMesh project's macros
directory. Multiple functions can be defined in one .py
file, or they can be distributed across multiple files.
An empty __init__.py
file must be present in the SQLMesh project's macros
directory. It will be created automatically when the project scaffold is created with sqlmesh init
.
Each .py
file containing a macro definition must import SQLMesh's macro
decorator with from sqlmesh import macro
.
Python macros are defined as regular python functions adorned with the SQLMesh @macro()
decorator. The first argument to the function must be evaluator
, which provides the macro evaluation context in which the macro function will run.
Python macros will parse all arguments with SQLGlot before they are used in the function body. Therefore, the function code exclusively processes SQLGlot expressions and may need to extract the expression's attributes/contents for use.
Python macro functions may return values of either string
or SQLGlot expression
types. SQLMesh will automatically parse returned strings into a SQLGlot expression after the function is executed so they can be incorporated into the model query's semantic representation.
Macro functions may return a list of strings or expressions that all play the same role in the query (e.g., specifying column definitions). For example, a list containing multiple CASE WHEN
statements would be incorporated into the query properly, but a list containing both CASE WHEN
statements and a WHERE
clause would not.
Macro function basics
This example demonstrates the core requirements for defining a python macro - it takes no user-supplied arguments and returns the string text
.
We could use this in a SQLMesh SQL model like this:
After processing, it will render to this:
Note that the python function returned a string 'text'
, but the rendered query uses text
as a column name. That is due to the function's returned text being parsed as SQL code and integrated into the query's semantic representation.
The rendered query will treat text
as a string if we double-quote the single-quoted value in the function definition as "'text'"
:
When run in the same model query as before, this will render to:
Returning more than one value
Macro functions are a convenient way to tidy model code by creating multiple outputs from one function call. Python macro functions do this by returning a list of strings or SQLGlot expressions.
For example, we might want to create indicator variables from the values in a string column. We can do that by passing in the name of column and a list of values for which it should create indicators, which we then interpolate into CASE WHEN
statements.
Because SQLMesh parses the input objects, they become SQLGLot expressions in the function body. Therefore, the function code cannot treat the input list as a regular Python list.
Two things will happen to the input Python list before the function code is executed:
-
Each of its entries will be parsed by SQLGlot. Different inputs are parsed into different SQLGlot expressions:
- Numbers are parsed into
Literal
expressions - Quoted strings are parsed into
Literal
expressions - Unquoted strings are parsed into
Column
expressions
- Numbers are parsed into
-
The parsed entries will be contained in a SQLGlot
Array
expression, the SQL entity analogous to a Python list
Because the input Array
expression named values
is not a Python list, we cannot iterate over it directly - instead, we iterate over its expressions
attribute with values.expressions
:
We call this function in a model query to create CASE WHEN
statements for the vehicle
column values truck
and bus
like this:
Which renders to:
Note that in the call @make_indicators(vehicle, [truck, bus])
none of the three values is quoted.
Because they are unquoted, SQLGlot will parse them all as Column
expressions. In the places we used single quotes when building the string ('{value}'
), they will be single-quoted in the output. In the places we did not quote them ({string_column} =
and {string_column}_{value}
), they will not.
Accessing predefined and local variable values
Pre-defined variables and user-defined local variables can be accessed within the macro's body via the evaluator.locals
attribute.
The first argument to every macro function, the macro evaluation context evaluator
, contains macro variable values in its locals
attribute. evaluator.locals
is a dictionary whose key:value pairs are macro variables names and the associated values.
For example, a function could access the predefined execution_epoch
variable containing the epoch timestamp of when the execution started.
The function would return the execution_epoch
value when called in a model query:
The same approach works for user-defined local macro variables, where the key "execution_epoch"
would be replaced with the name of the user-defined variable to be accessed.
One downside of that approach to accessing user-defined local variables is that the name of the variable is hard-coded into the function. A more flexible approach is to pass the name of the local macro variable as a function argument:
We could define a local macro variable my_macro_var
with a value of 1 and pass it to the get_macro_var
function like this:
The model query would render to:
Accessing global variable values
User-defined global variables can be accessed within the macro's body using the evaluator.var
method.
For example:
Accessing model schemas
Model schemas can be accessed within a Python macro function through its evaluation context's column_to_types
method, when they can be statically determined. For instance, a schema of an external model can be accessed only after the sqlmesh create_external_models
command has been executed.
As an example, consider the following macro function which aims to rename the columns of a target model by adding a prefix to them:
This can then be used in a SQL model as shown below:
Note that columns_to_types
expects an unquoted model name, such as schema.parent
. Since macro arguments are SQLGlot expressions, they need to be processed accordingly in order to extract meaningful information from them. For instance, the lookup key in the above macro definition is extracted by generating the SQL code for model_name
using the sql
method.
Having access to the schema of an upstream model can be useful for various reasons:
- Renaming columns so that downstream consumers are not tightly coupled to external or source tables
- Selecting only a subset of columns that satisfy some criteria (e.g. columns whose names start with a specific prefix)
- Applying transformations to columns, such as masking PII or computing various statistics based on the column types
Thus, leveraging columns_to_types
can also enable one to write code according to the DRY principle, as they can implement these transformations in a single function instead of duplicating them in each model of interest.
Accessing snapshots
After a SQLMesh project has been successfully loaded, its snapshots can be accessed in Python macro functions and Python models that generate SQL through the get_snapshot
method of MacroEvaluator
.
This enables the inspection of physical table names or the processed intervals for certain snapshots at runtime, as shown in the example below:
Using SQLGlot expressions
SQLMesh automatically parses strings returned by Python macro functions into SQLGlot expressions so they can be incorporated into the model query's semantic representation. Functions can also return SQLGlot expressions directly.
For example, consider a macro function that uses the BETWEEN
operator in the predicate of a WHERE
clause. A function returning the predicate as a string might look like this, where the function arguments are substituted into a Python f-string:
The function could then be called in a query:
And it would render to:
Alternatively, the function could return a SQLGLot expression equivalent to that string by using SQLGlot's expression methods for building semantic representations:
The methods are available because the column
argument is parsed as a SQLGlot Column expression when the macro function is executed.
Column expressions are sub-classes of the Condition class, so they have builder methods like between
and like
.
Mixing macro systems
SQLMesh supports both SQLMesh and Jinja macro systems. We strongly recommend using only one system in a model - if both are present, they may fail or behave in unintuitive ways.