Edit on GitHub

This script's goal is to warn users about two situations that could lead to a diff:

  • They have blueprint models and some of their variables may be trimmed from python_env
  • Variables are used in metadata-only contexts, e.g., within metadata-only macros

Context:

We used to store all blueprint variables in python_env, even though some of them were redundant. For example, if a blueprint variable is only used in the model's name property, then it is rendered once, at load time, and after that point it's not needed elsewhere.

This behavior is now different: we only store the blueprint variables that are required to render expressions at runtime, such as model query or runtime-rendered properties, like merge_filter.

Additionally, variables were previously treated as non-metadata, regardless of how they were used. This behavior changed as well: SQLMesh now analyzes variable references and tracks the data flow, in order to detect whether changing them will result in a metadata diff for a given model.

Some examples where variables can be treated as metadata-only python_env executables are:

  • A variable is referenced in metadata-only macros
  • A variable is referenced in metadata-only expressions, such as virtual update statements
  • A variable is passed as argument to metadata-only macros
 1"""
 2This script's goal is to warn users about two situations that could lead to a diff:
 3
 4- They have blueprint models and some of their variables may be trimmed from `python_env`
 5- Variables are used in metadata-only contexts, e.g., within metadata-only macros
 6
 7Context:
 8
 9We used to store *all* blueprint variables in `python_env`, even though some of them were
10redundant. For example, if a blueprint variable is only used in the model's `name` property,
11then it is rendered once, at load time, and after that point it's not needed elsewhere.
12
13This behavior is now different: we only store the blueprint variables that are required to render
14expressions at runtime, such as model query or runtime-rendered properties, like `merge_filter`.
15
16Additionally, variables were previously treated as non-metadata, regardless of how they were used.
17This behavior changed as well: SQLMesh now analyzes variable references and tracks the data flow,
18in order to detect whether changing them will result in a metadata diff for a given model.
19
20Some examples where variables can be treated as metadata-only `python_env` executables are:
21
22- A variable is referenced in metadata-only macros
23- A variable is referenced in metadata-only expressions, such as virtual update statements
24- A variable is passed as argument to metadata-only macros
25"""
26
27import json
28
29from sqlglot import exp
30
31from sqlmesh.core.console import get_console
32
33SQLMESH_VARS = "__sqlmesh__vars__"
34SQLMESH_BLUEPRINT_VARS = "__sqlmesh__blueprint__vars__"
35METADATA_HASH_EXPRESSIONS = {"on_virtual_update", "audits", "signals", "audit_definitions"}
36
37
38def migrate_schemas(engine_adapter, schema, **kwargs):  # type: ignore
39    pass
40
41
42def migrate_rows(engine_adapter, schema, **kwargs):  # type: ignore
43    snapshots_table = "_snapshots"
44    if schema:
45        snapshots_table = f"{schema}.{snapshots_table}"
46
47    warning = (
48        "SQLMesh detected that it may not be able to fully migrate the state database. This should not impact "
49        "the migration process, but may result in unexpected changes being reported by the next `sqlmesh plan` "
50        "command. Please run `sqlmesh diff prod` after the migration has completed, before making any new "
51        "changes. If any unexpected changes are reported, consider running a forward-only plan to apply these "
52        "changes and avoid unnecessary backfills: sqlmesh plan prod --forward-only. "
53        "See https://sqlmesh.readthedocs.io/en/stable/concepts/plans/#forward-only-plans for more details.\n"
54    )
55
56    for (snapshot,) in engine_adapter.fetchall(
57        exp.select("snapshot").from_(snapshots_table), quote_identifiers=True
58    ):
59        parsed_snapshot = json.loads(snapshot)
60        node = parsed_snapshot["node"]
61
62        # Standalone audits don't have a data hash, so they're unaffected
63        if node.get("source_type") == "audit":
64            continue
65
66        python_env = node.get("python_env") or {}
67
68        if SQLMESH_BLUEPRINT_VARS in python_env or (
69            SQLMESH_VARS in python_env
70            and (
71                any(v.get("is_metadata") for v in python_env.values())
72                or any(node.get(k) for k in METADATA_HASH_EXPRESSIONS)
73            )
74        ):
75            get_console().log_warning(warning)
76            return
SQLMESH_VARS = '__sqlmesh__vars__'
SQLMESH_BLUEPRINT_VARS = '__sqlmesh__blueprint__vars__'
METADATA_HASH_EXPRESSIONS = {'signals', 'audit_definitions', 'audits', 'on_virtual_update'}
def migrate_schemas(engine_adapter, schema, **kwargs):
39def migrate_schemas(engine_adapter, schema, **kwargs):  # type: ignore
40    pass
def migrate_rows(engine_adapter, schema, **kwargs):
43def migrate_rows(engine_adapter, schema, **kwargs):  # type: ignore
44    snapshots_table = "_snapshots"
45    if schema:
46        snapshots_table = f"{schema}.{snapshots_table}"
47
48    warning = (
49        "SQLMesh detected that it may not be able to fully migrate the state database. This should not impact "
50        "the migration process, but may result in unexpected changes being reported by the next `sqlmesh plan` "
51        "command. Please run `sqlmesh diff prod` after the migration has completed, before making any new "
52        "changes. If any unexpected changes are reported, consider running a forward-only plan to apply these "
53        "changes and avoid unnecessary backfills: sqlmesh plan prod --forward-only. "
54        "See https://sqlmesh.readthedocs.io/en/stable/concepts/plans/#forward-only-plans for more details.\n"
55    )
56
57    for (snapshot,) in engine_adapter.fetchall(
58        exp.select("snapshot").from_(snapshots_table), quote_identifiers=True
59    ):
60        parsed_snapshot = json.loads(snapshot)
61        node = parsed_snapshot["node"]
62
63        # Standalone audits don't have a data hash, so they're unaffected
64        if node.get("source_type") == "audit":
65            continue
66
67        python_env = node.get("python_env") or {}
68
69        if SQLMESH_BLUEPRINT_VARS in python_env or (
70            SQLMESH_VARS in python_env
71            and (
72                any(v.get("is_metadata") for v in python_env.values())
73                or any(node.get(k) for k in METADATA_HASH_EXPRESSIONS)
74            )
75        ):
76            get_console().log_warning(warning)
77            return