This script's goal is to warn users about two situations that could lead to a diff:
- They have blueprint models and some of their variables may be trimmed from
python_env - Variables are used in metadata-only contexts, e.g., within metadata-only macros
Context:
We used to store all blueprint variables in python_env, even though some of them were
redundant. For example, if a blueprint variable is only used in the model's name property,
then it is rendered once, at load time, and after that point it's not needed elsewhere.
This behavior is now different: we only store the blueprint variables that are required to render
expressions at runtime, such as model query or runtime-rendered properties, like merge_filter.
Additionally, variables were previously treated as non-metadata, regardless of how they were used. This behavior changed as well: SQLMesh now analyzes variable references and tracks the data flow, in order to detect whether changing them will result in a metadata diff for a given model.
Some examples where variables can be treated as metadata-only python_env executables are:
- A variable is referenced in metadata-only macros
- A variable is referenced in metadata-only expressions, such as virtual update statements
- A variable is passed as argument to metadata-only macros
1""" 2This script's goal is to warn users about two situations that could lead to a diff: 3 4- They have blueprint models and some of their variables may be trimmed from `python_env` 5- Variables are used in metadata-only contexts, e.g., within metadata-only macros 6 7Context: 8 9We used to store *all* blueprint variables in `python_env`, even though some of them were 10redundant. For example, if a blueprint variable is only used in the model's `name` property, 11then it is rendered once, at load time, and after that point it's not needed elsewhere. 12 13This behavior is now different: we only store the blueprint variables that are required to render 14expressions at runtime, such as model query or runtime-rendered properties, like `merge_filter`. 15 16Additionally, variables were previously treated as non-metadata, regardless of how they were used. 17This behavior changed as well: SQLMesh now analyzes variable references and tracks the data flow, 18in order to detect whether changing them will result in a metadata diff for a given model. 19 20Some examples where variables can be treated as metadata-only `python_env` executables are: 21 22- A variable is referenced in metadata-only macros 23- A variable is referenced in metadata-only expressions, such as virtual update statements 24- A variable is passed as argument to metadata-only macros 25""" 26 27import json 28 29from sqlglot import exp 30 31from sqlmesh.core.console import get_console 32 33SQLMESH_VARS = "__sqlmesh__vars__" 34SQLMESH_BLUEPRINT_VARS = "__sqlmesh__blueprint__vars__" 35METADATA_HASH_EXPRESSIONS = {"on_virtual_update", "audits", "signals", "audit_definitions"} 36 37 38def migrate_schemas(engine_adapter, schema, **kwargs): # type: ignore 39 pass 40 41 42def migrate_rows(engine_adapter, schema, **kwargs): # type: ignore 43 snapshots_table = "_snapshots" 44 if schema: 45 snapshots_table = f"{schema}.{snapshots_table}" 46 47 warning = ( 48 "SQLMesh detected that it may not be able to fully migrate the state database. This should not impact " 49 "the migration process, but may result in unexpected changes being reported by the next `sqlmesh plan` " 50 "command. Please run `sqlmesh diff prod` after the migration has completed, before making any new " 51 "changes. If any unexpected changes are reported, consider running a forward-only plan to apply these " 52 "changes and avoid unnecessary backfills: sqlmesh plan prod --forward-only. " 53 "See https://sqlmesh.readthedocs.io/en/stable/concepts/plans/#forward-only-plans for more details.\n" 54 ) 55 56 for (snapshot,) in engine_adapter.fetchall( 57 exp.select("snapshot").from_(snapshots_table), quote_identifiers=True 58 ): 59 parsed_snapshot = json.loads(snapshot) 60 node = parsed_snapshot["node"] 61 62 # Standalone audits don't have a data hash, so they're unaffected 63 if node.get("source_type") == "audit": 64 continue 65 66 python_env = node.get("python_env") or {} 67 68 if SQLMESH_BLUEPRINT_VARS in python_env or ( 69 SQLMESH_VARS in python_env 70 and ( 71 any(v.get("is_metadata") for v in python_env.values()) 72 or any(node.get(k) for k in METADATA_HASH_EXPRESSIONS) 73 ) 74 ): 75 get_console().log_warning(warning) 76 return
43def migrate_rows(engine_adapter, schema, **kwargs): # type: ignore 44 snapshots_table = "_snapshots" 45 if schema: 46 snapshots_table = f"{schema}.{snapshots_table}" 47 48 warning = ( 49 "SQLMesh detected that it may not be able to fully migrate the state database. This should not impact " 50 "the migration process, but may result in unexpected changes being reported by the next `sqlmesh plan` " 51 "command. Please run `sqlmesh diff prod` after the migration has completed, before making any new " 52 "changes. If any unexpected changes are reported, consider running a forward-only plan to apply these " 53 "changes and avoid unnecessary backfills: sqlmesh plan prod --forward-only. " 54 "See https://sqlmesh.readthedocs.io/en/stable/concepts/plans/#forward-only-plans for more details.\n" 55 ) 56 57 for (snapshot,) in engine_adapter.fetchall( 58 exp.select("snapshot").from_(snapshots_table), quote_identifiers=True 59 ): 60 parsed_snapshot = json.loads(snapshot) 61 node = parsed_snapshot["node"] 62 63 # Standalone audits don't have a data hash, so they're unaffected 64 if node.get("source_type") == "audit": 65 continue 66 67 python_env = node.get("python_env") or {} 68 69 if SQLMESH_BLUEPRINT_VARS in python_env or ( 70 SQLMESH_VARS in python_env 71 and ( 72 any(v.get("is_metadata") for v in python_env.values()) 73 or any(node.get(k) for k in METADATA_HASH_EXPRESSIONS) 74 ) 75 ): 76 get_console().log_warning(warning) 77 return