SQLMesh currently offers two ways of scheduling model evaluation:
SQLMesh includes a built-in scheduler that schedules model evaluation without any additional tools or dependencies. It provides all the functionality needed to use SQLMesh in production.
By default, the scheduler stores your SQLMesh project's state (information about models, data, and run history) in the SQL engine used to execute your models. Some engines, such as BigQuery, are not optimized for the transactions the scheduler executes to store state, which may degrade the scheduler's performance.
When running the scheduler in production, we recommend evaluating its performance with your SQL engine. If you observe degraded performance, consider providing the scheduler its own transactional database such as PostgreSQL to improve performance. See the connections guide for more information on providing a separate database/engine for the scheduler.
To perform model evaluation using the built-in scheduler, run the following command:
The command above will automatically detect missing intervals for all models in the current project and then evaluate them:
sqlmesh run command performs model evaluation based on the missing data intervals identified at the time of running. It does not run continuously, and will exit once evaluation is complete. You must run this command periodically with a cron job, a CI/CD tool like Jenkins, or in a similar fashion.
Integrating with Airflow
Configuring the Airflow cluster
SQLMesh natively integrates with the popular open source workflow orchestrator Apache Airflow, both self-hosted and managed (e.g. Google Cloud Composer, Amazon MWAA, Astronomer).
Install the SQLMesh Python package on all nodes of the Airflow cluster using the following command:
pip install sqlmesh
Note: The Airflow webserver must be restarted after installation.
Within the Airflow
dags/folder, create a file called
Within that file add the following, making sure to replace "spark" with your engine if you are using a different one:
from sqlmesh.schedulers.airflow.integration import SQLMeshAirflow sqlmesh_airflow = SQLMeshAirflow("spark") for dag in sqlmesh_airflow.dags: globals()[dag.dag_id] = dag
The example above uses
sparkas the engine of choice. Other engines can be configured instead by providing a corresponding string as an argument to the
SQLMeshAirflowconstructor. Supported strings are
Note: An Airflow Connection must be configured for each engine accordingly. Refer to Engine support for more details.
After setup is completed, the
sqlmesh_janitor_dag DAG should become available in the Airflow UI when filtered by the
Configuring the client
On the client side, you must configure the connection to your Airflow cluster in the
config.yaml file as follows:
default_scheduler: type: airflow airflow_url: http://localhost:8080/ username: airflow password: airflow
Alternatively, the configuration above can be generated automatically as part of the project initialization using the
For Airflow configuration types specific to Google Cloud Composer, configure the file as follows:
default_scheduler: type: cloud_composer airflow_url: https:/XXXXXXXX.composer.googleusercontent.com/
Note: Guidelines for integrating with managed offerings other than Google Cloud Composer will be added later.
sqlmesh plan command to apply all changes on the target Airflow cluster.
Below is example output from running the
sqlmesh plan command in the example project generated by the
sqlmesh init command:
$ sqlmesh plan ====================================================================== Successfully Ran 1 tests against duckdb ---------------------------------------------------------------------- Summary of differences against `prod`: └── Added Models: ├── sqlmesh_example.example_incremental_model └── sqlmesh_example.example_full_model Models needing backfill (missing dates): ├── sqlmesh_example.example_incremental_model: (2020-01-01, 2023-02-13) └── sqlmesh_example.example_full_model: (2023-02-13, 2023-02-13) Enter the backfill start date (eg. '1 year', '2020-01-01') or blank for the beginning of history: 2023-02-13 Apply - Backfill Tables [y/n]: y Waiting for the plan application DAG 'sqlmesh_plan_application__prod__fb88a0c6_16f9_4a3e_93ec_7f8026bc878c' to be provisioned on Airflow Track plan application progress using link
Once the command runs, the following DAGs will become available within the Airflow UI: