Skip to content

Scheduling guide

SQLMesh currently offers two ways of scheduling model evaluation:

Built-in scheduler

SQLMesh includes a built-in scheduler that schedules model evaluation without any additional tools or dependencies. It provides all the functionality needed to use SQLMesh in production.

By default, the scheduler stores your SQLMesh project's state (information about models, data, and run history) in the SQL engine used to execute your models. Some engines, such as BigQuery, are not optimized for the transactions the scheduler executes to store state, which may degrade the scheduler's performance.

When running the scheduler in production, we recommend evaluating its performance with your SQL engine. If you observe degraded performance, consider providing the scheduler its own transactional database such as PostgreSQL to improve performance. See the connections guide for more information on providing a separate database/engine for the scheduler.

To perform model evaluation using the built-in scheduler, run the following command:

sqlmesh run

The command above will automatically detect missing intervals for all models in the current project and then evaluate them:

$ sqlmesh run

All model batches have been executed successfully

sqlmesh_example.example_incremental_model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%  1/1  0:00:00
       sqlmesh_example.example_full_model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%  1/1  0:00:00

Note: The sqlmesh run command performs model evaluation based on the missing data intervals identified at the time of running. It does not run continuously, and will exit once evaluation is complete. You must run this command periodically with a cron job, a CI/CD tool like Jenkins, or in a similar fashion.

Integrating with Airflow

Configuring the Airflow cluster

SQLMesh natively integrates with the popular open source workflow orchestrator Apache Airflow, both self-hosted and managed (e.g. Google Cloud Composer, Amazon MWAA, Astronomer).

To integrate with Airflow, ensure that you meet the prerequisites, then perform the following:

  1. Install the SQLMesh Python package on all nodes of the Airflow cluster using the following command:

    pip install sqlmesh
    

    Note: The Airflow webserver must be restarted after installation.

  2. Within the Airflow dags/ folder, create a file called sqlmesh.py.

  3. Within that file add the following, making sure to replace "spark" with your engine and spark_catalog with your default catalog:

    from sqlmesh.schedulers.airflow.integration import SQLMeshAirflow
    
    sqlmesh_airflow = SQLMeshAirflow("spark", default_catalog="spark_catalog")
    
    for dag in sqlmesh_airflow.dags:
        globals()[dag.dag_id] = dag
    

    The example above uses spark as the engine of choice. Other engines can be configured instead by providing a corresponding string as an argument to the SQLMeshAirflow constructor. Supported strings are "spark", "databricks", "snowflake", "bigquery", "redshift", "trino", "mssql" and "mysql". See the Airflow Cluster Configuration for full list of arguments and their descriptions.

After setup is completed, the sqlmesh_janitor_dag DAG should become available in the Airflow UI when filtered by the sqlmesh tag:

Airflow UI after successful setup

Configuring the client

On the client side, you must configure the connection to your Airflow cluster in the config.yaml file as follows:

    default_scheduler:
      type: airflow
      airflow_url: http://localhost:8080/
      username: airflow
      password: airflow

Alternatively, the configuration above can be generated automatically as part of the project initialization using the airflow template:

sqlmesh init [PROJECT SQL DIALECT] -t airflow

For Airflow configuration types specific to Google Cloud Composer, configure the file as follows:

    default_scheduler:
      type: cloud_composer
      airflow_url: https:/XXXXXXXX.composer.googleusercontent.com/

Note: Guidelines for integrating with managed offerings other than Google Cloud Composer will be added later.

Running the plan command

Run the sqlmesh plan command to apply all changes on the target Airflow cluster.

Below is example output from running the sqlmesh plan command in the example project generated by the sqlmesh init command:

$ sqlmesh plan
======================================================================
Successfully Ran 1 tests against duckdb
----------------------------------------------------------------------
Summary of differences against `prod`:
└── Added Models:
    ├── sqlmesh_example.example_incremental_model
    └── sqlmesh_example.example_full_model
Models needing backfill (missing dates):
├── sqlmesh_example.example_incremental_model: (2020-01-01, 2023-02-13)
└── sqlmesh_example.example_full_model: (2023-02-13, 2023-02-13)
Enter the backfill start date (eg. '1 year', '2020-01-01') or blank for the beginning of history: 2023-02-13
Apply - Backfill Tables [y/n]: y
Waiting for the plan application DAG 'sqlmesh_plan_application__prod__fb88a0c6_16f9_4a3e_93ec_7f8026bc878c' to be provisioned on Airflow
Track plan application progress using link

Once the command runs, the following DAGs will become available within the Airflow UI:

Airflow UI after successful plan application