Skip to content

BigQuery

Introduction

This guide provides step-by-step instructions on how to connect SQLMesh to the BigQuery SQL engine.

It will walk you through the steps of installing SQLMesh and BigQuery connection libraries locally, configuring the connection in SQLMesh, and running the quickstart project.

Prerequisites

This guide assumes the following about the BigQuery project being used with SQLMesh:

Installation

Follow the quickstart installation guide up to the step that installs SQLMesh, where we deviate to also install the necessary BigQuery libraries.

Instead of installing just SQLMesh core, we will also include the BigQuery engine libraries:

> pip install "sqlmesh[bigquery]"

Install Google Cloud SDK

SQLMesh connects to BigQuery via the Python google-cloud-bigquery library, which uses the Google Cloud SDK gcloud tool for authenticating with BigQuery.

Follow these steps to install and configure the Google Cloud SDK on your computer:

  • Download the appropriate installer for your system from the Google Cloud installation guide
  • Unpack the downloaded file with the tar command:

    > tar -xzvf google-cloud-cli-{SYSTEM_SPECIFIC_INFO}.tar.gz
    
  • Run the installation script:

    > ./google-cloud-sdk/install.sh
    
  • Reload your shell profile (e.g., for zsh):

    > source $HOME/.zshrc
    
  • Run gcloud init to setup authentication

Configuration

Configure SQLMesh for BigQuery

Add the following gateway specification to your SQLMesh project's config.yaml file:

bigquery:
  connection:
    type: bigquery
    project: <your_project_id>

default_gateway: bigquery

This creates a gateway named bigquery and makes it your project's default gateway.

It uses the oauth authentication method, which does not specify a username or other information directly in the connection configuration. Other authentication methods are described below.

In BigQuery, navigate to the dashboard and select the BigQuery project your SQLMesh project will use. From the Google Cloud dashboard, use the arrow to open the pop-up menu:

BigQuery Dashboard

Now we can identify the project ID needed in the config.yaml gateway specification above. Select the project that you want to work with, the project ID that you need to add to your yaml file is the ID label from the pop-up menu.

BigQuery Dashboard: selecting your project

For this guide, the Docs-Demo is the one we will use, thus the project ID for this example is healthy-life-440919-s0.

Usage

Test the connection

Run the following command to verify that SQLMesh can connect to BigQuery:

> sqlmesh info

The output will look something like this:

Terminal Output

  • Set quota project (optional)

    You may see warnings like this when you run sqlmesh info:

    Terminal Output with warnings

    You can avoid these warnings about quota projects by running:

    > gcloud auth application-default set-quota-project <your_project_id>
    > gcloud config set project <your_project_id>
    

Create and run a plan

We've verified our connection, so we're ready to create and execute a plan in BigQuery:

> sqlmesh plan

View results in BigQuery Console

Let's confirm that our project models are as expected.

First, navigate to the BigQuery Studio Console:

Steps to the Studio

Then use the left sidebar to find your project and the newly created models:

New Models

We have confirmed that our SQLMesh project is running properly in BigQuery!

Local/Built-in Scheduler

Engine Adapter Type: bigquery

Installation

pip install "sqlmesh[bigquery]"

Connection options

Option Description Type Required
type Engine type name - must be bigquery string Y
method Connection methods - see allowed values below. Default: oauth. string N
project The ID of the GCP project string N
location The location of for the datasets (can be regional or multi-regional) string N
execution_project The name of the GCP project to bill for the execution of the models. If not set, the project associated with the model will be used. string N
quota_project The name of the GCP project used for the quota. If not set, the quota_project_id set within the credentials of the account is used to authenticate to BigQuery. string N
keyfile Path to the keyfile to be used with service-account method string N
keyfile_json Keyfile information provided inline (not recommended) dict N
token OAuth 2.0 access token string N
refresh_token OAuth 2.0 refresh token string N
client_id OAuth 2.0 client ID string N
client_secret OAuth 2.0 client secret string N
token_uri OAuth 2.0 authorization server's toke endpoint URI string N
scopes The scopes used to obtain authorization list N
job_creation_timeout_seconds The maximum amount of time, in seconds, to wait for the underlying job to be created. int N
job_execution_timeout_seconds The maximum amount of time, in seconds, to wait for the underlying job to complete. int N
job_retries The number of times to retry the underlying job if it fails. (Default: 1) int N
priority The priority of the underlying job. (Default: INTERACTIVE) string N
maximum_bytes_billed The maximum number of bytes to be billed for the underlying job. int N

Airflow Scheduler

Engine Name: bigquery

In order to share a common implementation across local and Airflow, SQLMesh BigQuery implements its own hook and operator.

Installation

To enable support for this operator, the Airflow BigQuery provider package should be installed on the target Airflow cluster along with SQLMesh with the BigQuery extra:

pip install "apache-airflow-providers-google"
pip install "sqlmesh[bigquery]"

Connection info

The operator requires an Airflow connection to determine the target BigQuery account. Please see GoogleBaseHook and GCP connectionfor more details. Use the sqlmesh_google_cloud_bigquery_default (by default) connection ID instead of the google_cloud_default one in the Airflow guide.

By default, the connection ID is set to sqlmesh_google_cloud_bigquery_default, but it can be overridden using the engine_operator_args parameter to the SQLMeshAirflow instance as in the example below:

1
2
3
4
5
6
7
sqlmesh_airflow = SQLMeshAirflow(
    "bigquery",
    default_catalog="<project id>",
    engine_operator_args={
        "bigquery_conn_id": "<Connection ID>"
    },
)

Optional Arguments

  • location: Sets the default location for datasets and tables. If not set, BigQuery defaults to US for new datasets. See location in Connection options for more details.
1
2
3
4
5
6
7
8
sqlmesh_airflow = SQLMeshAirflow(
    "bigquery",
    default_catalog="<project id>",
    engine_operator_args={
        "bigquery_conn_id": "<Connection ID>",
        "location": "<location>"
    },
)

Authentication Methods

  • oauth (default)
    • Related Credential Configuration:
      • scopes (Optional)
  • oauth-secrets
    • Related Credential Configuration:
      • token (Optional): Can be None if refresh information is provided.
      • refresh_token (Optional): If specified, credentials can be refreshed.
      • client_id (Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.
      • client_secret (Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.
      • token_uri (Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.
      • scopes (Optional): OAuth 2.0 credentials can not request additional scopes after authorization. The scopes must be derivable from the refresh token if refresh information is provided (e.g. The refresh token scopes are a superset of this or contain a wild card scope like 'https://www.googleapis.com/auth/any-api')
  • service-account
    • Related Credential Configuration:
      • keyfile (Required)
      • scopes (Optional)
  • service-account-json
    • Related Credential Configuration:
      • keyfile_json (Required)
      • scopes (Optional)

Permissions Required

With any of the above connection methods, ensure these BigQuery permissions are enabled to allow SQLMesh to work correctly.