BigQuery
Introduction
This guide provides step-by-step instructions on how to connect SQLMesh to the BigQuery SQL engine.
It will walk you through the steps of installing SQLMesh and BigQuery connection libraries locally, configuring the connection in SQLMesh, and running the quickstart project.
Prerequisites
This guide assumes the following about the BigQuery project being used with SQLMesh:
- The project already exists
- Project CLI/API access is enabled
- Project billing is configured (i.e. it's not a sandbox project)
- SQLMesh can authenticate using an account with permissions to execute commands against the project
Installation
Follow the quickstart installation guide up to the step that installs SQLMesh, where we deviate to also install the necessary BigQuery libraries.
Instead of installing just SQLMesh core, we will also include the BigQuery engine libraries:
Install Google Cloud SDK
SQLMesh connects to BigQuery via the Python google-cloud-bigquery
library, which uses the Google Cloud SDK gcloud
tool for authenticating with BigQuery.
Follow these steps to install and configure the Google Cloud SDK on your computer:
- Download the appropriate installer for your system from the Google Cloud installation guide
-
Unpack the downloaded file with the
tar
command: -
Run the installation script:
-
Reload your shell profile (e.g., for zsh):
Configuration
Configure SQLMesh for BigQuery
Add the following gateway specification to your SQLMesh project's config.yaml
file:
This creates a gateway named bigquery
and makes it your project's default gateway.
It uses the oauth
authentication method, which does not specify a username or other information directly in the connection configuration. Other authentication methods are described below.
In BigQuery, navigate to the dashboard and select the BigQuery project your SQLMesh project will use. From the Google Cloud dashboard, use the arrow to open the pop-up menu:
Now we can identify the project ID needed in the config.yaml
gateway specification above. Select the project that you want to work with, the project ID that you need to add to your yaml file is the ID label from the pop-up menu.
For this guide, the Docs-Demo is the one we will use, thus the project ID for this example is healthy-life-440919-s0
.
Usage
Test the connection
Run the following command to verify that SQLMesh can connect to BigQuery:
The output will look something like this:
-
Set quota project (optional)
You may see warnings like this when you run
sqlmesh info
:You can avoid these warnings about quota projects by running:
Create and run a plan
We've verified our connection, so we're ready to create and execute a plan in BigQuery:
View results in BigQuery Console
Let's confirm that our project models are as expected.
First, navigate to the BigQuery Studio Console:
Then use the left sidebar to find your project and the newly created models:
We have confirmed that our SQLMesh project is running properly in BigQuery!
Local/Built-in Scheduler
Engine Adapter Type: bigquery
Installation
Connection options
Option | Description | Type | Required |
---|---|---|---|
type |
Engine type name - must be bigquery |
string | Y |
method |
Connection methods - see allowed values below. Default: oauth . |
string | N |
project |
The ID of the GCP project | string | N |
location |
The location of for the datasets (can be regional or multi-regional) | string | N |
execution_project |
The name of the GCP project to bill for the execution of the models. If not set, the project associated with the model will be used. | string | N |
quota_project |
The name of the GCP project used for the quota. If not set, the quota_project_id set within the credentials of the account is used to authenticate to BigQuery. |
string | N |
keyfile |
Path to the keyfile to be used with service-account method | string | N |
keyfile_json |
Keyfile information provided inline (not recommended) | dict | N |
token |
OAuth 2.0 access token | string | N |
refresh_token |
OAuth 2.0 refresh token | string | N |
client_id |
OAuth 2.0 client ID | string | N |
client_secret |
OAuth 2.0 client secret | string | N |
token_uri |
OAuth 2.0 authorization server's toke endpoint URI | string | N |
scopes |
The scopes used to obtain authorization | list | N |
job_creation_timeout_seconds |
The maximum amount of time, in seconds, to wait for the underlying job to be created. | int | N |
job_execution_timeout_seconds |
The maximum amount of time, in seconds, to wait for the underlying job to complete. | int | N |
job_retries |
The number of times to retry the underlying job if it fails. (Default: 1 ) |
int | N |
priority |
The priority of the underlying job. (Default: INTERACTIVE ) |
string | N |
maximum_bytes_billed |
The maximum number of bytes to be billed for the underlying job. | int | N |
Airflow Scheduler
Engine Name: bigquery
In order to share a common implementation across local and Airflow, SQLMesh BigQuery implements its own hook and operator.
Installation
To enable support for this operator, the Airflow BigQuery provider package should be installed on the target Airflow cluster along with SQLMesh with the BigQuery extra:
Connection info
The operator requires an Airflow connection to determine the target BigQuery account. Please see GoogleBaseHook and GCP connectionfor more details. Use the sqlmesh_google_cloud_bigquery_default
(by default) connection ID instead of the google_cloud_default
one in the Airflow guide.
By default, the connection ID is set to sqlmesh_google_cloud_bigquery_default
, but it can be overridden using the engine_operator_args
parameter to the SQLMeshAirflow
instance as in the example below:
Optional Arguments
location
: Sets the default location for datasets and tables. If not set, BigQuery defaults to US for new datasets. Seelocation
in Connection options for more details.
Authentication Methods
- oauth (default)
- Related Credential Configuration:
scopes
(Optional)
- Related Credential Configuration:
- oauth-secrets
- Related Credential Configuration:
token
(Optional): Can be None if refresh information is provided.refresh_token
(Optional): If specified, credentials can be refreshed.client_id
(Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.client_secret
(Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.token_uri
(Optional): Must be specified for refresh, can be left as None if the token can not be refreshed.scopes
(Optional): OAuth 2.0 credentials can not request additional scopes after authorization. The scopes must be derivable from the refresh token if refresh information is provided (e.g. The refresh token scopes are a superset of this or contain a wild card scope like 'https://www.googleapis.com/auth/any-api')
- Related Credential Configuration:
- service-account
- Related Credential Configuration:
keyfile
(Required)scopes
(Optional)
- Related Credential Configuration:
- service-account-json
- Related Credential Configuration:
keyfile_json
(Required)scopes
(Optional)
- Related Credential Configuration:
Permissions Required
With any of the above connection methods, ensure these BigQuery permissions are enabled to allow SQLMesh to work correctly.