Skip to content

Athena

Installation

pip install "sqlmesh[athena]"

Connection options

PyAthena connection options

SQLMesh leverages the PyAthena DBAPI driver to connect to Athena. Therefore, the connection options relate to the PyAthena connection options. Note that PyAthena uses boto3 under the hood so you can also use boto3 environment variables for configuration.

Option Description Type Required
type Engine type name - must be athena string Y
aws_access_key_id The access key for your AWS user string N
aws_secret_access_key The secret key for your AWS user string N
role_arn The ARN of a role to assume once authenticated string N
role_session_name The session name to use when assuming role_arn string N
region_name The AWS region to use string N
work_group The Athena workgroup to send queries to string N
s3_staging_dir The S3 location for Athena to write query results. Only required if not using work_group OR the configured work_group doesnt have a results location set string N
schema_name The default schema to place objects in if a schema isnt specified. Defaults to default string N
catalog_name The default catalog to place schemas in. Defaults to AwsDataCatalog string N

SQLMesh connection options

These options are specific to SQLMesh itself and are not passed to PyAthena

Option Description Type Required
s3_warehouse_location Set the base path in S3 where SQLMesh will instruct Athena to place table data. Only required if you arent specifying the location in the model itself. See S3 Locations below. string N

Model properties

The Athena adapter utilises the following model top-level properties:

Name Description Type Required
table_format Sets the table_type Athena uses when creating the table. Valid values are hive or iceberg. string N
storage_format Configures the file format to be used by the table_format. For Hive tables, this sets the STORED AS option. For Iceberg tables, this sets format property. string N

The Athena adapter recognises the following model physical_properties:

Name Description Type Default
s3_base_location s3:// base URI of where the snapshot tables for this model should be written. Overrides s3_warehouse_location if one is configured. string

S3 Locations

When creating tables, Athena needs to know where in S3 the table data is located. You cannot issue a CREATE TABLE statement without specifying a LOCATION for the table data.

In addition, unlike other engines such as Trino, Athena will not infer a table location if you set a schema location via CREATE SCHEMA <schema> LOCATION 's3://schema/location'.

Therefore, in order for SQLMesh to issue correct CREATE TABLE statements to Athena, you need to configure where the tables should be stored. There are two options for this:

  • Project-wide: set s3_warehouse_location in the connection config. SQLMesh will set the table LOCATION to be <s3_warehouse_location>/<schema_name>/<snapshot_table_name> when it creates a snapshot of your model.
  • Per-model: set s3_base_location in the model physical_properties. SQLMesh will set the table LOCATION to be <s3_base_location>/<snapshot_table_name> every time it creates a snapshot of your model. This takes precedence over any s3_warehouse_location set in the connection config.

Limitations

Athena was initially designed to read data stored in S3 and to do so without changing that data. This means that it does not have good support for mutating tables. In particular, it will not delete data from Hive tables.

Consequently, forward only changes that mutate the schemas of existing tables have a high chance of failure because Athena supports very limited schema modifications on Hive tables.

However, Athena does support Apache Iceberg tables which allow a full range of operations. These can be used for more complex model types such as INCREMENTAL_BY_UNIQUE_KEY and SCD_TYPE_2.

To use an Iceberg table for a model, set table_format iceberg in the model properties.

In general, Iceberg tables offer the most flexibility and you'll run into the least SQLMesh limitations when using them. However, we create Hive tables by default because Athena creates Hive tables by default, so Iceberg tables are opt-in rather than opt-out.