SQLMesh
Overview
SQLMesh is an open-source data transformation framework with native BigQuery support. By setting destination_dataset_id to one of your SQLMesh source datasets, the Decode GA4 events table is written directly into a dataset that SQLMesh can reference as an external model — no intermediate loading step required.
Requirements
- SQLMesh with the BigQuery extra
- Decode GA4 data exported using the
events_externaltemplate, which stores transformed data in GCS as Parquet files and exposes it as an external BigQuery table - The BigQuery service account (or user account) used by SQLMesh must have Storage Object Viewer access to the GCS bucket where Decode GA4 stores its Parquet files
- A Decode GA4 installation using
destination_dataset_idset to a BigQuery dataset that SQLMesh uses as a source
Setup
1. Install SQLMesh with BigQuery support
pip install "sqlmesh[bigquery]"2. Configure Decode GA4
Set destination_dataset_id to the BigQuery dataset you want SQLMesh to read from. Decode GA4 will write the events table into that dataset, making it available as a SQLMesh external model.
DECLARE options JSON;
SET options = JSON '''
{
"ga4_dataset_id": "project_id.ga4_dataset_name",
"transform_config_template": "events_external",
"gcs_bucket_name": "bucketname",
"destination_dataset_id": "project_id.sqlmesh_sources"
}
''';
EXECUTE IMMEDIATE (
SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
);
CALL `project_id.ga4_dataset_name.install_decode_ga4`();
CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);3. Grant GCS access to your BigQuery principal
The external events table reads Parquet files from GCS. The account SQLMesh uses to query BigQuery must also be able to read from the bucket.
In the Google Cloud Console, grant the Storage Object Viewer role (roles/storage.objectViewer) on your Decode GA4 GCS bucket to the service account or user account that SQLMesh authenticates with.
4. Configure your SQLMesh project
In your project config.yaml, add a BigQuery gateway:
gateways:
default:
connection:
type: bigquery
project: your-gcp-project-id
location: EU
model_defaults:
dialect: bigquery
start: 2020-01-01Authenticate with:
gcloud auth application-default loginFor a service account, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your key file instead.
5. Declare the Decode GA4 events table as an external model
Create an external_models.yaml file in your project (or add to an existing one):
- name: your-gcp-project-id.sqlmesh_sources.events
gateway: defaultReplace sqlmesh_sources with the dataset name you set as destination_dataset_id.
6. Reference the source in a model
Create a model file (e.g. models/pageviews.sql):
MODEL (
name your_project.your_dataset.pageviews,
kind FULL
);
select
partition_date AS event_date,
event_name,
event_param.page_location,
count(*) as event_count
from your-gcp-project-id.sqlmesh_sources.events
where event_name = 'page_view'
group by partition_date, event_name, page_locationRun your models with:
sqlmesh run