SQLMesh

Overview

SQLMesh is an open-source data transformation framework with native BigQuery support. By setting destination_dataset_id to one of your SQLMesh source datasets, the Decode GA4 events table is written directly into a dataset that SQLMesh can reference as an external model — no intermediate loading step required.

Requirements

  • SQLMesh with the BigQuery extra
  • Decode GA4 data exported using the events_external template, which stores transformed data in GCS as Parquet files and exposes it as an external BigQuery table
  • The BigQuery service account (or user account) used by SQLMesh must have Storage Object Viewer access to the GCS bucket where Decode GA4 stores its Parquet files
  • A Decode GA4 installation using destination_dataset_id set to a BigQuery dataset that SQLMesh uses as a source

Setup

1. Install SQLMesh with BigQuery support

pip install "sqlmesh[bigquery]"

2. Configure Decode GA4

Set destination_dataset_id to the BigQuery dataset you want SQLMesh to read from. Decode GA4 will write the events table into that dataset, making it available as a SQLMesh external model.

DECLARE options JSON;

SET options = JSON '''
{
    "ga4_dataset_id": "project_id.ga4_dataset_name",
    "transform_config_template": "events_external",
    "gcs_bucket_name": "bucketname",
    "destination_dataset_id": "project_id.sqlmesh_sources"
}
''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
    );

CALL `project_id.ga4_dataset_name.install_decode_ga4`();

CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);

3. Grant GCS access to your BigQuery principal

The external events table reads Parquet files from GCS. The account SQLMesh uses to query BigQuery must also be able to read from the bucket.

In the Google Cloud Console, grant the Storage Object Viewer role (roles/storage.objectViewer) on your Decode GA4 GCS bucket to the service account or user account that SQLMesh authenticates with.

4. Configure your SQLMesh project

In your project config.yaml, add a BigQuery gateway:

gateways:
  default:
    connection:
      type: bigquery
      project: your-gcp-project-id
      location: EU

model_defaults:
  dialect: bigquery
  start: 2020-01-01

Authenticate with:

gcloud auth application-default login

For a service account, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your key file instead.

5. Declare the Decode GA4 events table as an external model

Create an external_models.yaml file in your project (or add to an existing one):

- name: your-gcp-project-id.sqlmesh_sources.events
  gateway: default

Replace sqlmesh_sources with the dataset name you set as destination_dataset_id.

6. Reference the source in a model

Create a model file (e.g. models/pageviews.sql):

MODEL (
  name your_project.your_dataset.pageviews,
  kind FULL
);

select
    partition_date AS event_date,
    event_name,
    event_param.page_location,
    count(*) as event_count
from your-gcp-project-id.sqlmesh_sources.events
where event_name = 'page_view'
group by partition_date, event_name, page_location

Run your models with:

sqlmesh run

Further Reading