Dataform

Overview

Dataform is a Google Cloud service for managing SQL workflows in BigQuery. Because it runs natively within GCP, no separate connection configuration is needed. By setting destination_dataset_id to a dataset that Dataform uses as a source, the Decode GA4 events table is written directly into a dataset Dataform can declare and reference.

Requirements

  • A Dataform repository in Google Cloud
  • Decode GA4 data exported using the events_external template, which stores transformed data in GCS as Parquet files and exposes it as an external BigQuery table
  • The Dataform service account must have Storage Object Viewer access to the GCS bucket where Decode GA4 stores its Parquet files
  • A Decode GA4 installation using destination_dataset_id set to a BigQuery dataset that Dataform uses as a source

Setup

1. Configure Decode GA4

Set destination_dataset_id to the BigQuery dataset you want Dataform to read from. Decode GA4 will write the events table into that dataset, making it available to declare as a Dataform source.

DECLARE options JSON;

SET options = JSON '''
{
    "ga4_dataset_id": "project_id.ga4_dataset_name",
    "transform_config_template": "events_external",
    "gcs_bucket_name": "bucketname",
    "destination_dataset_id": "project_id.dataform_sources"
}
''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
    );

CALL `project_id.ga4_dataset_name.install_decode_ga4`();

CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);

2. Grant GCS access to the Dataform service account

The external events table reads Parquet files from GCS. The Dataform service account must be able to read from the bucket.

In the Google Cloud Console, grant the Storage Object Viewer role (roles/storage.objectViewer) on your Decode GA4 GCS bucket to the Dataform service account. The service account is listed under your Dataform repository settings.

3. Declare the Decode GA4 events table as a source

In your Dataform repository, create a declaration file (e.g. definitions/sources/events.sqlx):

config {
  type: "declaration",
  database: "your-gcp-project-id",
  schema: "dataform_sources",
  name: "events"
}

Replace dataform_sources with the dataset name you set as destination_dataset_id.

4. Reference the source in a model

Create a model file (e.g. definitions/pageviews.sqlx):

config { type: "table" }

select
    partition_date AS event_date,
    event_name,
    event_param.page_location,
    count(*) as event_count
from ${ref("events")}
where event_name = 'page_view'
group by partition_date, event_name, page_location

Further Reading