dbt

Overview

dbt (data build tool) is an open-source transformation framework that lets you write SQL models, manage dependencies, and test your data warehouse. By setting destination_dataset_id to one of your dbt source datasets, the Decode GA4 events table is written directly into a dataset that dbt can reference as a source — no intermediate loading step required.

Requirements

  • dbt Core with the dbt-bigquery adapter, or a dbt Cloud account
  • Decode GA4 data exported using the events_external template, which stores transformed data in GCS as Parquet files and exposes it as an external BigQuery table
  • The BigQuery service account (or user account) used by dbt must have Storage Object Viewer access to the GCS bucket where Decode GA4 stores its Parquet files
  • A Decode GA4 installation using destination_dataset_id set to a BigQuery dataset that dbt uses as a source

Setup

1. Install dbt-bigquery

pip install dbt-bigquery

2. Configure Decode GA4

Set destination_dataset_id to the BigQuery dataset you want dbt to read from. Decode GA4 will write the events table into that dataset, making it available as a dbt source.

DECLARE options JSON;

SET options = JSON '''
{
    "ga4_dataset_id": "project_id.ga4_dataset_name",
    "transform_config_template": "events_external",
    "gcs_bucket_name": "bucketname",
    "destination_dataset_id": "project_id.dbt_sources"
}
''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
    );

CALL `project_id.ga4_dataset_name.install_decode_ga4`();

CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);

3. Grant GCS access to your BigQuery principal

The external events table reads Parquet files from GCS. The account dbt uses to query BigQuery must also be able to read from the bucket.

In the Google Cloud Console, grant the Storage Object Viewer role (roles/storage.objectViewer) on your Decode GA4 GCS bucket to the service account or user account that dbt authenticates with.

4. Configure your dbt connection

In ~/.dbt/profiles.yml, add a BigQuery profile. The simplest approach uses OAuth via gcloud:

my_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: your-gcp-project-id
      dataset: your_dbt_output_dataset
      threads: 4
      location: EU

Authenticate with:

gcloud auth application-default login

For a service account, use method: service-account and provide a keyfile path instead.

5. Declare the Decode GA4 events table as a source

In your dbt project, create a sources.yml file (e.g. models/sources.yml):

sources:
  - name: decode_ga4
    database: your-gcp-project-id
    schema: dbt_sources
    tables:
      - name: events

Replace dbt_sources with the dataset name you set as destination_dataset_id.

6. Reference the source in a model

select
    partition_date AS event_date,
    event_name,
    event_param.page_location,
    count(*) as event_count
from {{ source('decode_ga4', 'events') }}
where event_name = 'page_view'
group by partition_date, event_name, page_location

Run your models with:

dbt run

Further Reading