Bruin

Overview

Bruin is an open-source data pipeline CLI that lets you define SQL and Python assets in YAML, manage dependencies, and run transformations directly against BigQuery. By setting destination_dataset_id to a BigQuery dataset that Bruin references as a connection, the Decode GA4 events table is written directly into a dataset your Bruin pipelines can query — no intermediate loading step required.

Requirements

  • Bruin CLI installed
  • Decode GA4 data exported using the events_external template, which stores transformed data in GCS as Parquet files and exposes it as an external BigQuery table
  • The BigQuery service account (or user account) used by Bruin must have Storage Object Viewer access to the GCS bucket where Decode GA4 stores its Parquet files
  • A Decode GA4 installation using destination_dataset_id set to a BigQuery dataset that Bruin uses as a source

Setup

1. Install Bruin

curl -LsSf https://raw.githubusercontent.com/bruin-data/bruin/refs/heads/main/install.sh | sh

2. Configure Decode GA4

Set destination_dataset_id to the BigQuery dataset you want Bruin to read from. Decode GA4 will write the events table into that dataset.

DECLARE options JSON;

SET options = JSON '''
{
    "ga4_dataset_id": "project_id.ga4_dataset_name",
    "transform_config_template": "events_external",
    "gcs_bucket_name": "bucketname",
    "destination_dataset_id": "project_id.bruin_sources"
}
''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
    );

CALL `project_id.ga4_dataset_name.install_decode_ga4`();

CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);

3. Grant GCS access to your BigQuery principal

The external events table reads Parquet files from GCS. The account Bruin uses to query BigQuery must also be able to read from the bucket.

In the Google Cloud Console, grant the Storage Object Viewer role (roles/storage.objectViewer) on your Decode GA4 GCS bucket to the service account or user account that Bruin authenticates with.

4. Configure a BigQuery connection

In your Bruin project's .bruin.yml, add a BigQuery connection:

connections:
  google_cloud_platform:
    - name: bigquery_default
      project_id: your-gcp-project-id
      location: EU
      use_application_default_credentials: true

Authenticate with Application Default Credentials:

gcloud auth application-default login

For a service account, replace use_application_default_credentials with service_account_file: /path/to/key.json.

5. Reference the Decode GA4 events table in an asset

Create a SQL asset file (e.g. assets/pageviews.sql):

/* @bruin

name: your_dataset.pageviews
type: bq.sql
connection: bigquery_default

materialization:
  type: table

@bruin */

select
    partition_date AS event_date,
    event_name,
    event_param.page_location,
    count(*) as event_count
from `your-gcp-project-id.bruin_sources.events`
where event_name = 'page_view'
group by partition_date, event_name, page_location

Replace bruin_sources with the dataset name you set as destination_dataset_id.

Run the pipeline with:

bruin run assets/pageviews.sql

Further Reading