Decode GA4

Overview

Decode GA4 is a BigQuery utility which empowers consumers of the Google Analytics 4 BigQuery export to:

  • Automate extraction of complex nested data and evolving schemas/parameters.
  • Detect modifications in source data, and automatically reprocess impacted partitions.
  • Simplify the structure to support downstream analytics.
  • Augment data to unlock downstream use-cases.
  • Optimize compression and storage for ongoing cost savings.
  • Access prepared data directly from external systems.

It removes the complexity required to handle the structural nuances of the data export, simplifying downstream data modelling and analytics. It is a robust, efficient, future-proof and low-maintenance solution to the practical challenges presented by the GA4 BigQuery export.

Philosophy

Working with GA4 data in BigQuery is extremely challenging due to the schema and nested structure. Decode GA4 'shifts-left' these challenging activities, providing an automated, dynamic and configurable solution.

This means that data engineers, analytics engineers and analysts can focus on downstream value-add activitities, instead of the low-level engineering which the GA4 export requires to make it useful.

We also understand that GA4 data is useful beyond Google Cloud, and provide a simple mechanism to export the transformed data efficiently to externally-accessible and external destinations.

Architecture

Decode GA4 is built entirely within Google BigQuery, fitting into and simplifying existing GA4-related workflows.

Decode GA4 Architecture

It uses Google Cloud Storage as the partitioned, compressed file store for transformed event data and BigQuery external tables to read the data into BigQuery when required. This both compresses the data and stores it in a low-cost, flexible location for significant, immediate ongoing cost savings. It can also be configured to load data directly into BigQuery as a date-partitioned table.

Execution

Metadata-driven incremental logic is deployed to lower processing costs, and schema changes are automatically incorporated, so a full data refresh is never required in order to reflect a new schema in the output data table.

It is designed to process each inbound date partition precisely once, unless upstream data changes are detected - which is observed unpredictably in the BigQuery GA4 export.

This supports subsequent BigQuery transformations using any method or tool such as Dataform, DBT or SQLMesh. It also supports deployment and execution of any subsequent data transformation natively in BigQuery.

Transformed event data (or the outputs of subsequent transformations) can also be exported to AWS S3 or Azure Blob Storage, empowering simple, robust cross-cloud workflows.