Decode GA4
Overview
Decode GA4 is a BigQuery data automation utility, which transforms and exports GA4 data to Cloud Storage (Google, AWS or Azure).
It empowers consumers of the Google Analytics 4 BigQuery export to:
- Automate extraction of complex nested data and evolving schemas/parameters.
- Detect modifications in source data, and automatically reprocess impacted partitions.
- Simplify the structure to support downstream analytics.
- Augment data to unlock downstream use-cases.
- Optimize compression and storage for ongoing cost savings.
- Access prepared data directly from external systems.
It removes the complexity required to handle the structural nuances of the data export, simplifying downstream data modelling and analytics. It is a robust, efficient, future-proof and low-maintenance solution to the practical challenges presented by the GA4 BigQuery export.
Objectives
Working with GA4 data in BigQuery is extremely challenging, due to the schema and nested structure. Decode GA4 'shifts-left' these challenging activities, providing an automated, dynamic and configurable solution.
This means that data engineers, analytics engineers and analysts can focus on downstream value-add activitities, instead of the low-level engineering which the GA4 export requires to make it useful.
We also understand that GA4 data is useful beyond Google Cloud, and provide a simple mechanism to export the transformed data efficiently to external storage and systems.
Architecture
Decode GA4 is built entirely within Google BigQuery, fitting into and simplifying existing GA4-related workflows.
It uses Google Cloud Storage as the partitioned, compressed file store for transformed event data and BigQuery external tables to read the data into BigQuery when required. This both compresses the data and stores it in a low-cost, flexible location for significant, immediate ongoing cost savings. It can also be configured to load data directly into BigQuery as a date-partitioned table.
Execution
Metadata-driven incremental logic is deployed to lower processing costs, and schema changes are automatically incorporated, so a full data refresh is never required in order to reflect a new schema in the output data table.
It is designed to process each inbound date partition precisely once, unless upstream data changes are detected - which are often observed unpredictably in the BigQuery GA4 export.
Transformation
This supports subsequent transformations using any method or tool such as DBT, Dataform or SQLMesh. It also supports deployment and execution of simple additional data transformation natively in BigQuery.
External Storage
Transformed event data (or the outputs of subsequent transformations) can also be exported to AWS S3 or Azure Blob Storage, empowering simple, robust cross-cloud workflows.
External Systems
Once exported to Cloud Storage in compressed parquet format and in a hive-partitioned folder structure, it can be efficiently queried and explored using external systems such as DuckDB or Motherduck for significant query cost reductions.