Events
The events table is a non-destructive, unfiltered, zero-assumption transformation of raw events data. This means that the row count by date will exactly match the row count for each corresponding date shard of the source events data.
Partitioning
The output events table is date-partitioned, which permits optimized date queries from downstream systems, is easy to work with for humans or agents and fits well within BigQuery's limits on table partitions.
If you have installed using the events_external template, the output table is hive-partitioned on the partition_date DATE column. This means that partition_date should be used in the WHERE clause of subsequent filtering statements.
The events_partitioned template will result in a date-partitioned Native Table in BigQuery, partitioned on the event_date DATE column, which should be used in subsequent WHERE clause filters.
Schema
The output schema is composed of existing and converted columns from the source data, restructured columns, as well as additional data and metadata columns.
Augmentation
In addition to the event_count, event_param and user_property STRUCT columns, the following additional fields are added to support downstream modelling. SQL definitions are included in the SQL Definitions section.
Metadata
| Column | Data Type | Description |
|---|---|---|
event_id | STRING | Unique id identifying each event |
ga_session_id | STRING | Extracted from ga_session_id event_params, corresponding to the timestamp of the event start |
ga4_dataset_id | STRING | The dataset_id (project_id.dataset_name) of the GA4 property |
session_id | STRING | Session id derived from a combination of stream_id, user_pseudo_id and ga_session_id |
Consent
| Column | Data Type | Description |
|---|---|---|
ga_session_id_is_null | BOOL | Flag whether the ga_session_id is null |
user_pseudo_id_is_null | BOOL | Flag whether the user_pseudo_id is null |
consent_status | STRING | Consent status for each event |
Localization
The IANA tz database timezone in which each event occurred is derived from the geo.country, geo.region and geo.city columns. This supports offsetting the actual observed timestamps to compute the precise local time-of-day that an event occurred, for more accurate user journey time-delta metric modelling.
| Column | Field Path | Data Type | Description |
|---|---|---|---|
local | local.timezone_id | STRING | Timezone ID (e.g. Europe/Madrid) |
local | local.timezone_name | STRING | Timezone name (e.g. Central European Standard Time) |
local | local.country_code | STRING | ISO 3166-1 two-letter country code |
local | local.timezone_source | STRING | city, region or country depending on the matching level of the geo fields |
local | local.latitude | STRING | Latitude of the identified location |
local | local.longitude | STRING | Longitude of the identified location |
local | local.event_date | DATE | Locally-adjusted event_date column |
local | local.event_timestamp | DATE | Locally-adjusted event_timestamp column |
local | local.event_previous_timestamp | DATE | Locally-adjusted event_previous_timestamp column |
local | local.user_first_touch_timestamp | DATE | Locally-adjusted user_first_touch_timestamp column |
Approximate latitude and longitude is also included for versatile mapping applications.
SQL Definitions
Default Configuration
The default configurations for web and app streams are aligned to the Automatically Collected Events documentation. Additional detected values will also be automatically reflected in the output schema.
These default values can be excluded upon installation by setting the include_default_events and/or include_default_event_params BOOL options to false.
Metadata
Column Descriptions
Note that the events table does not contain column descriptions. This is intentional, as Decode GA4 builds a foundation model which will require subsequent transformation in order to improve downstream analytics performance and agentic understanding.
Since column names do not propagate automatically in subsequent transformation steps, the recommended approach is to add column names to the final analytics/agent-ready downstream tables only, which then act as valuable context for users, BI tools, Semantic Layers, Agents or MCPs.
The input schema and descriptions should be provided as context to the LLM which is writing the column descriptions, however human verification before publication is always advised.