Automation

Since execution of the RUN function can be triggered via a simple function call, there are a number of different automation options available, depending on your specific tools and objectives.

Scheduling

The time of day at which the GA4 BigQuery export arrives is notoriously unpredictable, posing a scheduling challenge: run on a schedule, and risk running before new data has arrived; or deploy infrastructure to trigger the transformation when new data is detected?

Thankfully, we can look at the metadata to help inform the decision. Running this query post-installation on the _partitions metadata table in BigQuery and analysing the results should help identify potential patterns and thresholds.

select partition_date, creation_time_decimal 
from `[deployment_dataset_id]._partitions`

From this scatter plot, it is clear that 100% of all inbound data arrives before 10:00 UTC, so it makes sense to schedule Decode GA4 at this time. However you should run this query on your own metadata to confirm the appropriate scheduling time.

You can also use statistical methods on this metadata to set the timing based on e.g. confidence intervals.

The built-in Vizualisation tab in the Query results pane of BigQuery Studio enables you to quickly inspect the time-series plot on your data, using the following chart configuration on the outputs of the _partitions query:

select partition_date, creation_time_decimal 
from `[deployment_dataset_id]._partitions`

Visualization type: Scatter
Dimension (x-axis): partition_date
Measures (y-axis): creation_time_decimal

Tools

Native Scheduling

The simplest scheduling approach is using Scheduled Queries in BigQuery, on a set schedule every day. Since there is no Decode GA4 cost if no data is processed, you can run this multiple times a day to catch late-arriving data.

This could also be triggered via a Cloud Workflow, for enterprise integration with CI/CD pipelines.

Transformation Tools

In order to use the outputs of Decode GA4 in a downstream transformation tool, it is recommended to set the destination_dataset_id option upon installation.

This will result in a segregated dataset containing only the transformed GA4 event data and statistics, to be used as a clean input to the subsequent process.

Executing of the RUN function can then be incorporated into the pre-operations for the tool you are using, which is implemented slightly differently in different tools:

Tool	Approach
dbt	Execute a pre-hook before core model transformations.
Dataform	Execute a pre-operation before core model transformations.
SQLMesh	Execute a SQL pre_statement before core model transformations (also Python).

Orchestration Tools

Common orchestration tools permit execution of BigQuery procedures via SQL execution.

Tool	Approach
Airflow	Execute a `CALL` statement via a BigQueryInsertJobOperator query job configuration.
Dagster	Execute a `CALL` statement using the BigQueryResource from `dagster-gcp`.
Prefect	Execute a `CALL` statement via the bigquery_query task from `prefect-gcp`.
Kestra	Execute a `CALL` statement via the BigQuery Query task.
Orchestra	Execute a `CALL` statement via the Run SQL BigQuery task.

Essentially, any mechanism via which you can execute SQL on BigQuery can be used to trigger Decode GA4.