Test

Objectives

Installing GA4 does not incur any Decode GA4 consumption costs, and gives you immediate access to a number of useful metadata resources. However it is not until you have called the RUN function that your data is transformed and your output tables are created.

You can use the pricing calculator to estimate the volume and cost for monthly automation and backfill of historic data.

However, if you want to sample a subset of data before committing to a full backfill, then the following approach is recommended.

Approach

This approach will install Decode GA4 and run on a recent subset of data, with the option to run a full or targeted backfill after verification.

Test Installation

Subscribe to Decode GA4 via Google Cloud Marketplace.
Log into the Installer App by clicking Manage on provider or by going directly to ga4.decodedata.io.
Upon login you will be taken to the GA4 Properties page. Select the Google Cloud Project and GA4 dataset from the dropdowns.
Select the Installation Type. If the Quick Install type is External Storage Destination, then you need to input the GCS Bucket Name too. You will now see the installation options.
Ensure that the Run on installation option is checked.
Set the number of days to backfill — a minimum of 7 days is required, with 30–90 days recommended for a meaningful sample.
Click Install Decode GA4.
Navigate to BigQuery Studio, where you will find both the linked dataset and the newly created dataset and resources.

Verification

Take some time to review the output structure and verify it meets your expectations. If needed, revise your configuration — for example, to exclude any parameters that are not relevant to your analysis.

Backfill

Upon verification, you may want to run a full or targeted backfill. The commands and configurations are presented below, and the full documentation is available in the run section.

Full

To process all unprocessed data in one go, set the run_mode to auto as a one-time operation.

CALL `deployment_dataset_id.RUN`(
  JSON'{"run_mode": "auto"}'
  )

Executing this procedure will fill in any unprocessed date partitions from the source to the destination.

Range

To process a specific date range, set the range with the desired start_date and end_date values.

CALL `deployment_dataset_id.RUN`(
  JSON'{"run_mode": "range", "start_date": "YYYY-MM-DD", "end_date": "YYYY-MM-DD"}')

Executing this procedure will process or reprocess any date partitions in the date range from the source to the destination.

Automation

Once you are satisfied with the output, schedule the function as per the options in the automate section.

For automation, the run_mode is typically set to auto or incremental, which will only process unprocessed date partitions.

Depending on your downstream use case, you may need to set the auto_partition_detection, auto_schema_evolution and auto_parameter_evolution booleans to configure behaviour in response to input changes to contents or schema.

Full documentation is provided in the run section.