Home Integrations Cloud Storage Amazon S3

Amazon Web Services S3

Overview

Exporting transformed data to Amazon S3 requires a BigQuery connection to AWS. The connection allows BigQuery to authenticate with AWS, write to your S3 bucket using the EXPORT DATA statement and read data back to the external table in BigQuery.

Requirements

Google Cloud

The executing user or service account must have:

  • BigQuery Connection Admin (roles/bigquery.connectionAdmin) — to create the AWS connection

AWS

An AWS IAM role with the following policy attached:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::BUCKET_NAME"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": ["arn:aws:s3:::BUCKET_NAME/*"]
    }
  ]
}

Replace BUCKET_NAME with your S3 bucket name.

Connection Setup

1. Create the IAM role in AWS

In the AWS IAM console, create a new role with Web Identity as the trusted entity type. Select Google as the identity provider and enter 0 as a temporary audience value (you will update this after the BigQuery connection is created). Attach the S3 policy above to the role, and set the maximum session duration to 12 hours.

2. Create the BigQuery connection

bq mk --connection \
  --connection_type='AWS' \
  --iam_role_id=arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME \
  --location=AWS_REGION \
  CONNECTION_ID

For AWS_REGION, use the BigQuery AWS region identifier, e.g. aws-us-east-1. For ACCOUNT_ID and ROLE_NAME, use the AWS account ID and the role name created in step 1.

3. Get the Google identity

After creating the connection, retrieve the Google identity that BigQuery uses to assume the AWS role:

bq show --connection --project_id=PROJECT_ID --location=AWS_REGION CONNECTION_ID

Note the googleIdentity value from the output (a numeric string).

4. Update the AWS role trust policy

In the AWS IAM console, update the trust relationship on your role to:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Federated": "accounts.google.com"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "accounts.google.com:sub": "GOOGLE_IDENTITY_ID"
        }
      }
    }
  ]
}

Replace GOOGLE_IDENTITY_ID with the googleIdentity value from the previous step.

Example

The following install script will install Decode GA4, run the transformation and export all transformed data to Amazon S3:

DECLARE options JSON;

SET options = JSON '''
{
    "ga4_dataset_id": "project_id.ga4_dataset_name",
    "transform_config_template": "events_external",
    "s3_bucket_name": "my-bucket",
    "connection_id": "project_id.aws-us-east-1.my_s3_connection"
}
''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
    );

CALL `project_id.ga4_dataset_name.install_decode_ga4`();

CALL `project_id.decode_ga4_dataset_name.RUN`(NULL);

Note that you will have to replace project_id, ga4_dataset_name, and the connection details with your actual values. The connection_id must match the connection created in the setup steps above, in the format project_id.aws_region.connection_name.

Further Reading