# Google BigQuery

{% hint style="info" %}
**Private preview**: This integration is available to all accounts.
{% endhint %}

The Google BigQuery integration allows users to query policy protected data directly in BigQuery as secure views within an Immuta-created dataset. Immuta controls who can see what within the views, allowing data governors to create complex ABAC policies and data users to query the right data within the BigQuery console.

### Configuration

Google BigQuery is configured through the Immuta console and a script provided by Immuta. While you can complete some steps within the BigQuery console, it is easiest to install using gcloud and the Immuta script.

1. [Create a custom role and assign that role to a custom user to use as the Immuta system account.](#configure-the-google-bigquery-integration)
2. [Enable the integration in the Immuta console.](#enable-the-google-bigquery-integration)

### Protect your data

Once Google BigQuery has been configured, BigQuery admins can start creating subscription and data policies to meet compliance requirements and users can start querying policy protected data directly in BigQuery.

1. Create a global [subscription](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents/reference-guides/subscription-policies) or [supported data policy](#supported-policies).
2. [Register your BigQuery tables and views in Immuta as data sources.](https://documentation.immuta.com/SaaS/configuration/integrations/data-and-integrations/registering-metadata/register-data-sources/bigquery-tutorial)
3. [Recommended: Organize your data sources into domains and assign domain permissions to accountable teams.](https://documentation.immuta.com/SaaS/configuration/domains/configure-domains)
4. Revoke user access to the original datasets and grant users access to the Immuta created datasets in BigQuery.
5. Users query data from the Immuta created datasets directly in BigQuery.

### FAQs

1. What permissions will Immuta have in my BigQuery environment?
   * You can find a list of the permissions the custom Immuta role has [here](#configure-the-google-bigquery-integration).
2. What integration features will Immuta support for BigQuery?
   * Immuta can enforce [specific policies](#supported-policies) on data in a single BigQuery project. At this time, workspaces, tag ingestion, user impersonation, query audit, and multiple integrations are not supported.

## Google BigQuery integration conceptual overview

In this policy push integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. Access controls are applied in the view, allowing users to leverage Immuta’s powerful set of attribute-based policies and query data directly in BigQuery.

BigQuery is organized by projects (which can be thought of as databases), datasets (which can be compared to schemas), tables, and views. When you enable the integration, an Immuta dataset is created in BigQuery that contains the Immuta-required user entitlements information. These objects within the Immuta dataset are intended to only be used and altered by the Immuta application.

After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.

### Secure views

The Immuta integration uses a mirrored dataset approach. That is, if the source dataset is named `mydataset`, Immuta will create a dataset named `mydataset_secure`, assuming that `_secure` is the specified Immuta dataset suffix. This mirrored dataset is an [authorized dataset](https://cloud.google.com/bigquery/docs/authorized-datasets#console), allowing it to access the data of the original dataset. It will contain the Immuta-managed views, which have identical names to the original tables they’re based on.

### Managing access

Following the principle of least privilege, Immuta does not have permission to manage Google Cloud Platform users, specifically in granting or denying access to a project and its datasets. This means that data governors should limit user access to original datasets to ensure data users are accessing the data through the Immuta created views and not the backing tables. The only users who need to have access to the backing tables are the credentials used to register the tables in Immuta.

Additionally, a data governor must grant users access to the mirrored datasets that Immuta will create and populate with views. Immuta and BigQuery’s best practice recommendation is to grant access via groups in Google Cloud Platform. Because users still must be registered in Immuta and subscribed to an Immuta data source to be able to query Immuta views, all Immuta users can be granted access to the mirrored datasets that Immuta creates.

<figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2Fgit-blob-b695a151ca292c174d3ce039662595bed687249e%2Fbigquery.png?alt=media" alt=""><figcaption></figcaption></figure>

### Integration health status

The definitions for each status and the state of configured data platform integrations is available in the [response schema of the integrations API](https://documentation.immuta.com/SaaS/developer-guides/api-intro/integrations-api/reference-guides/response-schema#integration-statuses).

### Limitations

* This integration can only be enabled through a manual bootstrap using the Immuta API.
* This integration can only be enabled to work in a single region.
* BigQuery does not allow views partitioned by pseudo-columns. If you would like to partition a table by a pseudo-column and have Immuta govern it, take the following steps:
  * Create a view in BigQuery of the partitioned table, with the pseudo-column aliased. For example,

    ```
    create view `sales`.`emea`.`sales_view` as SELECT *, _PARTITIONTIME as __partitiontime from `sales`.`emea`.`sales`
    ```
  * [Register this view as a BigQuery data source](https://documentation.immuta.com/SaaS/configuration/data-and-integrations/registering-metadata/register-data-sources/bigquery-tutorial#register-data-sources-in-immuta) in Immuta.
  * Immuta will then be able to create Immuta-managed views off of this view with the pseudo-column aliased.

### Supported policies

This integration supports the following policy types:

* Column masking
  * Mask using hashing (SHA256())
  * Mask by making NULL
  * Mask using constant
  * Mask using a regular expression
  * Mask by date rounding
  * Mask by numeric rounding
  * Mask using custom functions
* Row-level masking
* Row visibility based on user attributes and/or object attributes
* Only show rows that fall within a given time window
* Minimize rows
* Filter rows using custom WHERE clause
* Always hide rows

### Additional resources

See the resources below to start implementing and using the BigQuery integration:

* [Configuring the Google BigQuery integration](#configure-the-google-bigquery-integration)
* [Creating BigQuery data sources](https://documentation.immuta.com/SaaS/configuration/integrations/data-and-integrations/registering-metadata/register-data-sources/bigquery-tutorial)
* Building global [subscription](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents/reference-guides/subscription-policies) and [data policies](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/data-policies/reference-guides/data-policies) to govern data
* [Creating projects](https://documentation.immuta.com/SaaS/govern/secure-your-data/projects-and-purpose-based-access-control/purpose-index/reference-guides/projects) to collaborate

## Configure the Google BigQuery integration

Follow this guide to connect your Google BigQuery data warehouse to Immuta.

### Google Cloud service account and role used by Immuta to connect to Google BigQuery

The Google BigQuery integration requires you to create a Google Cloud service account and role that will be used by Immuta to

* create a Google BigQuery dataset that will be used to store a table of user entitlements, UDFs for policy enforcement, etc.
* manage the table of user entitlements via updates when entitlements change in Immuta.
* create datasets and secure views with access control policies enforced, which mirror tables inside of datasets you ingest as Immuta data sources.

You have two options to create the required Google Cloud service account and role:

* [Run the script provided by Immuta](#create-a-service-account-and-role-by-running-the-script-provided-by-immuta)
* [Use the Google Cloud Console](#create-a-service-account-and-role-by-using-google-cloud-console)

#### The Immuta script

The `bootstrap.sh` script is a shell script provided by Immuta that creates prerequisite Google Cloud IAM objects for the integration to connect. When you run this script from your command line, it will create the following items, [scoped at the project-level](#user-content-fn-1)[^1]:

* A new Google Cloud IAM role
* A new Google Cloud service account, which will be granted the newly-created role
* A JSON keyfile for the newly-created service account

You will need to use the objects created in these steps to [enable the Google BigQuery integration](#enable-the-google-bigquery-integration).

**Google Cloud IAM roles required to run the script**

To execute `bootstrap.sh` from your command line, you must be authenticated to the gcloud CLI utility as a user with **all** of the following roles:

* `roles/iam.roleAdmin`
* `roles/iam.serviceAccountAdmin`
* `roles/serviceusage.serviceUsageAdmin`

Having these three roles is the least-privilege set of Google Cloud IAM roles required to successfully run the `bootstrap.sh` script from your command line. However, having **either** of the following Google Cloud IAM roles will also allow you to run the script successfully:

* `roles/editor`
* `roles/owner`

#### Create a service account and role by running the script provided by Immuta

1. Install [gcloud](https://cloud.google.com/sdk/docs/install).
2. Set the account property in the core section for Google Cloud CLI to the account gcloud should use for authentication. (You can run gcloud auth list to see your currently available accounts):

   ```bash
   gcloud config set account ACCOUNT
   ```
3. In Immuta, navigate to the **App Settings** page and click the **Integrations** tab.
4. Click **Add Integration** and select **Google BigQuery** from the dropdown menu.
5. Click **Select Authentication Method** and select **Key File**.
6. Click **Download Script(s)**.
7. Before you run the script, update your permissions to execute it:

   ```bash
   chmod 755 <path to downloaded script>
   ```
8. Run the script, where

   * **PROJECT\_ID** is the Google Cloud Platform project to operate on.
   * **ROLE\_ID** is the name of the custom role to create.
   * **NAME** will create a service account with the provided name.
   * **OUTPUT\_FILE** is the path where the resulting private key should be written. File system write permission will be checked on the specified path prior to the key creation.
   * **undelete-role** (optional) will undelete the custom role from the project. Roles that have been deleted for a long time can't be undeleted. This option can fail for the following reasons:
     * The role specified does not exist.
     * The active user does not have permission to access the given role.
   * **enable-api** (optional) provided you’ve been granted access to enable the Google BigQuery API, will enable the service.

   ```bash
   $ bootstrap.sh \
       --project PROJECT_ID \
       --role ROLE_ID \
       --service_account NAME \
       --keyfile OUTPUT_FILE \
       [--undelete-role] \
       [--enable-api]
   ```

#### Create a service account and role by using Google Cloud console

Alternatively, you may use the Google Cloud Console to create the prerequisite role, service account, and private key file for the integration to connect to Google BigQuery.

1. [Create a custom role using the console](https://cloud.google.com/iam/docs/creating-custom-roles#creating_a_custom_role) with the following privileges:
   * `bigquery.datasets.create`
   * `bigquery.datasets.delete`
   * `bigquery.datasets.get`
   * `bigquery.datasets.update`
   * `bigquery.jobs.create`
   * `bigquery.jobs.get`
   * `bigquery.jobs.list`
   * `bigquery.jobs.listAll`
   * `bigquery.routines.create`
   * `bigquery.routines.delete`
   * `bigquery.routines.get`
   * `bigquery.routines.list`
   * `bigquery.routines.update`
   * `bigquery.tables.create`
   * `bigquery.tables.delete`
   * `bigquery.tables.export`
   * `bigquery.tables.get`
   * `bigquery.tables.getData`
   * `bigquery.tables.list`
   * `bigquery.tables.setCategory`
   * `bigquery.tables.update`
   * `bigquery.tables.updateData`
   * `bigquery.tables.updateTag`
2. [Create a service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating) and grant it the custom role you just created.
3. [Enable the Google BigQuery API](https://cloud.google.com/endpoints/docs/openapi/enable-api#enabling_an_api).

### Enable the Google BigQuery integration

Once the Google Cloud IAM custom role and service account are created, you can enable the Google BigQuery integration. This section illustrates how to enable the integration on the Immuta app settings page. To configure this integration via the Immuta API, see the [Configure a Google BigQuery integration API guide](https://documentation.immuta.com/SaaS/developer-guides/api-intro/integrations-api/how-to-guides/bigquery-api).

1. In Immuta, navigate to the **App Settings** page and click the **Integrations** tab.
2. Click **Add Integration** and select **Google BigQuery** from the dropdown menu.
3. Click **Select Authentication Method** and select **Key File**.
4. Upload your **GCP Service Account Key File**. This is the private key file generated in [create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery](#google-cloud-service-account-and-role-used-by-immuta-to-connect-to-google-bigquery). Uploading this file will auto-populate the following fields:
   * **Project Id**: The Google Cloud Platform project to operate on, where your Google BigQuery data warehouse is located. A new dataset will be provisioned in this Google BigQuery project to store the integration configuration.
   * **Service Account**: The service account you created in [create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery](#google-cloud-service-account-and-role-used-by-immuta-to-connect-to-google-bigquery).
5. Complete the following fields:
   * **Immuta Dataset**: The name of the Google BigQuery dataset to provision inside of the project. Important: if you are using multiple environments in the same Google BigQuery project, this dataset to provision **must** be unique across environments.
   * **Immuta Role**: The custom role you created in [create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery](#google-cloud-service-account-and-role-used-by-immuta-to-connect-to-google-bigquery).
   * **Dataset Suffix**: The suffix that will be postfixed to the name of each dataset created to store secure views, one per dataset that you ingest a table for as a data source in Immuta. Important: if you are using multiple environments in the same Google BigQuery project, this suffix **must** be unique across environments.
   * **GCP Location**: The dataset’s location. After a dataset is created, the location can't be changed. Note that
     * If you choose EU for the dataset location, your Core BigQuery Customer Data resides in the EU.
6. Click **Test Google BigQuery Integration**.
7. Click **Save**.

{% hint style="warning" %}
**GCP location must match dataset region**

The region set for the GCP location must match the region of your datasets. Set GCP location to a general region (for example, `US`) to include child regions.
{% endhint %}

### Disable the Google BigQuery integration

You can disable the Google BigQuery integration automatically or manually.

#### Automatically disable integration

1. Click the **App Settings** icon, and then click the **Integrations** tab.
2. Select the Google BigQuery integration you would like to disable, and select the **Disable Integration** checkbox.
3. Click **Save**.

#### Manually disable integration

*The privileges required to run the cleanup script are the same as the Google Cloud IAM roles required to run the `bootstrap.sh` script.*

1. Click the **App Settings** icon, and then click the **Integrations** tab.
2. Select the Google BigQuery integration you would like to disable, and click **Download Scripts**.
3. Click **Save**. *Wait until Immuta has finished saving your configuration changes before proceeding.*
4. Before you run the script, update your permissions to execute it:

   ```bash
   chmod 755 <path to downloaded script>
   ```
5. Run the cleanup script.

## Next steps

* [Create Google BigQuery data sources](https://documentation.immuta.com/SaaS/configuration/integrations/data-and-integrations/registering-metadata/register-data-sources/bigquery-tutorial)
* Build [global subscription policies](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents/reference-guides/subscription-policies) and [data policies](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/data-policies/reference-guides/data-policies)
* [Create projects](https://documentation.immuta.com/SaaS/govern/secure-your-data/projects-and-purpose-based-access-control/purpose-index/reference-guides/projects) to securely collaborate on analytical workloads

[^1]: As opposed to scoped at the organization-level, to adhere to the principle of least privilege.
