1 of 100

2024.2

Self-Managed Deployment

This section illustrates how to install Immuta on Kubernetes using the Immuta Enterprise Helm chart.

This how-to guide includes instructions and links for installing Immuta in any Kubernetes environment.

Requirements

This reference guide provides an overview of the Immuta Enterprise Helm chart version requirements and infrastructure recommendations.

Install

- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)

The guides in this section illustrate how to configure your Immuta Enterprise Helm chart for various scenarios, including optimizing your deployment for production environments.

Upgrade

The guides in this section illustrate how to upgrade the Immuta Enterprise Helm chart:

This guide provides links to additional resources for disaster recovery strategies.

This page provides troubleshooting guidance and outlines frequently asked questions for the Immuta installation.

This guide outlines the updates and bug fixes to the Immuta Enterprise Helm chart.

Getting Started

Prerequisites and requirements

Use a supported version of Kubernetes.
Use Helm 3.2.0 or newer (When using a Helm version older than 3.8.0, enable OCI experimental mode by exporting environment variable HELM_EXPERIMENTAL_OCI=1.)

Pull the Helm chart

ocir.immuta.com

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your Immuta support professional:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Install Immuta

- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)

Configure Ingress

Additional recommendations

Deployment Requirements

Immuta comprises three core services (Secure, Discover, and Detect) that rely on PostgreSQL and Elasticsearch to store their states. The illustration below shows the relationships among these services.

The Immuta Enterprise Helm chart (IEHC) (represented by the yellow box above) does not deploy PostgreSQL or Elasticsearch, so you must deploy and manage them separately.

Although Immuta recommends using Elasticsearch because it supports several new Immuta features and services, you can deploy Immuta without Elasticsearch. The table below outlines the Immuta features supported with and without Elasticsearch and the dependencies you must deploy and manage yourself.

Immuta with Elasticsearch

Immuta without Elasticsearch

Dependencies

Immuta Detect

Audit of Immuta and data platform events

Legacy audit

Immuta Monitors

Sensitive data discovery

For guidance on how to configure the IEHC to deploy Immuta with or without Elasticsearch, see one of the guides below:

Version requirements

Kubernetes versions

Kubernetes 1.29 to 1.32

Metadata database (PostgreSQL)

PostgreSQL incompatibilities

Immuta is not compatible with PostgreSQL abstraction layers, such as Amazon Aurora.

PostgreSQL 15.0 or newer
The pgcrypto extension must be enabled

Elasticsearch

Elasticsearch v7 API or newer
OpenSearch compatible with Elasticsearch v7 API or newer

OpenSearch user

cluster:monitor/health
indices:data/write/bulk*
indices:data/write/bulk
indices:data/read/search
indices:admin/exists
indices:admin/create
indices:admin/delete
indices:admin/settings/update
indices:admin/get
indices:data/write/delete/byquery
indices:data/write/index
indices:admin/mapping/put
indices:data/write/bulk
indices:data/write/bulk*

Cache (Redis/Memcached)

Built-in cache

The IEHC manages its own Memcached deployment inside the cluster. The key-value cache can optionally be externalized post installation.

Redis 7.0 or newer
Memcached 1.6 or newer

Infrastructure recommendations

Kubernetes distribution

Ingress

External metadata database

External Elasticsearch

Amazon Elastic Kubernetes Service (EKS)

AWS Load Balancer Controller

Azure Kubernetes Service (AKS)

Azure Application Gateway Ingress Controller

Google Kubernetes Engine (GKE)

GKE Ingress Controller

Red Hat OpenShift

OpenShift Ingress Operator

SUSE Rancher Government (RKE2)

Ingress NGINX Controller

SUSE K3s - For evaluation purposes only

Traefik

Legacy features and services

Some legacy services and features are no longer enabled in the recommended configuration of the IEHC. The table below lists these features and provides links to documentation that outlines how to enable them in Immuta.

Feature

Immuta Enterprise Helm chart configuration

Legacy audit

Set each of the following secure.extraEnvVars in your immuta-values.yaml file to false:

FeatureFlag_AuditService
FeatureFlag_detect
FeatureFlag_auditLegacyViewHide

Legacy sensitive data discovery

Data platforms

Amazon Redshift
Azure Synapse Analytics
Google BigQuery

Policies

Masking with format preserving masking (unless using the Snowflake integration)
Masking with k-anonymization
Masking using randomized response (unless using the Snowflake integration)

Next step

Install

- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)

Managed Public Cloud

This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

Prerequisites

The following cloud-managed services must be provisioned before proceeding:

Validation

PostgreSQL

Elasticsearch

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

Setup

Create a Kubernetes namespace named immuta for Immuta.
Switch to namespace immuta.

PostgreSQL

Connecting to the database

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.
Create an immuta role and database.
Revoke privileges from CURRENT_USER as they're no longer required.
Enable the pgcrypto extension.
Type \q, and then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Create a Helm values file named immuta-values.yaml with the following content:
Deploy Immuta.

Validation

Wait for all pods in the namespace to become ready.
Determine the name of the Secure service.
Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

Next steps

Immuta in an Air-Gapped Environment

This page provides one possible way to download and package Immuta artifacts for consumption on a separate network with no Internet access.

Install Skopeo

Authenticate Skopeo to the Immuta registry

Copy the snippet below and replace the placeholder text with the credentials provided by your Immuta representative:

skopeo login https://ocir.immuta.com -u <username> -p <password>

Copy images from the Immuta registry

export IMMUTA_VERSION=2024.2.16
export IMMUTA_IMAGES="audit-service audit-export-cronjob cache classify-service immuta-service"
export IMMUTA_LEGACY_IMAGES="immuta-db immuta-fingerprint"
for image in ${IMMUTA_IMAGES} ${IMMUTA_LEGACY_IMAGES}; do
  skopeo copy docker://ocir.immuta.com/stable/${image}:${IMMUTA_VERSION} docker-archive://${PWD}/${image}-${IMMUTA_VERSION}.tar;
done

Pull the Immuta Enterprise Helm chart (IEHC)

Copy the snippet below and replace the placeholder text with the credentials provided by your Immuta representative:
```
echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
```

Download the IEHC for the current Immuta release:

helm pull oci://ocir.immuta.com/stable/immuta-enterprise --version 2024.2.16

Push images to the private registry

After transferring the Immuta container images and IEHC to your air-gapped network, load them into the container registry there after authenticating.

export PRIVATE_REGISTRY=your.private-registry.com
export IMMUTA_VERSION=2024.2.16
export IMMUTA_IMAGES="audit-service audit-export-cronjob cache classify-service immuta-service"
export IMMUTA_LEGACY_IMAGES="immuta-db immuta-fingerprint"
for image in ${IMMUTA_IMAGES} ${IMMUTA_LEGACY_IMAGES}; do
  skopeo copy docker-archive://${PWD}/${image}-${IMMUTA_VERSION}.tar docker://${PRIVATE_REGISTRY}/immuta/${image}:${IMMUTA_VERSION};
done

Install from IEHC tarball

Override the image registry in the Helm chart values overrides:

immuta-values.yaml

---
global:
  imageRegistry: your.private-registry.com
  imageRepositoryMap:
    immuta/immuta-service: immuta/immuta-service
    immuta/immuta-db: immuta/immuta-db
    immuta/immuta-fingerprint: immuta/immuta-fingerprint
    immuta/audit-service: immuta/audit-service
    immuta/audit-export-cronjob: immuta/audit-export-cronjob
    immuta/classify-service: immuta/classify-service
    immuta/cache: immuta/cache

The IEHC can be referenced via filename if there is no Helm chart repository on the destination network:

helm upgrade --install immuta ./immuta-enterprise-2024.2.16.tgz -f immuta-values.yaml

Ingress Configuration

Kubernetes namespace

The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

The Immuta web service listens on the following ports:

Port

Protocol

Description

Optional

443

TCP

HTTPS

False

TCP

HTTP (redirects to HTTPS)

True

Ingress hostname

This is the fully qualified domain name (FQDN) as defined by RFC 3986 used to access the Immuta UI. If a FQDN has yet to be determined set Secure's ingress hostname to immuta.local.

Edit the immuta-values.yaml file to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: nginx
    annotations:
      nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
      nginx.ingress.kubernetes.io/proxy-body-size: '64m'

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      # Determines which type of load balancer is provisioned
      #   gce-internal
      #   gce
      kubernetes.io/ingress.class: gce
      # Listen on both 80 and 443
      kubernetes.io/ingress.allow-http: 'true'
      # Redirect traffic from 80 to 443
      cloud.google.com/frontend-config: immuta

Create a file named frontendconfig.yaml with the following content.

apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
  name: immuta
spec:
  redirectToHttps:
    enabled: true
    responseCodeName: RESPONSE_CODE

Apply the FrontendConfig CRD.
```
kubectl apply -f frontendconfig.yaml
```

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: alb
    annotations:
      # Determines which type of load balancer is provisioned
      #   internal
      #   internet-facing
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      # Listen on both 80 and 443
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
      # Redirect traffic from 80 to 443
      alb.ingress.kubernetes.io/ssl-redirect: '443'

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: webapprouting.kubernetes.azure.com
    # https://azure.github.io/application-gateway-kubernetes-ingress/annotations/
    annotations:
      appgw.ingress.kubernetes.io/ssl-redirect: 'true'

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: traefik
    annotations:
      # Listen on ports 80 and 443
      traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
      # Redirect HTTP to HTTPS
      # When referencing middleware you must prefix the name with its namespace
      # <namespace>-<middleware-name>@kubernetescrd
      traefik.ingress.kubernetes.io/router.middlewares: immuta-https-redirectscheme@kubernetescrd

Create a file named middleware.yaml with the following content.

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: https-redirectscheme
spec:
  redirectScheme:
    scheme: https
    permanent: true

Apply the Middleware CRD.
```
kubectl apply -f middleware.yaml
```

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values. Because the Ingress resource will be managed by the OpenShift route you create and not the Immuta Enterprise Helm chart, ingress is set to false below.
```
secure:
  ingress:
    enabled: false
```

Get the service name for Secure.

oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: immuta
spec:
  host: <immuta-fqdn>
  to:
    kind: Service
    name: immuta-secure
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect

Apply the Route CRD.
```
oc apply -f route.yaml
```

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Cosign Verification

Cosign installation

Verify

Create a file named immuta-cosign.pub with the following content:
Verify artifact signature.

Frequently asked question

How can I list all container images referenced in the IEHC?

Yq installation

List all container images by rendering the chart templates locally.

TLS Configuration

Kubernetes namespace

The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

Prerequisite

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      nginx.ingress.kubernetes.io/auth-tls-secret: <namespace>/<secret-name>

kubectl create secret tls <secret-name> --cert=path/to/tls.cert --key=path/to/tls.key

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      ingress.gcp.kubernetes.io/pre-shared-cert: <certificate-name>

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      alb.ingress.kubernetes.io/certificate-arn: <certificate-arn>

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      appgw.ingress.kubernetes.io/appgw-ssl-certificate: <certificate-name>

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    annotations:
      traefik.ingress.kubernetes.io/router.tls: "true"
    hostname: <immuta-fqdn>
    tls: true
    # If left unset the TLS secret name defaults to <hostname>-tls
    secretName: <secret-name>

kubectl create secret tls <secret-name> --cert=path/to/tls.cert --key=path/to/tls.key

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

How-to Guides

Integration Settings

Reference Guides

How-to Guides

Reference Guides

Configuration Settings

Cluster Policies

How-to Guides

Reference Guides

Deployment Requirements

The Immuta Enterprise Helm chart (IEHC) (represented by the yellow box above) does not deploy PostgreSQL or Elasticsearch, so you must deploy and manage them separately.

Immuta with Elasticsearch

Immuta without Elasticsearch

Dependencies

Immuta Detect

Audit of Immuta and data platform events

Legacy audit

()

(Until October 2024)

Immuta Monitors

Sensitive data discovery

For guidance on how to configure the IEHC to deploy Immuta with or without Elasticsearch, see one of the guides below:

For more information about legacy features and services no longer enabled in the recommended deployment of Immuta, see the .

Version requirements

Kubernetes versions

Kubernetes 1.29 to 1.32

Metadata database (PostgreSQL)

PostgreSQL incompatibilities

Immuta is not compatible with PostgreSQL abstraction layers, such as Amazon Aurora.

PostgreSQL 15.0 or newer
The pgcrypto extension must be enabled

Elasticsearch

Elasticsearch v7 API or newer
OpenSearch compatible with Elasticsearch v7 API or newer

OpenSearch user

The user provided during the install must have the following :

cluster:monitor/health
indices:data/write/bulk*
indices:data/write/bulk
indices:data/read/search
indices:admin/exists
indices:admin/create
indices:admin/delete
indices:admin/settings/update
indices:admin/get
indices:data/write/delete/byquery
indices:data/write/index
indices:admin/mapping/put
indices:data/write/bulk
indices:data/write/bulk*

Follow OpenSearch documentation to and add permissions, or see the .

Cache (Redis/Memcached)

Built-in cache

The IEHC manages its own Memcached deployment inside the cluster. The key-value cache can optionally be externalized post installation.

Redis 7.0 or newer
Memcached 1.6 or newer

Infrastructure recommendations

Kubernetes distribution

Ingress

External metadata database

External Elasticsearch

Amazon Elastic Kubernetes Service (EKS)

AWS Load Balancer Controller

Azure Kubernetes Service (AKS)

Azure Application Gateway Ingress Controller

Google Kubernetes Engine (GKE)

GKE Ingress Controller

Red Hat OpenShift

OpenShift Ingress Operator

SUSE Rancher Government (RKE2)

Ingress NGINX Controller

SUSE K3s - For evaluation purposes only

Traefik

Legacy features and services

Feature

Immuta Enterprise Helm chart configuration

Legacy audit

Set each of the following secure.extraEnvVars in your immuta-values.yaml file to false:

FeatureFlag_AuditService
FeatureFlag_detect
FeatureFlag_auditLegacyViewHide

Legacy sensitive data discovery

Data platforms

Amazon Redshift
Azure Synapse Analytics
Google BigQuery

Policies

Masking with format preserving masking (unless using the Snowflake integration)
Masking with k-anonymization
Masking using randomized response (unless using the Snowflake integration)

Next step

Follow the to install Immuta.

Managed Public Cloud

This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

Prerequisites

The following cloud-managed services must be provisioned before proceeding:

Validation

PostgreSQL

The PostgreSQL instance's hostname/FQDN is .
The PostgreSQL instance is .

Elasticsearch

The Elasticsearch instance's hostname/FQDN is .
The Elasticsearch instance is .
The user must have the .

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Setup

Create a Kubernetes namespace named immuta for Immuta.
```
kubectl create namespace immuta
```

Switch to namespace immuta.

kubectl config set-context --current --namespace=immuta

Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting to the database

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.
```
kubectl run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create an immuta role and database.

CREATE ROLE immuta with login encrypted password '<postgres-password>';

GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;

Revoke privileges from CURRENT_USER as they're no longer required.
```
REVOKE immuta FROM CURRENT_USER;
```
Enable the pgcrypto extension.
```
\c immuta
CREATE EXTENSION pgcrypto;
```
Type \q, and then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Create a Helm values file named immuta-values.yaml with the following content:

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  imageRepositoryMap:
    immuta/immuta-service: stable/immuta-service
    immuta/immuta-db: stable/immuta-db
    immuta/immuta-fingerprint: stable/immuta-fingerprint
    immuta/audit-service: stable/audit-service
    immuta/audit-export-cronjob: stable/audit-export-cronjob
    immuta/classify-service: stable/classify-service
    immuta/cache: stable/cache

audit:
  config:
    databaseConnectionString: postgres://immuta:<postgres-password>@<postgres-fqdn>:5432/immuta?schema=audit
    elasticsearchEndpoint: <elasticsearch-endpoint>
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>

secure:
  ingress:
    enabled: false
    tls: false
  extraEnvVars:
    - name: FeatureFlag_AuditService
      value: "true"
    - name: FeatureFlag_detect
      value: "true"
    - name: FeatureFlag_auditLegacyViewHide
      value: "true"

  postgresql:
    host: <postgres-fqdn>
    port: 5432
    database: immuta
    username: immuta
    password: <postgres-password>
    ssl: true

Update all in the immuta-values.yaml file.

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.2.16

Validation

Wait for all pods in the namespace to become ready.
```
kubectl wait --for=condition=Ready pods --all
```

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward service/<name> 8080:http
```

Next steps

to complete your installation and access your Immuta application.
to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.
.

to complete your installation and access your Immuta application.
to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.
.

to complete your installation and access your Immuta application.
to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.
.

Ingress Configuration

This guide demonstrates how to configure . Ingress can be configured in numerous ways. Configurations for the most popular controllers are outlined below.

Kubernetes namespace

The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

The Immuta web service listens on the following ports:

Port

Protocol

Description

Optional

443

TCP

HTTPS

False

TCP

HTTP (redirects to HTTPS)

True

Ingress hostname

This is the fully qualified domain name (FQDN) as defined by RFC 3986 used to access the Immuta UI. If a FQDN has yet to be determined set Secure's ingress hostname to immuta.local.

Edit the immuta-values.yaml file to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: nginx
    annotations:
      nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
      nginx.ingress.kubernetes.io/proxy-body-size: '64m'

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    annotations:
      # Determines which type of load balancer is provisioned
      #   gce-internal
      #   gce
      kubernetes.io/ingress.class: gce
      # Listen on both 80 and 443
      kubernetes.io/ingress.allow-http: 'true'
      # Redirect traffic from 80 to 443
      cloud.google.com/frontend-config: immuta

Create a file named frontendconfig.yaml with the following content.

apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
  name: immuta
spec:
  redirectToHttps:
    enabled: true
    responseCodeName: RESPONSE_CODE

Apply the FrontendConfig CRD.
```
kubectl apply -f frontendconfig.yaml
```

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: alb
    annotations:
      # Determines which type of load balancer is provisioned
      #   internal
      #   internet-facing
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      # Listen on both 80 and 443
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
      # Redirect traffic from 80 to 443
      alb.ingress.kubernetes.io/ssl-redirect: '443'

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: webapprouting.kubernetes.azure.com
    # https://azure.github.io/application-gateway-kubernetes-ingress/annotations/
    annotations:
      appgw.ingress.kubernetes.io/ssl-redirect: 'true'

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Edit immuta-values.yaml to include the following Helm values.

secure:
  ingress:
    hostname: <immuta-fqdn>
    ingressClassName: traefik
    annotations:
      # Listen on ports 80 and 443
      traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
      # Redirect HTTP to HTTPS
      # When referencing middleware you must prefix the name with its namespace
      # <namespace>-<middleware-name>@kubernetescrd
      traefik.ingress.kubernetes.io/router.middlewares: immuta-https-redirectscheme@kubernetescrd

Create a file named middleware.yaml with the following content.

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: https-redirectscheme
spec:
  redirectScheme:
    scheme: https
    permanent: true

Apply the Middleware CRD.
```
kubectl apply -f middleware.yaml
```

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Edit immuta-values.yaml to include the following Helm values. Because the Ingress resource will be managed by the OpenShift route you create and not the Immuta Enterprise Helm chart, ingress is set to false below.
```
secure:
  ingress:
    enabled: false
```

Get the service name for Secure.

oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Create a file named route.yaml with the following content. Update all with your own values.

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: immuta
spec:
  host: <immuta-fqdn>
  to:
    kind: Service
    name: immuta-secure
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect

Apply the Route CRD.
```
oc apply -f route.yaml
```

Perform a to apply the changes made to immuta-values.yaml.

helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.16

Refer to the for further assistance.

Red Hat OpenShift

This is an OpenShift-specific guide on how to deploy Immuta with the following managed services:

Cloud-managed PostgreSQL
Cloud-managed Redis
Cloud-managed Elasticsearch

Prerequisites

Review the following criteria before proceeding with deploying Immuta.

PostgreSQL

The PostgreSQL instance has been provisioned and is actively running.

Redis

The Redis instance has been provisioned and is actively running.

Elasticsearch

The Elasticsearch instance has been provisioned and is actively running.

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Setup

Create a new OpenShift project named immuta for Immuta.
```
oc new-project immuta
```
Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
```
Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
```
Switch to project immuta.
```
oc project immuta
```

oc create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

Cloud-managed PostgreSQL

Connecting to the database

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the oc run command outlined below. Wait 5 seconds, and then proceed by entering a password.
```
oc run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create an immuta role and database.

CREATE ROLE immuta with login encrypted password '<postgres-password>';

GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;

Revoke privileges from CURRENT_USER as they're no longer required.
```
REVOKE immuta FROM CURRENT_USER;
```
Enable the pgcrypto extension.
```
\c immuta
CREATE EXTENSION pgcrypto;
```
Type \q, and then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  imageRepositoryMap:
    immuta/immuta-service: stable/immuta-service
    immuta/immuta-db: stable/immuta-db
    immuta/immuta-fingerprint: stable/immuta-fingerprint
    immuta/audit-service: stable/audit-service
    immuta/audit-export-cronjob: stable/audit-export-cronjob
    immuta/classify-service: stable/classify-service
    immuta/cache: stable/cache

audit:
  config:
    databaseConnectionString: postgres://immuta:<postgres-password>@pg-db-postgresql.immuta.svc.cluster.local:5432/immuta?schema=audit
    elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>

  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  extraEnvVars:
    - name: FeatureFlag_AuditService
      value: "true"
    - name: FeatureFlag_detect
      value: "true"
    - name: FeatureFlag_auditLegacyViewHide
      value: "true"

  ingress:
    enabled: false
    tls: false

  postgresql:
    host: <postgres-fqdn>
    port: 5432
    database: immuta
    username: immuta
    password: <postgres-password>
    ssl: false

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.2.16

Validation

Wait for all pods in the namespace to become ready.
```
oc wait --for=condition=Ready pods --all
```

Determine the name of the Secure service.

oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
oc port-forward service/<name> 8080:http
```

Next steps

Upgrade to Immuta 2024.2 LTS

This guide demonstrates how to upgrade an existing Immuta deployment installed with the Immuta Helm chart (IHC) to the latest LTS release using the newer Immuta Enterprise Helm chart (IEHC).

Helm chart deprecation notice

As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. The immuta-values.yaml Helm values files are not cross-compatible.

Prerequisites

Create a PostgreSQL database

The PostgreSQL instance has been provisioned and is actively running.

For additional information, consult the Deployment requirements.

Validate the Helm release

Fetch the metadata for the Helm release associated with Immuta.
Review the output from the previous step and verify the following:
- The Immuta version (appVersion) is
  - The last LTS (2022.5.x) or 2024.1 or newer
  - Less than 2024.2
- The Immuta Helm chart (version) is greater than or equal to 4.13.5
- The Immuta Helm chart name (chart) is immuta
If any of the criteria is not met, it's first necessary to perform a Helm upgrade using the IHC. Contact your Immuta representative for guidance.

Metadata database

The new IEHC no longer supports deploying a Metadata database (PostgreSQL) inside the Kubernetes cluster. Before transitioning to the new IEHC, it's first necessary to externalize the Metadata database.

Built-in

The following demonstrates how to take a database backup and import the data into each cloud provider's managed PostgreSQL service.

Create backup of old database

Get the metadata database pod name.
Spawn a shell inside the running metadata database pod.
Perform a database backup.
Type exit, and then press Enter to exit the shell prompt.
Copy file bometadata.dump from the pod to the host's working directory.

Setup new database

Create a pod named immuta-setup-db and spawn a shell.
Connect to the new PostgreSQL database as a superuser. Depending on the cloud provider, the default superuser name (postgres) might differ.
Create an immuta role and database.
Type \q, and then press Enter to exit the psql prompt.
Authenticate as the immuta user and create the pgcrypto extension.
Type \q, and then press Enter to exit the psql prompt.

Restore backup to new database

Create a pod named immuta-restore-db and spawn a shell.
Copy file bometadata.dump from the host's working directory to pod immuta-restore-db.
Spawn a shell inside pod immuta-restore-db.
Perform a database restore while authenticated as role immuta. Refer to the value substituted for <postgres-password> when prompted to enter a password.
Type exit, and then press Enter to exit the shell prompt.
Delete pod immuta-restore-db that was previously created.

External

No additional work is required. The existing database can be reused with the new IEHC.

Helm values

Helm values file compatibility

The immuta-values.yaml Helm values file used by the IHC is not compatible with the new IEHC.

Rename the existing immuta-values.yaml Helm values file used by the IHC.
Legacy audit records: If you want to be able to view audit records from before the 2024.2 upgrade, set FeatureFlag_auditLegacyViewHide to false in your Helm values file.
- - Amazon Elastic Kubernetes Service (EKS)
  - Google Kubernetes Engine (GKE)
  - Microsoft Azure Kubernetes Service (AKS)

This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

Prerequisites

The following cloud-managed services must be provisioned before proceeding:

Amazon Web Services (AWS):
Microsoft Azure:
Google Cloud Platform (GCP):

Validation

The PostgreSQL instance's hostname/FQDN is .
The PostgreSQL instance is .

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Setup

Create a Kubernetes namespace named immuta for Immuta.
```
kubectl create namespace immuta
```

Switch to namespace immuta.

kubectl config set-context --current --namespace=immuta

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting to the database

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.
```
kubectl run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create an immuta role and database.

CREATE ROLE immuta with login encrypted password '<postgres-password>';

GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;

Revoke privileges from CURRENT_USER as they're no longer required.
```
REVOKE immuta FROM CURRENT_USER;
```
Enable the pgcrypto extension.
```
\c immuta
CREATE EXTENSION pgcrypto;
```
Type \q, and then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Create a Helm values file named immuta-values.yaml with the following content:

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  imageRepositoryMap:
    immuta/immuta-service: stable/immuta-service
    immuta/immuta-db: stable/immuta-db
    immuta/immuta-fingerprint: stable/immuta-fingerprint
    immuta/audit-service: stable/audit-service
    immuta/audit-export-cronjob: stable/audit-export-cronjob
    immuta/classify-service: stable/classify-service
    immuta/cache: stable/cache

audit:
  enabled: false

secure:
  ingress:
    enabled: false
    tls: false
  extraEnvVars:
    - name: FeatureFlag_AuditService
      value: "false"
    - name: FeatureFlag_detect
      value: "false"
    - name: FeatureFlag_auditLegacyViewHide
      value: "false"

  postgresql:
    host: <postgres-fqdn>
    port: 5432
    database: immuta
    username: immuta
    password: <postgres-password>
    ssl: true

Deploy Immuta.

helm install immuta immuta/immuta-enterprise \
    --values immuta-values.yaml

Validation

Wait for all pods in the namespace to become ready.
```
kubectl wait --for=condition=Ready pods --all
```

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward service/<name> 8080:http
```

Next steps

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

This is an OpenShift-specific guide on how to deploy Immuta with the following managed services:

Cloud-managed PostgreSQL
Cloud-managed Redis

Prerequisites

Review the following criteria before proceeding with deploying Immuta.

PostgreSQL

The PostgreSQL instance has been provisioned and is actively running.

Redis

The Redis instance has been provisioned and is actively running.

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Setup

Create a new OpenShift project named immuta for Immuta.
```
oc new-project immuta
```
Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
```
Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
```
Switch to project immuta.
```
oc project immuta
```

oc create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

Cloud-managed PostgreSQL

Connecting to the database

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the oc run command outlined below. Wait 5 seconds, and then proceed by entering a password.
```
oc run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create an immuta role and database.

CREATE ROLE immuta with login encrypted password '<postgres-password>';

GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;

Revoke privileges from CURRENT_USER as they're no longer required.
```
REVOKE immuta FROM CURRENT_USER;
```
Enable the pgcrypto extension.
```
\c immuta
CREATE EXTENSION pgcrypto;
```
Type \q, and then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  imageRepositoryMap:
    immuta/immuta-service: stable/immuta-service
    immuta/immuta-db: stable/immuta-db
    immuta/immuta-fingerprint: stable/immuta-fingerprint
    immuta/audit-service: stable/audit-service
    immuta/audit-export-cronjob: stable/audit-export-cronjob
    immuta/classify-service: stable/classify-service
    immuta/cache: stable/cache

audit:
  enabled: false

  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  extraEnvVars:
    - name: FeatureFlag_AuditService
      value: "false"
    - name: FeatureFlag_detect
      value: "false"
    - name: FeatureFlag_auditLegacyViewHide
      value: "false"

  ingress:
    enabled: false
    tls: false

  postgresql:
    host: <postgres-fqdn>
    port: 5432
    database: immuta
    username: immuta
    password: <postgres-password>
    ssl: true

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

Deploy Immuta.

helm install immuta immuta/immuta-enterprise \
    --values immuta-values.yaml

Validation

Wait for all pods in the namespace to become ready.
```
oc wait --for=condition=Ready pods --all
```

Determine the name of the Secure service.

oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
oc port-forward service/<name> 8080:http
```

Next steps

This is a generic guide that demonstrates how to deploy Immuta into any Kubernetes cluster without dependencies on any particular cloud provider.

Considerations

Running production-grade stateful workloads (e.g, databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

Operational overhead: Managing PostgreSQL on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.
Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL has sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.
Data integrity and high availability: PostgreSQL deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.
Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.
Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.
Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

Authenticate with OCI registry

Helm chart availability

The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Setup

Create a Kubernetes namespace named immuta for Immuta and its third-party dependencies.
```
kubectl create namespace immuta
```

Switch to namespace immuta.

kubectl config set-context --current --namespace=immuta

oc create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Create a Helm values file named pg-values.yaml with the following content:

auth:
    database: immuta
    username: immuta
    password: <postgres-password>

Deploy PostgreSQL.

helm install pg-db oci://registry-1.docker.io/bitnamicharts/postgresql \
    --values pg-values.yaml

Wait for all pods in the namespace to become ready.
```
kubectl wait --for=condition=Ready pods --all
```

Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.

kubectl get pod --selector "app.kubernetes.io/name=postgresql" --output template='{{ .metadata.name }}'

Exec into the PostgreSQL database pod using the psql command and immuta user to configure the PostgreSQL user used by Immuta.
```
kubectl exec --stdin --tty pod/<database-pod-name> -- psql -U immuta
```

Alter the search_path for the immuta user.

ALTER ROLE immuta SET search_path TO bometadata,public;

Enable the pgcrypto extension.
```
CREATE EXTENSION pgcrypto;
```
Type \q then press Enter to exit.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite local services are configured.

Create a Helm values file named immuta-values.yaml with the following content:

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  imageRepositoryMap:
    immuta/immuta-service: stable/immuta-service
    immuta/immuta-db: stable/immuta-db
    immuta/immuta-fingerprint: stable/immuta-fingerprint
    immuta/audit-service: stable/audit-service
    immuta/audit-export-cronjob: stable/audit-export-cronjob
    immuta/classify-service: stable/classify-service
    immuta/cache: stable/cache

audit:
  enabled: false

secure:
  ingress:
    enabled: false
  extraEnvVars:
    - name: FeatureFlag_AuditService
      value: "false"
    - name: FeatureFlag_detect
      value: "false"
    - name: FeatureFlag_auditLegacyViewHide
      value: "false"

  postgresql:
    # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
    # The anatomy of a domain name is as follows:
    #   <service>.<namespace>.svc.<cluster-domain>
    #
    # Where the default cluster domain is: cluster.local
    host: pg-db-postgresql.immuta.svc.cluster.local
    port: 5432
    database: immuta
    username: immuta
    password: <postgres-password>
    ssl: true

Deploy Immuta.

helm install immuta immuta/immuta-enterprise \
    --values immuta-values.yaml

Validation

Wait for all pods in the namespace to become ready.
```
kubectl wait --for=condition=Ready pods --all
```

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward service/<name> 8080:http
```
Navigate to http://localhost:8080 in a web browser.

Next steps

Configure a Snowflake Integration

Permissions

APPLICATION_ADMIN Immuta permission
The Snowflake user must have the following privileges:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
- USAGE on all databases and schemas with registered data sources
- REFERENCES on all tables and views registered in Immuta

Different accounts

The setup account used to enable the integration must be different from the account used to register data sources in Immuta.

Configure the integration

Snowflake resource names: Use uppercase for the names of the Snowflake resources you create below.

Click the App Settings icon in the navigation panel.
Click the Integrations tab.
Click the +Add Integration button and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Impersonation box and customize the Impersonation Role to allow users to natively impersonate another user. You cannot edit this choice after you configure the integration.
2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
3. Continue with your integration configuration.

Select your configuration method

You have two options for configuring your Snowflake environment:

Automatic setup

The setup will use the provided credentials to create a user called IMMUTA_SYSTEM_ACCOUNT and grant the following privileges to that user:

APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
Additional grants associated with the IMMUTA database

These credentials will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

You can create a new account for Immuta to use that has these privileges, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate privileges is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.

From the Select Authentication Method Dropdown, select one of the following authentication methods:

2. When using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
3. Click Key Pair (Required), and upload a Snowflake private key pair file.
4. Complete the Role field.

Manual setup

It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:

APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
Additional grants associated with the IMMUTA database

Run the script

Select Manual.
Use the Dropdown Menu to select your Authentication Method:
- Snowflake External OAuth:
  2. Fill out the Token Endpoint. This is where the generated token is sent.
  3. Fill out the Client ID. This is the subject of the generated token.
  4. Select the method Immuta will use to obtain an access token:
    Certificate
    Keep the Use Certificate checkbox enabled.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as `x5t` or is called `sub` (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
    Client secret
    Uncheck the Use Certificate checkbox.
    Enter the Client Secret (string). Immuta uses this secret to authenticate with the authorization server when it requests a token.
In the Setup section, click bootstrap script to download the script. Then, fill out the appropriate fields and run the bootstrap script in Snowflake.

Select available warehouses (optional)

If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.

Select excepted roles and users

Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma. Wildcards are unsupported.

Excepted roles/users will have no policies applied to queries

Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).

Save the configuration

Click Save.

Opt to enable Snowflake tag ingestion

Register data

Snowflake Lineage Tag Propagation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can

trace data back to its source to validate the integrity of dashboards and reports,
identify who performed write operations to meet compliance requirements,
evaluate data quality and pinpoint points of failure, and
tag sensitive data on source tables without having tag columns on their descendant tables.

However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.

Data flow

An application administrator enables the feature on the Immuta app settings page.
Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.
A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.
A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
An audit record is created that includes which tags were applied and from which columns those tags originated.

Snowflake access history view and Immuta lineage job

The Snowflake Account Usage ACCESS_HISTORY view contains column lineage information.

To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.

Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

If the Discovered.Electronic Mail Address tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.

Data source registration

After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.

By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.

Managing tags

Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

Immuta added the Discovered.Electronic Mail Address tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.

Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.

However the Discovered.Electronic Mail Address tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.

If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.

Sensitive data discovery

Snowflake lineage audit

Immuta audit records include Snowflake lineage tag events when a tag is added or removed.

The example audit record below illustrates the SNOWFLAKE_TAGS.pii tag successfully propagating from the Customer table to Customer 2:

{
  "id": "c8e020cb-232c-4ba9-a0d8-f3a84ba6808d",
  "dateTime": "1670355170336",
  "month": 1475,
  "profileId": 1,
  "userId": "immuta_system_account",
  "dataSourceId": 2,
  "dataSourceName": "Customer 2",
  "count": 1,
  "recordType": "nativeLineageDataSourceTagUpdate",
  "success": true,
  "component": "dataSource",
  "extra": {
    "sourceColumn": {
      "nativeColumnName": "\"MY_DATABASE\".\"PUBLIC\".\"CUSTOMER\".\"C_FIRST_NAME\"",
      "dataSourceId": 1,
      "columnName": "c_first_name"
    },
    "dataSourceId": 2,
    "columnName": "c_first_name",
    "tagPropagationDirection": "downstream",
    "tags": [
      {
        "name": "SNOWFLAKE_TAGS.pii",
        "source": "immuta-us-east-1"
      }
    ]
  },
  "newAuditServiceFields": {
    "actorIp": null,
    "sessionId": null
  },
  "createdAt": "2022-12-06T19:32:50.372Z",
  "updatedAt": "2022-12-06T19:32:50.372Z"
}

Limitations

Without tableFilter set, Immuta will ingest lineage for every table on the Snowflake instance.
Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.
The lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.
There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY view.
Immuta does not ingest lineage information for views.
Snowflake only captures lineage events for CTAS, CLONE, MERGE, and INSERT write operations. Snowflake does not capture lineage events for DROP, RENAME, ADD, or SWAP. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.
Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.

Snowflake Integration

Snowflake Enterprise Edition required

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

How the integration works

Data flow

Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Policy enforcement

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.

After a user has met these qualifications they can query Snowflake tables directly.

Comply with column length and precision requirements in a Snowflake masking policy

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

5w4502

REDAC

null

990

6e3611

REDAC

null

750

9s7934

REDAC

null

380

Hashing collisions

Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

Query performance

Snowflake privileges

The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the , the user, or the . The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.

Snowflake privilege

User requiring privilege

Features

Explanation

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates an Immuta database in your organization's Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.

CREATE ROLE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.

CREATE USER ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.

MANAGE GRANTS ON ACCOUNT

Setup user

All

The user configuring the integration must be able to GRANT global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT user by this setup user.

OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE

IMMUTA_SYSTEM_ACCOUNT user

Impersonation

If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.

ALL PRIVILEGES ON DATABASE IMMUTA_DB

ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

IMMUTA_SYSTEM_ACCOUNT user

All

The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.

USAGE ON WAREHOUSE IMMUTA_WH

IMMUTA_SYSTEM_ACCOUNT user

All

To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

IMMUTA_SYSTEM_ACCOUNT user

Audit

APPLY MASKING POLICY ON ACCOUNT

APPLY ROW ACCESS POLICY ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Snowflake integration with governance features enabled

MANAGE GRANTS ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

Immuta must be able to MANAGE GRANTS on objects throughout your organization's Snowflake account.

CREATE ROLE ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in your organization’s Snowflake account.

USAGE on all databases and schemas with registered data sources

REFERENCES on all tables and views registered in Immuta

Metadata registration user

Data source registration

Immuta must be able to see metadata on securables to register them as data sources and populate the data dictionary.

SELECT on all tables and views registered in Immuta

Metadata registration user

Sensitive data discovery and specialized masking policies that require fingerprinting

APPLY TAG ON ACCOUNT

Metadata registration user

Tag ingestion

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

Metadata registration user

Tag ingestion

USAGE ON DATABASE IMMUTA_DB

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE

PUBLIC role

All

Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST

PUBLIC role

All

Immuta retains a list of excepted roles and users when using the Snowflake integration. The roles and users in this list will be exempt from policies applied to tables in Snowflake to give organizations flexibility in case there are entities that should not be bound to Immuta policies in Snowflake (for example, a system or application role or user).

Integration health status

Registering data sources

Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that role, ensuring that your integration works with the following use cases:

Snowflake bulk data source creation

Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

Resource allocations

Based on performance tests that create 100,000 data sources, the following minimum resource allocations need to be applied to the appropriate pods in your Kubernetes environment for successful bulk data source creation.

Web

Database

Memory

4Gi

16Gi

CPU

Storage

8Gi

24Gi

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Authentication methods

The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.

Snowflake External OAuth

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake feature

Supported Immuta features

The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Caveat

To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

Tag ingestion

You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

Caveats

Query audit

Multiple Snowflake instances

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

Requirements for a custom WHERE policy

All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see

Databricks Unity Catalog Integration Reference Guide

Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function

Feature support

The Databricks Unity Catalog integration supports

- applying column masks and row filters on specific securable objects
- applying subscription polices on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
allowing non-Immuta reads and writes
using Photon
using a proxy server

Architecture

Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

immuta_system: Contains internal Immuta data.
immuta_policies_n: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

The Unity Catalog integration supports the following policy types:

- Conditional masking
- Constant
- Custom masking
- Hashing
- Null
- Rounding (date and numeric rounding)
- Matching (only show rows where)
  - Custom WHERE
  - Never
  - Where user
  - Where value in column
- Minimization
- Time-based restrictions

Project-scoped purpose exceptions for Databricks Unity Catalog

Public preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.

Databricks Unity Catalog views

If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

Policy exemption groups

Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.

The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.

Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

Authentication methods

The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

Immuta data sources in Unity Catalog

External data connectors and query-federated tables

Query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system tables:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

Configuration requirements

Supported Databricks cluster configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

Example cluster

Databricks Runtime

Unity Catalog in Databricks

Databricks Spark integration

Databricks Unity Catalog integration

Cluster 1

9.1

Unavailable

Cluster 2

10.4

Unavailable

Cluster 3

11.3

Unavailable

Cluster 4

11.3

Cluster 5

11.3

Legend:

Unity Catalog caveats

Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
- regex with a global flag (supported): /^ssn|social ?security$/g
- regex without a global flag (unsupported): /^ssn|social ?security$/
- regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
- regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

Databricks change data feed support
Multiple IAMs on a single cluster
Column masking policies on views
Mixing masking policies on the same column
Row-redaction policies on views
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies on ARRAY, MAP, or STRUCT type columns
Shallow clones

Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

Configuration

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Prerequisites

Databricks instance has network level access to Immuta tenant
Permissions and access to download (outside Internet access) or transfer files to the host machine

Recommended Databricks Workspace Configurations:

Supported Databricks Runtime Versions

Use the table below to determine which version of Immuta supports your Databricks Runtime version:

Databricks Runtime Version

Immuta Version

11.3 LTS

2023.1 and newer

10.4 LTS

2022.2.x and newer

7.3 LTS 9.1 LTS

2021.5.x and newer

Supported Databricks Cluster Configurations

Example cluster

Databricks Runtime

Unity Catalog in Databricks

Databricks Spark integration

Databricks Unity Catalog integration

Cluster 1

9.1

Unavailable

Cluster 2

10.4

Unavailable

Cluster 3

11.3

Unavailable

Cluster 4

11.3

Cluster 5

11.3

Legend:

Supported Access Mode and Languages

Immuta supports the Custom access mode.

Supported Languages:
- Python
- SQL
- R (requires advanced configuration; work with your Immuta support professional to use R)
- Scala (requires advanced configuration; work with your Immuta support professional to use Scala)

Databricks Installation Overview

Users who can read raw tables on-cluster

If a Databricks Admin is tied to an Immuta account, they will have the ability to read raw tables on-cluster.
If a Databricks user is listed as an "ignored" user, they will have the ability to read raw tables on-cluster. Users can be added to the immuta.spark.acl.whitelist configuration to become ignored users.

The Immuta Databricks integration injects an Immuta plugin into the SparkSQL stack at cluster startup. The Immuta plugin creates an "immuta" database that is available for querying and intercepts all queries executed against it. For these queries, policy determinations will be obtained from the connected Immuta tenant and applied before returning the results to the user.

The Databricks cluster init script provided by Immuta downloads the Immuta artifacts onto the target cluster and puts them in the appropriate locations on local disk for use by Spark. Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for policy enforcement.

The cluster init script uses environment variables in order to

Determine the location of the required artifacts for downloading.
Authenticate with the service/storage containing the artifacts.

Note: Each target system/storage layer (HTTPS, for example) can only have one set of environment variables, so the cluster init script assumes that any artifact retrieved from that system uses the same environment variables.

Limitations

Installation Methods

There are two installation options for Databricks. Click a link below to navigate to a tutorial for your chosen method:

1. Adding the integration on the App Settings page.
2. Downloading or automatically pushing cluster policies to your Databricks workspace.
3. Creating or restarting your cluster.
1. Downloading and configuring Immuta artifacts.
2. Staging Immuta artifacts somewhere the cluster can read from during its startup procedures.
3. Protecting Immuta environment variables with Databricks Secrets.
4. Creating and configuring the cluster to start with the init script and load Immuta into its SparkSQL environment.

Debugging Immuta Installation Issues

For easier debugging of the Immuta Databricks installation, enable cluster init script logging. In the cluster page in Databricks for the target cluster, under Advanced Options -> Logging, change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.

For debugging issues between the Immuta web service and Databricks, you can view the Spark UI on your target Databricks cluster. On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

Using the Validation and Debugging Notebook

The Validation and Debugging Notebook (immuta-validation.ipynb) is packaged with other Databricks release artifacts (for manual installations), or it can be downloaded from the App Settings page when configuring Databricks through the Immuta UI. This notebook is designed to be used by or under the guidance of an Immuta Support Professional.

Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.
Click the arrow next to your name and select Import.
Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

Configure a Databricks Unity Catalog Integration

Permissions

The following permissions and personas are used in the registration process.

Immuta user: An Immuta user with the APPLICATION_ADMIN Immuta permission must configure the Databricks Unity Catalog integration.
Databricks user: The Databricks user must have the following privileges.
- Account admin
- CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
- (only required if enabling query audit)
:
- USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources and USE SCHEMA on all schemas containing securables registered as Immuta data sources.
- MODIFY and SELECT on all securables registered as Immuta data sources. MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
- Optionally, to include audit, the service principal needs the following additional privileges:
  - USE CATALOG on system catalog
    USE SCHEMA on system.access schema
    SELECT on system.access.audit table
    SELECT on system.access.table_lineage table
    SELECT on system.access.column_lineage table

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
If you select single user access mode for your cluster, you must
- enable serverless compute for your workspace.

Unity Catalog best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

Configure the Databricks Unity Catalog integration

You have two options for configuring your Databricks Unity Catalog integration:

Automatic setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.

Manual setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.

Enable query audit for Unity Catalog

To enable query audit for Unity Catalog, complete the following steps before configuring the integration:

- USE CATALOG on the system catalog
- USE SCHEMA on the system.access schema
- SELECT on the following system tables:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage
Use the Databricks Personal Access Token in the configuration above for the account you just granted system table access. This account will be the Immuta service principal.

Map Databricks users to Immuta

If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

Register data

Manual Databricks Configuration

Databricks Unity Catalog: If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you setup the integration to create an Immuta-enabled cluster.

The immuta_conf.xml file is no longer required

The immuta_conf.xml file that was previously used to configure the Databricks integration is no longer required to install Immuta, so it is no longer staged as a deployment artifact. However, you can use these snippets if you wish to deploy an immuta_conf.xml file to set properties.

The required Immuta base URL and Immuta system API key properties, along with any other valid properties, can still be specified as Spark environment variables or in the optional immuta_conf.xml file. As before, if the same property is specified in both locations, the Spark environment variable takes precedence.

If you have an existing immuta_conf.xml file, you can continue using it. However, it's recommended that you delete any default properties from the file that you have not explicitly overridden, or remove the file completely and rely on Spark environment variables. Either method will ensure that any property defaults changed in upcoming Immuta releases are propagated to your environment.

1 - Download and Configure Immuta Artifacts

Scroll to the release that corresponds to your Immuta version.
Download the .jar file (Immuta plugin) as well as the other scripts listed below, which will load the plugin at cluster startup.
The immuta-benchmark-suite.dbc is a collection of notebooks packaged as a .dbc file. After you have added cluster policies to your cluster, you can import this file into Databricks to run performance tests and compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries. Note: Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.
Specify the following properties as Spark environment variables or in the optional immuta_conf.xml file. If the same property is specified in both locations, the Spark environment variable takes precedence. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com
- immuta.base.url: The full URL for the target Immuta tenant Ex: https://immuta.mycompany.com.

Environment variables with Google Cloud Platform

Do not use environment variables to set sensitive properties when using Google Cloud Platform. Set them directly in immuta_conf.xml.

2 - Stage Immuta Artifacts

When configuring the Databricks cluster, a path will need to be provided to each of the artifacts downloaded/created in the previous step. To do this, those artifacts must be hosted somewhere that your Databricks instance can access. The following methods can be used for this step:

These artifacts will be downloaded to the required location within the clusters file-system by the init script downloaded in the previous step. In order for the init script to find these files, a URI will have to be provided through environment variables configured on the cluster. Each method's URI structure and setup is explained below.

AWS/S3

URI Structure: s3://[bucket]/[path]

Upload the configuration file, JSON file, and JAR file to an S3 bucket that the role from step 1 has access to.

Authenticating with Access Keys or Session Tokens (Optional)

If you wish to authenticate using access keys, add the following items to the cluster's environment variables:

If you've assumed a role and received a session token, that can be added here as well:

Azure

ADL Gen 2

URI Structure: abfs(s)://[container]@[account].dfs.core.windows.net/[path]

Environment Variables:

If you want to authenticate using an account key, add the following to your cluster's environment variables:

If you want to authenticate using an Azure SAS token, add the following to your cluster's environment variables:

ADL Gen 1

URI Structure: adl://[account].azuredatalakestore.net/[path]

Environment Variables:

If authenticating as a Microsoft Entra ID user,

If authenticating using a service principal,

HTTPS

URI Structure: http(s)://[host](:port)/[path]

Artifacts are available for download from Immuta using basic authentication. Your basic authentication credentials can be obtained from your Immuta support professional.

Environment Variables (Optional)

DBFS

DBFS does not support access control

Any Databricks user can access DBFS via the Databricks command line utility. Files containing sensitive materials (such as Immuta API keys) should not be stored there in plain text. Use other methods described herein to properly secure such materials.

URI Structure: dbfs:/[path]

Since any user has access to everything in DBFS:

The artifacts can be stored anywhere in DBFS.
It's best to have a cluster-specific place for your artifacts in DBFS if you are testing to avoid overwriting or reusing someone else's artifacts accidentally.

3 - Protect Immuta Environment Variables with Databricks Secrets

Databricks secrets can be used in the Environment Variables configuration section for a cluster by referencing the secret path rather than the actual value of the environment variable. For example, if a user wanted to make the following value secret

they could instead create a Databricks secret and reference it as the value of that variable. For instance, if the secret scope my_secrets was created, and the user added a secret with the key my_secret_env_var containing the desired sensitive environment variable, they would reference it in the Environment Variables section:

Then, at runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

Best practice: Replace sensitive variables with secrets

Immuta recommends that any sensitive environment variables listed below in the various artifact deployment instructions be replaced with secrets.

4 - Create and Configure the Cluster

Cluster creation in an Immuta-enabled organization or Databricks workspace should be limited to administrative users to avoid allowing users to create non-Immuta enabled clusters.

Select the Custom Access mode.
Opt to adjust the Autopilot Options and Worker Type settings. The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
In the Advanced Options section, click the Instances tab.
Click the Spark tab. In Spark Config field, add your configuration.
- Cluster Configuration Requirements:
Click the Init Scripts tab and set the following configurations:
- Destination: Specify the service you used to host the Immuta artifacts.
- File Path: Specify the full URI to the immuta_cluster_init_script.sh.
- Add the new key/value to the configuration.
Click the Permissions tab and configure the following setting:
- Who has access: Users or groups will need to have the permission Can Attach To to execute queries against Immuta configured data sources.
(Re)start the cluster.

Additional Hadoop Configuration File (Optional)

As mentioned in the "Environment Variables" section of the cluster configuration, there may be some cases where it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration in order to read the data composing Immuta data sources.

As an example, when accessing external tables stored in Azure Data Lake Gen 2, Spark must have credentials to access the target containers/filesystems in ADLg2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access ADLg2.

The additional configuration file looks very similar to the Immuta Configuration file referenced above. Some example configuration files for accessing different storage layers are below.

Amazon S3

IAM role for S3 access

Azure Data Lake Gen 2

Azure Data Lake Gen 1

ADL prefix: Prior to Databricks Runtime version 6, the following configuration items should have a prefix of dfs.adls rather than fs.adl

Azure Blob Storage

5 - Register Data

6 - Query Immuta Data

When the Immuta enabled Databricks cluster has been successfully started, users will see a new database labeled "immuta". This database is the virtual layer provided to access data sources configured within the connected Immuta instance.

Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster and GRANT the user access to the immuta database.

The following SQL query can be run as an administrator within a journal to give the user access to "Immuta":

Creating a Databricks Data Source

Databricks to Immuta User Mapping

By default, the IAM used to map users between Databricks and Immuta is the BIM (Immuta's internal IAM). The Immuta Spark plugin will check the Databricks username against the username within the BIM to determine access. For a basic integration, this means the users email address in Databricks and the connected Immuta tenant must match.

Run spark-submit Jobs on Databricks

This guide illustrates how to run R and Scala spark-submit jobs on Databricks, including prerequisites and caveats.

Language support: R and Scala are supported, but require advanced configuration; work with your Immuta support professional to use these languages. Python spark-submit jobs are not supported by the Databricks Spark integration.

Using R in a notebook: Because of how some user properties are populated in Databricks, users should load the SparkR library in a separate cell before attempting to use any SparkR functions.

R `spark-submit`

Prerequisites

Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

Initialize the Spark session by entering these settings into the R submit script immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
This will enable the R script to access Immuta data sources, scratch paths, and workspace tables.
Once the script is written, upload the script to a location in dbfs/S3/ABFS to give the Databricks cluster access to it.

Create the R `spark submit` Job

To create the R spark-submit job,

Go to the Databricks jobs page.
Create a new job, and select Configure spark-submit.

Set up the parameters:

 [
 "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
 "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
 "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
 "dbfs:/path/to/script.R",
 "arg1", "arg2", "..."
 ]

Note: The path dbfs:/path/to/script.R can be in S3 or ABFS (on Azure Databricks), assuming the cluster is configured with access to that path.

Edit the cluster configuration, and change the Databricks Runtime to be a supported version.

Scala spark-submit

Prerequisites

Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

Configure the Spark session with immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
Note: Stop your Spark session (spark.stop()) at the end of your job or the cluster will not terminate.

The spark submit job needs to be launched using a different classloader which will point at the designated user JARs directory. The following Scala template can be used to handle launching your submit code using a separate classloader:

package com.example.job

import java.net.URLClassLoader
import java.io.File

import org.apache.spark.sql.SparkSession

object ImmutaSparkSubmitExample {
def main(args: Array[String]): Unit = {
    val jarDir = new File("/databricks/immuta/jars/")
    val urls = jarDir.listFiles.map(_.toURI.toURL)

    // Configure a new ClassLoader which will load jars from the additional jars directory
    val cl = new URLClassLoader(urls)
    val jobClass = cl.loadClass(classOf[ImmutaSparkSubmitExample].getName)
    val job = jobClass.newInstance
    jobClass.getMethod("runJob").invoke(job)
}
}

class ImmutaSparkSubmitExample {

def getSparkSession(): SparkSession = {
    SparkSession.builder()
    .appName("Example Spark Submit")
    .enableHiveSupport()
    .config("immuta.spark.acl.assume.not.privileged", "true")
    .config("spark.hadoop.immuta.databricks.config.update.service.enabled", "false")
    .getOrCreate()
}

def runJob(): Unit = {
    val spark = getSparkSession
    try {
    val df = spark.table("immuta.<YOUR DATASOURCE>")

    // Run Immuta Spark queries...

    } finally {
    spark.stop()
    }
}
}

Create the Scala `spark-submit` Job

To create the Scala spark-submit job,

Build and upload your JAR to dbfs/S3/ABFS where the cluster has access to it.

Select Configure spark-submit, and configure the parameters:

 [
 "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
 "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
 "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
 "--class","org.youorg.package.MainClass",
 "dbfs:/path/to/code.jar",
 "arg1", "arg2", "..."
 ]

Note: The fully-qualified class name of the class whose main function will be used as the entry point for your code in the --class parameter.

Note: The path dbfs:/path/to/code.jar can be in S3 or ABFS (on Azure Databricks) assuming the cluster is configured with access to that path.

Include IMMUTA_INIT_ADDITIONAL_JARS_URI=dbfs:/path/to/code.jar in the "Environment Variables" (where dbfs:/path/to/code.jar is the path to your jar) so that the jar is uploaded to all the cluster nodes.

Caveats

The user mapping works differently from notebooks because spark-submit clusters are not configured with access to the Databricks SCIM API. The cluster tags are read to get the cluster creator and match that user to an Immuta user.
Privileged users (Databricks Admins and Whitelisted Users) must be tied to an Immuta user and given access through Immuta to access data through spark-submit jobs because the setting immuta.spark.acl.assume.not.privileged="true" is used.
There is an option of using the immuta.api.key setting with an Immuta API key generated on the Immuta profile page.
Currently when an API key is generated it invalidates the previous key. This can cause issues if a user is using multiple clusters in parallel, since each cluster will generate a new API key for that Immuta user. To avoid these issues, manually generate the API key in Immuta and set the immuta.api.key on all the clusters or use a specified job user for the submit job.

Environment Variables

This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended).

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Environment variable overrides

Properties in the config file can be overridden during installation using environment variables. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com

immuta.ephemeral.host.override
- Default: true
- Description: Set this to false if ephemeral overrides should not be enabled for Spark. When true, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.
immuta.ephemeral.host.override.httpPath
- Description: This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.
immuta.ephemeral.table.path.check.enabled
- Default: true
- Description: When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.
immuta.spark.acl.enabled
- Default: true
- Description: Immuta Access Control List (ACL). Controls whether Databricks users are blocked from accessing non-Immuta tables. Ignored if Databricks Table ACLs are enabled (i.e., spark.databricks.acl.dfAclsEnabled=true).
immuta.spark.acl.whitelist
- Description: Comma-separated list of Databricks usernames who may access raw tables when the Immuta ACL is in use.
immuta.spark.acl.privileged.timeout.seconds
- Default: 3600
- Description: The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is whitelisted in immuta.spark.acl.whitelist.
immuta.spark.acl.assume.not.privileged
- Default: false
- Description: Session property that overrides privileged user status when the Immuta ACL is in use. This should only be used in R scripts associated with spark-submit jobs.
immuta.spark.audit.all.queries
- Default: false
- Description: Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
immuta.spark.databricks.allow.non.immuta.reads
- Default: false
immuta.spark.databricks.allow.non.immuta.writes
- Default: false
immuta.spark.databricks.allowed.impersonation.users
- Description: This configuration is a comma-separated list of Databricks users who are allowed to impersonate Immuta users.
immuta.spark.databricks.dbfs.mount.enabled
- Default: false
- Description: Exposes the DBFS FUSE mount located at /dbfs. Granular permissions are not possible, so all users will have read/write access to all objects therein. Note: Raw, unfiltered source data should never be stored in DBFS.
immuta.spark.databricks.disabled.udfs
immuta.spark.databricks.filesystem.blacklist
- Default: hdfs
- Description: A list of filesystem protocols that this instance of Immuta will not support for workspaces. This is useful in cases where a filesystem is available to a cluster but should not be used on that cluster.
immuta.spark.databricks.jar.uri
- Default: file:///databricks/jars/immuta-spark-hive.jar
- Description: The location of immuta-spark-hive.jar on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.
immuta.spark.databricks.local.scratch.dir.enabled
- Default: true
- Description: Creates a world-readable/writable scratch directory on local disk to facilitate the use of dbutils and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable IMMUTA_LOCAL_SCRATCH_DIR. Note: Sensitive data should not be stored at this location.
immuta.spark.databricks.log.level
- Default Value: INFO
- Description: The SLF4J log level to apply to Immuta's Spark plugins.
immuta.spark.databricks.log.stdout.enabled
- Default: false
- Description: If true, writes logging output to stdout/the console as well as the log4j-active.txt file (default in Databricks).
immuta.spark.databricks.py4j.strict.enabled
- Default: true
- Description: Disable to allow the use of the dbutils API in Python. Note: This setting should only be disabled for customers who employ a homogeneous integration (i.e., all users have the same level of data access).
immuta.spark.databricks.scratch.database
- Description: This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a SHOW DATABASE query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a SHOW DATABASE query; it won't affect access to the scratch databases. Instead, use immuta.spark.databricks.scratch.paths to control read and write access to the underlying database paths.
  Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.
immuta.spark.databricks.scratch.paths
- Description: Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db for the default location).
  To create a scratch path to a location or a database stored at that location, configure
  <property> <name>immuta.spark.databricks.scratch.paths</name> <value>s3://path/to/the/dir</value> </property>
  To create a scratch path to a database created using the default location,
  <property> <name>immuta.spark.databricks.scratch.paths</name> <value>s3://path/to/the/dir, dbfs:/user/hive/warehouse/any_db_name.db</value> </property>
immuta.spark.databricks.scratch.paths.create.db.enabled
- Default: false
- Description: Enables non-privileged users to create or drop scratch databases.
immuta.spark.databricks.single.impersonation.user
- Default: false
- Description: When true, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.
immuta.spark.databricks.submit.tag.job
- Default: true
- Description: Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.
immuta.spark.databricks.trusted.lib.uris
immuta.spark.non.immuta.table.cache.seconds
- Default: 3600
- Description: The number of seconds Immuta caches whether a table has been exposed as a source in Immuta. This setting only applies when immuta.spark.databricks.allow.non.immuta.writes or immuta.spark.databricks.allow.non.immuta.reads is enabled.
immuta.spark.require.equalization
- Default: false
- Description: Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages.
immuta.spark.resolve.raw.tables.enabled
- Default: true
- Description: Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or whitelisted users can set immuta.spark.session.resolve.raw.tables.enabled to false to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.
immuta.spark.session.resolve.raw.tables.enabled
- Default: true
- Description: Same as above, but a session property that allows users to toggle this functionality. If users run set immuta.spark.session.resolve.raw.tables.enabled=false, they will see raw data only (not Immuta data policy-enforced data). Note: This property is not set in immuta_conf.xml.
immuta.spark.show.immuta.database
- Default: true
- Description: This shows the immuta database in the configured Databricks cluster. When set to false Immuta will no longer show this database when a SHOW DATABASES query is performed. However, queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.
immuta.spark.version.validate.enabled
- Default: true
- Description: Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to true, if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to false so that the versions don't get logged as incompatible and make the cluster unusable.
immuta.user.context.class
- Default: com.immuta.spark.OSUserContext
- Description: The class name of the UserContext that will be used to determine the current user in immuta-spark-hive. The default implementation gets the OS user running the JVM for the Spark application.
immuta.user.mapping.iamid
- Default: bim
- Description: Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (bim) but should be updated to reflect an actual production IAM.

Starburst (Trino) Integration Reference Guide

Starburst and Trino

The Starburst (Trino) integration allows you to access policy-enforced data directly in your Starburst catalogs without rewriting queries or changing workflows. Instead of generating policy-enforced views and adding them to an Immuta catalog that users have to query (like in the legacy Starburst (Trino) integration), Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.

Architecture

Policy enforcement

When a user queries a table in Starburst (Trino), the Trino Execution Engine reaches out to the Immuta plugin to determine what the user is allowed to see:

masking policies: For each column, Starburst (Trino) requests a view expression from the Immuta plugin. If there is a masking policy on the column, the Immuta plugin returns the corresponding view expression for that column. Otherwise, nothing is returned.
row-level policies: For each table, Starburst (Trino) requests the rows a user can see in a table from Immuta. If there is a WHERE clause policy on the data source, Immuta returns the corresponding view expression as a WHERE clause. Otherwise, nothing is returned.

The Immuta plugin then requests policy information about the tables being queried from the Immuta Web Service and sends this information to the Trino Execution Engine. Finally, the Trino Execution Engine constructs the SQL statement, executes it on the backing tables to apply the policies, and returns the response to the user.

System access control providers

Users cannot bypass Immuta controls by changing roles in their system access control provider.

Multiple system access control providers can be configured in the Starburst (Trino) integration. This approach allows Immuta to work with existing Starburst (Trino) installations that already have an access control provider configured.

Immuta does not manage all permissions in Starburst (Trino) and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

If you have multiple access control providers configured, those providers interact in the following ways:

For a user to have access to a resource (catalog, schema, or a table), that user must have access in all of the configured access control providers.
In catalog, schema, or table filtering (such as show catalogs, show schemas, or show tables), the user will see the intersection of all access control providers. For example, if a Starburst (Trino) environment includes the catalogs public, demo, and restricted and one provider restricts a user from accessing the restricted catalog and another provider restricts the user from accessing the demo catalog, running show catalogs will only return the public catalog for that user.
Only one column masking policy can be applied per column across all system access control providers. If two or more access control providers return a mask for a column, Starburst (Trino) will throw an error at query time.
For row filtering policies, the expression for each system access control provider is applied one after the other.

Starburst (Trino) query passthrough

Starburst (Trino) query passthrough is available in most connectors using the query table function or raw_query in the Elasticsearch connector. Consequently, Immuta blocks functions named raw_query or query, as those table functions would completely bypass Immuta’s access controls.

For example, without blocking those functions, this query would access the public.customer table directly:

select * from table(postgres.system.query(query => 'select * from public.customer limit 10'));

Data flow

An Immuta Application Administrator configures the Starburst (Trino) integration, adding the ImmutaSystemAccessControl plugin on their Starburst (Trino) node.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Trino Execution Engine calls various methods on the interface to ask the ImmutaSystemAccessControl plugin where the policies should be applied. The masking and row-level security methods apply the actual policy expressions.
The Immuta System Access Control plugin calls the Immuta Web Service to retrieve policy information for that data source for the querying user, using the querying user's project, purpose, and entitlements.
The Immuta System Access Control plugin provides the SQL view expression (for masked columns) or WHERE clause SQL view expression (for row filtering) to the Trino Execution Engine.
The Trino Execution Engine constructs and executes the SQL statement on the backing catalogs and retrieves the data with appropriate policy enforcement.
User sees policy-enforced data.

Authentication methods

The Starburst (Trino) integration supports the following authentication methods to create data sources in Immuta:

Username and password: You can authenticate with your Starburst (Trino) username and password.

OAuth Authentication for creating data sources

Configure JWT authentication method in Starburst (Trino)

When using OAuth authentication to create data sources in Immuta, configure your Starburst (Trino) cluster to use JWT authentication, not OpenID Connect or OAuth.

When users query a Starburst (Trino) data source, Immuta sends a username with the view SQL so that policies apply in the right context. Since OAuth authentication does not require a username to be associated with a data source upon data source creation, Immuta does not send a username and Starburst (Trino) queries fail. To avoid this error, you must configure a global admin username.

Supported Starburst (Trino) feature

Starburst (Trino)-created logical view support

The descriptions below provide guidance for applying policies to Starburst (Trino)-created logical views in the

However, there are other approaches you can use to apply policies to Starburst (Trino)-created logical views. The examples below are the simplest approaches.

Views created in the `DEFINER` security mode

For views created using the DEFINER security mode,

ensure the user who created the view is configured as an admin user in the Immuta plugin so that policies are never applied to the underlying tables.
create Immuta data sources and apply policies to logical views exposing those tables.
lock down access to the underlying tables in Starburst (Trino) so that all end user access is provided through the views.

Views created in the `INVOKER` security mode

Applying policies to views or tables

Avoid creating data policies for both a logical view and its underlying tables. Instead, apply policies to the logical view or the underlying tables.

For views created using the INVOKER security mode, the querying user needs access to the logical view and underlying tables.

If non-Immuta table reads are disabled, provide access to the views and tables through Immuta. To do so, create Immuta data sources for the view and underlying tables, and grant access to the querying user in Immuta. If creating data policies, apply the policies to either the view or underlying tables, not both.
If non-Immuta table reads are enabled, the user already has access to the table and view. Create Immuta data sources and apply policies to the underlying table; this approach will enforce access controls for both the table and view in Starburst (Trino).

Supported Immuta features

Query audit

In addition to the information included on the Starburst (Trino) Audit Logs page, the audit logs payload in the Starburst (Trino) integration includes immutaPlanningDuration, which represents the planning overhead in Immuta.

Multiple Starburst (Trino) integrations

You can configure multiple Starburst (Trino) integrations with a single Immuta tenant and use them dynamically. Configure the integration once in Immuta to use it in multiple Starburst (Trino) clusters. However, consider the following limitations:

Names of catalogs cannot overlap because Immuta cannot distinguish among them.
A combination of cluster types on a single Immuta tenant is supported unless your Trino cluster is configured to use a proxy. In that case, you can only connect either Trino clusters or Starburst clusters to the same Immuta tenant.

Policy caveat

Limit your masked joins to columns with matching column types. Starburst truncates the result of the masking expression to conform to the native column type when performing the join, so joining two masked columns with different data types produces invalid results when one of the columns' lengths is less than the length of the masked value.

For example, if the value of a hashed column is 64 characters, joining a hashed varchar(50) and a hashed varchar(255) column will not be joined correctly, since the varchar(50) value is truncated and doesn’t match the varchar(255) value.

Configure Starburst (Trino) Integration

The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

Starburst Cluster Configuration

Requirement

Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the Integration Type dropdown menu.
Click Save.

OAuth Authentication

If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, configure the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Starburst

Default Configuration Property Values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

If the hostnames in your certificate don't match the hostname specified in your Starburst (Trino) integration, you can set immuta.disable-hostname-verification to true in the Immuta access control config file to get the integration working in the interim.

The Starburst (Trino) integration uses the immuta.ca-file property to communicate with Immuta. When configuring the plugin in Starburst (outlined below), specify a path to your CA file using the immuta.ca-file property in the Immuta access control configuration file.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
The table below describes the properties that can be set during configuration.
Property
Starburst version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
immuta.apikey
392 and newer
Required
This should be set to the Immuta API key displayed when enabling the integration on the app settings page.
immuta.audit.legacy.enabled
392 and newer
Optional
This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Starburst (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

access-control.config-files=/etc/starburst/immuta-access-control.properties

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Starburst Users to Immuta

- All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Trino Cluster Configuration

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the dropdown menu.
Click Save.

OAuth Authentication

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Trino

Default Configuration Property Values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

Download the assets for the release.
Enable Immuta on your cluster. Select the tab below that corresponds to your installation method for instructions:

Docker installations

Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

immuta-trino Docker image

For Trino versions 414 and newer, an immuta-trino Docker image that includes the Trino plugin jars is available from registry.immuta.com. Before using this image, consider the following factors:

This image was designed to provide a method for customers to quickly set up and validate the integration, so it should be used in a development environment. Use the Docker installation method above for production environments.
Immuta only supports the Immuta Trino plugin on the Docker image, not any other software packaged on the image.
If you experience an issue with the image outside of the scope of the Immuta plugin, you must rebuild your own version of the image using the Docker installation method above.

To use this image,

Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:
```
docker run registry.immuta.com/immuta/immuta-trino:414
```
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

Standalone installations

Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.

Configure the properties described in the table below.

Property

Trino version

Required or optional

Description

access-control.name

392 and newer

Required

This property enables the integration.

access-control.config-files

392 and newer

Optional

Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

immuta.allowed.immuta.datasource.operations

413 and newer

Optional

immuta.allowed.non.immuta.datasource.operations

392 and newer

Optional

immuta.apikey

392 and newer

Required

This should be set to the Immuta API key displayed when enabling the integration on the app settings page.

immuta.audit.legacy.enabled

392 and newer

Optional

This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.

immuta.ca-file

392 and newer

Optional

This property allows you to specify a path to your CA file.

immuta.cache.views.seconds

392 and newer

Optional

Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.

immuta.cache.datasource.seconds

392 and newer

Optional

Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.

immuta.endpoint

392 and newer

Required

The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Trino (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.

immuta.filter.unallowed.table.metadata

392 and newer

Optional

When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.

immuta.group.admin

420 and newer

Required if immuta.user.admin is not set

This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

immuta.user.admin

392 and newer

Required if immuta.group.admin is not set

This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Trino Users to Immuta

- All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Customize Read and Write Access Policies for Starburst (Trino)

Private preview: Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.

Requirements

Starburst (Trino) version 438 or newer
Write policies for Starburst (Trino) enabled. Contact your Immuta representative to get this feature enabled on your account.

Configuration options

Immuta web service configuration

Contact your Immuta representative to configure read and write access in the Immuta web service if all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to your Immuta tenant. A configuration example is provided below.

Configuration example

The following example maps WRITE to READ, WRITE and OWN permissions and READ to just READ. Both READ and WRITE permissions should always include READ:

accessGrantMapping:
  WRITE: ['READ', 'WRITE', 'OWN']
  READ: ['READ']

Starburst cluster configuration

Configure the integration to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a Starburst cluster.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
```
immuta.allowed.immuta.datasource.operations=READ,WRITE,OWN
immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN
```
Enable the Immuta access control plugin in the Starburst cluster's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/starburst/immuta-access-control.properties
```

Trino cluster configuration

Create the Immuta access control configuration file in the Trino configuration directory (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
```
immuta.allowed.immuta.datasource.operations=READ,WRITE,OWN
immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN
```
Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```