1 of 100

2024.3

Immuta Documentation - 2024.3

Your guide to discovering, securing, and monitoring your data with Immuta.

What is Immuta?

Immuta helps you achieve the following outcomes in your data platform:

How does Immuta do it?

Immuta provides three modules to create a full data security platform suite.

Discover sensitive data from millions of fields without manual effort. With over 60 pre-built and domain-specific identifiers, you can tailor data classification to your unique business needs based on your desired confidence level.

Leverage timely insights into data access and user activity with anomaly indicators for faster analysis and proactive actions.

Immuta’s attribute-based access control (ABAC) delivers scalable data access without role explosion, and dynamic data masking ensures the right users can access the right data.

Self-Managed Deployment

This section illustrates how to install Immuta on Kubernetes using the Immuta Enterprise Helm chart.

This reference guide provides an overview of the Immuta Enterprise Helm chart version requirements and infrastructure recommendations.

The guides in this section illustrate how to install and deploy Immuta in your Kubernetes environment.

The guides in this section illustrate how to upgrade Immuta.

The guides in this section illustrate how to configure your Immuta Enterprise Helm chart for various scenarios, including optimizing your deployment for production environments.

This guide provides links to additional resources for disaster recovery strategies.

This page provides troubleshooting guidance and outlines frequently asked questions for the Immuta installation.

This page introduces the core concepts and terminology essential for understanding the installation material.

Requirements

Immuta comprises three core services: Secure, Discover, and Detect. These services rely on PostgreSQL and Elasticsearch to store their states, a caching layer, and Temporal for job execution. The illustration below shows the relationships among these services.

The Immuta Enterprise Helm chart (IEHC) does not include the deployment of PostgreSQL or Elasticsearch, so you must deploy them separately.

Although Immuta recommends using Elasticsearch because it supports several new Immuta features and services, you can deploy Immuta without Elasticsearch. The table below outlines the Immuta features supported with and without Elasticsearch and the dependencies you must deploy and manage yourself.

Version requirements

Kubernetes versions

Kubernetes 1.29 to 1.32

Metadata database (PostgreSQL)

PostgreSQL incompatibilities

Immuta is not compatible with PostgreSQL abstraction layers, such as Amazon Aurora.

PostgreSQL 15.0 or newer
The pgcrypto, btree_gin extensions must be enabled

Elasticsearch

Elasticsearch v7 API or newer
OpenSearch compatible with Elasticsearch v7 API or newer

OpenSearch user

cluster:monitor/health
indices:data/write/bulk*
indices:data/write/bulk
indices:data/read/search
indices:admin/exists
indices:admin/create
indices:admin/delete
indices:admin/settings/update
indices:admin/get
indices:data/write/delete/byquery
indices:data/write/index
indices:admin/mapping/put
indices:data/write/bulk
indices:data/write/bulk*

Cache (Redis/Memcached)

Built-in cache

The IEHC manages its own Memcached deployment inside the cluster. The key-value cache can optionally be externalized post installation.

Redis 7.0 or newer
Memcached 1.6 or newer

Temporal

Built-in Temporal server

The IEHC deploys a Temporal server and its requisite components. However, you may choose to use your own Temporal instance.

Temporal 1.24.2 or newer

Infrastructure recommendations

Legacy features and services

Some legacy services and features are no longer enabled in the recommended configuration of the IEHC. The table below lists these features and provides links to documentation that outlines how to enable them in Immuta.

Install

The guides in this section illustrate how to install and deploy Immuta in your Kubernetes environment.

Prerequisites

Helm installation

Checklist

Quickstart

Encountering issues?

Complete the guide that corresponds with your Kubernetes cluster's distribution.
- - Amazon Elastic Kubernetes Service (EKS)
  - Google Kubernetes Engine (GKE)
  - Microsoft Azure Kubernetes Service (AKS)

Managed Public Cloud

This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

Prerequisites

Feature availability

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

PostgreSQL

You have a PostgreSQL user account with the necessary privileges to create databases, extensions, and roles.

Elasticsearch

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
```
kubectl create namespace immuta
```
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.
```
kubectl config set-context --current --namespace=immuta
```

Create registry secret

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting a client

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect with psql by creating an ephemeral Kubernetes pod.

Connect to the database

Connect to the database as an admin (e.g., postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.

kubectl run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username <postgres-user> --port 5432 --password

Create role

Temporal's upgrade mechanism utilizes SQL command CREATE EXTENSION when managing database schema changes. However, in cloud-managed PostgreSQL offerings, this command is typically restricted to roles with elevated privileges to protect the database and maintain the stability of the cloud environment.

To ensure Temporal can successfully manage its schema, an administrator role must be granted temporarily. The role name varies depending on the cloud-managed service:

Amazon RDS: rds_superuser
Azure Database: azure_pg_admin
Google Cloud SQL: cloudsqlsuperuser

Create the immuta role.

CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
ALTER ROLE immuta SET search_path TO bometadata,public;

Grant administrator privileges to the immuta role. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT <admin-role> TO immuta;
```
Grant the immuta role to the current user. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT immuta TO CURRENT_USER;
```

Create databases

Create databases.

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

GRANT ALL ON DATABASE immuta TO immuta;
GRANT ALL ON DATABASE temporal TO immuta;
GRANT ALL ON DATABASE temporal_visibility TO immuta;

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, and then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Feature availability

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: <elasticsearch-endpoint>
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta

secure:
  postgresql:
    database: immuta
    ssl: true

temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls: 
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  featureFlags:
    AuditService: false
    detect: false
    auditLegacyViewHide: false
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  enabled: false

secure:
  postgresql:
    database: immuta
    ssl: true
    
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls: 
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

kubectl wait --for=condition=Ready pods --all

Validation

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward <service-name> 8080:http
```
Press Control+C to stop port forwarding.

Next steps

Red Hat OpenShift

This is a guide on how to deploy Immuta on OpenShift.

Prerequisites

Feature availability

PostgreSQL
(Optional) Elasticsearch/OpenSearch

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

PostgreSQL

You have a PostgreSQL user account with the necessary privileges to create databases, extensions, and roles.

Elasticsearch

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create project

Create an OpenShift project named immuta.
```
oc new-project immuta
```
Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
```
Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
```
Switch to project immuta.
```
oc project immuta
```

Create registry secret

oc create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting a client

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect with psql by creating an ephemeral Kubernetes pod.

Connect to the database

oc run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username <postgres-user> --port 5432 --password

Create Role

To ensure Temporal can successfully manage its schema, a pre-defined administrator role must be granted. The role name varies depending on the cloud-managed service:

Amazon RDS: rds_superuser
Azure Database: azure_pg_admin
Google Cloud SQL: cloudsqlsuperuser

Create the immuta role.

CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
ALTER ROLE immuta SET search_path TO bometadata,public;

Grant administrator privileges to the immuta role. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT <admin-role> TO immuta;
```

Create databases

Create databases.

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

GRANT ALL ON DATABASE immuta TO immuta;
GRANT ALL ON DATABASE temporal TO immuta;
GRANT ALL ON DATABASE temporal_visibility TO immuta;

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, and then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Why disable Ingress?

In OpenShift, Ingress resources are managed by OpenShift Routes. These routes provide a more integrated and streamlined way to handle external access to your applications. To avoid conflicts and ensure proper functionality, it's necessary to disable the pre-defined Ingress resource in the Helm chart.

Feature availability

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta 

  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  worker:
    podSecurityContext:  
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL 

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  ingress:
    enabled: false

  postgresql:
    database: immuta
    ssl: false

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
  
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls:
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true
    frontend:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    history:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    matching:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    worker:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
  schema:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  proxy:
    deployment:
      podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
      containerSecurityContext:
        enabled: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  featureFlags:
    AuditService: false
    detect: false
    auditLegacyViewHide: false
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  enabled: false

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  ingress:
    enabled: false

  postgresql:
    database: immuta
    ssl: true

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls:
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true
    frontend:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    history:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    matching:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    worker:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
  schema:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  proxy:
    deployment:
      podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
      containerSecurityContext:
        enabled: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

oc wait --for=condition=Ready pods --all

Validation

Determine the name of the Secure service.

oc get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
oc port-forward <service-name> 8080:http
```
Press Control+C to stop port forwarding.

Next steps

SUSE Rancher Kubernetes Engine (RKE2)

This is a generic guide that demonstrates how to deploy Immuta into RKE2 (i.e., Rancher Government) without dependencies on any particular cloud provider. Advanced Kubernetes expertise is required; therefore, it is not suitable for beginners.

Considerations

Running production-grade stateful workloads (e.g., databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

Operational overhead: Managing PostgreSQL and Elasticsearch on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.
Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL and Elasticsearch have sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.
Data integrity and high availability: PostgreSQL and Elasticsearch deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.
Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.
Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.
Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

Prerequisite

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Setup

Helm

Authenticate with OCI registry

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.

Create registry secret

Elasticsearch

Install Helm chart

Create a Helm values file named es-values.yaml with the following content:

Deploy Elasticsearch.
Wait for all Elasticsearch pods to become ready.

PostgreSQL

Install Helm chart

Create a Helm values file named pg-values.yaml.

Deploy PostgreSQL.
Wait for all PostgreSQL pods to become ready.

Configure databases

Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.
Exec into the PostgreSQL database pod using psql.
Configure the immuta database.
Configure the temporal database.
Configure the temporal_visibility database.
Exit the interactive prompt. Type \q, then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Deploy Immuta.
Wait for all pods to become ready.

Validation

Determine the name of the Secure service.
Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
Press Control+C to stop port forwarding.

Next steps

K3s

This is a generic guide that demonstrates how to deploy Immuta into K3s without dependencies on any particular cloud provider. Advanced Kubernetes expertise is required; therefore, it is not suitable for beginners.

Considerations

Running production-grade stateful workloads (e.g., databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

Operational overhead: Managing PostgreSQL and Elasticsearch on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.
Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL and Elasticsearch have sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.
Data integrity and high availability: PostgreSQL and Elasticsearch deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.
Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.
Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.
Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

Prerequisites

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

Setup

Helm

Authenticate with OCI registry

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.

Create registry secret

Elasticsearch

Install Helm chart

Create a Helm values file named es-values.yaml with the following content:

Deploy Elasticsearch.
Wait for all Elasticsearch pods to become ready.

PostgreSQL

Install Helm chart

Create a Helm values file named pg-values.yaml.

Deploy PostgreSQL.
Wait for all PostgreSQL pods to become ready.

Configure databases

Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.
Exec into the PostgreSQL database pod using psql.
Configure the immuta database.
Configure the temporal database.
Configure the temporal_visibility database.
Exit the interactive prompt. Type \q, then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Deploy Immuta.
Wait for all pods to become ready.

Validation

Determine the name of the Secure service.
Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
Press Control+C to stop port forwarding.

Next steps

Upgrade

Introduced in 2024.2, the Immuta Enterprise Helm chart (IEHC) is an entirely new Helm chart used to deploy Immuta. Unlike the previous Immuta Helm chart (IHC), the IEHC shares the same version as the Immuta product. Each version of the chart supports a singular version of Immuta. Upgrading the Immuta version now entails upgrading the underlying Helm chart. Failure to do so will lead to an unsupported configuration.

Helm chart deprecation notice

As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. The immuta-values.yaml Helm values files are not cross-compatible.

Legacy

Migrating to the New Helm Chart

This guide demonstrates how to upgrade an existing Immuta deployment installed with the older Immuta Helm chart (IHC) to v2024.2 LTS using the Immuta Enterprise Helm chart (IEHC).

Helm chart deprecation notice

As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. Their respective immuta-values.yaml Helm values files are not compatible.

Prerequisites

Create a PostgreSQL database

The PostgreSQL instance has been provisioned and is actively running.

For additional information, consult the Deployment requirements.

Validate the Helm release

Fetch the metadata for the Helm release associated with Immuta.
```
helm get metadata --output yaml <helm-release-name>
```
Review the output from the previous step and verify the following:
- The Immuta version (appVersion) is
  - The last LTS (2022.5.x) or 2024.1 or newer
  - Less than 2024.2
- The Immuta Helm chart (version) is greater than or equal to 4.13.5
- The Immuta Helm chart name (chart) is immuta
If any of the criteria is not met, it's first necessary to perform a Helm upgrade using the IHC. Contact your Immuta representative for guidance.

Metadata database

The new IEHC no longer supports deploying a Metadata database (PostgreSQL) inside the Kubernetes cluster. Before transitioning to the new IEHC, it's first necessary to externalize the Metadata database.

Built-in

The following demonstrates how to take a database backup and import the data into each cloud provider's managed PostgreSQL service.

Create backup of old database

Get the metadata database pod name.

kubectl get pod --selector "app.kubernetes.io/component=database" --output name

Spawn a shell inside the running metadata database pod.

kubectl exec --stdin --tty <metadata-database-pod-name> -- sh

Perform a database backup.

pg_dump --dbname=bometadata --file=/tmp/bometadata.dump --format=custom --no-owner --no-privileges

Type exit, and then press Enter to exit the shell prompt.
Copy file bometadata.dump from the pod to the host's working directory.
```
kubectl cp <metadata-database-pod-name>:/tmp/bometadata.dump .
```

Setup new database

Create a pod named immuta-setup-db and spawn a shell.

kubectl run immuta-setup-db --stdin --tty --rm --image docker.io/bitnami/postgresql:latest -- sh

Connect to the new PostgreSQL database as a superuser. Depending on the cloud provider, the default superuser name (postgres) might differ.
```
psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create immuta, temporal, and temporal_visiblity databases and an immuta role.

CREATE ROLE immuta with login encrypted password '<postgres-password>';
GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
GRANT all ON DATABASE temporal TO immuta;
GRANT all ON DATABASE temporal_visibility TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;
REVOKE immuta FROM CURRENT_USER;

Type \q, and then press Enter to exit the psql prompt.

Authenticate as the immuta user and create the pgcrypto extension.

psql --host <postgres-fqdn> --username immuta --port 5432 --password
CREATE EXTENSION pgcrypto;

Type \q, and then press Enter to exit the psql prompt.

Restore backup to new database

Create a pod named immuta-restore-db and spawn a shell.

kubectl run immuta-restore-db --image docker.io/bitnami/postgresql:latest -- sleep infinity

Copy file bometadata.dump from the host's working directory to pod immuta-restore-db.
```
kubectl cp bometadata.dump immuta-restore-db:/tmp
```

Spawn a shell inside pod immuta-restore-db.

kubectl exec immuta-restore-db --stdin --tty -- sh

Perform a database restore while authenticated as role immuta. Refer to the value substituted for <postgres-password> when prompted to enter a password.
```
pg_restore --host=<postgres-fqdn> --port=5432 --username=immuta --password --dbname=immuta --no-owner --role=immuta < /tmp/bometadata.dump
```
Type exit, and then press Enter to exit the shell prompt.
Delete pod immuta-restore-db that was previously created.
```
kubectl delete pod/immuta-restore-db
```

External

No additional work is required. The existing database can be reused with the new IEHC.

Helm values

Helm values file compatibility

The immuta-values.yaml Helm values file used by the IHC is not compatible with the new IEHC.

Rename the existing immuta-values.yaml Helm values file used by the IHC.
```
mv immuta-values.yaml immuta-values.ihc.yaml
```

How-to Guides

Integration Settings

Reference Guides

How-to Guides

Reference Guides

Configuration Settings

Databricks Spark Cluster Policies

How-to Guides

Reference Guides

Managed Public Cloud

This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

Prerequisites

The following managed services must be provisioned and running before proceeding. For further assistance consult the for your respective cloud provider.

Feature availability

If deployed without ElasticSearch/OpenSearch, several core services and features will be unavailable. See the for details.

(Optional)

(Optional)

(Optional)

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

You have the credentials needed to access the ocir.immuta.com OCI registry. These can be viewed in your user profile at .

PostgreSQL

The PostgreSQL instance's hostname/FQDN is .
The PostgreSQL instance is .
You have a PostgreSQL user account with the necessary privileges to create databases, extensions, and roles.

Elasticsearch

The Elasticsearch instance's hostname/FQDN is .
The Elasticsearch instance is .
The user must have the .

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
```
kubectl create namespace immuta
```
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.
```
kubectl config set-context --current --namespace=immuta
```

Create registry secret

Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting a client

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect with psql by creating an ephemeral Kubernetes pod.

Connect to the database

kubectl run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username <postgres-user> --port 5432 --password

Create role

To ensure Temporal can successfully manage its schema, an administrator role must be granted temporarily. The role name varies depending on the cloud-managed service:

Amazon RDS: rds_superuser
Azure Database: azure_pg_admin
Google Cloud SQL: cloudsqlsuperuser

Create the immuta role.

CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
ALTER ROLE immuta SET search_path TO bometadata,public;

Grant administrator privileges to the immuta role. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT <admin-role> TO immuta;
```
Grant the immuta role to the current user. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT immuta TO CURRENT_USER;
```

Create databases

Create databases.

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

Grant role immuta additional privileges. Refer to the for further details on database roles and privileges.

GRANT ALL ON DATABASE immuta TO immuta;
GRANT ALL ON DATABASE temporal TO immuta;
GRANT ALL ON DATABASE temporal_visibility TO immuta;

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, and then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Feature availability

If deployed without Elasticsearch/OpenSearch, several core services and features will be unavailable. See the for details.

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: <elasticsearch-endpoint>
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta

secure:
  postgresql:
    database: immuta
    ssl: true

temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls: 
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  featureFlags:
    AuditService: false
    detect: false
    auditLegacyViewHide: false
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  enabled: false

secure:
  postgresql:
    database: immuta
    ssl: true
    
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls: 
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true

Create a file named immuta-values.yaml with the above content, making sure to update all .

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

kubectl wait --for=condition=Ready pods --all

Validation

This section helps you validate your Immuta installation by temporarily accessing the application locally. However, this access is limited to your own computer. To enable access for other devices, you must proceed with configuring Ingress outlined in the section.

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward <service-name> 8080:http
```
In a web browser, navigate to , to ensure the Immuta application loads.
Press Control+C to stop port forwarding.

Next steps

(required).
.
.

(required).
.
.

(required).
.
.

Red Hat OpenShift

This is a guide on how to deploy Immuta on OpenShift.

Prerequisites

The following managed services must be provisioned and running before proceeding. For further assistance consult the for your respective cloud provider.

Feature availability

If deployed without Elasticsearch/OpenSearch, several core services and features will be unavailable. See the for details.

PostgreSQL
(Optional) Elasticsearch/OpenSearch

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

You have the credentials needed to access the ocir.immuta.com OCI registry. These can be viewed in your user profile at .

PostgreSQL

The PostgreSQL instance's hostname/FQDN is .
The PostgreSQL instance is .
You have a PostgreSQL user account with the necessary privileges to create databases, extensions, and roles.

Elasticsearch

The Elasticsearch instance's hostname/FQDN is .
The Elasticsearch instance is .
The user must have the .

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create project

Create an OpenShift project named immuta.
```
oc new-project immuta
```
Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
```
Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.
```
oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
```
Switch to project immuta.
```
oc project immuta
```

Create registry secret

Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

oc create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

PostgreSQL

Connecting a client

There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect with psql by creating an ephemeral Kubernetes pod.

Connect to the database

oc run pgclient \
    --stdin \
    --tty \
    --rm \
    --image docker.io/bitnami/postgresql -- \
    psql --host <postgres-fqdn> --username <postgres-user> --port 5432 --password

Create Role

To ensure Temporal can successfully manage its schema, a pre-defined administrator role must be granted. The role name varies depending on the cloud-managed service:

Amazon RDS: rds_superuser
Azure Database: azure_pg_admin
Google Cloud SQL: cloudsqlsuperuser

Create the immuta role.

CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
ALTER ROLE immuta SET search_path TO bometadata,public;

Grant administrator privileges to the immuta role. Upon successfully completing this installation guide, you can optionally revoke this role grant.
```
GRANT <admin-role> TO immuta;
```

Create databases

Create databases.

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

Grant role immuta additional privileges. Refer to the for further details on database roles and privileges.

GRANT ALL ON DATABASE immuta TO immuta;
GRANT ALL ON DATABASE temporal TO immuta;
GRANT ALL ON DATABASE temporal_visibility TO immuta;

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, and then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

Why disable Ingress?

Feature availability

If deployed without Elasticsearch/OpenSearch, several core services and features will be unavailable. See the for details.

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta 

  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  worker:
    podSecurityContext:  
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL 

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  ingress:
    enabled: false

  postgresql:
    database: immuta
    ssl: false

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
  
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault
      
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls:
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true
    frontend:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    history:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    matching:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    worker:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
  schema:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  proxy:
    deployment:
      podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
      containerSecurityContext:
        enabled: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  featureFlags:
    AuditService: false
    detect: false
    auditLegacyViewHide: false
  postgresql:
    host: <postgres-fqdn>
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  enabled: false

discover:
  deployment:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

secure:
  ingress:
    enabled: false

  postgresql:
    database: immuta
    ssl: true

  web:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  backgroundWorker:
    podSecurityContext:
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
      runAsUser: <user-id>
      # A number that is within the project range:
      #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
      runAsGroup: <group-id>
      seccompProfile:
        type: RuntimeDefault

    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    config:
      persistence:
        default:
          sql:
            database: temporal
            tls:
              enabled: true
        visibility:
          sql:
            database: temporal_visibility
            tls:
              enabled: true
    frontend:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    history:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    matching:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
    worker:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
  schema:
    podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
  proxy:
    deployment:
      podSecurityContext:
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
        runAsUser: <user-id>
        # A number that is within the project range:
        #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
        runAsGroup: <group-id>
        seccompProfile:
          type: RuntimeDefault
      containerSecurityContext:
        enabled: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL

Create a file named immuta-values.yaml with the above content, making sure to update all .

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

oc wait --for=condition=Ready pods --all

Validation

Determine the name of the Secure service.

oc get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
oc port-forward <service-name> 8080:http
```
In a web browser, navigate to , to ensure the Immuta application loads.
Press Control+C to stop port forwarding.

Next steps

Migrating to the New Helm Chart

This guide demonstrates how to upgrade an existing Immuta deployment installed with the older Immuta Helm chart (IHC) to v2024.2 LTS using the Immuta Enterprise Helm chart (IEHC).

Helm chart deprecation notice

As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. Their respective immuta-values.yaml Helm values files are not compatible.

Prerequisites

Create a PostgreSQL database

The PostgreSQL instance has been provisioned and is actively running.
The PostgreSQL instance's hostname/FQDN is .
The PostgreSQL instance is .

For additional information, consult the Deployment requirements.

Validate the Helm release

Fetch the metadata for the Helm release associated with Immuta.
```
helm get metadata --output yaml <helm-release-name>
```
Review the output from the previous step and verify the following:
- The Immuta version (appVersion) is
  - The last LTS (2022.5.x) or 2024.1 or newer
  - Less than 2024.2
- The Immuta Helm chart (version) is greater than or equal to 4.13.5
- The Immuta Helm chart name (chart) is immuta
If any of the criteria is not met, it's first necessary to perform a Helm upgrade using the IHC. Contact your Immuta representative for guidance.

Metadata database

Built-in

The following demonstrates how to take a database backup and import the data into each cloud provider's managed PostgreSQL service.

Create backup of old database

Get the metadata database pod name.

kubectl get pod --selector "app.kubernetes.io/component=database" --output name

Spawn a shell inside the running metadata database pod.

kubectl exec --stdin --tty <metadata-database-pod-name> -- sh

Perform a database backup.

pg_dump --dbname=bometadata --file=/tmp/bometadata.dump --format=custom --no-owner --no-privileges

Type exit, and then press Enter to exit the shell prompt.
Copy file bometadata.dump from the pod to the host's working directory.
```
kubectl cp <metadata-database-pod-name>:/tmp/bometadata.dump .
```

Setup new database

Create a pod named immuta-setup-db and spawn a shell.

kubectl run immuta-setup-db --stdin --tty --rm --image docker.io/bitnami/postgresql:latest -- sh

Connect to the new PostgreSQL database as a superuser. Depending on the cloud provider, the default superuser name (postgres) might differ.
```
psql --host <postgres-fqdn> --username postgres --port 5432 --password
```

Create immuta, temporal, and temporal_visiblity databases and an immuta role.

CREATE ROLE immuta with login encrypted password '<postgres-password>';
GRANT immuta TO CURRENT_USER;

CREATE DATABASE immuta OWNER immuta;
CREATE DATABASE temporal OWNER immuta;
CREATE DATABASE temporal_visibility OWNER immuta;

GRANT all ON DATABASE immuta TO immuta;
GRANT all ON DATABASE temporal TO immuta;
GRANT all ON DATABASE temporal_visibility TO immuta;
ALTER ROLE immuta SET search_path TO bometadata,public;
REVOKE immuta FROM CURRENT_USER;

Type \q, and then press Enter to exit the psql prompt.

Authenticate as the immuta user and create the pgcrypto extension.

psql --host <postgres-fqdn> --username immuta --port 5432 --password
CREATE EXTENSION pgcrypto;

Type \q, and then press Enter to exit the psql prompt.

Restore backup to new database

Create a pod named immuta-restore-db and spawn a shell.

kubectl run immuta-restore-db --image docker.io/bitnami/postgresql:latest -- sleep infinity

Copy file bometadata.dump from the host's working directory to pod immuta-restore-db.
```
kubectl cp bometadata.dump immuta-restore-db:/tmp
```

Spawn a shell inside pod immuta-restore-db.

kubectl exec immuta-restore-db --stdin --tty -- sh

Perform a database restore while authenticated as role immuta. Refer to the value substituted for <postgres-password> when prompted to enter a password.
```
pg_restore --host=<postgres-fqdn> --port=5432 --username=immuta --password --dbname=immuta --no-owner --role=immuta < /tmp/bometadata.dump
```
Type exit, and then press Enter to exit the shell prompt.
Delete pod immuta-restore-db that was previously created.
```
kubectl delete pod/immuta-restore-db
```

External

No additional work is required. The existing database can be reused with the new IEHC.

Helm values

Helm values file compatibility

The immuta-values.yaml Helm values file used by the IHC is not compatible with the new IEHC.

Rename the existing immuta-values.yaml Helm values file used by the IHC.
```
mv immuta-values.yaml immuta-values.ihc.yaml
```
Follow the for your Kubernetes distribution of choice.

SUSE Rancher Kubernetes Engine (RKE2)

Considerations

For the purposes of this guide, the following state stores are deployed in Kubernetes using third-party Helm charts maintained by :

Running production-grade stateful workloads (e.g., databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

Operational overhead: Managing PostgreSQL and Elasticsearch on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.
Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL and Elasticsearch have sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.
Data integrity and high availability: PostgreSQL and Elasticsearch deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.
Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.
Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.
Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

Prerequisite

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

You have the credentials needed to access the ocir.immuta.com OCI registry. These can be viewed in your user profile at .

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
```
kubectl create namespace immuta
```
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.
```
kubectl config set-context --current --namespace=immuta
```

Create registry secret

Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

Elasticsearch

Install Helm chart

Create a Helm values file named es-values.yaml with the following content:

es-values.yaml

master:
    masterOnly: false
    replicaCount: 1

data:
    replicaCount: 0

coordinating:
    replicaCount: 0

ingest:
    replicaCount: 0

Deploy Elasticsearch.

helm install es-db oci://registry-1.docker.io/bitnamicharts/elasticsearch \
    --values es-values.yaml

Wait for all Elasticsearch pods to become ready.

kubectl wait --for=condition=Ready pods --selector "app.kubernetes.io/name=elasticsearch"

PostgreSQL

Install Helm chart

Create a Helm values file named pg-values.yaml.

pg-values.yaml

primary:
  # The default 'nano' preset is insufficient and will lead to Out Of Memory (OOM) 
  # errors.
  #
  # Possible values: nano, micro, small, medium, large, xlarge, 2xlarge
  resourcesPreset: medium
  initdb:
    scripts:
      00_setup.sql: |
        CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
        ALTER ROLE immuta SET search_path TO bometadata,public;

        CREATE DATABASE temporal WITH OWNER immuta;
        CREATE DATABASE temporal_visibility WITH OWNER immuta;
        CREATE DATABASE immuta WITH OWNER immuta;

        GRANT ALL ON DATABASE temporal TO immuta;
        GRANT ALL ON DATABASE temporal_visibility TO immuta;
        GRANT ALL ON DATABASE immuta TO immuta;

Update all in the pg-values.yaml file.

Deploy PostgreSQL.

helm install pg-db oci://registry-1.docker.io/bitnamicharts/postgresql \
    --values pg-values.yaml

Wait for all PostgreSQL pods to become ready.

kubectl wait --for=condition=Ready pods --selector "app.kubernetes.io/name=postgresql"

Configure databases

Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.
```
kubectl get pod --selector "app.kubernetes.io/name=postgresql" --output name
```

Exec into the PostgreSQL database pod using psql.

kubectl exec --stdin --tty <database-pod> -- psql -U immuta --password

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
    # The anatomy of a domain name is as follows:
    #   <service>.<namespace>.svc.<cluster-domain>
    #
    # Where the default cluster domain is: cluster.local
    host: pg-db-postgresql.immuta.svc.cluster.local
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta

secure:
  postgresql:
    database: immuta
  :
    hostname: <immuta-fqdn>
    ingressClassName: nginx
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: '64m'
    
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
        visibility:
          sql:
            database: temporal_visibility

Create a file named immuta-values.yaml with the above content, making sure to update all .

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

kubectl wait --for=condition=Ready pods --all

Validation

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward <service-name> 8080:http
```
In a web browser, navigate to , to ensure the Immuta application loads.
Press Control+C to stop port forwarding.

Next steps

K3s

Considerations

For the purposes of this guide, the following state stores are deployed in Kubernetes using third-party Helm charts maintained by :

Running production-grade stateful workloads (e.g., databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

Operational overhead: Managing PostgreSQL and Elasticsearch on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.
Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL and Elasticsearch have sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.
Data integrity and high availability: PostgreSQL and Elasticsearch deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.
Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.
Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.
Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

Prerequisites

Checklist

This checklist outlines the necessary prerequisites for successfully deploying Immuta.

Credentials

You have the credentials needed to access the ocir.immuta.com OCI registry. These can be viewed in your user profile at .

Setup

Helm

Authenticate with OCI registry

echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

Kubernetes

Creating a dedicated namespace ensures a logically isolated environment for your Immuta deployment, preventing resource conflicts with other applications.

Create namespace

Create a Kubernetes namespace named immuta.
```
kubectl create namespace immuta
```
Switch to namespace immuta. All subsequent kubectl commands will default to this namespace.
```
kubectl config set-context --current --namespace=immuta
```

Create registry secret

Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

kubectl create secret docker-registry immuta-oci-registry \
    --docker-server=https://ocir.immuta.com \
    --docker-username="<username>" \
    --docker-password="<token>" \
    --docker-email=support@immuta.com

Elasticsearch

Install Helm chart

Create a Helm values file named es-values.yaml with the following content:

es-values.yaml

master:
    masterOnly: false
    replicaCount: 1

data:
    replicaCount: 0

coordinating:
    replicaCount: 0

ingest:
    replicaCount: 0

Deploy Elasticsearch.

helm install es-db oci://registry-1.docker.io/bitnamicharts/elasticsearch \
    --values es-values.yaml

Wait for all Elasticsearch pods to become ready.

kubectl wait --for=condition=Ready pods --selector "app.kubernetes.io/name=elasticsearch"

PostgreSQL

Install Helm chart

Create a Helm values file named pg-values.yaml.

pg-values.yaml

primary:
  # The default 'nano' preset is insufficient and will lead to Out Of Memory (OOM) 
  # errors.
  #
  # Possible values: nano, micro, small, medium, large, xlarge, 2xlarge
  resourcesPreset: medium
  initdb:
    scripts:
      00_setup.sql: |
        CREATE ROLE immuta with LOGIN ENCRYPTED PASSWORD '<postgres-password>';
        ALTER ROLE immuta SET search_path TO bometadata,public;

        CREATE DATABASE temporal WITH OWNER immuta;
        CREATE DATABASE temporal_visibility WITH OWNER immuta;
        CREATE DATABASE immuta WITH OWNER immuta;

        GRANT ALL ON DATABASE temporal TO immuta;
        GRANT ALL ON DATABASE temporal_visibility TO immuta;
        GRANT ALL ON DATABASE immuta TO immuta;

Update all in the pg-values.yaml file.

Deploy PostgreSQL.

helm install pg-db oci://registry-1.docker.io/bitnamicharts/postgresql \
    --values pg-values.yaml

Wait for all PostgreSQL pods to become ready.

kubectl wait --for=condition=Ready pods --selector "app.kubernetes.io/name=postgresql"

Configure databases

Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.
```
kubectl get pod --selector "app.kubernetes.io/name=postgresql" --output name
```

Exec into the PostgreSQL database pod using psql.

kubectl exec --stdin --tty <database-pod> -- psql -U immuta --password

Configure the immuta database.
```
\c immuta
CREATE EXTENSION pgcrypto;
```

Configure the temporal database.

\c temporal
GRANT CREATE ON SCHEMA public TO immuta;

Configure the temporal_visibility database.

\c temporal_visibility
GRANT CREATE ON SCHEMA public TO immuta;
CREATE EXTENSION btree_gin;

Exit the interactive prompt. Type \q, then press Enter.

Install Immuta

This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

immuta-values.yaml

global:
  imageRegistry: ocir.immuta.com
  imagePullSecrets:
    - name: immuta-oci-registry
  postgresql:
    # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
    # The anatomy of a domain name is as follows:
    #   <service>.<namespace>.svc.<cluster-domain>
    #
    # Where the default cluster domain is: cluster.local
    host: pg-db-postgresql.immuta.svc.cluster.local
    port: 5432
    username: immuta
    password: <postgres-password>

audit:
  config:
    elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
    elasticsearchUsername: <elasticsearch-username>
    elasticsearchPassword: <elasticsearch-password>
  postgresql:
    database: immuta

secure:
  postgresql:
    database: immuta
  :
    hostname: <your-hostname>
    annotations:
      traefik.ingress.kubernetes.io/router.tls: "true"
    tls: true
    # If left unset the TLS secret name defaults to <hostname>-tls
    secretName: <secret-name>
    
temporal:
  enabled: true
  schema:
    createDatabase:
      enabled: false
  server:
    config:
      persistence:
        default:
          sql:
            database: temporal
        visibility:
          sql:
            database: temporal_visibility

Create a file named immuta-values.yaml with the above content, making sure to update all .

Deploy Immuta.

helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
    --values immuta-values.yaml \
    --version 2024.3.8

Wait for all pods to become ready.

kubectl wait --for=condition=Ready pods --all

Validation

Determine the name of the Secure service.

kubectl get service --selector "app.kubernetes.io/component=secure" --output name

Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.
```
kubectl port-forward <service-name> 8080:http
```
In a web browser, navigate to , to ensure the Immuta application loads.
Press Control+C to stop port forwarding.

Next steps

Configure a Snowflake Integration

Permissions

Automatic setup permissions

When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following permissions:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

These permissions will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

You can create a new account for Immuta to use that has these permissions, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate permissions is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.

Alternatively, you can create the IMMUTA database within the specified Snowflake instance manually using the manual setup option.

Manual setup permissions

The specified role used to run the bootstrap needs to have the following privileges:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:

APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
Additional grants associated with the IMMUTA database

Configure the integration

Snowflake resource names: Use uppercase for the names of the Snowflake resources you create below.

Click the App Settings icon in the navigation panel.
Click the Integrations tab.
Click the +Add Integration button and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Impersonation box and customize the Impersonation Role to allow users to natively impersonate another user. You cannot edit this choice after you configure the integration.
2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
3. Continue with your integration configuration.

Select your configuration method

You have two options for configuring your Snowflake environment:

Automatic setup

From the Select Authentication Method Dropdown, select one of the following authentication methods:

Username and Password: Complete the Username, Password, and Role fields.
Key Pair Authentication:
1. Complete the Username field.
2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
3. Click Key Pair (Required), and upload a Snowflake key pair file.
4. Complete the Role field.

Manual setup

Account creation best practice

The account you create for Immuta should only be used for the integration and should not be used as the credentials for creating data sources in Immuta; doing so will cause issues. Instead, create a separate, dedicated READ-ONLY account for creating and registering data sources within Immuta.

Select Manual.
Use the Dropdown Menu to select your Authentication Method:
- Username and password: Enter the Username and Password and set them in the bootstrap script for the Immuta system account credentials.
- Key pair authentication: Upload the Key Pair file and when using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
- Snowflake External OAuth:
  2. Fill out the Token Endpoint. This is where the generated token is sent.
  3. Fill out the Client ID. This is the subject of the generated token.
  4. Select the method Immuta will use to obtain an access token:
    Certificate
    Keep the Use Certificate checkbox enabled.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as `x5t` or is called `sub` (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
    Client secret
    Uncheck the Use Certificate checkbox.
    Enter the Client Secret (string). Immuta uses this secret to authenticate with the authorization server when it requests a token.
In the Setup section, click bootstrap script to download the script. Then, fill out the appropriate fields and run the bootstrap script in Snowflake.

Different accounts

Select available warehouses (optional)

If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.

Select excepted roles and users

Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma. Wildcards are unsupported.

Excepted roles/users will have no policies applied to queries

Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).

Save the configuration

Click Save.

Opt to enable Snowflake tag ingestion

To allow Immuta to automatically import table and column tags from Snowflake, enable Snowflake tag ingestion in the external catalog section of the Immuta app settings page.

Snowflake user authentication

To configure Snowflake tag ingestion, which syncs Snowflake tags into Immuta, you must provide a Snowflake user who has, at minimum, the ability to set the following privileges:

GRANT IMPORTED PRIVILEGES ON DATABASE snowflake
GRANT APPLY TAG ON ACCOUNT

Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Snowflake from the dropdown menu.
Enter the Account.
Enter the Authentication information: Username, Password, Port, Default Warehouse, and Role.
Opt to enter the Proxy Host, Proxy Port, and Encrypted Key File Passphrase.
Opt to Upload Certificates.
Click the Test Connection button.
Click the Test Data Source Link.
Once both tests are successful, click Save.

Register data

Snowflake Integration

Snowflake Enterprise Edition required

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

How the integration works

Data flow

Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Policy enforcement

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.

After a user has met these qualifications they can query Snowflake tables directly.

Comply with column length and precision requirements in a Snowflake masking policy

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

Hashing collisions

Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

Query performance

Snowflake privileges

The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the or the user. The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.

Registering data sources

Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that role, ensuring that your integration works with the following use cases:

Snowflake bulk data source creation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

Resource allocations

Based on performance tests that create 100,000 data sources, the following minimum resource allocations need to be applied to the appropriate pods in your Kubernetes environment for successful bulk data source creation.

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Authentication methods

The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.

Snowflake External OAuth

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake feature

Supported Immuta features

The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Caveat

To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

Tag ingestion

You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

Caveats

Query audit

Immuta system account required Snowflake privilege

IMPORTED PRIVILEGES ON DATABASE snowflake

Multiple Snowflake instances

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

Requirements for a custom WHERE policy

All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see

Snowflake Lineage Tag Propagation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can

trace data back to its source to validate the integrity of dashboards and reports,
identify who performed write operations to meet compliance requirements,
evaluate data quality and pinpoint points of failure, and
tag sensitive data on source tables without having tag columns on their descendant tables.

However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.

Data flow

An application administrator enables the feature on the Immuta app settings page.
Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.
A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.
A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
An audit record is created that includes which tags were applied and from which columns those tags originated.

Snowflake access history view and Immuta lineage job

The Snowflake Account Usage ACCESS_HISTORY view contains column lineage information.

To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.

Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

If the Discovered.Electronic Mail Address tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.

Data source registration

After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.

By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.

Managing tags

Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

Immuta added the Discovered.Electronic Mail Address tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.

Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.

However the Discovered.Electronic Mail Address tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.

If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.

Sensitive data discovery

Snowflake lineage audit

Immuta audit records include Snowflake lineage tag events when a tag is added or removed.

The example audit record below illustrates the SNOWFLAKE_TAGS.pii tag successfully propagating from the Customer table to Customer 2:

{
  "id": "c8e020cb-232c-4ba9-a0d8-f3a84ba6808d",
  "dateTime": "1670355170336",
  "month": 1475,
  "profileId": 1,
  "userId": "immuta_system_account",
  "dataSourceId": 2,
  "dataSourceName": "Customer 2",
  "count": 1,
  "recordType": "nativeLineageDataSourceTagUpdate",
  "success": true,
  "component": "dataSource",
  "extra": {
    "sourceColumn": {
      "nativeColumnName": "\"MY_DATABASE\".\"PUBLIC\".\"CUSTOMER\".\"C_FIRST_NAME\"",
      "dataSourceId": 1,
      "columnName": "c_first_name"
    },
    "dataSourceId": 2,
    "columnName": "c_first_name",
    "tagPropagationDirection": "downstream",
    "tags": [
      {
        "name": "SNOWFLAKE_TAGS.pii",
        "source": "immuta-us-east-1"
      }
    ]
  },
  "newAuditServiceFields": {
    "actorIp": null,
    "sessionId": null
  },
  "createdAt": "2022-12-06T19:32:50.372Z",
  "updatedAt": "2022-12-06T19:32:50.372Z"
}

Limitations

Without tableFilter set, Immuta will ingest lineage for every table on the Snowflake instance.
Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.
The lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.
There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY view.
Immuta does not ingest lineage information for views.
Snowflake only captures lineage events for CTAS, CLONE, MERGE, and INSERT write operations. Snowflake does not capture lineage events for DROP, RENAME, ADD, or SWAP. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.
Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.

Configure a Databricks Unity Catalog Integration

Permissions

The following permissions and personas are used in the registration process.

Immuta user: An Immuta user with the APPLICATION_ADMIN Immuta permission must configure the Databricks Unity Catalog integration.
Databricks user: The Databricks user must have the following privileges.
- Account admin
- CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
- (only required if enabling query audit)
:
- USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources and USE SCHEMA on all schemas containing securables registered as Immuta data sources.
- MODIFY and SELECT on all securables registered as Immuta data sources. MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
- Optionally, to include audit, the service principal needs the following additional privileges:
  - USE CATALOG on system catalog
    USE SCHEMA on system.access schema
    SELECT on system.access.audit table
    SELECT on system.access.table_lineage table
    SELECT on system.access.column_lineage table

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
If you select single user access mode for your cluster, you must
- enable serverless compute for your workspace.

Unity Catalog best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

Create the Databricks service principal

Opt to enable query audit for Unity Catalog

- USE CATALOG on the system catalog
- USE SCHEMA on the system.access schema
- SELECT on the following system tables:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage

Configure the Databricks Unity Catalog integration

You have two options for configuring your Databricks Unity Catalog integration:

Automatic setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.

Manual setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.

Map Databricks users to Immuta

If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

Opt to enable Databricks Unity Catalog tag ingestion

Design partner preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirement: A Databricks Unity Catalog integration must be configured for tags to be ingested.

To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.

Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.

Register data

Databricks Unity Catalog Integration Reference Guide

Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function

Feature support

The Databricks Unity Catalog integration supports

- applying column masks and row filters on specific securable objects
- applying subscription polices on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
allowing non-Immuta reads and writes
using Photon
using a proxy server

Architecture

Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

immuta_system: Contains internal Immuta data.
immuta_policies_n: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

Workspace-catalog binding

Immuta’s Databricks Unity Catalog integration manages a single metastore per integration. Prior to catalog-binding, a Unity Catalog metastore in Databricks was available unrestricted across workspaces in Databricks, which made integrating against a metastore independent of the workspace attached to that metastore possible. However, when catalog-binding is enabled on a Databricks workspace, you can assign catalogs available in the Unity Catalog metastore to that workspace. As a result, Immuta cannot see all catalogs in a metastore when integrated with that workspace. This behavior is problematic for customers who use catalog isolation to separate environments or business units, as Immuta cannot see all the data that may need to be governed.

To avoid this issue,

Set up a dedicated Databricks workspace for Immuta that has catalog-binding disabled so that Immuta can see all data in the metastore.
Configure Immuta’s Databricks Unity Catalog integration with that workspace to govern all data in the metastore.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

The Unity Catalog integration supports the following policy types:

- Conditional masking
- Constant
- Custom masking
- Hashing
- Null (including on ARRAY, MAP, and STRUCT type columns)
- Rounding (date and numeric rounding)
- Matching (only show rows where)
  - Custom WHERE
  - Never
  - Where user
  - Where value in column
- Minimization
- Time-based restrictions

Project-scoped purpose exceptions for Databricks Unity Catalog

Databricks Unity Catalog views

If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

Masked joins for Databricks Unity Catalog

Policy exemption groups

Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.

The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.

Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

Authentication methods

The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

Immuta data sources in Unity Catalog

External data connectors and query-federated tables

Query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system tables:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

Tag ingestion

Design partner preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.

You can enable tag ingestion to allow Immuta to ingest Databricks Unity Catalog table and column tags so that you can use them in Immuta policies to enforce access controls. When you enable this feature, Immuta uses the credentials and connection information from the Databricks Unity Catalog integration to pull tags from Databricks and apply them to data sources as they are registered in Immuta. If Databricks data sources preexist the Databricks Unity Catalog tag ingestion enablement, those data sources will automatically sync to the catalog and tags will apply. Immuta checks for changes to tags in Databricks and syncs Immuta data sources to those changes every 24 hours.

Syncing tag changes

When syncing data sources to Databricks Unity Catalog tags, Immuta pulls the following information:

Table tags: These tags apply to the table and appear on the data source overview tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Column tags: These tags are applied to data source columns and appear on the columns listed in the data dictionary tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Table comments field: This content appears as the data source description on the data source details tab.
Column comments field: This content appears as dictionary column descriptions on the data dictionary tab.

Limitations

Only tags that apply to Databricks data sources in Immuta are available to build policies in Immuta. Immuta will not pull tags in from Databricks Unity Catalog unless those tags apply to registered data sources.
Cost implications: Tag ingestion in Databricks Unity Catalog requires compute resources. Therefore, having many Databricks data sources or frequently manually syncing data sources to Databricks Unity Catalog may incur additional costs.
Databricks Unity Catalog tag ingestion only supports tenants with fewer than 2,500 data sources registered.

Configuration requirements

Supported Databricks cluster configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

Example cluster

Databricks Runtime

Unity Catalog in Databricks

Databricks Spark integration

Databricks Unity Catalog integration

Cluster 1

9.1

Unavailable

Cluster 2

10.4

Unavailable

Cluster 3

11.3

Unavailable

Cluster 4

11.3

Cluster 5

11.3

Legend:

Unity Catalog caveats

Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
- regex with a global flag (supported): /^ssn|social ?security$/g
- regex without a global flag (unsupported): /^ssn|social ?security$/
- regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
- regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

Databricks change data feed support
Immuta project workspaces
Multiple IAMs on a single cluster
Column masking policies on views
Mixing masking policies on the same column
Row-redaction policies on views
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies (except for masking with NULL) on ARRAY, MAP, or STRUCT type columns
Shallow clones

Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

Manual Databricks Spark Configuration

Databricks Unity Catalog: If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the integration to create an Immuta-enabled cluster.

immuta_conf.xml is no longer required

The immuta_conf.xml file that was previously used to configure the Databricks Spark integration is no longer required to install Immuta, so it is no longer staged as a deployment artifact. However, you can use these snippets if you wish to deploy an immuta_conf.xml file to set properties.

The required Immuta base URL and Immuta system API key properties, along with any other valid properties, can still be specified as Spark environment variables or in the optional immuta_conf.xml file. As before, if the same property is specified in both locations, the Spark environment variable takes precedence.

If you have an existing immuta_conf.xml file, you can continue using it. However, it's recommended that you delete any default properties from the file that you have not explicitly overridden, or remove the file completely and rely on Spark environment variables. Either method will ensure that any property defaults changed in upcoming Immuta releases are propagated to your environment.

1 - Download and Configure Immuta Artifacts

Spark Version

Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.

Scroll to the release that corresponds to your Immuta version.
Download the .jar file (Immuta plugin) as well as the other scripts listed below, which will load the plugin at cluster startup.
```
allowedCallingClasses.json
immuta-benchmark-suite.dbc
immuta-spark-hive-X.X.X_YYYYMMDD-hadoop-Z.Z.Z-public.jar
immuta_cluster_init_script.sh
obscuredCommands.yaml
```
The immuta-benchmark-suite.dbc is a collection of notebooks packaged as a .dbc file. After you have added cluster policies to your cluster, you can import this file into Databricks to run performance tests and compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries. Note: Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.
Specify the following properties as Spark environment variables or in the optional immuta_conf.xml file. If the same property is specified in both locations, the Spark environment variable takes precedence. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com
- immuta.base.url: The full URL for the target Immuta tenant Ex: https://immuta.mycompany.com.

Environment variables with Google Cloud Platform

Do not use environment variables to set sensitive properties when using Google Cloud Platform. Set them directly in immuta_conf.xml.

Generating a key will destroy any previously generated HDFS keys. This will cause previously integrated HDFS systems to lose access to your Immuta console. The key will only be shown once when generated.

2 - Stage Immuta Artifacts

When configuring the Databricks cluster, a path will need to be provided to each of the artifacts downloaded/created in the previous step. To do this, those artifacts must be hosted somewhere that your Databricks instance can access. The following methods can be used for this step:

These artifacts will be downloaded to the required location within the clusters file-system by the init script downloaded in the previous step. In order for the init script to find these files, a URI will have to be provided through environment variables configured on the cluster. Each method's URI structure and setup is explained below.

AWS/S3

URI Structure: s3://[bucket]/[path]

Upload the configuration file, JSON file, and JAR file to an S3 bucket that the role from step 1 has access to.

Authenticating with Access Keys or Session Tokens (Optional)

If you wish to authenticate using access keys, add the following items to the cluster's environment variables:

IMMUTA_INIT_AWS_SECRET_ACCESS_KEY=<aws secret key>
IMMUTA_INIT_AWS_ACCESS_KEY_ID=<aws access key id>

If you've assumed a role and received a session token, that can be added here as well:

IMMUTA_INIT_AWS_SESSION_TOKEN=<aws session token>

Azure

ADL Gen 2

URI Structure: abfs(s)://[container]@[account].dfs.core.windows.net/[path]

Environment Variables:

If you want to authenticate using an account key, add the following to your cluster's environment variables:

IMMUTA_INIT_AZCOPY_CRED_TYPE=SharedKey
IMMUTA_INIT_ACCOUNT_NAME=<ADLg2 account name>
IMMUTA_INIT_ACCOUNT_KEY=<ADLg2 account key>

If you want to authenticate using an Azure SAS token, add the following to your cluster's environment variables:

IMMUTA_INIT_AZURE_SAS_TOKEN=<SAS token>

ADL Gen 1

URI Structure: adl://[account].azuredatalakestore.net/[path]

Environment Variables:

If authenticating as a Microsoft Entra ID user,

IMMUTA_INIT_AZURE_AD_USER=<Microsoft Entra ID username>
IMMUTA_INIT_AZURE_PASSWORD=<Microsoft Entra ID password>

If authenticating using a service principal,

IMMUTA_INIT_AZURE_SERVICE_PRINCIPAL=<azure service principal>
IMMUTA_INIT_AZURE_PASSWORD=<azure service principal password>
IMMUTA_INIT_AZURE_TENANT=<tenant ID where principal was created>

HTTPS

URI Structure: http(s)://[host](:port)/[path]

Artifacts are available for download from Immuta using basic authentication. Your basic authentication credentials can be obtained from your Immuta support professional.

Environment Variables (Optional)

IMMUTA_INIT_HTTPS_USER=<basic auth username>
IMMUTA_INIT_HTTPS_PASSWORD=<basic auth password>

DBFS

DBFS does not support access control

Any Databricks user can access DBFS via the Databricks command line utility. Files containing sensitive materials (such as Immuta API keys) should not be stored there in plain text. Use other methods described herein to properly secure such materials.

URI Structure: dbfs:/[path]

Since any user has access to everything in DBFS:

The artifacts can be stored anywhere in DBFS.
It's best to have a cluster-specific place for your artifacts in DBFS if you are testing to avoid overwriting or reusing someone else's artifacts accidentally.

3 - Protect Immuta Environment Variables with Databricks Secrets

Databricks secrets can be used in the Environment Variables configuration section for a cluster by referencing the secret path rather than the actual value of the environment variable. For example, if a user wanted to make the following value secret

MY_SECRET_ENV_VAR=super_secret_stuff

they could instead create a Databricks secret and reference it as the value of that variable. For instance, if the secret scope my_secrets was created, and the user added a secret with the key my_secret_env_var containing the desired sensitive environment variable, they would reference it in the Environment Variables section:

MY_SECRET_ENV_VAR={{secrets/my_secrets/my_secret_env_var}}

Then, at runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

Best practice: Replace sensitive variables with secrets

Immuta recommends that any sensitive environment variables listed below in the various artifact deployment instructions be replaced with secrets.

4 - Create and Configure the Cluster

Cluster creation in an Immuta-enabled organization or Databricks workspace should be limited to administrative users to avoid allowing users to create non-Immuta enabled clusters.

Select the Custom Access mode.
Opt to adjust the Autopilot Options and Worker Type settings. The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
In the Advanced Options section, click the Instances tab.

Click the Spark tab. In Spark Config field, add your configuration.

Cluster Configuration Requirements:

spark.executor.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
    -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
    -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.driver.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
    -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
    -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.databricks.repl.allowedLanguages python,sql
spark.databricks.pyspark.enableProcessIsolation true
spark.databricks.isv.product Immuta

# Specify the URI to the artifacts that were hosted in the previous steps
# The URI must adhere to the supported types for each service mentioned above
IMMUTA_INIT_JAR_URI=<Full URI to immuta-spark-hive.jar>
IMMUTA_INIT_CONF_URI=<Full URI to Immuta configuration file>
IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI=<full URI to allowedCallingClasses.json>
IMMUTA_INIT_OBSCURED_COMMANDS_URI=<full URI to obscuredCommands.yaml>

# (OPTIONAL)
# Specify an additional configuration file to be added to the spark.sparkContext.hadoopConfiguration.
# This file allows administrators to add sensitive configuration needed by the SparkSession that
# should not viewable by users.
# Further explanation of this variable as well as examples are provided below.
IMMUTA_INIT_ADDITIONAL_CONF_URI=<full URI to additional configuration file>

Click the Init Scripts tab and set the following configurations:
- Destination: Specify the service you used to host the Immuta artifacts.
- File Path: Specify the full URI to the immuta_cluster_init_script.sh.
- Add the new key/value to the configuration.
Click the Permissions tab and configure the following setting:
- Who has access: Users or groups will need to have the permission Can Attach To to execute queries against Immuta configured data sources.
(Re)start the cluster.

Additional Hadoop Configuration File (Optional)

As mentioned in the "Environment Variables" section of the cluster configuration, there may be some cases where it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration in order to read the data composing Immuta data sources.

As an example, when accessing external tables stored in Azure Data Lake Gen 2, Spark must have credentials to access the target containers/filesystems in ADLg2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access ADLg2.

The additional configuration file looks very similar to the Immuta Configuration file referenced above. Some example configuration files for accessing different storage layers are below.

Amazon S3

IAM role for S3 access

<configuration>
    <property>
        <name>fs.s3n.awsAccessKeyId</name>
        <value>[AWS access key ID]</value>
    </property>
    <property>
        <name>fs.s3n.awsSecretAccessKey</name>
        <value>[AWS secret key]</value>
    </property>
</configuration>

Azure Data Lake Gen 2

<configuration>
    <property>
        <name>fs.azure.account.key.[storage account name].dfs.core.windows.net</name>
        <value>[storage account key]</value>
    </property>
</configuration>

Azure Data Lake Gen 1

ADL prefix: Prior to Databricks Runtime version 6, the following configuration items should have a prefix of dfs.adls rather than fs.adl.

<configuration>
    <property>
        <name>fs.adl.oauth2.refresh.url</name>
        <value>https://login.microsoftonline.com/[directory ID]/oauth2/token</value>
    </property>
    <property>
        <name>fs.adl.oauth2.access.token.provider.type</name>
        <value>ClientCredential</value>
    </property>
    <property>
        <name>fs.adl.oauth2.credential</name>
        <value>[client secret from Azure]</value>
    </property>
    <property>
        <name>fs.adl.oauth2.client.id</name>
        <value>[client ID from Azure]</value>
    </property>
</configuration>

Azure Blob Storage

<configuration>
    <property>
        <name>fs.azure.account.key.[storage account name].blob.core.windows.net</name>
        <value>[storage account key]</value>
    </property>
</configuration>

5 - Register data

6 - Query Immuta Data

When the Immuta enabled Databricks cluster has been successfully started, users will see a new database labeled "immuta". This database is the virtual layer provided to access data sources configured within the connected Immuta instance.

Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster and GRANT the user access to the immuta database.

The following SQL query can be run as an administrator within a journal to give the user access to "Immuta":

%sql
GRANT SELECT,READ_METADATA ON DATABASE immuta TO `user@company.com`

%sql
select * from immuta.my_data_source limit 5;

%sql
select * from my_data_source limit 5;

Creating a Databricks Data Source

Databricks to Immuta User Mapping

By default, the IAM used to map users between Databricks and Immuta is the BIM (Immuta's internal IAM). The Immuta Spark plugin will check the Databricks username against the username within the BIM to determine access. For a basic integration, this means the users email address in Databricks and the connected Immuta tenant must match.

Environment Variables

This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended).

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Environment variable overrides

Properties in the config file can be overridden during installation using environment variables. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com

immuta.ephemeral.host.override
- Default: true
- Description: Set this to false if ephemeral overrides should not be enabled for Spark. When true, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.
immuta.ephemeral.host.override.httpPath
- Description: This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.
immuta.ephemeral.table.path.check.enabled
- Default: true
- Description: When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.
immuta.spark.acl.enabled
- Default: true
- Description: Immuta Access Control List (ACL). Controls whether Databricks users are blocked from accessing non-Immuta tables. Ignored if Databricks Table ACLs are enabled (i.e., spark.databricks.acl.dfAclsEnabled=true).
immuta.spark.acl.whitelist
- Description: Comma-separated list of Databricks usernames who may access raw tables when the Immuta ACL is in use.
immuta.spark.acl.privileged.timeout.seconds
- Default: 3600
- Description: The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is whitelisted in immuta.spark.acl.whitelist.
immuta.spark.acl.assume.not.privileged
- Default: false
- Description: Session property that overrides privileged user status when the Immuta ACL is in use. This should only be used in R scripts associated with spark-submit jobs.
immuta.spark.audit.all.queries
- Default: false
- Description: Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
immuta.spark.databricks.allow.non.immuta.reads
- Default: false
immuta.spark.databricks.allow.non.immuta.writes
- Default: false
immuta.spark.databricks.allowed.impersonation.users
- Description: This configuration is a comma-separated list of Databricks users who are allowed to impersonate Immuta users.
immuta.spark.databricks.dbfs.mount.enabled
- Default: false
- Description: Exposes the DBFS FUSE mount located at /dbfs. Granular permissions are not possible, so all users will have read/write access to all objects therein. Note: Raw, unfiltered source data should never be stored in DBFS.
immuta.spark.databricks.disabled.udfs
immuta.spark.databricks.filesystem.blacklist
- Default: hdfs
- Description: A list of filesystem protocols that this instance of Immuta will not support for workspaces. This is useful in cases where a filesystem is available to a cluster but should not be used on that cluster.
immuta.spark.databricks.jar.uri
- Default: file:///databricks/jars/immuta-spark-hive.jar
- Description: The location of immuta-spark-hive.jar on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.
immuta.spark.databricks.local.scratch.dir.enabled
- Default: true
- Description: Creates a world-readable/writable scratch directory on local disk to facilitate the use of dbutils and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable IMMUTA_LOCAL_SCRATCH_DIR. Note: Sensitive data should not be stored at this location.
immuta.spark.databricks.log.level
- Default Value: INFO
- Description: The SLF4J log level to apply to Immuta's Spark plugins.
immuta.spark.databricks.log.stdout.enabled
- Default: false
- Description: If true, writes logging output to stdout/the console as well as the log4j-active.txt file (default in Databricks).
immuta.spark.databricks.py4j.strict.enabled
- Default: true
- Description: Disable to allow the use of the dbutils API in Python. Note: This setting should only be disabled for customers who employ a homogeneous integration (i.e., all users have the same level of data access).
immuta.spark.databricks.scratch.database
- Description: This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a SHOW DATABASE query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a SHOW DATABASE query; it won't affect access to the scratch databases. Instead, use immuta.spark.databricks.scratch.paths to control read and write access to the underlying database paths.
  Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.
immuta.spark.databricks.scratch.paths
- Description: Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db for the default location).
  To create a scratch path to a location or a database stored at that location, configure
  To create a scratch path to a database created using the default location,
immuta.spark.databricks.scratch.paths.create.db.enabled
- Default: false
- Description: Enables non-privileged users to create or drop scratch databases.
immuta.spark.databricks.single.impersonation.user
- Default: false
- Description: When true, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.
immuta.spark.databricks.submit.tag.job
- Default: true
- Description: Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.
immuta.spark.databricks.trusted.lib.uris
immuta.spark.non.immuta.table.cache.seconds
- Default: 3600
- Description: The number of seconds Immuta caches whether a table has been exposed as a source in Immuta. This setting only applies when immuta.spark.databricks.allow.non.immuta.writes or immuta.spark.databricks.allow.non.immuta.reads is enabled.
immuta.spark.require.equalization
- Default: false
- Description: Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages.
immuta.spark.resolve.raw.tables.enabled
- Default: true
- Description: Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or whitelisted users can set immuta.spark.session.resolve.raw.tables.enabled to false to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.
immuta.spark.session.resolve.raw.tables.enabled
- Default: true
- Description: Same as above, but a session property that allows users to toggle this functionality. If users run set immuta.spark.session.resolve.raw.tables.enabled=false, they will see raw data only (not Immuta data policy-enforced data). Note: This property is not set in immuta_conf.xml.
immuta.spark.show.immuta.database
- Default: true
- Description: This shows the immuta database in the configured Databricks cluster. When set to false Immuta will no longer show this database when a SHOW DATABASES query is performed. However, queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.
immuta.spark.version.validate.enabled
- Default: true
- Description: Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to true, if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to false so that the versions don't get logged as incompatible and make the cluster unusable.
immuta.user.context.class
- Default: com.immuta.spark.OSUserContext
- Description: The class name of the UserContext that will be used to determine the current user in immuta-spark-hive. The default implementation gets the OS user running the JVM for the Spark application.
immuta.user.mapping.iamid
- Default: bim
- Description: Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (bim) but should be updated to reflect an actual production IAM.

Starburst (Trino) Integration Reference Guide

Starburst and Trino

The Starburst (Trino) integration allows you to access policy-enforced data directly in your Starburst catalogs without rewriting queries or changing workflows. Instead of generating policy-enforced views and adding them to an Immuta catalog that users have to query (like in the legacy Starburst (Trino) integration), Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.

Architecture

Rotating the Immuta API key

When you configure the integration, Immuta generates an API key for you to add to your Immuta access control properties file for API authentication between Starburst (Trino) and Immuta. You can rotate this shared secret to mitigate potential security risks and comply with your organizational policies.

Policy enforcement

When a user queries a table in Starburst (Trino), the Trino Execution Engine reaches out to the Immuta plugin to determine what the user is allowed to see:

masking policies: For each column, Starburst (Trino) requests a view expression from the Immuta plugin. If there is a masking policy on the column, the Immuta plugin returns the corresponding view expression for that column. Otherwise, nothing is returned.
row-level policies: For each table, Starburst (Trino) requests the rows a user can see in a table from Immuta. If there is a WHERE clause policy on the data source, Immuta returns the corresponding view expression as a WHERE clause. Otherwise, nothing is returned.

The Immuta plugin then requests policy information about the tables being queried from the Immuta Web Service and sends this information to the Trino Execution Engine. Finally, the Trino Execution Engine constructs the SQL statement, executes it on the backing tables to apply the policies, and returns the response to the user.

System access control providers

Users cannot bypass Immuta controls by changing roles in their system access control provider.

Multiple system access control providers can be configured in the Starburst (Trino) integration. This approach allows Immuta to work with existing Starburst (Trino) installations that already have an access control provider configured.

Immuta does not manage all permissions in Starburst (Trino) and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

If you have multiple access control providers configured, those providers interact in the following ways:

For a user to have access to a resource (catalog, schema, or a table), that user must have access in all of the configured access control providers.
In catalog, schema, or table filtering (such as show catalogs, show schemas, or show tables), the user will see the intersection of all access control providers. For example, if a Starburst (Trino) environment includes the catalogs public, demo, and restricted and one provider restricts a user from accessing the restricted catalog and another provider restricts the user from accessing the demo catalog, running show catalogs will only return the public catalog for that user.
Only one column masking policy can be applied per column across all system access control providers. If two or more access control providers return a mask for a column, Starburst (Trino) will throw an error at query time.
For row filtering policies, the expression for each system access control provider is applied one after the other.

Starburst (Trino) query passthrough

Starburst (Trino) query passthrough is available in most connectors using the query table function or raw_query in the Elasticsearch connector. Consequently, Immuta blocks functions named raw_query or query, as those table functions would completely bypass Immuta’s access controls.

For example, without blocking those functions, this query would access the public.customer table directly:

select * from table(postgres.system.query(query => 'select * from public.customer limit 10'));

Data flow

An Immuta Application Administrator configures the Starburst (Trino) integration, adding the ImmutaSystemAccessControl plugin on their Starburst (Trino) node.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Trino Execution Engine calls various methods on the interface to ask the ImmutaSystemAccessControl plugin where the policies should be applied. The masking and row-level security methods apply the actual policy expressions.
The Immuta System Access Control plugin calls the Immuta Web Service to retrieve policy information for that data source for the querying user, using the querying user's project, purpose, and entitlements.
The Immuta System Access Control plugin provides the SQL view expression (for masked columns) or WHERE clause SQL view expression (for row filtering) to the Trino Execution Engine.
The Trino Execution Engine constructs and executes the SQL statement on the backing catalogs and retrieves the data with appropriate policy enforcement.
User sees policy-enforced data.

Authentication methods

The Starburst (Trino) integration supports the following authentication methods to create data sources in Immuta:

Username and password: You can authenticate with your Starburst (Trino) username and password.

OAuth authentication for creating data sources

Configure JWT authentication method in Starburst (Trino)

When using OAuth authentication to create data sources in Immuta, configure your Starburst (Trino) cluster to use JWT authentication, not OpenID Connect or OAuth.

When users query a Starburst (Trino) data source, Immuta sends a username with the view SQL so that policies apply in the right context. Since OAuth authentication does not require a username to be associated with a data source upon data source creation, Immuta does not send a username and Starburst (Trino) queries fail. To avoid this error, you must configure a global admin username.

Supported Starburst (Trino) feature

Starburst (Trino)-created logical view support

The descriptions below provide guidance for applying policies to Starburst (Trino)-created logical views in the

However, there are other approaches you can use to apply policies to Starburst (Trino)-created logical views. The examples below are the simplest approaches.

Views created in the `DEFINER` security mode

For views created using the DEFINER security mode,

ensure the user who created the view is configured as an admin user in the Immuta plugin so that policies are never applied to the underlying tables.
create Immuta data sources and apply policies to logical views exposing those tables.
lock down access to the underlying tables in Starburst (Trino) so that all end user access is provided through the views.

Views created in the `INVOKER` security mode

Applying policies to views or tables

Avoid creating data policies for both a logical view and its underlying tables. Instead, apply policies to the logical view or the underlying tables.

For views created using the INVOKER security mode, the querying user needs access to the logical view and underlying tables.

If non-Immuta table reads are disabled, provide access to the views and tables through Immuta. To do so, create Immuta data sources for the view and underlying tables, and grant access to the querying user in Immuta. If creating data policies, apply the policies to either the view or underlying tables, not both.
If non-Immuta table reads are enabled, the user already has access to the table and view. Create Immuta data sources and apply policies to the underlying table; this approach will enforce access controls for both the table and view in Starburst (Trino).

Supported Immuta features

Query audit

In addition to the information included on the Starburst (Trino) Audit Logs page, the audit logs payload in the Starburst (Trino) integration includes immutaPlanningDuration, which represents the planning overhead in Immuta.

Multiple Starburst (Trino) integrations

You can configure multiple Starburst (Trino) integrations with a single Immuta tenant and use them dynamically. Configure the integration once in Immuta to use it in multiple Starburst (Trino) clusters. However, consider the following limitations:

Names of catalogs cannot overlap because Immuta cannot distinguish among them.
A combination of cluster types on a single Immuta tenant is supported unless your Trino cluster is configured to use a proxy. In that case, you can only connect either Trino clusters or Starburst clusters to the same Immuta tenant.

Policy caveat

Limit your masked joins to columns with matching column types. Starburst truncates the result of the masking expression to conform to the native column type when performing the join, so joining two masked columns with different data types produces invalid results when one of the columns' lengths is less than the length of the masked value.

For example, if the value of a hashed column is 64 characters, joining a hashed varchar(50) and a hashed varchar(255) column will not be joined correctly, since the varchar(50) value is truncated and doesn’t match the varchar(255) value.

Customize Read and Write Access Policies for Starburst (Trino)

Private preview: Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.

Requirements

Starburst (Trino) version 438 or newer
Write policies for Starburst (Trino) enabled. Contact your Immuta representative to get this feature enabled on your account.

Configuration options

Immuta web service configuration

Contact your Immuta representative to configure read and write access in the Immuta web service if all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to your Immuta tenant. A configuration example is provided below.

Configuration example

The following example maps WRITE to READ, WRITE and OWN permissions and READ to just READ. Both READ and WRITE permissions should always include READ:

Starburst cluster configuration

Configure the integration to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a Starburst cluster.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. For these permissions to apply, the user must be subscribed to the data source in Immuta and not be an administrator (who gets all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
Enable the Immuta access control plugin in the Starburst cluster's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

Trino cluster configuration

Create the Immuta access control configuration file in the Trino configuration directory (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. For these permissions to apply, the user must be subscribed to the data source in Immuta and not be an administrator (who gets all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,

Configure Starburst (Trino) Integration

The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

Starburst Cluster Configuration

Requirement

Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the Integration Type dropdown menu.
Click Save.

OAuth Authentication

If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, configure the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Starburst

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

If the hostnames in your certificate don't match the hostname specified in your Starburst (Trino) integration, you can set immuta.disable-hostname-verification to true in the Immuta access control config file to get the integration working in the interim.

The Starburst (Trino) integration uses the immuta.ca-file property to communicate with Immuta. When configuring the plugin in Starburst (outlined below), specify a path to your CA file using the immuta.ca-file property in the Immuta access control configuration file.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
The table below describes the properties that can be set during configuration.
Property
Starburst version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
immuta.apikey
392 and newer
Required
immuta.audit.legacy.enabled
392 and newer
Optional
This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Starburst (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

access-control.config-files=/etc/starburst/immuta-access-control.properties

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Starburst Users to Immuta

- All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Trino Cluster Configuration

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the dropdown menu.
Click Save.

OAuth Authentication

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Trino

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

Download the assets for the release.
Enable Immuta on your cluster. Select the tab below that corresponds to your installation method for instructions:

Docker installations

Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

immuta-trino Docker image

For Trino versions 414 and newer, an immuta-trino Docker image that includes the Trino plugin jars is available from registry.immuta.com. Before using this image, consider the following factors:

This image was designed to provide a method for customers to quickly set up and validate the integration, so it should be used in a development environment. Use the Docker installation method above for production environments.
Immuta only supports the Immuta Trino plugin on the Docker image, not any other software packaged on the image.
If you experience an issue with the image outside of the scope of the Immuta plugin, you must rebuild your own version of the image using the Docker installation method above.

To use this image,

Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:
```
docker run registry.immuta.com/immuta/immuta-trino:414
```
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

Standalone installations

Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.

Configure the properties described in the table below.

Property

Trino version

Required or optional

Description

access-control.name

392 and newer

Required

This property enables the integration.

access-control.config-files

392 and newer

Optional

Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

immuta.allowed.immuta.datasource.operations

413 and newer

Optional

immuta.allowed.non.immuta.datasource.operations

392 and newer

Optional

immuta.apikey

392 and newer

Required

immuta.audit.legacy.enabled

392 and newer

Optional

This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.

immuta.ca-file

392 and newer

Optional

This property allows you to specify a path to your CA file.

immuta.cache.views.seconds

392 and newer

Optional

Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.

immuta.cache.datasource.seconds

392 and newer

Optional

Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.

immuta.endpoint

392 and newer

Required

The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Trino (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.

immuta.filter.unallowed.table.metadata

392 and newer

Optional

When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.

immuta.group.admin

420 and newer

Required if immuta.user.admin is not set

This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

immuta.user.admin

392 and newer

Required if immuta.group.admin is not set

This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Trino Users to Immuta

- All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.