1 of 7

Additional Resources

App Settings Tutorial

Audience: Application Admins
Content Summary: This page details how to use the App Settings page to configure settings for Immuta for your organization.

Navigate to the App Settings Page

Click the App Settings icon in the left sidebar.
Click the link in the App Settings panel to navigate to that section.

Use Existing Identity Access Manager

See the identity manager pages for a tutorial to connect an Microsoft Entra ID, Okta, or OneLogin identity manager.

To configure Immuta to use all other existing IAMs,

Click the Add IAM button.
Complete the Display Name field and select your IAM type from the Identity Provider Type dropdown: LDAP/Active Directory, SAML, or OpenID.

Once you have selected LDAP/Active Directory from the Identity Provider Type dropdown menu,

Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Credentials and Options sections. Note: Either User Attribute OR User Search Filter is required, not both. Completing one of these fields disables the other.
Opt to have Case-insensitive user names by clicking the checkbox.
Opt to Enable Debug Logging or Enable SSL by clicking the checkboxes.
In the Profile Schema section, map attributes in LDAP/Active Directory to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account.
Opt to Enable scheduled LDAP Sync support for LDAP/Active Directory and Enable pagination for LDAP Sync. Once enabled, confirm the sync schedule written in Cron rule; the default is every hour. Confirm the LDAP page size for pagination; the default is 1,000.
Opt to Sync groups from LDAP/Active Directory to Immuta. Once enabled, map attributes in LDAP/Active Directory to automatically pull information about the groups into Immuta.
Opt to Sync attributes from LDAP/Active Directory to Immuta. Once enabled, add attribute mappings in the attribute schema. The desired attribute prefix should be mapped to the relevant schema URN.
Opt to enable External Groups and Attributes Endpoint, Make Default IAM, or Migrate Users from another IAM by selecting the checkbox.
Then click the Test Connection button.
Note: If you select Link SQL Account, you will need to update the Query Engine configuration.
Once the connection is successful, click the Test User Login button.
Click the Test LDAP Sync button if scheduled sync has been enabled.

Once you have selected SAML from the Identity Provider Type dropdown menu,

Take note of the ID and copy the SSO Callback URL to use as the ACS URL in your identity provider.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Client Options section.
Opt to Enable SCIM support for SAML by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in SAML to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account, Allow Identity Provider Initiated Single Sign On, Sync groups from SAML to Immuta, Sync attributes from SAML to Immuta, External Groups and Attributes Endpoint, or Migrate Users from another IAM by selecting the checkboxes, and then click the Test Connection button.
Once the connection is successful, click the Test User Login button.

Once you have selected OpenID from the Identity Provider Type dropdown menu,

Take note of the ID. You will need this value to reference the IAM in the callback URL in your identity provider with the format <base url>/bim/iam/<id>/user/authenticate/callback.
Note the SSO Callback URL shown. Navigate out of Immuta and register the client application with the OpenID provider. If prompted for client application type, choose web.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu.
Back in Immuta, enter the Client ID, Client Secret, and Discover URL in the form field.
Configure OpenID provider settings. There are two options:
1. Set Discover URL to the /.well-known/openid-configuration URL provided by your OpenID provider.
2. If you are unable to use the Discover URL option, you can fill out Authorization Endpoint, Issuer, Token Endpoint, JWKS Uri, and Supported ID Token Signing Algorithms.
If necessary, add additional Scopes.
Opt to Enable SCIM support for OpenID by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in OpenID to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Allow Identity Provider Initiated Single Sign On or Migrate Users from another IAM by selecting the checkboxes.
Click the Test Connection button.
Once the connection is successful, click the Test User Login button.

Immuta Accounts

To set the default permissions granted to users when they log in to Immuta, click the Default Permissions dropdown menu, and then select permissions from this list.

Link External Catalogs

See the External Catalogs page.

Enable External Masking

Deprecation notice

Support for this feature has been deprecated.

To enable external masking,

Navigate to the App Settings page and click External Masking in the left sidebar.
Click Add Configuration and specify an external endpoint in the External URI field.
Click Configure, and then add at least one tag by selecting from the Search for tags dropdown menu. Note: Tag hierarchies are supported, so tagging a column as Sensitive.Customer would drive the policy if external masking was configured with the tag Sensitive).
Select Authentication Method and then complete the authentication fields (when applicable).
Click Test Connection and then Save.

Add a Native Workspace

Select Add Workspace.
Use the dropdown menu to select the Workspace Type and refer to the corresponding tab below.

Best Practices: Read-only Access Recommended

It is best practice to use an AWS account with limited read-only access to the data in question, but not necessary.

Use the dropdown menu to select the Schema:

hdfs
1. Enter the Workspace Base Directory (any project workspaces created will be sub-directories of this path).
2. Click Test Workspace Directory.
3. Once the credentials are successfully tested, click Save.
s3a
1. Use the dropdown menu to select the AWS Region.
2. Enter the S3 Bucket.
3. Opt to enter the S3 Bucket Prefix.
4. Opt to Configure S3 Credentials.
  1. Use the dropdown menu to select Authentication Method, and enter the required information.
    AWS Access Key: Enter your AWS Access Key ID and AWS Secret Key. Required Permissions: s3:ListBucket, s3:GetObject, and s3:GetObjectTagging.
    AWS IAM Instance Role: Opt to Assume AWS IAM Instance Role if you have ListRoles IAM permission or enter the AWS IAM Role ARN manually.
5. Click Test Workspace Bucket.
6. Once the credentials are successfully tested, click Save.

Databricks Cluster Configuration

Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables). Otherwise, an error message will occur when users attempt to create a workspace.

Databricks API Token Expiration

The Databricks API Token used for native workspace access must be non-expiring. Using a token that expires risks losing access to projects that are created using that configuration.

Use the dropdown menu to select the Schema:

Required AWS S3 Permissions

When configuring a native workspace using Databricks with S3, the following permissions need to be applied to arn:aws:s3:::immuta-workspace-bucket/workspace/base/path/* and arn:aws:s3:::immuta-workspace-bucket/workspace/base/path Note: Two of these values are found on the App Settings page; immuta-workspace-bucket is from the S3 Bucket field and workspace/base/path is from the Workspace Remote Path field:

s3:Get*
s3:Delete*
s3:Put*
s3:AbortMultipartUpload

Additionally, these permissions must be applied to arn:aws:s3:::immuta-workspace-bucket Note: One value is found on the App Settings page; immuta-workspace-bucket is from the S3 Bucket field:

s3:ListBucket
s3:ListBucketMultipartUploads
s3:GetBucketLocation

Enter the Name.
Click Add Workspace
Enter the Hostname.
Opt to enter the Workspace ID (required with Azure Databricks).
Enter the Databricks API Token.
Use the dropdown menu to select the AWS Region.
Enter the S3 Bucket.
Opt to enter the S3 Bucket Prefix.
Click Test Workspace Bucket.
Once the credentials are successfully tested, click Save.

Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, Databricks API Token, and Storage Container.
Enter the Workspace Base Directory.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.

Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, and Databricks API Token.
Use the dropdown menu to select the Google Cloud Region.
Enter the GCS Bucket.
Opt to enter the GCS Object Prefix.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.

Add an Integration

Select Add Native Integration.
Use the dropdown menu to select the Integration Type:
- To enable Azure Synapse Analytics, see the Azure Synapse Analytics Integration page.
- To enable Starburst (Trino), see the Starburst (Trino) installation page.
- To enable Redshift, see the Redshift installation guide.
- To enable Snowflake, see the Snowflake integration guide.
- To enable Databricks Spark, see the Simplified Databricks Configuration guide.

Manage Data Providers

You can enable or disable the types of data sources users can create in this section. Some of these types will require you to upload an ODBC driver before they can be enabled. The list of currently supported drivers is on the ODBC Drivers page.

To enable a data provider,

Click the menu button in the upper right corner of the provider icon you want to enable.
Select Enable from the dropdown.

If an ODBC driver needs to be uploaded,

Click the menu button in the upper right corner of the provider icon, and then select Upload Driver from the dropdown.
Click in the Add Files to Upload box and upload your file.
Click Close.
Click the menu button again, and then select Enable from the dropdown.

Enable Email

Application Admins can configure the SMTP server that Immuta will use to send emails to users. If this server is not configured, users will only be able to view notifications in the Immuta console.

To configure the SMTP server,

Complete the Host and Port fields for your SMTP server.
Enter the username and password Immuta will use to log in to the server in the User and Password fields, respectively.
Enter the email address that will send the emails in the From Email field.
Opt to Enable TLS by clicking this checkbox, and then enter a test email address in the Test Email Address field.
Finally, click Send Test Email.

Once SMTP is enabled in Immuta, any Immuta user can request access to notifications as emails, which will vary depending on the permissions that user has. For example, to receive email notifications about group membership changes, the receiving user will need the GOVERNANCE permission. Once a user requests access to receive emails, Immuta will compile notifications and distribute these compilations via email at 8-hour intervals.

Initialize Kerberos

To configure Immuta to protect data in a kerberized Hadoop cluster,

Upload your Kerberos Configuration File, and then you can add modify the Kerberos configuration in the window pictured below.
Upload your Keytab File.
Enter the principal Immuta will use to authenticate with your KDC in the Username field. Note: This must match a principal in the Keytab file.
Adjust how often (in milliseconds) Immuta needs to re-authenticate with the KDC in the Ticket Refresh Interval field.
Click Test Kerberos Initialization.

Configure HDFS Cache Settings

To improve performance when using Immuta to secure Spark or HDFS access, a user's access level is cached momentarily. These cache settings are configurable, but decreasing the Time to Live (TTL) on any cache too low will negatively impact performance.

To configure cache settings, enter the time in milliseconds in each of the Cache TTL fields.

Set Public URLs

You can set the URL users will use to access the Immuta Application and Query Engine. Note: Proxy configuration must be handled outside Immuta.

Complete the Public Immuta URL, Public Query Engine Hostname, and Public Query Engine Port fields.
Opt to Enable SSL by clicking this checkbox.

Enable Sensitive Data Discovery

To enable Sensitive Data Discovery and configure its settings, see the Sensitive Data Discovery page.

Allow Policy Exemptions

Click the Allow Policy Exemptions checkbox to allow users to specify who can bypass all policies on a data source.

Default Subscription Merge Options

Immuta merges multiple Global Subscription policies that apply to a single data source; by default, users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND). To change the default behavior to allow users to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR),

Click the Default Subscription Merge Options text in the left pane.
Select the Default "allow shared policy responsibility" to be checked checkbox.
Click Save.

Note: Even with this setting enabled, Governors can opt to have their Global Subscription policies combined with AND during policy creation.

Configure Governor and Admin Settings

These options allow you to restrict the power individual users with the GOVERNANCE and USER_ADMIN permissions have in Immuta. Click the checkboxes to enable or disable these options.

Create Custom Permissions

You can create custom permissions that can then be assigned to users and leveraged when building subscription policies. Note: You cannot configure actions users can take within the console when creating a custom permission, nor can the actions associated with existing permissions in Immuta be altered.

To add a custom permission, click the Add Permission button, and then name the permission in the Enter Permission field.

Create Custom Data Source Access Requests

To create a custom questionnaire that all users must complete when requesting access to a data source, fill in the following fields:

Opt for the questionnaire to be required.
Key: Any unique value that identifies the question.
Header: The text that will display on reports.
Label: The text that will display in the questionnaire for the user. They will be prompted to type the answer in a text box.

To create a custom message for the login page of Immuta, enter text in the Enter Login Message box. Note: The message can be formatted in markdown.

Opt to adjust the Message Text Color and Message Background Color by clicking in these dropdown boxes.

Generate System API Key

Click the Generate Key button.
Save this API key in a secure location.

Prevent Automatic Table Statistics

Without Fingerprints Some Policies Will Be Unavailable.

Disabling the collection of statistics will prevent the Immuta Query Engine cost-based optimizer from correctly estimating query plan costs. Additionally, these policies will be unavailable until a data owner manually generates a fingerprint:

Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response

To disable the automatic collection of statistics with a particular tag,

Use the Select Tags dropdown to select the tag(s).
Click Save.

SQL Credential Password Requirements

Users can add password requirements for SQL credentials in this section, including minimum length, maximum length, and minimum number of symbols, numbers, uppercase, and lowercase characters. This will set password requirements for any new SQL credentials or any updated passwords. Previous Query Engine accounts will not be subject to the newly applied password restrictions.

To ensure that users are not easily compromised, minimum password requirements have been added as default. The password requirements can be changed to fit certain standards.

Advanced Settings

Preview Features

If you enable any Preview features, please provide feedback on how you would like these features to evolve.

Policy Adjustments

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Click Save.

Health Expert Determination

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Check the Enable Health Expert Determination checkbox.
Click Save.

Complex Data Types

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Allow Complex Data Types checkbox.
Click Save.

Enhanced Subscription Policy Variables

For instructions on enabling this feature, navigate to the Global Subscription Policies Advanced DSL Tutorial.

Advanced Configuration

Advanced configuration options provided by the Immuta Support team can be added in this section. The configuration must adhere to the YAML syntax.

Update the K-Anonymity Cardinality Cutoff

To increase the default cardinality cutoff for columns compatible with k-anonymity,

Expand the Advanced Settings section and add the following text to the Advanced Configuration:

plugins:
  postgresHandler:
    maxKAnonCardinality: 10000000
  snowflakeHandler:
    maxKAnonCardinality: 10000000

Click Save.
To regenerate the data source's fingerprint, navigate to that data source's page.
Click the Status in the upper right corner.
Click Re-run in the Fingerprint section of the dropdown menu.

Note: Re-running the fingerprint is only necessary for existing data sources. New data sources will be generated using the new maximum cardinality.

Update the Time to Webhook Request Timeouts

Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of seconds before webhook requests timeout. For example use 30 for 30 seconds. Setting it to 0 will result in no timeout.
```
webhookIntegrationResponseTimeout: 30
```
Click Save.

Update the Audit Ingestion Expiration

Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of minutes before an audit job expires. For example use 300 for 300 minutes.
```
plugins:
  auditService:
    ingestionJob:
      expirationInMinutes: 300
```
Click Save.

Turn Off Query Text Transfer

By default, Immuta includes the query text in audit records, which requires it to transfer from the data platform to Immuta. To turn off this feature and stop audit records from including query text,

Expand the Advanced Settings section and add the following text to the Advanced Configuration:
```
featureFlags:
  AuditExcludeQueryText: true
```
Click Save.

Administer Features

The Query Engine grants Immuta user accounts proxied data query access to Immuta data sources through the Immuta API and the Query Editor in the Immuta UI.

Application Administrators can turn off the Query Engine to ensure data does not leave a data localization zone when authorized users access the Immuta Application outside data jurisdiction.

To disable this feature,

Click Advanced Settings in the left panel, and scroll to the Administer Features section.
Select the Disable radio button and click Save.
Click Confirm to deploy your changes.

Deploy Configuration Changes

When you are ready to finalize your configuration changes, click the Save button at the bottom of the left panel, and then click Confirm to deploy your changes.

Configure the Immuta PostgreSQL Instance

Audience: Application Admins
Content Summary: Immuta uses the tool to manage streaming replication of the PostgreSQL database. You need to interact with the Patroni API in order to change PostgreSQL setting.
This page outlines how to change the PostgreSQL settings, connect to the database container, modify the configuration, and apply changes and restart the cluster.

Change PostgreSQL Settings

The easiest way to interact with the Patroni API is using the tool patronictl that is installed in the Immuta database docker containers. Updating the PostgreSQL settings involves 3 processes:

Connect to the Database Container

Use kubectl to determine the name of the pod running the PostgreSQL master:
For the metadata database:
For the Immuta Query Engine database:
Make note of the pod name, which will be used when connecting:

The following steps should be executed from within this context.

Modify the Configuration

Once inside the database container, you can run patronictl to modify the configuration.
A few environment variables must be exported for the patronictl edit-config command to run successfully:
The following command will then open up the PostgreSQL configuration in vi for editing.
Optional Flags: edit-config can be used with other flags set to specific values
- -q, --quiet: Do not show changes.
- -s, --set: With additional text, this parameter will set a specific configuration value. Can be specified multiple times.
- -p, --pg: With additional text, this parameter will set the specific PostgreSQL parameter value. Can be specified multiple times.
Make any changes, and then close the vi session by saving the configuration (Type <ESC>:wq).
You will be asked if you would like to apply the changes; type y and press enter.

Apply the Changes and Restart the Cluster

Now that changes have been made you can push these changes out using patronictl restart.

Get the Patroni cluster name:
The cluster name is the first column in the result. There should be only one unique value. Use this cluster name in the call to patronictl restart.
If you have modified the value of max_connections, then you should use the following command to restart the master instance only; the changes will propagate to the replicas automatically:
If you have not modified the value of max_connections you can simply run the following command:

Enable ImmutaGroupsMapping

Audience:System Administrators
Content Summary: This page explains the benefits of enabling ImmutaGroupsMapping. It also gives the prerequisites and configurations required to enable it.

Introduction

Hadoop has the concept of a , which is a way to retrieve groups corresponding to a provided user/Kerberos principal. By default, Hadoop services (HDFS, Hive, Impala, etc.) will retrieve a user's groups from the local OS by way of the ShellBasedUnixGroupsMapping class.

With Immuta, this data can be enriched to include the user's current project. This can be helpful for systems where current project information could help provide access to data. For example, in Impala it is possible to GRANT access to a database or tables based on a user's membership in an Immuta project group. This way a user could read information from tables only when acting in the target project.

Group Naming

Immuta project group names are simply immuta_project_<project_id> where project_id is just the Immuta project's ID.

Configuration Prerequisites

In Impala it is important that the auth_to_local setting is enabled in order to map Kerberos principals to short usernames in order to properly retrieve groups from Immuta for the corresponding principal. For example, Impala should map bob/my.host.name@REALM to bob in order to properly map bob to the corresponding Immuta user account to determine the current project group (if any) for bob.
If Immuta HDFS Native Workspaces are being created on the target cluster, then the Immuta Partition Service principal needs to be a Sentry Admin user in order to CREATE databases and roles for use by Immuta.
If administrators want to allow users to CREATE non-data-source tables in the workspace database, the immuta.workspace.allow.create.table configuration option must be set to true for the Partition Service in generator.xml. It is also recommended that Sentry Object Ownership is enabled and set to ALL in this scenario, which allows users to DROP their own tables. If Object Ownership is not enabled, users will not be able to DROP tables and a Sentry Admin would need to clean up old tables.
For Hive, it is required that the ImmutaGroupsMapping service jar is added to the classpath for YARN applications. This can be done by updating the yarn.application.classpath configuration value in yarn-site.xml. In Cloudera manager the value /opt/cloudera/parcels/IMMUTA/lib/immuta-group-mapping.jar should be added under YARN Application Classpath in Yarn's Cloudera Manager configuration page. Note that if you are using a non-standard parcel directory, you should replace /opt/cloudera/parcels/ with your custom directory.

Check the Existing Group Mapping Service

It's a good idea to start by checking the existing Group Mapping Service in the configuration item hadoop.security.group.mapping. If this is not found in your configuration, the default is org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.

To use ImmutaGroupsMapping alongside another Group Mapping Service, there is an implementation called org.apache.hadoop.security.CompositeGroupsMapping. This Group Mapping Service takes the results of multiple Group Mapping Services and combines them. If CompositeGroupsMapping is already being used before adding ImmutaGroupsMapping, simply add ImmutaGroupsMapping as another provider in configuration. This will be detailed below.

Configuration to Add

To enable the ImmutaGroupsMapping, the following configuration needs to be added to Hadoop XML configuration for any target systems where the groups mapping should be applied.

If this should be applied across the cluster it should be added to the system-wide core-site.xml file.
If it is only being applied to a single system (Impala for instance), then it should be added to an XML file specifically for Impala.

Best Practice: Group Mapping Service

The group mapping service should be applied only to target systems requiring Immuta project groups to be determined for context-aware permissions. This can typically be limited to Hive and/or Impala. In Cloudera Manager this configuration can be added to Impala Daemon Advanced Configuration Snippet (Safety Valve) for core-site.xml and/or Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml.

Configuration Snippet

The following configuration shows the ImmutaGroupsMapping provider being used alongside the JniBasedUnixGroupsMappingWithFallback provider.

If the group mapping service is being configured for a specific service (i.e., Hive or Impala), it is critical that immuta.system.api.key is also configured for that target service. The ImmutaGroupsMapping provider requires the system API key in order to retrieve user group details. Add something like the following to the properties defined above.

Caching Considerations

By default, Hadoop's group services cache the retrieved groups for 5 minutes. This may not be desirable for Immuta deployments using group mapping because switching project contexts would then take up to 5 minutes to have an effect on the cluster. In order to lower this cache time, add the following configuration to the same file as above:

Hadoop and Spark Plugin Configuration

Audience: System Administrators
Content Summary: This page outlines the on-cluster configurations for Immuta's Hadoop and Spark plugins. Most of these values are consistent across Hadoop providers; however, some values are provider-specific. To learn more about provider-specific deployments, see the installation guides for Cloudera and Amazon EMR.

Components

Immuta NameNode Plugin

The NameNode plugin runs on each HDFS NameNode as the hdfs user. It will have access to any configuration items available to HDFS clients as well as potentially additional configuration items for the NameNode only. The configuration for the NameNode plugin can be placed in an alternate configuration file (detailed below) to avoid leaking sensitive configuration items.

The NameNode plugin configurations can be set in core-site.xml and hdfs-site.xml (for NameNode-specific values).

Immuta Vulcan Service

The Vulcan Service is an Immuta service that is mostly relevant to Spark applications. It has its own configuration file (generator.xml) and also reads all system-wide/client configuration for Hadoop (core-site.xml).

Hadoop Clients

Clients of HDFS/Hadoop services are Spark jobs, MapReduce jobs, and other user-driven applications in the Hadoop ecosystem. The configuration items for clients can be provided system-wide in core-site.xml or configured per-job (typically) on the command line or in application/job configuration.

Spark Applications

There is an additional generator.xml file that is created for Spark applications only that contains connection information for the Vulcan Service. Immuta configuration can also be added to spark-defaults.conf or system-wide application to Spark jobs. Unless otherwise stated, items in spark-defaults.conf should be prefixed with spark.hadoop. because they are read from Hadoop configuration.

Public NameNode and Hadoop Client Configuration

Public configuration is not sensitive, and is shared by client libraries such as ImmutaApiKeyAuth and the NameNode plugin (as well as potentially other Immuta and non-Immuta services on the cluster). These configuration items should be in a core-site.xml file distributed across the cluster and readable by all users.

immuta.generated.api.key.dir
- Default: /user
- Description: The base directory under which the NameNode plugin will look for generated API keys for use with the Immuta Web Service. The default value is user with the username and .immuta_generated added to the end so that each user has their own generated API key directory and the .immuta_generated directory adds an additional layer of protection so other users can't listen on the /user/<username> directory to wait for API keys to be generated. This configuration item should never point at a non-HDFS path because attempting to generate credentials outside of HDFS is invalid. This item should be in sync between the NameNode plugin's configuration and client configuration.
immuta.credentials.dir
- Default: /user
- Description: A directory which will be used to store each user's Immuta API key and token for use with the Immuta Web Service. The user's API key and token are stored this way to avoid re-authenticating frequently with the web service and introducing additional overhead to processes like MapReduce and Spark. Similar to the generated API key directory, this configuration item defaults to /user with the username of the current user added on. Each user should have a directory under the credentials directory for storing their own credentials. NOTE: It is valid for a user to provide and save their own API key in /user/<username>/immuta_api_key so that their code does not attempt to generate an API key. It is also valid to override this value with a non-HDFS path in case HDFS is not being used (Spark in a non-HDFS environment, for example); e.g., file:///home/ would point to file:///home/<username>/immuta_api_key with the user's API key file.
immuta.base.url
- Description: The URL at which the Immuta API can be reached. This should be the base URL of the Immuta API.
fs.immuta.impl
- Description: This configuration allows users to access the immuta:// scheme in order to have their filesystem built in the same way that the Immuta FUSE filesystem is built. This filesystem is also used in Spark deployments, which read data from external object storage (e.g., S3). This means that users will have consistent filesystem views regardless of where they are accessing Immuta. This is not set by default and must be set to com.immuta.hadoop.ImmutaFileSystem system-wide in core-site.xml.
immuta.cluster.name
- Default: hostname from fs.defaultFS
- Description: This configuration item identifies a cluster to the Immuta Web Service. This is very important because it determines how file access is controlled in HDFS by the NameNode plugin and which data sources are available to a cluster. The default value is taken from fs.defaultFS and administrators should be advised that when an organization has multiple HA HDFS clusters it is possible that they all have the same nameservice name, so this value should be set on each cluster for identification purposes.
immuta.api.key
- Description: (CLIENT ONLY) Users can configure their own API key when running jobs or interacting with an HDFS client, but if an API key is not configured for the user it will be generated on the first attempt to communicate with the Immuta service and stored securely in their credential directory (described above). Immuta uses the Configuration.getPassword() method to retrieve this configuration item, so it may also be set using the Hadoop CredentialProvider API.
immuta.permission.fallback.class
- Default: org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider (HDFS 2.6.x/CDH), org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider (HDFS 2.7+)
- Sentry: org.apache.sentry.hdfs.SentryINodeAttributesProvider (HDFS 2.7+)
- Description: The configuration key for the fully qualified class name of the fallback permission checking class that will be used after the Immuta authorization or inode attribute provider.
immuta.permission.allow.fallback
- Default: false
- Description: Denotes the action that the Immuta permission checking classes will take when a user is forbidden access to data in Immuta. If set to true every time a user is denied access to a file via Immuta their permissions will be checked against the underlying default permission checker, potentially meaning that they will still have access to data that they cannot access via Immuta.
immuta.permission.users.to.ignore
- Default: hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,immuta
- Description: CSV list of users that will not ever have their HDFS file accesses checked in Immuta. This should include any system superusers to avoid overhead of checking permissions in Immuta that should not be relevant.
immuta.permission.groups.to.ignore
- Description: Same as immuta.permission.users.to.ignore but for groups.
immuta.permission.users.to.enforce
- Description: A comma delimited list of users that must go through Immuta when checking permissions on HDFS files. If this configuration item is set, then fallback authorizations will apply to everyone by default, unless they are on this list. If a user is on both the enforce list and the ignore list, then their permissions will be checked with Immuta (i.e., the enforce configuration item takes precedence).
immuta.permission.groups.to.enforce
- Description: Same as immuta.permission.users.to.enforce but for groups.
immuta.permission.paths.to.enforce
- Description: A comma delimited list of paths to ignore when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will use fallback authorizations and not go through Immuta. All other paths will be checked with Immuta. Setting both immuta.permission.paths.to.ignore and immuta.permission.paths.to.enforce properties at the same time is unsupported.
immuta.permission.paths.to.ignore
- Description: A comma delimited list of paths to enforce when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will be checked in Immuta. All other paths will use fallback authorizations. WARNING: Setting this property effectively disables Immuta file permission checking for all paths not in this configuration item. Setting both immuta.permission.paths.to.ignore and immuta.permission.paths.to.enforce properties at the same time is unsupported.
immuta.system.details.cache.timeout.seconds
- Default: 1800
- Description: The number of seconds to cache system detail information from the Immuta Web Service. This should be high since, ideally, the relevant values in Immuta configuration won't change often (or ever).
immuta.permission.workspace.ignored.users
- Default: hive,impala
- Description: Comma-delimited list of users that should be ignored when accessing workspace directories. This should never have to change since the default Hive and Impala principals are covered, but this can be modified in case of non-standard configuration. This list is separate from the ignored user list above because we do not want to allow access to ignored non-system users who may be operating on a cluster with Immuta installed but who should not be allowed to see workspace data. This should be limited to the principals for Hive and Impala.

NameNode-only Configuration

The following configuration items are only relevant to the NameNode plugin. These are typically set somewhere like hdfs-site.xml and for the most part they are not sensitive. There are some highly sensitive configuration items, and those should be set in such a way that only the NameNode process has the ability to read them. Immuta provides one solution for this: have an additional NameNode plugin configuration file that must be configured elsewhere (such as hdfs-site.xml) and is only readable by the hdfs user. This will be detailed below.

immuta.extra.name.node.plugin.config
- Description: Path to Hadoop-style XML configuration file containing items that will be used by the Immuta NameNode plugin. This item helps to configure sensitive information in a way that will only be readable by the hdfs user to avoid leaking sensitive configuration to other users. This should be in the form file:///path/to/file.xml.
immuta.system.api.key
- Description: HIGHLY SENSITIVE. This configuration item is used by the NameNode plugin (and the Vulcan Service) to access privileged endpoints of the Immuta API. This is a required configuration item for both the NameNode plugin and Vulcan Service.
immuta.no.data.source.cache.timeout.seconds
- Default: 60
- Description: The amount of time in seconds that the NameNode plugin will cache the fact that a specific path is not a part of any Immuta data sources.
immuta.hive.impala.cache.timeout.seconds
- Default: 60
- Description: The amount of time in seconds to cache the fact that a user is subscribed to a Hive or Impala data source containing the target file they are attempting to access.
immuta.canisee.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds to cache the access result from Immuta for a user/path pair.
immuta.specific.access.cache.timeout
- Default: 10
- Description: The amount of time to temporarily unlock a file in HDFS for a user using temporary access tokens with files backing Hive and Impala data sources in Spark.
immuta.data.source.cache.timeout.seconds
- Default: 300
- Description: The amount of time in seconds that users' subscribed data sources should be cached in memory to avoid reaching out to Immuta for data sources over and over. Relevant to the Immuta Hadoop client FileSystem and Spark jobs.
immuta.canisee.metastore.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the NameNode plugin will cache the fact that a path belongs to a Metastore (Impala or Hive) data source. Reduces network calls from NameNode to Immuta when Vulcan is accessing paths belonging to Metastore sources.
immuta.canisee.non.user.cache.timeout.seconds
- Default: 30
- Description: The amount of time that the NameNode plugin will cache that a user principal does not belong to an Immuta user. This is useful if the ignored/enforced users/groups configurations are not being used so that when the NameNode receives a 401 response from the canisee endpoint it will store that information and not retry canisee requests to Immuta during that time.
immuta.canisee.num.retries
- Default: 1
- Description: The number of times to retry access calls from the NameNode plugin to Immuta to account for network issues.
immuta.project.user.cache.timeout.seconds
- Default: 300
- Description: The amount of time in seconds that the ImmutaGroupsMapping will cache whether or not a principal is tied to an Immuta user account. This decreases the number of calls from HDFS to Immuta when there are accounts that are not tied to Immuta.
immuta.project.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the ImmutaGroupsMapping will cache project and workspace information for a given project ID. This is also the amount of time a user's current project will be cached.
immuta.project.forbidden.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the ImmutaCurrentProjectHelper will cache the fact that a principal tied to an Immuta user is being forbidden from using their current project.
immuta.workspace.deduplication.timeout.seconds
- Default: 60
- Description: The amount of time to wait before auditing duplicate workspace filesystem actions from HDFS. This is the amount of time the NameNode plugin will wait before a user reading or writing the same path will have duplicate audit records written to Immuta.
immuta.permission.system.details.retries
- Default: 5
- Description: The number of times the system details background worker will attempt to retrieve system details from the Immuta web service if an attempt fails.
immuta.permission.source.cache.enabled
- Default: false
- Description: Denotes whether a background thread should be started to periodically cache paths from Immuta that represent Immuta-protected paths in HDFS. Enabling this increases NameNode performance because it prevents the NameNode plugin from calling the Immuta web service for paths that do not back HDFS data sources.
immuta.permission.source.cache.timeout.seconds
- Default: 300
- Description: The time between calls to sync/cache all paths that back Immuta data sources in HDFS.
immuta.permission.source.cache.retries
- Default: 5
- Description: The number of times the data source cache background worker will attempt to retry calls to Immuta on failure.
immuta.permission.request.retries
- Default: 5
- Description: The number of retries that the NameNode plugin will attempt for any blocking web request between HDFS and the Immuta API.
immuta.permission.request.initial.delay.milliseconds
- Default: 250
- Description: The initial delay for the BackoffRetryHelper that the NameNode plugin will employ for any retries of blocking web requests between HDFS and the Immuta API.
immuta.permission.request.socket.timeout
- Default: 1500
- Description: The time in milliseconds that the NameNode plugin will wait before cancelling a request to the Immuta API if no data has been read from the HTTP connection. This applies to blocking requests only.
immuta.permission.workspace.base.path.override
- Description: This configuration item can be set so that the NameNode does not have to retrieve Immuta HDFS workspace base path periodically from the Immuta API.

Spark Application Configuration

The following items are relevant to any Immuta Spark applications using the ImmutaSparkSession or ImmutaContext.

immuta.spark.data.source.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that data source information will be cached in the user's Spark job. This reduces the number of times the client will need to refresh data source information.
immuta.spark.sql.account.expiration
- Default: 2880
- Description: The amount of time in seconds that temporary SQL account credentials will be valid that are created by the Immuta Spark plugins for accessing data sources via Postgres over JDBC.
immuta.postgres.fetch.size
- Default: 1000
- Description: The JDBC fetch size used for data sources accessed via Postgres over JDBC.
immuta.postgres.configuration
- Description: The configuration key for any extra JDBC options that should be appended to the Immuta Postgres connection by the Immuta SQL Context. An example would include sslfactory=org.postgresql.ssl.NonValidatingFactory to turn off SSL validation.
immuta.enable.jdbc
- Default: false
- Description: If true, allows the user's Spark job to make queries to Immuta's Postgres instance automatically when we detect that the data source is not on cluster and we must pull data back via PG. This can be set per-job, but defaults to false to prevent a user from accidentally (and unknowingly) pulling huge amounts of data over JDBC.
immuta.ephemeral.host.override
- Default: true
- Description: Set this to false if ephemeral overrides should not be enabled for Spark. When true this will automatically override ephemeral data source host names with an auto-detected host name on cluster that should be running HiveServer2. It is assumed HiveServer2 is running on the NameNode.
immuta.ephemeral.host.override.address
- Description: This configuration item can be used if automatic detection of Hive's hostname should be disabled in favor of a static hostname to use for ephemeral overrides. This is useful for when your cluster is behind a load balancer or proxy.
immuta.ephemeral.host.override.name-node
- Description: In an HA cluster it may be a good idea to specify the NameNode on which Hive is running for ephemeral overrides. This should contain the NameNode from configuration that is hosting HiveServer2.
immuta.secure.truststore.enabled
- Default: false
- Description: Enables TLS truststore verification. If enabled without a custom truststore it will use the default.
immuta.secure.truststore
- Description: Location of the truststore that contains the Immuta Web Service certification.
immuta.secure.truststore.password
- Description: Password for the truststore that contains the Immuta Web Service certification.
immuta.spark.visibility.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds the ImmutaContext or ImmutaSparkSession will cache visibilities from Immuta. Maximum of 30 seconds.
immuta.spark.visibility.read.timeout.seconds
- Default: 300
- Description: The socket read timeout for visibility calls to Immuta.
immuta.spark.audit.retries
- Default: 2
- Description: The number of times to retry audit calls to Immuta from Spark.
immuta.masked.jdbc.optimization.enabled
- Default: true
- Description: Enables push down filters to postgres. This should only be changed to false if the user is joining to a non-Spark data source (in PostgreSQL) on a masked column.

Vulcan Service Configuration

The following configuration items are needed by the Immuta Vulcan Service. Some of these items are also shared with the NameNode plugin as they work in tandem to protect data in HDFS.

immuta.meta.store.token.dir
- Default: /user/<Vulcan Service user>/tokens
- Description: The directory in which temporary access tokens for HDFS files backing Hive/Impala data sources will be stored. This needs to be configured for the NameNode plugin as well in order to unlock files in HDFS.
immuta.meta.store.remote.token.dir
- Default: /user/<Vulcan Service user>/remotetokens
- Description: The directory in which temporary access tokens for remote/object storage (S3, GS, etc) files backing Hive/Impala data sources will be stored.
immuta.spark.partition.generator.user
- Default: immuta
- Description: The username of the user that will be running the Vulcan Service. This should also be the short username of the Kerberos principal running the Vulcan Service.
immuta.secure.partition.generator.hostname
- Default: localhost
- Description: The interface/hostname that clients will use to communicate with the Vulcan Service.
immuta.secure.partition.generator.listen.address
- Default: 0.0.0.0
- Description: The interface/hostname on which the Vulcan Service will listen for connections.
immuta.secure.partition.generator.port
- Default: 9070
- Description: The port on which the Vulcan Service will listen for connections.
immuta.configuration.id.file.config
- Default: hdfs:///user/<Vulcan Service user>/config_id
- Description: The file in HDFS where the cluster configuration ID will be stored. This is used to keep track of the unique ID in Immuta tied to the current cluster.
immuta.secure.partition.generator.keystore
- Description: Path the keystore file to be used for securing Vulcan Service with TLS.
immuta.secure.partition.generator.keystore.password
- Description: The password for the keystore configured with immuta.secure.partition.generator.keystore.
immuta.secure.partition.generator.keymanager.password
- Description: The configuration key for the key manager password for the keystore configured with immuta.secure.partition.generator.keystore.
immuta.secure.partition.generator.url.external
- Default: <NameNode / master hostname>:<Vulcan Service port>
- Description: The configuration key for specifying the externally addressable Vulcan Service URL. This URL must be reachable from the Immuta web app. If this is not set, the Vulcan Service will try to determine the URL based on its Hadoop configuration.
immuta.yarn.validation.params
- Default: /user/<Vulcan Service user>/yarnParameters.json
- Description: The file containing parameters to use when validating YARN applications for secure token generation for file access. When a Spark application requests tokens be generated for file access, the Vulcan Service will validate that the Spark application is configured properly using the parameters from this file.
immuta.emrfs.credential.file.path
- Description: For EMR/EMRFS only. This configuration points to a file containing AWS credentials that the Vulcan Service can use for accessing data in S3. This is also useful for Hive/the hive user so that (if impersonation is turned on) only a few users (hive and the Vulcan Service user) on cluster can access data in S3 while everyone else is forced through the Vulcan Service.
immuta.workspace.allow.create.table
- Default: false
- Description: True if the user should be allowed to create workspace tables. Users will not be able to drop their created tables if sentry object ownership is not set to ALL.
immuta.partition.tokens.ttl.seconds
- Default: 3600
- Description: How long in seconds Immuta temporary file access tokens should live in HDFS before being cleaned up.
immuta.partition.tokens.interval.seconds
- Default: 1800
- Description: Number of seconds between runs of the token cleanup job which will delete all expired temporary file access tokens from HDFS.
immuta.scheduler.heartbeat.enable
- Default: true
- Description: True to enable sending configuration to the Immuta Web Service and updating on an interval. This can be set to false to prevent this cluster from being available in the HDFS configurations dropdown for HDFS data sources as well as prevent it from being used for workspaces. This make sense for ephemeral (EMR) clusters.
immuta.scheduler.heartbeat.initial.delay.seconds
- Default: 0
- Description: When starting the Vulcan Service, how long in seconds to wait before first sending configuration to the Immuta Web Service.
immuta.scheduler.heartbeat.interval.seconds
- Default: 28800
- Description: How long in seconds to wait between each configuration update submission to the Immuta Web Service.
immuta.file.session.store.expiration.seconds
- Default: 900
- Description: Number of seconds that idle remote file sessions will be kept active in the Vulcan Service. This is for spark clients that are reading remote data (S3, GS) via the Vulcan Service.
immuta.file.session.status.expiration.seconds
- Default: 300
- Description: Number of seconds that the Vulcan Service will cache file statuses from remote object storage.
immuta.file.session.status.max.size
- Default: 250
- Description: Maximum number of file status objects that the Vulcan Service will cache at one time.
immuta.yarn.api.num.retries
- Default: 5
- Description: Number of times that the YARN Validator will attempt to contact the YARN resource manager API to validate a Spark application for partition tokens.

Manage Encryption Keys

Audience: System Administrators
Content Summary: Immuta generates data encryption keys (on a user-defined rollover schedule) to encrypt and decrypt values. This page provides an overview of encryption key management and outlines its configuration options in Immuta.

Use an External Key Management Service

Immuta recommends using an external Key Management Service (KMS) to encrypt or decrypt data keys as needed.

Overview

Immuta encrypts values with data encryption keys, either those that are system-generated or managed using an external key management service (KMS). Immuta recommends a KMS to encrypt or decrypt data keys and supports the AWS Key Management Service. To configure the AWS KMS, complete the steps below.

However, if no KMS is configured Immuta will generate a data encryption key on a user-defined rollover schedule, using the most recent data key to encrypt new values while preserving old data keys to decrypt old values. To change the default rollover schedule of 1 year, follow these steps.

1 - Set Up AWS Credentials

Before you can configure the AWS KMS, you need to set up your AWS credentials. Immuta cannot encrypt the AWS access/secret keys in the KMS configuration, so we recommend using IAM roles.

1.1 - Create an IAM Policy

Follow AWS documentation to create an IAM policy to attach to your IAM role. An example is provided below.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt"
            ],
            "Resource": [
                "{AWS KMS Key ARN}"
            ],
            "Effect": "Allow"
        }
    ]
}

Other ways of setting up AWS credentials can be found here.

1.2 Set Up an IAM Role

Choose one of the following options to set up an IAM role:

Attach an IAM role to your AWS EC2 instance. Then, continue to step 2.
If you're running Immuta in Kubernetes (AWS EKS), work with your Immuta Support Professional to set up an IAM role. Then, continue to step 2.
Add credentials in the KMS configuration (not recommended): This option should only be used if Immuta is not running on your AWS infrastructure and you need to leverage a KMS on AWS. For all other scenarios, use one of the options above.

2 - Configure AWS KMS on the App Settings Page

Add the following configuration (with your AWS region and keyId) to the Advanced Configuration section of the App Settings page.

plugins:
  awsKms:
    plugin: 'awsKms'
    isKms: true
    keyId: <your key id>
    region: <your aws region>

Adding Credentials in the KMS Configuration (not recommended)

Immuta Cannot Encrypt AWS Access/Secret Keys in KMS Configuration

Immuta cannot encrypt the AWS access/secret keys in the KMS configuration, so we recommend using IAM roles.

This option should only be used if Immuta is not running on your AWS infrastructure. For example, if you are running Immuta on-prem and need to leverage a KMS on AWS. For all other scenarios, use one of the two other options above.

Before you begin, create a secret access key and an access key that will authenticate to Immuta.

Navigate to the App Settings page and add the following configuration (with your AWS keyId, region, and credentials) to the Advanced Configuration section:

plugins:
  awsKms:
    plugin: awsKms
    isKms: true
    keyId: <your key id>
    region: <your aws region>
    extraKmsConfig:
      credentials:
        accessKeyId: <your access key id>
        secretAccessKey: <your secret access key>
        sessionToken: <your session token>

Optional: Define Rollover Schedule

Click the App Settings icon in the left sidebar and scroll to the Advanced Configuration section.
Paste the following configuration in the text box, adjusting dataKeyRollOverLength days to your desired schedule:
```
schedule:
  dataKeyRollOverLength: 365
```

Supported ODBC Drivers

The Immuta Web Service uses ODBC drivers to communicate with back end data platforms. Immuta deployments only include a few ODBC drivers that Immuta is able to distribute. All other drivers that are not redistributable must be obtained and deployed by a System Administrator before Data Owners can use the corresponding data source types in Immuta.

You can install ODBC drivers on the App Settings page.

Amazon Athena

This driver is included with Immuta.

Apache Hive

HiveODBC-2.6.9.1009-1.x86_64.rpm

Apache Impala

ImpalaODBC-2.6.8.1008-1.x86_64.rpm

Azure Synapse Analytics

msodbcsql17-17.10.2.1-1.x86_64.rpm

Azure SQL

Deprecation notice

Support for this data platform has been deprecated.

msodbcsql17-17.10.2.1-1.x86_64.rpm

Databricks

This driver is included with Immuta.

Google BigQuery

SimbaODBCDriverforGoogleBigQuery_2.5.0.1001-Linux.tar.gz

Greenplum

This driver is included with Immuta.

MySQL

mysql-connector-odbc-8.0.18-glibc2-12-x86_64.rpm

Netezza

npsclient.7.2.0.5-P1.tar.gz

Oracle

oracle-instantclient19.5-odbc-19.5.0.0.0-1.x86_64.rpm

PostgreSQL

This driver is included with Immuta.

Presto

1.2.18

You may purchase this ODBC driver from Magnitude.

Redshift

This driver is included with Immuta.

SAP Hana

The SAP Hana ODBC driver odbc-2019.01.19.tar.gz is available as part of your SAP Hana installation. Upload a tar.gz file that contains the ODBC driver for Linux x86 64.

Snowflake

This driver is included with Immuta.

SQL Server

This driver is included with Immuta.

Starburst

StarburstODBC-64bit-2.0.1.1002.el6.x86_64.rpm

You can obtain this ODBC driver from Starburst Data with a Starburst contract.

Teradata

Deprecation notice

Support for this data platform has been deprecated.

tdodbc1620-16.20.00.65-1.noarch.rpm

Trino

simbatrino-2.1.0.1001-1.el6.x86_64.rpm

You can purchase this ODBC driver from Magnitude.

Vertica

vertica-client-9.2.1-0.x86_64.rpm

Hadoop and Spark Plugin Configuration

Audience: System Administrators
Content Summary: This page outlines the on-cluster configurations for Immuta's Hadoop and Spark plugins. Most of these values are consistent across Hadoop providers; however, some values are provider-specific. To learn more about provider-specific deployments, see the installation guides for Cloudera and Amazon EMR.

Components

Immuta NameNode Plugin

The NameNode plugin configurations can be set in core-site.xml and hdfs-site.xml (for NameNode-specific values).

Immuta Vulcan Service

Hadoop Clients

Spark Applications

Public NameNode and Hadoop Client Configuration

immuta.generated.api.key.dir
- Default: /user
- Description: The base directory under which the NameNode plugin will look for generated API keys for use with the Immuta Web Service. The default value is user with the username and .immuta_generated added to the end so that each user has their own generated API key directory and the .immuta_generated directory adds an additional layer of protection so other users can't listen on the /user/<username> directory to wait for API keys to be generated. This configuration item should never point at a non-HDFS path because attempting to generate credentials outside of HDFS is invalid. This item should be in sync between the NameNode plugin's configuration and client configuration.
immuta.credentials.dir
- Default: /user
- Description: A directory which will be used to store each user's Immuta API key and token for use with the Immuta Web Service. The user's API key and token are stored this way to avoid re-authenticating frequently with the web service and introducing additional overhead to processes like MapReduce and Spark. Similar to the generated API key directory, this configuration item defaults to /user with the username of the current user added on. Each user should have a directory under the credentials directory for storing their own credentials. NOTE: It is valid for a user to provide and save their own API key in /user/<username>/immuta_api_key so that their code does not attempt to generate an API key. It is also valid to override this value with a non-HDFS path in case HDFS is not being used (Spark in a non-HDFS environment, for example); e.g., file:///home/ would point to file:///home/<username>/immuta_api_key with the user's API key file.
immuta.base.url
- Description: The URL at which the Immuta API can be reached. This should be the base URL of the Immuta API.
fs.immuta.impl
- Description: This configuration allows users to access the immuta:// scheme in order to have their filesystem built in the same way that the Immuta FUSE filesystem is built. This filesystem is also used in Spark deployments, which read data from external object storage (e.g., S3). This means that users will have consistent filesystem views regardless of where they are accessing Immuta. This is not set by default and must be set to com.immuta.hadoop.ImmutaFileSystem system-wide in core-site.xml.
immuta.cluster.name
- Default: hostname from fs.defaultFS
- Description: This configuration item identifies a cluster to the Immuta Web Service. This is very important because it determines how file access is controlled in HDFS by the NameNode plugin and which data sources are available to a cluster. The default value is taken from fs.defaultFS and administrators should be advised that when an organization has multiple HA HDFS clusters it is possible that they all have the same nameservice name, so this value should be set on each cluster for identification purposes.
immuta.api.key
- Description: (CLIENT ONLY) Users can configure their own API key when running jobs or interacting with an HDFS client, but if an API key is not configured for the user it will be generated on the first attempt to communicate with the Immuta service and stored securely in their credential directory (described above). Immuta uses the Configuration.getPassword() method to retrieve this configuration item, so it may also be set using the Hadoop CredentialProvider API.
immuta.permission.fallback.class
- Default: org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider (HDFS 2.6.x/CDH), org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider (HDFS 2.7+)
- Sentry: org.apache.sentry.hdfs.SentryINodeAttributesProvider (HDFS 2.7+)
- Description: The configuration key for the fully qualified class name of the fallback permission checking class that will be used after the Immuta authorization or inode attribute provider.
immuta.permission.allow.fallback
- Default: false
- Description: Denotes the action that the Immuta permission checking classes will take when a user is forbidden access to data in Immuta. If set to true every time a user is denied access to a file via Immuta their permissions will be checked against the underlying default permission checker, potentially meaning that they will still have access to data that they cannot access via Immuta.
immuta.permission.users.to.ignore
- Default: hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,immuta
- Description: CSV list of users that will not ever have their HDFS file accesses checked in Immuta. This should include any system superusers to avoid overhead of checking permissions in Immuta that should not be relevant.
immuta.permission.groups.to.ignore
- Description: Same as immuta.permission.users.to.ignore but for groups.
immuta.permission.users.to.enforce
- Description: A comma delimited list of users that must go through Immuta when checking permissions on HDFS files. If this configuration item is set, then fallback authorizations will apply to everyone by default, unless they are on this list. If a user is on both the enforce list and the ignore list, then their permissions will be checked with Immuta (i.e., the enforce configuration item takes precedence).
immuta.permission.groups.to.enforce
- Description: Same as immuta.permission.users.to.enforce but for groups.
immuta.permission.paths.to.enforce
- Description: A comma delimited list of paths to ignore when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will use fallback authorizations and not go through Immuta. All other paths will be checked with Immuta. Setting both immuta.permission.paths.to.ignore and immuta.permission.paths.to.enforce properties at the same time is unsupported.
immuta.permission.paths.to.ignore
- Description: A comma delimited list of paths to enforce when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will be checked in Immuta. All other paths will use fallback authorizations. WARNING: Setting this property effectively disables Immuta file permission checking for all paths not in this configuration item. Setting both immuta.permission.paths.to.ignore and immuta.permission.paths.to.enforce properties at the same time is unsupported.
immuta.system.details.cache.timeout.seconds
- Default: 1800
- Description: The number of seconds to cache system detail information from the Immuta Web Service. This should be high since, ideally, the relevant values in Immuta configuration won't change often (or ever).
immuta.permission.workspace.ignored.users
- Default: hive,impala
- Description: Comma-delimited list of users that should be ignored when accessing workspace directories. This should never have to change since the default Hive and Impala principals are covered, but this can be modified in case of non-standard configuration. This list is separate from the ignored user list above because we do not want to allow access to ignored non-system users who may be operating on a cluster with Immuta installed but who should not be allowed to see workspace data. This should be limited to the principals for Hive and Impala.

NameNode-only Configuration

immuta.extra.name.node.plugin.config
- Description: Path to Hadoop-style XML configuration file containing items that will be used by the Immuta NameNode plugin. This item helps to configure sensitive information in a way that will only be readable by the hdfs user to avoid leaking sensitive configuration to other users. This should be in the form file:///path/to/file.xml.
immuta.system.api.key
- Description: HIGHLY SENSITIVE. This configuration item is used by the NameNode plugin (and the Vulcan Service) to access privileged endpoints of the Immuta API. This is a required configuration item for both the NameNode plugin and Vulcan Service.
immuta.no.data.source.cache.timeout.seconds
- Default: 60
- Description: The amount of time in seconds that the NameNode plugin will cache the fact that a specific path is not a part of any Immuta data sources.
immuta.hive.impala.cache.timeout.seconds
- Default: 60
- Description: The amount of time in seconds to cache the fact that a user is subscribed to a Hive or Impala data source containing the target file they are attempting to access.
immuta.canisee.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds to cache the access result from Immuta for a user/path pair.
immuta.specific.access.cache.timeout
- Default: 10
- Description: The amount of time to temporarily unlock a file in HDFS for a user using temporary access tokens with files backing Hive and Impala data sources in Spark.
immuta.data.source.cache.timeout.seconds
- Default: 300
- Description: The amount of time in seconds that users' subscribed data sources should be cached in memory to avoid reaching out to Immuta for data sources over and over. Relevant to the Immuta Hadoop client FileSystem and Spark jobs.
immuta.canisee.metastore.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the NameNode plugin will cache the fact that a path belongs to a Metastore (Impala or Hive) data source. Reduces network calls from NameNode to Immuta when Vulcan is accessing paths belonging to Metastore sources.
immuta.canisee.non.user.cache.timeout.seconds
- Default: 30
- Description: The amount of time that the NameNode plugin will cache that a user principal does not belong to an Immuta user. This is useful if the ignored/enforced users/groups configurations are not being used so that when the NameNode receives a 401 response from the canisee endpoint it will store that information and not retry canisee requests to Immuta during that time.
immuta.canisee.num.retries
- Default: 1
- Description: The number of times to retry access calls from the NameNode plugin to Immuta to account for network issues.
immuta.project.user.cache.timeout.seconds
- Default: 300
- Description: The amount of time in seconds that the ImmutaGroupsMapping will cache whether or not a principal is tied to an Immuta user account. This decreases the number of calls from HDFS to Immuta when there are accounts that are not tied to Immuta.
immuta.project.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the ImmutaGroupsMapping will cache project and workspace information for a given project ID. This is also the amount of time a user's current project will be cached.
immuta.project.forbidden.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that the ImmutaCurrentProjectHelper will cache the fact that a principal tied to an Immuta user is being forbidden from using their current project.
immuta.workspace.deduplication.timeout.seconds
- Default: 60
- Description: The amount of time to wait before auditing duplicate workspace filesystem actions from HDFS. This is the amount of time the NameNode plugin will wait before a user reading or writing the same path will have duplicate audit records written to Immuta.
immuta.permission.system.details.retries
- Default: 5
- Description: The number of times the system details background worker will attempt to retrieve system details from the Immuta web service if an attempt fails.
immuta.permission.source.cache.enabled
- Default: false
- Description: Denotes whether a background thread should be started to periodically cache paths from Immuta that represent Immuta-protected paths in HDFS. Enabling this increases NameNode performance because it prevents the NameNode plugin from calling the Immuta web service for paths that do not back HDFS data sources.
immuta.permission.source.cache.timeout.seconds
- Default: 300
- Description: The time between calls to sync/cache all paths that back Immuta data sources in HDFS.
immuta.permission.source.cache.retries
- Default: 5
- Description: The number of times the data source cache background worker will attempt to retry calls to Immuta on failure.
immuta.permission.request.retries
- Default: 5
- Description: The number of retries that the NameNode plugin will attempt for any blocking web request between HDFS and the Immuta API.
immuta.permission.request.initial.delay.milliseconds
- Default: 250
- Description: The initial delay for the BackoffRetryHelper that the NameNode plugin will employ for any retries of blocking web requests between HDFS and the Immuta API.
immuta.permission.request.socket.timeout
- Default: 1500
- Description: The time in milliseconds that the NameNode plugin will wait before cancelling a request to the Immuta API if no data has been read from the HTTP connection. This applies to blocking requests only.
immuta.permission.workspace.base.path.override
- Description: This configuration item can be set so that the NameNode does not have to retrieve Immuta HDFS workspace base path periodically from the Immuta API.

Spark Application Configuration

The following items are relevant to any Immuta Spark applications using the ImmutaSparkSession or ImmutaContext.

immuta.spark.data.source.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds that data source information will be cached in the user's Spark job. This reduces the number of times the client will need to refresh data source information.
immuta.spark.sql.account.expiration
- Default: 2880
- Description: The amount of time in seconds that temporary SQL account credentials will be valid that are created by the Immuta Spark plugins for accessing data sources via Postgres over JDBC.
immuta.postgres.fetch.size
- Default: 1000
- Description: The JDBC fetch size used for data sources accessed via Postgres over JDBC.
immuta.postgres.configuration
- Description: The configuration key for any extra JDBC options that should be appended to the Immuta Postgres connection by the Immuta SQL Context. An example would include sslfactory=org.postgresql.ssl.NonValidatingFactory to turn off SSL validation.
immuta.enable.jdbc
- Default: false
- Description: If true, allows the user's Spark job to make queries to Immuta's Postgres instance automatically when we detect that the data source is not on cluster and we must pull data back via PG. This can be set per-job, but defaults to false to prevent a user from accidentally (and unknowingly) pulling huge amounts of data over JDBC.
immuta.ephemeral.host.override
- Default: true
- Description: Set this to false if ephemeral overrides should not be enabled for Spark. When true this will automatically override ephemeral data source host names with an auto-detected host name on cluster that should be running HiveServer2. It is assumed HiveServer2 is running on the NameNode.
immuta.ephemeral.host.override.address
- Description: This configuration item can be used if automatic detection of Hive's hostname should be disabled in favor of a static hostname to use for ephemeral overrides. This is useful for when your cluster is behind a load balancer or proxy.
immuta.ephemeral.host.override.name-node
- Description: In an HA cluster it may be a good idea to specify the NameNode on which Hive is running for ephemeral overrides. This should contain the NameNode from configuration that is hosting HiveServer2.
immuta.secure.truststore.enabled
- Default: false
- Description: Enables TLS truststore verification. If enabled without a custom truststore it will use the default.
immuta.secure.truststore
- Description: Location of the truststore that contains the Immuta Web Service certification.
immuta.secure.truststore.password
- Description: Password for the truststore that contains the Immuta Web Service certification.
immuta.spark.visibility.cache.timeout.seconds
- Default: 30
- Description: The amount of time in seconds the ImmutaContext or ImmutaSparkSession will cache visibilities from Immuta. Maximum of 30 seconds.
immuta.spark.visibility.read.timeout.seconds
- Default: 300
- Description: The socket read timeout for visibility calls to Immuta.
immuta.spark.audit.retries
- Default: 2
- Description: The number of times to retry audit calls to Immuta from Spark.
immuta.masked.jdbc.optimization.enabled
- Default: true
- Description: Enables push down filters to postgres. This should only be changed to false if the user is joining to a non-Spark data source (in PostgreSQL) on a masked column.

Vulcan Service Configuration

The following configuration items are needed by the Immuta Vulcan Service. Some of these items are also shared with the NameNode plugin as they work in tandem to protect data in HDFS.

immuta.meta.store.token.dir
- Default: /user/<Vulcan Service user>/tokens
- Description: The directory in which temporary access tokens for HDFS files backing Hive/Impala data sources will be stored. This needs to be configured for the NameNode plugin as well in order to unlock files in HDFS.
immuta.meta.store.remote.token.dir
- Default: /user/<Vulcan Service user>/remotetokens
- Description: The directory in which temporary access tokens for remote/object storage (S3, GS, etc) files backing Hive/Impala data sources will be stored.
immuta.spark.partition.generator.user
- Default: immuta
- Description: The username of the user that will be running the Vulcan Service. This should also be the short username of the Kerberos principal running the Vulcan Service.
immuta.secure.partition.generator.hostname
- Default: localhost
- Description: The interface/hostname that clients will use to communicate with the Vulcan Service.
immuta.secure.partition.generator.listen.address
- Default: 0.0.0.0
- Description: The interface/hostname on which the Vulcan Service will listen for connections.
immuta.secure.partition.generator.port
- Default: 9070
- Description: The port on which the Vulcan Service will listen for connections.
immuta.configuration.id.file.config
- Default: hdfs:///user/<Vulcan Service user>/config_id
- Description: The file in HDFS where the cluster configuration ID will be stored. This is used to keep track of the unique ID in Immuta tied to the current cluster.
immuta.secure.partition.generator.keystore
- Description: Path the keystore file to be used for securing Vulcan Service with TLS.
immuta.secure.partition.generator.keystore.password
- Description: The password for the keystore configured with immuta.secure.partition.generator.keystore.
immuta.secure.partition.generator.keymanager.password
- Description: The configuration key for the key manager password for the keystore configured with immuta.secure.partition.generator.keystore.
immuta.secure.partition.generator.url.external
- Default: <NameNode / master hostname>:<Vulcan Service port>
- Description: The configuration key for specifying the externally addressable Vulcan Service URL. This URL must be reachable from the Immuta web app. If this is not set, the Vulcan Service will try to determine the URL based on its Hadoop configuration.
immuta.yarn.validation.params
- Default: /user/<Vulcan Service user>/yarnParameters.json
- Description: The file containing parameters to use when validating YARN applications for secure token generation for file access. When a Spark application requests tokens be generated for file access, the Vulcan Service will validate that the Spark application is configured properly using the parameters from this file.
immuta.emrfs.credential.file.path
- Description: For EMR/EMRFS only. This configuration points to a file containing AWS credentials that the Vulcan Service can use for accessing data in S3. This is also useful for Hive/the hive user so that (if impersonation is turned on) only a few users (hive and the Vulcan Service user) on cluster can access data in S3 while everyone else is forced through the Vulcan Service.
immuta.workspace.allow.create.table
- Default: false
- Description: True if the user should be allowed to create workspace tables. Users will not be able to drop their created tables if sentry object ownership is not set to ALL.
immuta.partition.tokens.ttl.seconds
- Default: 3600
- Description: How long in seconds Immuta temporary file access tokens should live in HDFS before being cleaned up.
immuta.partition.tokens.interval.seconds
- Default: 1800
- Description: Number of seconds between runs of the token cleanup job which will delete all expired temporary file access tokens from HDFS.
immuta.scheduler.heartbeat.enable
- Default: true
- Description: True to enable sending configuration to the Immuta Web Service and updating on an interval. This can be set to false to prevent this cluster from being available in the HDFS configurations dropdown for HDFS data sources as well as prevent it from being used for workspaces. This make sense for ephemeral (EMR) clusters.
immuta.scheduler.heartbeat.initial.delay.seconds
- Default: 0
- Description: When starting the Vulcan Service, how long in seconds to wait before first sending configuration to the Immuta Web Service.
immuta.scheduler.heartbeat.interval.seconds
- Default: 28800
- Description: How long in seconds to wait between each configuration update submission to the Immuta Web Service.
immuta.file.session.store.expiration.seconds
- Default: 900
- Description: Number of seconds that idle remote file sessions will be kept active in the Vulcan Service. This is for spark clients that are reading remote data (S3, GS) via the Vulcan Service.
immuta.file.session.status.expiration.seconds
- Default: 300
- Description: Number of seconds that the Vulcan Service will cache file statuses from remote object storage.
immuta.file.session.status.max.size
- Default: 250
- Description: Maximum number of file status objects that the Vulcan Service will cache at one time.
immuta.yarn.api.num.retries
- Default: 5
- Description: Number of times that the YARN Validator will attempt to contact the YARN resource manager API to validate a Spark application for partition tokens.

App Settings Tutorial

Audience: Application Admins
Content Summary: This page details how to use the App Settings page to configure settings for Immuta for your organization.

Navigate to the App Settings Page

Click the App Settings icon in the left sidebar.
Click the link in the App Settings panel to navigate to that section.

Use Existing Identity Access Manager

See the identity manager pages for a tutorial to connect an Microsoft Entra ID, Okta, or OneLogin identity manager.

To configure Immuta to use all other existing IAMs,

Click the Add IAM button.
Complete the Display Name field and select your IAM type from the Identity Provider Type dropdown: LDAP/Active Directory, SAML, or OpenID.

Once you have selected LDAP/Active Directory from the Identity Provider Type dropdown menu,

Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Credentials and Options sections. Note: Either User Attribute OR User Search Filter is required, not both. Completing one of these fields disables the other.
Opt to have Case-insensitive user names by clicking the checkbox.
Opt to Enable Debug Logging or Enable SSL by clicking the checkboxes.
In the Profile Schema section, map attributes in LDAP/Active Directory to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account.
Opt to Enable scheduled LDAP Sync support for LDAP/Active Directory and Enable pagination for LDAP Sync. Once enabled, confirm the sync schedule written in Cron rule; the default is every hour. Confirm the LDAP page size for pagination; the default is 1,000.
Opt to Sync groups from LDAP/Active Directory to Immuta. Once enabled, map attributes in LDAP/Active Directory to automatically pull information about the groups into Immuta.
Opt to Sync attributes from LDAP/Active Directory to Immuta. Once enabled, add attribute mappings in the attribute schema. The desired attribute prefix should be mapped to the relevant schema URN.
Opt to enable External Groups and Attributes Endpoint, Make Default IAM, or Migrate Users from another IAM by selecting the checkbox.
Then click the Test Connection button.
Note: If you select Link SQL Account, you will need to update the Query Engine configuration.
Once the connection is successful, click the Test User Login button.
Click the Test LDAP Sync button if scheduled sync has been enabled.

Once you have selected SAML from the Identity Provider Type dropdown menu,

Take note of the ID and copy the SSO Callback URL to use as the ACS URL in your identity provider.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Client Options section.
Opt to Enable SCIM support for SAML by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in SAML to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account, Allow Identity Provider Initiated Single Sign On, Sync groups from SAML to Immuta, Sync attributes from SAML to Immuta, External Groups and Attributes Endpoint, or Migrate Users from another IAM by selecting the checkboxes, and then click the Test Connection button.
Once the connection is successful, click the Test User Login button.

Once you have selected OpenID from the Identity Provider Type dropdown menu,

Take note of the ID. You will need this value to reference the IAM in the callback URL in your identity provider with the format <base url>/bim/iam/<id>/user/authenticate/callback.
Note the SSO Callback URL shown. Navigate out of Immuta and register the client application with the OpenID provider. If prompted for client application type, choose web.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu.
Back in Immuta, enter the Client ID, Client Secret, and Discover URL in the form field.
Configure OpenID provider settings. There are two options:
1. Set Discover URL to the /.well-known/openid-configuration URL provided by your OpenID provider.
2. If you are unable to use the Discover URL option, you can fill out Authorization Endpoint, Issuer, Token Endpoint, JWKS Uri, and Supported ID Token Signing Algorithms.
If necessary, add additional Scopes.
Opt to Enable SCIM support for OpenID by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in OpenID to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Allow Identity Provider Initiated Single Sign On or Migrate Users from another IAM by selecting the checkboxes.
Click the Test Connection button.
Once the connection is successful, click the Test User Login button.

Immuta Accounts

To set the default permissions granted to users when they log in to Immuta, click the Default Permissions dropdown menu, and then select permissions from this list.

Link External Catalogs

See the External Catalogs page.

Enable External Masking

Deprecation notice

Support for this feature has been deprecated.

To enable external masking,

Navigate to the App Settings page and click External Masking in the left sidebar.
Click Add Configuration and specify an external endpoint in the External URI field.
Click Configure, and then add at least one tag by selecting from the Search for tags dropdown menu. Note: Tag hierarchies are supported, so tagging a column as Sensitive.Customer would drive the policy if external masking was configured with the tag Sensitive).
Select Authentication Method and then complete the authentication fields (when applicable).
Click Test Connection and then Save.

Add a Native Workspace

Select Add Workspace.
Use the dropdown menu to select the Workspace Type and refer to the corresponding tab below.

Best Practices: Read-only Access Recommended

It is best practice to use an AWS account with limited read-only access to the data in question, but not necessary.

Use the dropdown menu to select the Schema:

hdfs
1. Enter the Workspace Base Directory (any project workspaces created will be sub-directories of this path).
2. Click Test Workspace Directory.
3. Once the credentials are successfully tested, click Save.
s3a
1. Use the dropdown menu to select the AWS Region.
2. Enter the S3 Bucket.
3. Opt to enter the S3 Bucket Prefix.
4. Opt to Configure S3 Credentials.
  1. Use the dropdown menu to select Authentication Method, and enter the required information.
    AWS Access Key: Enter your AWS Access Key ID and AWS Secret Key. Required Permissions: s3:ListBucket, s3:GetObject, and s3:GetObjectTagging.
    AWS IAM Instance Role: Opt to Assume AWS IAM Instance Role if you have ListRoles IAM permission or enter the AWS IAM Role ARN manually.
5. Click Test Workspace Bucket.
6. Once the credentials are successfully tested, click Save.

Databricks Cluster Configuration

Databricks API Token Expiration

The Databricks API Token used for native workspace access must be non-expiring. Using a token that expires risks losing access to projects that are created using that configuration.

Use the dropdown menu to select the Schema:

Required AWS S3 Permissions

s3:Get*
s3:Delete*
s3:Put*
s3:AbortMultipartUpload

s3:ListBucket
s3:ListBucketMultipartUploads
s3:GetBucketLocation

Enter the Name.
Click Add Workspace
Enter the Hostname.
Opt to enter the Workspace ID (required with Azure Databricks).
Enter the Databricks API Token.
Use the dropdown menu to select the AWS Region.
Enter the S3 Bucket.
Opt to enter the S3 Bucket Prefix.
Click Test Workspace Bucket.
Once the credentials are successfully tested, click Save.

Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, Databricks API Token, and Storage Container.
Enter the Workspace Base Directory.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.

Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, and Databricks API Token.
Use the dropdown menu to select the Google Cloud Region.
Enter the GCS Bucket.
Opt to enter the GCS Object Prefix.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.

Add an Integration

Select Add Native Integration.
Use the dropdown menu to select the Integration Type:
- To enable Azure Synapse Analytics, see the Azure Synapse Analytics Integration page.
- To enable Starburst (Trino), see the Starburst (Trino) installation page.
- To enable Redshift, see the Redshift installation guide.
- To enable Snowflake, see the Snowflake integration guide.
- To enable Databricks Spark, see the Simplified Databricks Configuration guide.

Manage Data Providers

To enable a data provider,

Click the menu button in the upper right corner of the provider icon you want to enable.
Select Enable from the dropdown.

If an ODBC driver needs to be uploaded,

Click the menu button in the upper right corner of the provider icon, and then select Upload Driver from the dropdown.
Click in the Add Files to Upload box and upload your file.
Click Close.
Click the menu button again, and then select Enable from the dropdown.

Enable Email

Application Admins can configure the SMTP server that Immuta will use to send emails to users. If this server is not configured, users will only be able to view notifications in the Immuta console.

To configure the SMTP server,

Complete the Host and Port fields for your SMTP server.
Enter the username and password Immuta will use to log in to the server in the User and Password fields, respectively.
Enter the email address that will send the emails in the From Email field.
Opt to Enable TLS by clicking this checkbox, and then enter a test email address in the Test Email Address field.
Finally, click Send Test Email.

Initialize Kerberos

To configure Immuta to protect data in a kerberized Hadoop cluster,

Upload your Kerberos Configuration File, and then you can add modify the Kerberos configuration in the window pictured below.
Upload your Keytab File.
Enter the principal Immuta will use to authenticate with your KDC in the Username field. Note: This must match a principal in the Keytab file.
Adjust how often (in milliseconds) Immuta needs to re-authenticate with the KDC in the Ticket Refresh Interval field.
Click Test Kerberos Initialization.

Configure HDFS Cache Settings

To configure cache settings, enter the time in milliseconds in each of the Cache TTL fields.

Set Public URLs

You can set the URL users will use to access the Immuta Application and Query Engine. Note: Proxy configuration must be handled outside Immuta.

Complete the Public Immuta URL, Public Query Engine Hostname, and Public Query Engine Port fields.
Opt to Enable SSL by clicking this checkbox.

Enable Sensitive Data Discovery

To enable Sensitive Data Discovery and configure its settings, see the Sensitive Data Discovery page.

Allow Policy Exemptions

Click the Allow Policy Exemptions checkbox to allow users to specify who can bypass all policies on a data source.

Default Subscription Merge Options

Click the Default Subscription Merge Options text in the left pane.
Select the Default "allow shared policy responsibility" to be checked checkbox.
Click Save.

Note: Even with this setting enabled, Governors can opt to have their Global Subscription policies combined with AND during policy creation.

Configure Governor and Admin Settings

These options allow you to restrict the power individual users with the GOVERNANCE and USER_ADMIN permissions have in Immuta. Click the checkboxes to enable or disable these options.

Create Custom Permissions

To add a custom permission, click the Add Permission button, and then name the permission in the Enter Permission field.

Create Custom Data Source Access Requests

To create a custom questionnaire that all users must complete when requesting access to a data source, fill in the following fields:

Opt for the questionnaire to be required.
Key: Any unique value that identifies the question.
Header: The text that will display on reports.
Label: The text that will display in the questionnaire for the user. They will be prompted to type the answer in a text box.

To create a custom message for the login page of Immuta, enter text in the Enter Login Message box. Note: The message can be formatted in markdown.

Opt to adjust the Message Text Color and Message Background Color by clicking in these dropdown boxes.

Generate System API Key

Click the Generate Key button.
Save this API key in a secure location.

Prevent Automatic Table Statistics

Without Fingerprints Some Policies Will Be Unavailable.

Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response

To disable the automatic collection of statistics with a particular tag,

Use the Select Tags dropdown to select the tag(s).
Click Save.

SQL Credential Password Requirements

To ensure that users are not easily compromised, minimum password requirements have been added as default. The password requirements can be changed to fit certain standards.

Advanced Settings

Preview Features

If you enable any Preview features, please provide feedback on how you would like these features to evolve.

Policy Adjustments

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Click Save.

Health Expert Determination

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Check the Enable Health Expert Determination checkbox.
Click Save.

Complex Data Types

Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Allow Complex Data Types checkbox.
Click Save.

Enhanced Subscription Policy Variables

For instructions on enabling this feature, navigate to the Global Subscription Policies Advanced DSL Tutorial.

Advanced Configuration

Advanced configuration options provided by the Immuta Support team can be added in this section. The configuration must adhere to the YAML syntax.

Update the K-Anonymity Cardinality Cutoff

To increase the default cardinality cutoff for columns compatible with k-anonymity,

Expand the Advanced Settings section and add the following text to the Advanced Configuration:

plugins:
  postgresHandler:
    maxKAnonCardinality: 10000000
  snowflakeHandler:
    maxKAnonCardinality: 10000000

Click Save.
To regenerate the data source's fingerprint, navigate to that data source's page.
Click the Status in the upper right corner.
Click Re-run in the Fingerprint section of the dropdown menu.

Note: Re-running the fingerprint is only necessary for existing data sources. New data sources will be generated using the new maximum cardinality.

Update the Time to Webhook Request Timeouts

Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of seconds before webhook requests timeout. For example use 30 for 30 seconds. Setting it to 0 will result in no timeout.
```
webhookIntegrationResponseTimeout: 30
```
Click Save.

Update the Audit Ingestion Expiration

Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of minutes before an audit job expires. For example use 300 for 300 minutes.
```
plugins:
  auditService:
    ingestionJob:
      expirationInMinutes: 300
```
Click Save.

Turn Off Query Text Transfer

By default, Immuta includes the query text in audit records, which requires it to transfer from the data platform to Immuta. To turn off this feature and stop audit records from including query text,

Expand the Advanced Settings section and add the following text to the Advanced Configuration:
```
featureFlags:
  AuditExcludeQueryText: true
```
Click Save.

Administer Features

The Query Engine grants Immuta user accounts proxied data query access to Immuta data sources through the Immuta API and the Query Editor in the Immuta UI.

Application Administrators can turn off the Query Engine to ensure data does not leave a data localization zone when authorized users access the Immuta Application outside data jurisdiction.

To disable this feature,

Click Advanced Settings in the left panel, and scroll to the Administer Features section.
Select the Disable radio button and click Save.
Click Confirm to deploy your changes.

Deploy Configuration Changes

When you are ready to finalize your configuration changes, click the Save button at the bottom of the left panel, and then click Confirm to deploy your changes.

Additional Resources

App Settings Tutorial

Navigate to the App Settings Page

Use Existing Identity Access Manager

Immuta Accounts

Link External Catalogs

Enable External Masking

Add a Native Workspace

Add an Integration

Manage Data Providers

Enable Email

Initialize Kerberos

Configure HDFS Cache Settings

Set Public URLs

Enable Sensitive Data Discovery

Allow Policy Exemptions

Default Subscription Merge Options

Configure Governor and Admin Settings

Create Custom Permissions

Create Custom Data Source Access Requests

Create Custom Login Message

Generate System API Key

Prevent Automatic Table Statistics

SQL Credential Password Requirements

Advanced Settings

Preview Features

Policy Adjustments

Health Expert Determination

Complex Data Types

Enhanced Subscription Policy Variables

Advanced Configuration

Update the K-Anonymity Cardinality Cutoff

Update the Time to Webhook Request Timeouts

Update the Audit Ingestion Expiration

Turn Off Query Text Transfer

Administer Features

Deploy Configuration Changes

Configure the Immuta PostgreSQL Instance

Change PostgreSQL Settings

Connect to the Database Container

Modify the Configuration

Apply the Changes and Restart the Cluster

Enable ImmutaGroupsMapping

Introduction

Group Naming

Configuration Prerequisites

Check the Existing Group Mapping Service

Configuration to Add

Configuration Snippet

Caching Considerations

Hadoop and Spark Plugin Configuration

Components

Immuta NameNode Plugin

Immuta Vulcan Service

Hadoop Clients

Spark Applications

Public NameNode and Hadoop Client Configuration

NameNode-only Configuration

Spark Application Configuration

Vulcan Service Configuration

Manage Encryption Keys

Overview

1 - Set Up AWS Credentials

1.1 - Create an IAM Policy

1.2 Set Up an IAM Role

2 - Configure AWS KMS on the App Settings Page

Adding Credentials in the KMS Configuration (not recommended)

Optional: Define Rollover Schedule

Supported ODBC Drivers

Amazon Athena

Apache Hive

Apache Impala

Azure Synapse Analytics

Azure SQL

Databricks

Google BigQuery

Greenplum

MySQL

Netezza

Oracle