Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Audience: System Administrators
Content Summary: Immuta generates data encryption keys (on a user-defined rollover schedule) to encrypt and decrypt values. This page provides an overview of encryption key management and outlines its configuration options in Immuta.
Use an External Key Management Service
Immuta recommends using an external Key Management Service (KMS) to encrypt or decrypt data keys as needed.
Immuta encrypts values with data encryption keys, either those that are system-generated or managed using an external key management service (KMS). Immuta recommends a KMS to encrypt or decrypt data keys and supports the AWS Key Management Service. To configure the AWS KMS, complete the steps below.
However, if no KMS is configured Immuta will generate a data encryption key on a user-defined rollover schedule, using the most recent data key to encrypt new values while preserving old data keys to decrypt old values. To change the default rollover schedule of 1 year, follow these steps.
Before you can configure the AWS KMS, you need to set up your AWS credentials. Immuta cannot encrypt the AWS access/secret keys in the KMS configuration, so we recommend using IAM roles.
Follow AWS documentation to create an IAM policy to attach to your IAM role. An example is provided below.
Example IAM Policy:
Other ways of setting up AWS credentials can be found here.
Choose one of the following options to set up an IAM role:
Attach an IAM role to your AWS EC2 instance. Then, continue to step 2.
If you're running Immuta in Kubernetes (AWS EKS), work with your Immuta Support Professional to set up an IAM role. Then, continue to step 2.
Add credentials in the KMS configuration (not recommended): This option should only be used if Immuta is not running on your AWS infrastructure and you need to leverage a KMS on AWS. For all other scenarios, use one of the options above.
Add the following configuration (with your AWS region
and keyId
) to the Advanced Configuration section of the App Settings page.
Immuta Cannot Encrypt AWS Access/Secret Keys in KMS Configuration
Immuta cannot encrypt the AWS access/secret keys in the KMS configuration, so we recommend using IAM roles.
This option should only be used if Immuta is not running on your AWS infrastructure. For example, if you are running Immuta on-prem and need to leverage a KMS on AWS. For all other scenarios, use one of the two other options above.
Before you begin, create a secret access key and an access key that will authenticate to Immuta.
Navigate to the App Settings page and add the following configuration (with your AWS keyId
, region
, and credentials) to the Advanced Configuration section:
Click the App Settings icon in the left sidebar and scroll to the Advanced Configuration section.
Paste the following configuration in the text box, adjusting dataKeyRollOverLength
days to your desired schedule:
The Immuta Web Service uses ODBC drivers to communicate with back end data platforms. Immuta deployments only include a few ODBC drivers that Immuta is able to distribute. All other drivers that are not redistributable must be obtained and deployed by a System Administrator before Data Owners can use the corresponding data source types in Immuta.
You can install ODBC drivers on the App Settings page.
This driver is included with Immuta.
HiveODBC-2.6.9.1009-1.x86_64.rpm
ImpalaODBC-2.6.8.1008-1.x86_64.rpm
msodbcsql17-17.10.2.1-1.x86_64.rpm
Deprecation notice
Support for this data platform has been deprecated.
msodbcsql17-17.10.2.1-1.x86_64.rpm
This driver is included with Immuta.
SimbaODBCDriverforGoogleBigQuery_2.5.0.1001-Linux.tar.gz
This driver is included with Immuta.
mysql-connector-odbc-8.0.18-glibc2-12-x86_64.rpm
oracle-instantclient19.5-odbc-19.5.0.0.0-1.x86_64.rpm
This driver is included with Immuta.
You may purchase this ODBC driver from Magnitude.
This driver is included with Immuta.
The SAP Hana ODBC driver odbc-2019.01.19.tar.gz
is available as part of your SAP Hana installation. Upload a tar.gz file that contains the ODBC driver for Linux x86 64.
This driver is included with Immuta.
This driver is included with Immuta.
StarburstODBC-64bit-2.0.1.1002.el6.x86_64.rpm
You can obtain this ODBC driver from Starburst Data with a Starburst contract.
Deprecation notice
Support for this data platform has been deprecated.
tdodbc1620-16.20.00.65-1.noarch.rpm
simbatrino-2.1.0.1001-1.el6.x86_64.rpm
You can purchase this ODBC driver from Magnitude.
Audience: System Administrators
Content Summary: This page outlines the on-cluster configurations for Immuta's Hadoop and Spark plugins. Most of these values are consistent across Hadoop providers; however, some values are provider-specific. To learn more about provider-specific deployments, see the installation guides for Cloudera and Amazon EMR.
The NameNode plugin runs on each HDFS NameNode as the hdfs
user. It will have access to any configuration items available to HDFS clients as well as potentially additional configuration items for the NameNode only. The configuration for the NameNode plugin can be placed in an alternate configuration file (detailed below) to avoid leaking sensitive configuration items.
The NameNode plugin configurations can be set in core-site.xml
and hdfs-site.xml
(for NameNode-specific values).
The Vulcan Service is an Immuta service that is mostly relevant to Spark applications. It has its own configuration file (generator.xml
) and also reads all system-wide/client configuration for Hadoop (core-site.xml
).
Clients of HDFS/Hadoop services are Spark jobs, MapReduce jobs, and other user-driven applications in the Hadoop ecosystem. The configuration items for clients can be provided system-wide in core-site.xml
or configured per-job (typically) on the command line or in application/job configuration.
There is an additional generator.xml
file that is created for Spark applications only that contains connection information for the Vulcan Service. Immuta configuration can also be added to spark-defaults.conf
or system-wide application to Spark jobs. Unless otherwise stated, items in spark-defaults.conf
should be prefixed with spark.hadoop.
because they are read from Hadoop configuration.
Public configuration is not sensitive, and is shared by client libraries such as ImmutaApiKeyAuth
and the NameNode plugin (as well as potentially other Immuta and non-Immuta services on the cluster). These configuration items should be in a core-site.xml
file distributed across the cluster and readable by all users.
immuta.generated.api.key.dir
Default: /user
Description: The base directory under which the NameNode plugin will look for generated API keys for use with the Immuta Web Service. The default value is user
with the username and .immuta_generated
added to the end so that each user has their own generated API key directory and the .immuta_generated
directory adds an additional layer of protection so other users can't listen on the /user/<username>
directory to wait for API keys to be generated. This configuration item should never point at a non-HDFS path because attempting to generate credentials outside of HDFS is invalid. This item should be in sync between the NameNode plugin's configuration and client configuration.
immuta.credentials.dir
Default: /user
Description: A directory which will be used to store each user's Immuta API key and token for use with the Immuta Web Service. The user's API key and token are stored this way to avoid re-authenticating frequently with the web service and introducing additional overhead to processes like MapReduce and Spark. Similar to the generated API key directory, this configuration item defaults to /user
with the username of the current user added on. Each user should have a directory under the credentials directory for storing their own credentials. NOTE: It is valid for a user to provide and save their own API key in /user/<username>/immuta_api_key
so that their code does not attempt to generate an API key. It is also valid to override this value with a non-HDFS path in case HDFS is not being used (Spark in a non-HDFS environment, for example); e.g., file:///home/
would point to file:///home/<username>/immuta_api_key
with the user's API key file.
immuta.base.url
Description: The URL at which the Immuta API can be reached. This should be the base URL of the Immuta API.
fs.immuta.impl
Description: This configuration allows users to access the immuta://
scheme in order to have their filesystem built in the same way that the Immuta FUSE filesystem is built. This filesystem is also used in Spark deployments, which read data from external object storage (e.g., S3). This means that users will have consistent filesystem views regardless of where they are accessing Immuta. This is not set by default and must be set to com.immuta.hadoop.ImmutaFileSystem
system-wide in core-site.xml
.
immuta.cluster.name
Default: hostname from fs.defaultFS
Description: This configuration item identifies a cluster to the Immuta Web Service. This is very important because it determines how file access is controlled in HDFS by the NameNode plugin and which data sources are available to a cluster. The default value is taken from fs.defaultFS
and administrators should be advised that when an organization has multiple HA HDFS clusters it is possible that they all have the same nameservice name, so this value should be set on each cluster for identification purposes.
immuta.api.key
Description: (CLIENT ONLY) Users can configure their own API key when running jobs or interacting with an HDFS client, but if an API key is not configured for the user it will be generated on the first attempt to communicate with the Immuta service and stored securely in their credential directory (described above). Immuta uses the Configuration.getPassword()
method to retrieve this configuration item, so it may also be set using the Hadoop CredentialProvider
API.
immuta.permission.fallback.class
Default: org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider
(HDFS 2.6.x/CDH), org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider
(HDFS 2.7+)
Sentry: org.apache.sentry.hdfs.SentryINodeAttributesProvider (HDFS 2.7+)
Description: The configuration key for the fully qualified class name of the fallback permission checking class that will be used after the Immuta authorization or inode attribute provider.
immuta.permission.allow.fallback
Default: false
Description: Denotes the action that the Immuta permission checking classes will take when a user is forbidden access to data in Immuta. If set to true
every time a user is denied access to a file via Immuta their permissions will be checked against the underlying default permission checker, potentially meaning that they will still have access to data that they cannot access via Immuta.
immuta.permission.users.to.ignore
Default: hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,immuta
Description: CSV list of users that will not ever have their HDFS file accesses checked in Immuta. This should include any system superusers to avoid overhead of checking permissions in Immuta that should not be relevant.
immuta.permission.groups.to.ignore
Description: Same as immuta.permission.users.to.ignore
but for groups.
immuta.permission.users.to.enforce
Description: A comma delimited list of users that must go through Immuta when checking permissions on HDFS files. If this configuration item is set, then fallback authorizations will apply to everyone by default, unless they are on this list. If a user is on both the enforce list and the ignore list, then their permissions will be checked with Immuta (i.e., the enforce configuration item takes precedence).
immuta.permission.groups.to.enforce
Description: Same as immuta.permission.users.to.enforce
but for groups.
immuta.permission.paths.to.enforce
Description: A comma delimited list of paths to ignore when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will use fallback authorizations and not go through Immuta. All other paths will be checked with Immuta. Setting both immuta.permission.paths.to.ignore
and immuta.permission.paths.to.enforce
properties at the same time is unsupported.
immuta.permission.paths.to.ignore
Description: A comma delimited list of paths to enforce when checking permissions on HDFS files. If this configuration item is set, then these paths and their children will be checked in Immuta. All other paths will use fallback authorizations. WARNING: Setting this property effectively disables Immuta file permission checking for all paths not in this configuration item. Setting both immuta.permission.paths.to.ignore
and immuta.permission.paths.to.enforce
properties at the same time is unsupported.
immuta.system.details.cache.timeout.seconds
Default: 1800
Description: The number of seconds to cache system detail information from the Immuta Web Service. This should be high since, ideally, the relevant values in Immuta configuration won't change often (or ever).
immuta.permission.workspace.ignored.users
Default: hive,impala
Description: Comma-delimited list of users that should be ignored when accessing workspace directories. This should never have to change since the default Hive and Impala principals are covered, but this can be modified in case of non-standard configuration. This list is separate from the ignored user list above because we do not want to allow access to ignored non-system users who may be operating on a cluster with Immuta installed but who should not be allowed to see workspace data. This should be limited to the principals for Hive and Impala.
The following configuration items are only relevant to the NameNode plugin. These are typically set somewhere like hdfs-site.xml
and for the most part they are not sensitive. There are some highly sensitive configuration items, and those should be set in such a way that only the NameNode process has the ability to read them. Immuta provides one solution for this: have an additional NameNode plugin configuration file that must be configured elsewhere (such as hdfs-site.xml
) and is only readable by the hdfs
user. This will be detailed below.
immuta.extra.name.node.plugin.config
Description: Path to Hadoop-style XML configuration file containing items that will be used by the Immuta NameNode plugin. This item helps to configure sensitive information in a way that will only be readable by the hdfs
user to avoid leaking sensitive configuration to other users. This should be in the form file:///path/to/file.xml
.
immuta.system.api.key
Description: HIGHLY SENSITIVE. This configuration item is used by the NameNode plugin (and the Vulcan Service) to access privileged endpoints of the Immuta API. This is a required configuration item for both the NameNode plugin and Vulcan Service.
immuta.no.data.source.cache.timeout.seconds
Default: 60
Description: The amount of time in seconds that the NameNode plugin will cache the fact that a specific path is not a part of any Immuta data sources.
immuta.hive.impala.cache.timeout.seconds
Default: 60
Description: The amount of time in seconds to cache the fact that a user is subscribed to a Hive or Impala data source containing the target file they are attempting to access.
immuta.canisee.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds to cache the access result from Immuta for a user/path pair.
immuta.specific.access.cache.timeout
Default: 10
Description: The amount of time to temporarily unlock a file in HDFS for a user using temporary access tokens with files backing Hive and Impala data sources in Spark.
immuta.data.source.cache.timeout.seconds
Default: 300
Description: The amount of time in seconds that users' subscribed data sources should be cached in memory to avoid reaching out to Immuta for data sources over and over. Relevant to the Immuta Hadoop client FileSystem and Spark jobs.
immuta.canisee.metastore.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds that the NameNode plugin will cache the fact that a path belongs to a Metastore (Impala or Hive) data source. Reduces network calls from NameNode to Immuta when Vulcan is accessing paths belonging to Metastore sources.
immuta.canisee.non.user.cache.timeout.seconds
Default: 30
Description: The amount of time that the NameNode plugin will cache that a user principal does not belong to an Immuta user. This is useful if the ignored/enforced users/groups configurations are not being used so that when the NameNode receives a 401 response from the canisee endpoint it will store that information and not retry canisee requests to Immuta during that time.
immuta.canisee.num.retries
Default: 1
Description: The number of times to retry access calls from the NameNode plugin to Immuta to account for network issues.
immuta.project.user.cache.timeout.seconds
Default: 300
Description: The amount of time in seconds that the ImmutaGroupsMapping
will cache whether or not a principal is tied to an Immuta user account. This decreases the number of calls from HDFS to Immuta when there are accounts that are not tied to Immuta.
immuta.project.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds that the ImmutaGroupsMapping
will cache project and workspace information for a given project ID. This is also the amount of time a user's current project will be cached.
immuta.project.forbidden.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds that the ImmutaCurrentProjectHelper will cache the fact that a principal tied to an Immuta user is being forbidden from using their current project.
immuta.workspace.deduplication.timeout.seconds
Default: 60
Description: The amount of time to wait before auditing duplicate workspace filesystem actions from HDFS. This is the amount of time the NameNode plugin will wait before a user reading or writing the same path will have duplicate audit records written to Immuta.
immuta.permission.system.details.retries
Default: 5
Description: The number of times the system details background worker will attempt to retrieve system details from the Immuta web service if an attempt fails.
immuta.permission.source.cache.enabled
Default: false
Description: Denotes whether a background thread should be started to periodically cache paths from Immuta that represent Immuta-protected paths in HDFS. Enabling this increases NameNode performance because it prevents the NameNode plugin from calling the Immuta web service for paths that do not back HDFS data sources.
immuta.permission.source.cache.timeout.seconds
Default: 300
Description: The time between calls to sync/cache all paths that back Immuta data sources in HDFS.
immuta.permission.source.cache.retries
Default: 5
Description: The number of times the data source cache background worker will attempt to retry calls to Immuta on failure.
immuta.permission.request.retries
Default: 5
Description: The number of retries that the NameNode plugin will attempt for any blocking web request between HDFS and the Immuta API.
immuta.permission.request.initial.delay.milliseconds
Default: 250
Description: The initial delay for the BackoffRetryHelper that the NameNode plugin will employ for any retries of blocking web requests between HDFS and the Immuta API.
immuta.permission.request.socket.timeout
Default: 1500
Description: The time in milliseconds that the NameNode plugin will wait before cancelling a request to the Immuta API if no data has been read from the HTTP connection. This applies to blocking requests only.
immuta.permission.workspace.base.path.override
Description: This configuration item can be set so that the NameNode does not have to retrieve Immuta HDFS workspace base path periodically from the Immuta API.
The following items are relevant to any Immuta Spark applications using the ImmutaSparkSession
or ImmutaContext
.
immuta.spark.data.source.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds that data source information will be cached in the user's Spark job. This reduces the number of times the client will need to refresh data source information.
immuta.spark.sql.account.expiration
Default: 2880
Description: The amount of time in seconds that temporary SQL account credentials will be valid that are created by the Immuta Spark plugins for accessing data sources via Postgres over JDBC.
immuta.postgres.fetch.size
Default: 1000
Description: The JDBC fetch size used for data sources accessed via Postgres over JDBC.
immuta.postgres.configuration
Description: The configuration key for any extra JDBC options that should be appended to the Immuta Postgres connection by the Immuta SQL Context. An example would include sslfactory=org.postgresql.ssl.NonValidatingFactory
to turn off SSL validation.
immuta.enable.jdbc
Default: false
Description: If true
, allows the user's Spark job to make queries to Immuta's Postgres instance automatically when we detect that the data source is not on cluster and we must pull data back via PG. This can be set per-job, but defaults to false to prevent a user from accidentally (and unknowingly) pulling huge amounts of data over JDBC.
immuta.ephemeral.host.override
Default: true
Description: Set this to false
if ephemeral overrides should not be enabled for Spark. When true
this will automatically override ephemeral data source host names with an auto-detected host name on cluster that should be running HiveServer2. It is assumed HiveServer2 is running on the NameNode.
immuta.ephemeral.host.override.address
Description: This configuration item can be used if automatic detection of Hive's hostname should be disabled in favor of a static hostname to use for ephemeral overrides. This is useful for when your cluster is behind a load balancer or proxy.
immuta.ephemeral.host.override.name-node
Description: In an HA cluster it may be a good idea to specify the NameNode on which Hive is running for ephemeral overrides. This should contain the NameNode from configuration that is hosting HiveServer2.
immuta.secure.truststore.enabled
Default: false
Description: Enables TLS truststore verification. If enabled without a custom truststore it will use the default.
immuta.secure.truststore
Description: Location of the truststore that contains the Immuta Web Service certification.
immuta.secure.truststore.password
Description: Password for the truststore that contains the Immuta Web Service certification.
immuta.spark.visibility.cache.timeout.seconds
Default: 30
Description: The amount of time in seconds the ImmutaContext
or ImmutaSparkSession
will cache visibilities from Immuta. Maximum of 30 seconds.
immuta.spark.visibility.read.timeout.seconds
Default: 300
Description: The socket read timeout for visibility calls to Immuta.
immuta.spark.audit.retries
Default: 2
Description: The number of times to retry audit calls to Immuta from Spark.
immuta.masked.jdbc.optimization.enabled
Default: true
Description: Enables push down filters to postgres. This should only be changed to false if the user is joining to a non-Spark data source (in PostgreSQL) on a masked column.
The following configuration items are needed by the Immuta Vulcan Service. Some of these items are also shared with the NameNode plugin as they work in tandem to protect data in HDFS.
immuta.meta.store.token.dir
Default: /user/<Vulcan Service user>/tokens
Description: The directory in which temporary access tokens for HDFS files backing Hive/Impala data sources will be stored. This needs to be configured for the NameNode plugin as well in order to unlock files in HDFS.
immuta.meta.store.remote.token.dir
Default: /user/<Vulcan Service user>/remotetokens
Description: The directory in which temporary access tokens for remote/object storage (S3, GS, etc) files backing Hive/Impala data sources will be stored.
immuta.spark.partition.generator.user
Default: immuta
Description: The username of the user that will be running the Vulcan Service. This should also be the short username of the Kerberos principal running the Vulcan Service.
immuta.secure.partition.generator.hostname
Default: localhost
Description: The interface/hostname that clients will use to communicate with the Vulcan Service.
immuta.secure.partition.generator.listen.address
Default: 0.0.0.0
Description: The interface/hostname on which the Vulcan Service will listen for connections.
immuta.secure.partition.generator.port
Default: 9070
Description: The port on which the Vulcan Service will listen for connections.
immuta.configuration.id.file.config
Default: hdfs:///user/<Vulcan Service user>/config_id
Description: The file in HDFS where the cluster configuration ID will be stored. This is used to keep track of the unique ID in Immuta tied to the current cluster.
immuta.secure.partition.generator.keystore
Description: Path the keystore file to be used for securing Vulcan Service with TLS.
immuta.secure.partition.generator.keystore.password
Description: The password for the keystore configured with immuta.secure.partition.generator.keystore
.
immuta.secure.partition.generator.keymanager.password
Description: The configuration key for the key manager password for the keystore configured with immuta.secure.partition.generator.keystore
.
immuta.secure.partition.generator.url.external
Default: <NameNode / master hostname>:<Vulcan Service port>
Description: The configuration key for specifying the externally addressable Vulcan Service URL. This URL must be reachable from the Immuta web app. If this is not set, the Vulcan Service will try to determine the URL based on its Hadoop configuration.
immuta.yarn.validation.params
Default: /user/<Vulcan Service user>/yarnParameters.json
Description: The file containing parameters to use when validating YARN applications for secure token generation for file access. When a Spark application requests tokens be generated for file access, the Vulcan Service will validate that the Spark application is configured properly using the parameters from this file.
immuta.emrfs.credential.file.path
Description: For EMR/EMRFS only. This configuration points to a file containing AWS credentials that the Vulcan Service can use for accessing data in S3. This is also useful for Hive/the hive
user so that (if impersonation is turned on) only a few users (hive
and the Vulcan Service user) on cluster can access data in S3 while everyone else is forced through the Vulcan Service.
immuta.workspace.allow.create.table
Default: false
Description: True if the user should be allowed to create workspace tables. Users will not be able to drop their created tables if sentry object ownership is not set to ALL.
immuta.partition.tokens.ttl.seconds
Default: 3600
Description: How long in seconds Immuta temporary file access tokens should live in HDFS before being cleaned up.
immuta.partition.tokens.interval.seconds
Default: 1800
Description: Number of seconds between runs of the token cleanup job which will delete all expired temporary file access tokens from HDFS.
immuta.scheduler.heartbeat.enable
Default: true
Description: True to enable sending configuration to the Immuta Web Service and updating on an interval. This can be set to false
to prevent this cluster from being available in the HDFS configurations dropdown for HDFS data sources as well as prevent it from being used for workspaces. This make sense for ephemeral (EMR) clusters.
immuta.scheduler.heartbeat.initial.delay.seconds
Default: 0
Description: When starting the Vulcan Service, how long in seconds to wait before first sending configuration to the Immuta Web Service.
immuta.scheduler.heartbeat.interval.seconds
Default: 28800
Description: How long in seconds to wait between each configuration update submission to the Immuta Web Service.
immuta.file.session.store.expiration.seconds
Default: 900
Description: Number of seconds that idle remote file sessions will be kept active in the Vulcan Service. This is for spark clients that are reading remote data (S3, GS) via the Vulcan Service.
immuta.file.session.status.expiration.seconds
Default: 300
Description: Number of seconds that the Vulcan Service will cache file statuses from remote object storage.
immuta.file.session.status.max.size
Default: 250
Description: Maximum number of file status objects that the Vulcan Service will cache at one time.
immuta.yarn.api.num.retries
Default: 5
Description: Number of times that the YARN Validator will attempt to contact the YARN resource manager API to validate a Spark application for partition tokens.
Audience: Application Admins
Content Summary: This page details how to use the App Settings page to configure settings for Immuta for your organization.
Click the App Settings icon in the left sidebar.
Click the link in the App Settings panel to navigate to that section.
See the identity manager pages for a tutorial to connect an Microsoft Entra ID, Okta, or OneLogin identity manager.
To configure Immuta to use all other existing IAMs,
Click the Add IAM button.
Complete the Display Name field and select your IAM type from the Identity Provider Type dropdown: LDAP/Active Directory, SAML, or OpenID.
Once you have selected LDAP/Active Directory from the Identity Provider Type dropdown menu,
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Credentials and Options sections. Note: Either User Attribute OR User Search Filter is required, not both. Completing one of these fields disables the other.
Opt to have Case-insensitive user names by clicking the checkbox.
Opt to Enable Debug Logging or Enable SSL by clicking the checkboxes.
In the Profile Schema section, map attributes in LDAP/Active Directory to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account.
Opt to Enable scheduled LDAP Sync support for LDAP/Active Directory and Enable pagination for LDAP Sync. Once enabled, confirm the sync schedule written in Cron rule; the default is every hour. Confirm the LDAP page size for pagination; the default is 1,000.
Opt to Sync groups from LDAP/Active Directory to Immuta. Once enabled, map attributes in LDAP/Active Directory to automatically pull information about the groups into Immuta.
Opt to Sync attributes from LDAP/Active Directory to Immuta. Once enabled, add attribute mappings in the attribute schema. The desired attribute prefix should be mapped to the relevant schema URN.
Opt to enable External Groups and Attributes Endpoint, Make Default IAM, or Migrate Users from another IAM by selecting the checkbox.
Then click the Test Connection button.
Note: If you select Link SQL Account, you will need to update the Query Engine configuration.
Once the connection is successful, click the Test User Login button.
Click the Test LDAP Sync button if scheduled sync has been enabled.
Once you have selected SAML from the Identity Provider Type dropdown menu,
Take note of the ID and copy the SSO Callback URL to use as the ACS URL in your identity provider.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu, and then complete the required fields in the Client Options section.
Opt to Enable SCIM support for SAML by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in SAML to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Link SQL Account, Allow Identity Provider Initiated Single Sign On, Sync groups from SAML to Immuta, Sync attributes from SAML to Immuta, External Groups and Attributes Endpoint, or Migrate Users from another IAM by selecting the checkboxes, and then click the Test Connection button.
Once the connection is successful, click the Test User Login button.
Once you have selected OpenID from the Identity Provider Type dropdown menu,
Take note of the ID. You will need this value to reference the IAM in the callback URL in your identity provider with the format <base url>/bim/iam/<id>/user/authenticate/callback
.
Note the SSO Callback URL shown. Navigate out of Immuta and register the client application with the OpenID provider. If prompted for client application type, choose web.
Adjust Default Permissions granted to users by selecting from the list in this dropdown menu.
Back in Immuta, enter the Client ID, Client Secret, and Discover URL in the form field.
Configure OpenID provider settings. There are two options:
Set Discover URL to the /.well-known/openid-configuration
URL provided by your OpenID provider.
If you are unable to use the Discover URL option, you can fill out Authorization Endpoint, Issuer, Token Endpoint, JWKS Uri, and Supported ID Token Signing Algorithms.
If necessary, add additional Scopes.
Opt to Enable SCIM support for OpenID by clicking the checkbox, which will generate a SCIM API Key.
In the Profile Schema section, map attributes in OpenID to automatically fill in a user's Immuta profile. Note: Fields that you specify in this schema will not be editable by users within Immuta.
Opt to Allow Identity Provider Initiated Single Sign On or Migrate Users from another IAM by selecting the checkboxes.
Click the Test Connection button.
Once the connection is successful, click the Test User Login button.
To set the default permissions granted to users when they log in to Immuta, click the Default Permissions dropdown menu, and then select permissions from this list.
See the External Catalogs page.
Deprecation notice
Support for this feature has been deprecated.
To enable external masking,
Navigate to the App Settings page and click External Masking in the left sidebar.
Click Add Configuration and specify an external endpoint in the External URI field.
Click Configure, and then add at least one tag by selecting from the Search for tags dropdown menu. Note: Tag hierarchies are supported, so tagging a column as Sensitive.Customer
would drive the policy if external masking was configured with the tag Sensitive
).
Select Authentication Method and then complete the authentication fields (when applicable).
Click Test Connection and then Save.
Select Add Workspace.
Use the dropdown menu to select the Workspace Type and refer to the corresponding tab below.
Best Practices: Read-only Access Recommended
It is best practice to use an AWS account with limited read-only access to the data in question, but not necessary.
Use the dropdown menu to select the Schema:
hdfs
Enter the Workspace Base Directory (any project workspaces created will be sub-directories of this path).
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.
s3a
Use the dropdown menu to select the AWS Region.
Enter the S3 Bucket.
Opt to enter the S3 Bucket Prefix.
Opt to Configure S3 Credentials.
Use the dropdown menu to select Authentication Method, and enter the required information.
AWS Access Key: Enter your AWS Access Key ID and AWS Secret Key. Required Permissions: s3:ListBucket, s3:GetObject, and s3:GetObjectTagging.
AWS IAM Instance Role: Opt to Assume AWS IAM Instance Role if you have ListRoles IAM permission or enter the AWS IAM Role ARN manually.
Click Test Workspace Bucket.
Once the credentials are successfully tested, click Save.
Databricks Cluster Configuration
Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables
). Otherwise, an error message will occur when users attempt to create a workspace.
Databricks API Token Expiration
The Databricks API Token used for native workspace access must be non-expiring. Using a token that expires risks losing access to projects that are created using that configuration.
Use the dropdown menu to select the Schema:
Required AWS S3 Permissions
When configuring a native workspace using Databricks with S3, the following permissions need to be applied to arn:aws:s3:::immuta-workspace-bucket/workspace/base/path/*
and arn:aws:s3:::immuta-workspace-bucket/workspace/base/path
Note: Two of these values are found on the App Settings page; immuta-workspace-bucket
is from the S3 Bucket field and workspace/base/path
is from the Workspace Remote Path field:
s3:Get*
s3:Delete*
s3:Put*
s3:AbortMultipartUpload
Additionally, these permissions must be applied to arn:aws:s3:::immuta-workspace-bucket
Note: One value is found on the App Settings page; immuta-workspace-bucket
is from the S3 Bucket field:
s3:ListBucket
s3:ListBucketMultipartUploads
s3:GetBucketLocation
Enter the Name.
Click Add Workspace
Enter the Hostname.
Opt to enter the Workspace ID (required with Azure Databricks).
Enter the Databricks API Token.
Use the dropdown menu to select the AWS Region.
Enter the S3 Bucket.
Opt to enter the S3 Bucket Prefix.
Click Test Workspace Bucket.
Once the credentials are successfully tested, click Save.
Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, Databricks API Token, and Storage Container.
Enter the Workspace Base Directory.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.
Enter the Name.
Click Add Workspace.
Enter the Hostname, Workspace ID, Account Name, and Databricks API Token.
Use the dropdown menu to select the Google Cloud Region.
Enter the GCS Bucket.
Opt to enter the GCS Object Prefix.
Click Test Workspace Directory.
Once the credentials are successfully tested, click Save.
Select Add Native Integration.
Use the dropdown menu to select the Integration Type:
To enable Azure Synapse Analytics, see the Azure Synapse Analytics Integration page.
To enable Starburst (Trino), see the Starburst (Trino) installation page.
To enable Redshift, see the Redshift installation guide.
To enable Snowflake, see the Snowflake integration guide.
To enable Databricks Spark, see the Simplified Databricks Configuration guide.
You can enable or disable the types of data sources users can create in this section. Some of these types will require you to upload an ODBC driver before they can be enabled. The list of currently supported drivers is on the ODBC Drivers page.
To enable a data provider,
Click the menu button in the upper right corner of the provider icon you want to enable.
Select Enable from the dropdown.
If an ODBC driver needs to be uploaded,
Click the menu button in the upper right corner of the provider icon, and then select Upload Driver from the dropdown.
Click in the Add Files to Upload box and upload your file.
Click Close.
Click the menu button again, and then select Enable from the dropdown.
Application Admins can configure the SMTP server that Immuta will use to send emails to users. If this server is not configured, users will only be able to view notifications in the Immuta console.
To configure the SMTP server,
Complete the Host and Port fields for your SMTP server.
Enter the username and password Immuta will use to log in to the server in the User and Password fields, respectively.
Enter the email address that will send the emails in the From Email field.
Opt to Enable TLS by clicking this checkbox, and then enter a test email address in the Test Email Address field.
Finally, click Send Test Email.
Once SMTP is enabled in Immuta, any Immuta user can request access to notifications as emails, which will vary depending on the permissions that user has. For example, to receive email notifications about group membership changes, the receiving user will need the GOVERNANCE
permission. Once a user requests access to receive emails, Immuta will compile notifications and distribute these compilations via email at 8-hour intervals.
To configure Immuta to protect data in a kerberized Hadoop cluster,
Upload your Kerberos Configuration File, and then you can add modify the Kerberos configuration in the window pictured below.
Upload your Keytab File.
Enter the principal Immuta will use to authenticate with your KDC in the Username field. Note: This must match a principal in the Keytab file.
Adjust how often (in milliseconds) Immuta needs to re-authenticate with the KDC in the Ticket Refresh Interval field.
Click Test Kerberos Initialization.
To improve performance when using Immuta to secure Spark or HDFS access, a user's access level is cached momentarily. These cache settings are configurable, but decreasing the Time to Live (TTL) on any cache too low will negatively impact performance.
To configure cache settings, enter the time in milliseconds in each of the Cache TTL fields.
You can set the URL users will use to access the Immuta Application and Query Engine. Note: Proxy configuration must be handled outside Immuta.
Complete the Public Immuta URL, Public Query Engine Hostname, and Public Query Engine Port fields.
Opt to Enable SSL by clicking this checkbox.
To enable Sensitive Data Discovery and configure its settings, see the Sensitive Data Discovery page.
Click the Allow Policy Exemptions checkbox to allow users to specify who can bypass all policies on a data source.
Immuta merges multiple Global Subscription policies that apply to a single data source; by default, users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
). To change the default behavior to allow users to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
),
Click the Default Subscription Merge Options text in the left pane.
Select the Default "allow shared policy responsibility" to be checked checkbox.
Click Save.
Note: Even with this setting enabled, Governors can opt to have their Global Subscription policies combined with AND
during policy creation.
These options allow you to restrict the power individual users with the GOVERNANCE and USER_ADMIN permissions have in Immuta. Click the checkboxes to enable or disable these options.
You can create custom permissions that can then be assigned to users and leveraged when building subscription policies. Note: You cannot configure actions users can take within the console when creating a custom permission, nor can the actions associated with existing permissions in Immuta be altered.
To add a custom permission, click the Add Permission button, and then name the permission in the Enter Permission field.
To create a custom questionnaire that all users must complete when requesting access to a data source, fill in the following fields:
Opt for the questionnaire to be required.
Key: Any unique value that identifies the question.
Header: The text that will display on reports.
Label: The text that will display in the questionnaire for the user. They will be prompted to type the answer in a text box.
To create a custom message for the login page of Immuta, enter text in the Enter Login Message box. Note: The message can be formatted in markdown.
Opt to adjust the Message Text Color and Message Background Color by clicking in these dropdown boxes.
Click the Generate Key button.
Save this API key in a secure location.
Without Fingerprints Some Policies Will Be Unavailable.
Disabling the collection of statistics will prevent the Immuta Query Engine cost-based optimizer from correctly estimating query plan costs. Additionally, these policies will be unavailable until a data owner manually generates a fingerprint:
Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response
To disable the automatic collection of statistics with a particular tag,
Use the Select Tags dropdown to select the tag(s).
Click Save.
Users can add password requirements for SQL credentials in this section, including minimum length, maximum length, and minimum number of symbols, numbers, uppercase, and lowercase characters. This will set password requirements for any new SQL credentials or any updated passwords. Previous Query Engine accounts will not be subject to the newly applied password restrictions.
To ensure that users are not easily compromised, minimum password requirements have been added as default. The password requirements can be changed to fit certain standards.
If you enable any Preview features, please provide feedback on how you would like these features to evolve.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Click Save.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Policy Adjustments checkbox.
Check the Enable Health Expert Determination checkbox.
Click Save.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Allow Complex Data Types checkbox.
Click Save.
For instructions on enabling this feature, navigate to the Global Subscription Policies Advanced DSL Tutorial.
Advanced configuration options provided by the Immuta Support team can be added in this section. The configuration must adhere to the YAML syntax.
To increase the default cardinality cutoff for columns compatible with k-anonymity,
Expand the Advanced Settings section and add the following text to the Advanced Configuration:
Click Save.
To regenerate the data source's fingerprint, navigate to that data source's page.
Click the Status in the upper right corner.
Click Re-run in the Fingerprint section of the dropdown menu.
Note: Re-running the fingerprint is only necessary for existing data sources. New data sources will be generated using the new maximum cardinality.
Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of seconds before webhook requests timeout. For example use 30
for 30 seconds. Setting it to 0
will result in no timeout.
Click Save.
Expand the Advanced Settings section and add the following text to the Advanced Configuration to specify the number of minutes before an audit job expires. For example use 300
for 300 minutes.
Click Save.
By default, Immuta includes the query text in audit records, which requires it to transfer from the data platform to Immuta. To turn off this feature and stop audit records from including query text,
Expand the Advanced Settings section and add the following text to the Advanced Configuration:
Click Save.
The Query Engine grants Immuta user accounts proxied data query access to Immuta data sources through the Immuta API and the Query Editor in the Immuta UI.
Application Administrators can turn off the Query Engine to ensure data does not leave a data localization zone when authorized users access the Immuta Application outside data jurisdiction.
To disable this feature,
Click Advanced Settings in the left panel, and scroll to the Administer Features section.
Select the Disable radio button and click Save.
Click Confirm to deploy your changes.
When you are ready to finalize your configuration changes, click the Save button at the bottom of the left panel, and then click Confirm to deploy your changes.
Audience:System Administrators
Content Summary: This page explains the benefits of enabling ImmutaGroupsMapping. It also gives the prerequisites and configurations required to enable it.
Hadoop has the concept of a , which is a way to retrieve groups corresponding to a provided user/Kerberos principal. By default, Hadoop services (HDFS, Hive, Impala, etc.) will retrieve a user's groups from the local OS by way of the ShellBasedUnixGroupsMapping
class.
With Immuta, this data can be enriched to include the user's current project. This can be helpful for systems where current project information could help provide access to data. For example, in Impala it is possible to GRANT access to a database or tables based on a user's membership in an Immuta project group. This way a user could read information from tables only when acting in the target project.
Immuta project group names are simply immuta_project_<project_id>
where project_id
is just the Immuta project's ID.
In Impala it is important that the auth_to_local
setting is enabled in order to map Kerberos principals to short usernames in order to properly retrieve groups from Immuta for the corresponding principal. For example, Impala should map bob/my.host.name@REALM
to bob
in order to properly map bob
to the corresponding Immuta user account to determine the current project group (if any) for bob
.
If Immuta HDFS Native Workspaces are being created on the target cluster, then the Immuta Partition Service principal needs to be a Sentry Admin user in order to CREATE
databases and roles for use by Immuta.
If administrators want to allow users to CREATE
non-data-source tables in the workspace database, the immuta.workspace.allow.create.table
configuration option must be set to true
for the Partition Service in generator.xml
. It is also recommended that Sentry Object Ownership is enabled and set to ALL
in this scenario, which allows users to DROP
their own tables. If Object Ownership is not enabled, users will not be able to DROP
tables and a Sentry Admin would need to clean up old tables.
For Hive, it is required that the ImmutaGroupsMapping service jar is added to the classpath for YARN applications. This can be done by updating the yarn.application.classpath
configuration value in yarn-site.xml
. In Cloudera manager the value /opt/cloudera/parcels/IMMUTA/lib/immuta-group-mapping.jar
should be added under YARN Application Classpath
in Yarn's Cloudera Manager configuration page. Note that if you are using a non-standard parcel directory, you should replace /opt/cloudera/parcels/
with your custom directory.
It's a good idea to start by checking the existing Group Mapping Service in the configuration item hadoop.security.group.mapping
. If this is not found in your configuration, the default is org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback
.
To use ImmutaGroupsMapping alongside another Group Mapping Service, there is an implementation called org.apache.hadoop.security.CompositeGroupsMapping
. This Group Mapping Service takes the results of multiple Group Mapping Services and combines them. If CompositeGroupsMapping
is already being used before adding ImmutaGroupsMapping, simply add ImmutaGroupsMapping as another provider in configuration. This will be detailed below.
To enable the ImmutaGroupsMapping, the following configuration needs to be added to Hadoop XML configuration for any target systems where the groups mapping should be applied.
If this should be applied across the cluster it should be added to the system-wide core-site.xml
file.
If it is only being applied to a single system (Impala for instance), then it should be added to an XML file specifically for Impala.
Best Practice: Group Mapping Service
The group mapping service should be applied only to target systems requiring Immuta project groups to be determined for context-aware permissions. This can typically be limited to Hive and/or Impala. In Cloudera Manager this configuration can be added to Impala Daemon Advanced Configuration Snippet (Safety Valve) for core-site.xml
and/or Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml
.
The following configuration shows the ImmutaGroupsMapping provider being used alongside the JniBasedUnixGroupsMappingWithFallback
provider.
If the group mapping service is being configured for a specific service (i.e., Hive or Impala), it is critical that immuta.system.api.key
is also configured for that target service. The ImmutaGroupsMapping
provider requires the system API key in order to retrieve user group details. Add something like the following to the properties defined above.
By default, Hadoop's group services cache the retrieved groups for 5 minutes. This may not be desirable for Immuta deployments using group mapping because switching project contexts would then take up to 5 minutes to have an effect on the cluster. In order to lower this cache time, add the following configuration to the same file as above:
Audience: Application Admins
Content Summary: Immuta uses the tool to manage streaming replication of the PostgreSQL database. You need to interact with the Patroni API in order to change PostgreSQL setting.
This page outlines how to change the PostgreSQL settings, connect to the database container, modify the configuration, and apply changes and restart the cluster.
The easiest way to interact with the Patroni API is using the tool patronictl
that is installed in the Immuta database docker containers. Updating the PostgreSQL settings involves 3 processes:
.
.
.
Use kubectl
to determine the name of the pod running the PostgreSQL master:
For the metadata database:
For the Immuta Query Engine database:
Make note of the pod name, which will be used when connecting:
The following steps should be executed from within this context.
Once inside the database container, you can run patronictl
to modify the configuration.
A few environment variables must be exported for the patronictl edit-config
command to run successfully:
The following command will then open up the PostgreSQL configuration in vi
for editing.
Optional Flags: edit-config
can be used with other flags set to specific values
-q
, --quiet
: Do not show changes.
-s
, --set
: With additional text, this parameter will set a specific configuration value. Can be specified multiple times.
-p
, --pg
: With additional text, this parameter will set the specific PostgreSQL parameter value. Can be specified multiple times.
Make any changes, and then close the vi
session by saving the configuration (Type <ESC>:wq
).
You will be asked if you would like to apply the changes; type y
and press enter.
Now that changes have been made you can push these changes out using patronictl restart
.
Get the Patroni cluster name:
The cluster name is the first column in the result. There should be only one unique value. Use this cluster name in the call to patronictl restart
.
If you have modified the value of max_connections
, then you should use the following command to restart the master instance only; the changes will propagate to the replicas automatically:
If you have not modified the value of max_connections
you can simply run the following command: