Immuta Performance Optimization on CDH Clusters
Audience: System Administrators
Content Summary: This page describes strategies for improving performance of Immuta's NameNode plugin on CDH clusters.
Immuta operates within a locked operation in the NameNode when granting / denying permissions based on Immuta policies. This section contains configuration and strategies to prevent RPC queue latency, threads waiting, or other issues on cluster-wide file permission checks.
Isolated HDFS Namespace
Best Practice: NameNode Plugin Configuration
Immuta recommends only configuring the NameNode Plugin to check permissions on the NameNode(s) that oversee the data that you want to protect.
For example, say that you currently have a federated HDFS NameNode architecture with three
nameservice3. The HDFS federation in this
example is distributed across these nameservices as described below.
Suppose you know that all the sensitive data that you want to protect with Immuta is located
/data3. To achieve optimum performance in this case, you can go ahead and add the
Immuta NameNode-only configuration (
hdfs-site.xml) to the role config group for
and leave it out of
nameservice2. The public / client Immuta configuration
core-site.xml) should still be configured cluster-wide. See
Immuta CDH Integration Installation for more details about these configuration
One caveat to take into consideration here is that Immuta's Vulcan service requires the
Immuta NameNode Plugin to oversee user credentials that are stored in
default. Vulcan also stores some configuration under
/user/immuta by default.
This is a problem because
/user resides under
nameservice1, and the goal is to
only operate the Immuta NameNode Plugin on
A simple solution to this problem is to create a new directory for these credentials,
/data3/immuta_creds for example, and configure the NameNode Plugin and the
Vulcan service to use this directory instead of
/user. Changing this requires the
configuration modifications listed below.
HDFS - Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
Immuta - Immuta Spark 2 Vulcan Server Advanced Configuration Snippet(Safety Valve) for session/generator.xml
Note that you will need to manually create the
/data3/immuta_creds/immuta directory and set the permissions
such that only the
immuta user can read / write in that directory. The
/data3/immuta_creds directory should also
be world writable to allow user directories to be created the first time that they interact with Immuta on
Essential Performance Tuning Settings
- Description: A comma delimited list of paths to enforce when checking permissions on HDFS files.
This ensures that API calls to the Immuta web service are only made when permissions are being checked on
the paths that you specify in this configuration. This also means that you can only create data sources against
data that lives under these paths, and the Immuta Workspace must be under one of these paths as well.
immuta.permission.paths.to.ignorecan be set to a list of paths that you know do not contain Immuta data - then API calls will never be made against those paths. Setting both
immuta.permission.paths.to.enforceproperties at the same time is unsupported.
- Description: A comma delimited list of paths to enforce when checking permissions on HDFS files. This ensures that API calls to the Immuta web service are only made when permissions are being checked on the paths that you specify in this configuration. This also means that you can only create data sources against data that lives under these paths, and the Immuta Workspace must be under one of these paths as well. Alternatively,
- Description: A comma delimited list of groups that must go through Immuta when checking permissions on HDFS files. If this configuration item is set, then fallback authorizations will apply to everyone by default, unless they are in a group on this list. If a user is on both the enforce list and the ignore list, then their permissions will be checked with Immuta (i.e., the enforce configuration item takes precedence). This may improve NameNode performance by only making permission check API calls for the subset of users who fall under Immuta enforcement.
- Description: Denotes whether a background thread should be started to periodically cache paths from Immuta
that represent Immuta-protected paths in HDFS. Enabling this increases NameNode performance because it prevents
the NameNode plugin from calling the Immuta web service for paths that do not back HDFS data sources.
For performance optimization, it is best to enable this cache to act as a "backup" to
- Description: Denotes whether a background thread should be started to periodically cache paths from Immuta that represent Immuta-protected paths in HDFS. Enabling this increases NameNode performance because it prevents the NameNode plugin from calling the Immuta web service for paths that do not back HDFS data sources. For performance optimization, it is best to enable this cache to act as a "backup" to
- Description: The time between calls to sync/cache all paths that back Immuta data sources in HDFS. You can increase this value to further reduce the number of API calls made from the NameNode.
- Description: This configuration item can be set so that the NameNode does not have to retrieve the Immuta HDFS workspace base path periodically from the Immuta API.
Advanced Cache and Network Settings
There are also a wide variety of cache and network settings that can be used to fine-tune performance. You can refer to the Configuration Guide for details on each of these items.
Debugging Suspected Performance Issues
See Immuta Log Analysis Tool for CDH Deployments for instructions on how to identify performance issues in the Immuta NameNode Plugin.