Skip to content

Immuta HDFS Plugin Installation

Audience: System Administrators

Content Summary: The Immuta HDFS plugin installation consists of two main components:

  • Immuta INode Attribute Provider
  • Immuta Hadoop Filesystem

This illustrates the installation of these components on a Hadoop cluster.

Installation Prerequisites

Before proceeding with installation, an Immuta System API key will need to be generated for the NameNode to communicate securely with the Immuta Web Service. To do so, run the following command. You do not need to store this key, but it will need to be written to the configuration files for Hadoop and all instances of the Immuta Web Service.

The following reads random bytes from /dev/urandom, taking the first 30 alphanumeric characters:

cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 30 | head -n 1

The key that is generated will be referred to as HDFS_SYSTEM_TOKEN throughout this guide.

Installation

The two main components that will need to be installed on the Hadoop cluster are an Immuta INode Attribute Provider and the Immuta Hadoop Filesystem.

Installation consists of placing the jar files on the Hadoop classpath. This can be accomplished a number of ways. For the purposes of this document, it is assumed that the jar files will be copied to a directory on the host and that the HADOOP_CLASSPATH variable will be updated to include these jars. Updating the HADOOP_CLASSPATH variable will be covered under the Configuration section.

Setup

Each node in the cluster will need to have Immuta jar files added to the Hadoop classpath. It is entirely possible to install the jars to an existing directory. Jars can also be installed to a new directory, such as /opt/immuta/hadoop.

mkdir -p /opt/immuta/hadoop

If a new directory is created, the hadoop user will need to have read access to the directory and files contained within the directory. The classpath will also need to be updated by setting HADOOP_CLASSPATH in ${HADOOP_CONF_DIR}/hadoop-env.sh.

Immuta INode Attribute Provider (all NameNodes)

The Immuta INode Attribute Provider will be installed on all NameNodes.

Place the Immuta INode Attribute Provider jar in the installation directory, in this case /opt/immuta/hadoop/, on each NameNode.

Immuta Hadoop Filesystem (all nodes)

The Immuta Hadoop Filesystem jar needs to be installed on all nodes in the Hadoop cluster. To install the jar, place it in the installation directory, /opt/immuta/hadoop/.

Configuration

There are a few steps required in configuring the Hadoop cluster to enable the Immuta INode Attribute Provider and Hadoop filesystem access.

Once these changes are persisted to the Hadoop configuration, the cluster services must be restarted.

Setting up the Classpath

The classpath can be updated by setting HADOOP_CLASSPATH in ${HADOOP_CONF_DIR}/hadoop-env.sh.

For example,

HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:/opt/immuta/hadoop/immuta-hadoop-filesystem-2020.2.8.jar:/opt/immuta/hadoop/immuta-inode-attribute-provider-2020.2.8.jar"

Shared Configuration

The following configuration items should be configured for both the NameNode processes and the DataNode processes. These configurations are used both by the Immuta FileSystem and the Immuta NameNode plugin:

  • immuta.base.url: The base URL of the Immuta API.
    • Example: https://immuta.hostname
  • immuta.spark.partition.generator: This configuration item MUST be set to "secure" in order for the ImmutaContext to communicate with the Partition Service rather than attempting to generate partitions itself.
    • Example: secure
  • immuta.secure.partition.generator.hostname: Connection information for the Partition Service. This would point to the node you configured the Immuta Partition Service to run on, or if it is running on every node you can use localhost.
    • Example: localhost
  • immuta.partition.tokens.ttl.seconds: This is the TTL for tokens generated by the Partition Service.
    • Example: 3600
  • immuta.yarn.api.host.port: This option must be set to the host/port that the YARN resource manager service is running.
    • Example: http://master:8088
  • immuta.credentials.dir: This directory must contain a directory with each users' username. The directory must be owned by the user, and readable only by that user. An Immuta credential file will be written that is readable only by the owning user.
    • Example: /user
  • fs.immuta.impl: The class used for the immuta file system protocol.
    • Example: com.immuta.hadoop.ImmutaFileSystem
  • hadoop.proxyuser.<immuta service principal>.hosts: The configuration that allows the Immuta service principal to proxy other hosts.
    • Example: *
  • hadoop.proxyuser.<immuta service principal>.users: The configuration that allows the Immuta service principal to proxy end-users.
    • Example: *
  • hadoop.proxyuser.<immuta service principal>.groups: The configuration that allows the Immuta service principal to proxy user groups.
    • Example: *

Make sure that user directories underneath immuta.credentials.dir are readable only by the owner of the directory. If the user's directory doesn't exist and we create it, we will set the permissions to 700.

<property>
    <name>immuta.base.url</name>
    <value>https://immuta.hostname</value>
    <final>true</final>
</property>
<property>
    <name>immuta.spark.partition.generator</name>
    <value>secure</value>
    <final>true</final>
</property>
<property>
    <name>immuta.secure.partition.generator.hostname</name>
    <value>localhost</value>
    <final>true</final>
</property>
<property>
    <name>immuta.yarn.api.host.port</name>
    <value>http://master:8088</value>
    <final>true</final>
</property>
<property>
    <name>immuta.credentials.dir</name>
    <value>/user</value>
    <final>true</final>
</property>
<property>
    <name>fs.immuta.impl</name>
    <value>com.immuta.hadoop.ImmutaFileSystem</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.*.hosts</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.*.users</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.*.groups</name>
    <value>*</value>
    <final>true</final>
</property>

NOTE: We recommend that all Immuta configuration values be marked final.

NameNode only Configuration

The following settings should only be written to the configuration on the NameNode. Setting these values on DataNodes will have security implications, so be sure that they are set in the NameNode only section of your Hadoop configuration tool.

  • dfs.namenode.inode.attributes.provider.class: Configure Hadoop to use the Immuta Inode Attribute Provider.
    • Example: com.immuta.hadoop.ImmutaInodeAttributeProvider
  • immuta.permission.fallback.class: This class will be used as a fallback authorization/permission checker if Immuta is not protecting the target directory. This will also be used if fallback is explicitly enabled. If the deployment also requires Sentry, this should be set to org.apache.sentry.hdfs.SentryINodeAttributesProvider.
    • Example: org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider
  • immuta.permission.allow.fallback: Set to true if a user's access should be determined by the permission fallback class even if they are explicitly denied access by Immuta. This is a dangerous setting in that a user may be forbidden from seeing data through Immuta but still see the data in HDFS.
    • Example: false
  • immuta.system.api.key: This must be set to the value of the hdfsSystemToken configuration item in Immuta. This API key is used to create user API keys in Immuta, so it is important that it can be trusted and cannot be accessed by users. This must be set when using the Immuta FileSystem. Use the value of HDFS_SYSTEM_TOKEN generated earlier.
    • Example: mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
  • immuta.permission.users.to.ignore: Comma separated list of hdfs user accounts which will bypass the Immuta authorization provider. The final listed user immuta should be replaced with the principal being used as the Immuta system user. This should match the principal in the username configuration mentioned below under Immuta Web Service configuration.
    • Example: hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta
<property>
    <name>dfs.namenode.inode.attributes.provider.class</name>
    <value>com.immuta.hadoop.ImmutaInodeAttributeProvider</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.fallback.class</name>
    <value>org.apache.hadoop.hdfs.server.namenode.DefaultINodeAttributesProvider</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.allow.fallback</name>
    <value>false</value>
    <final>true</final>
</property>
<property>
    <name>immuta.system.api.key</name>
    <value>mYIUy6REcWrnW1mtVjZpuZiyyRFVj3</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.users.to.ignore</name>
    <value>hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta</value>
    <final>true</final>
</property>

Note: We recommend that all Immuta configuration values be marked final.

Enabling TLS for the Immuta Partition Service

You can enable TLS on the Immuta Partition Service by configuring it to use a keystore in JKS format.

These settings should be set in the HDFS configuration file core-site.xml.

  • immuta.secure.partition.generator.keystore: The path to the Immuta Partition Service keystore.
    • Example: /etc/immuta/keystore.jks
  • immuta.secure.partition.generator.keystore.password: The password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
    • Example: secure_password
  • immuta.secure.partition.generator.keystore.password: The password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
    • Example: secure_password
  • immuta.secure.partition.generator.keymanager.password: The KeyManager password for the Immuta Partition Service keystore. Note this will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file. This is not always necessary.
    • Example: secure_password

As noted above, currently the keystore password must be set in core-site.xml, which is publicly accessible. We recommend using file permissions to secure the keystore from improper access.

chown immuta:immuta /etc/immuta/keystore.jks
chmod 600 /etc/immuta/keystore.jks

Example configuration:

<property>
    <name>immuta.secure.partition.generator.keystore</name>
    <value>/etc/immuta/keystore.jks</value>
    <final>true</final>
</property>
<property>
    <name>immuta.secure.partition.generator.keystore.password</name>
    <value>secure_password</value>
    <final>true</final>
</property>
<property>
    <name>immuta.secure.partition.generator.keymanager.password</name>
    <value>secure_password</value>
    <final>true</final>
</property>

Note: We recommend that all Immuta configuration values be marked final.

Impala Configuration

You must give the service principal that the Immuta Web Service is configured to use permission to delegate in Impala. To accomplish this, add the Immuta Web Service principal to authorized_proxy_user_config in the Impala daemon command line arguments:

-authorized_proxy_user_config=<immuta web service principal>=*

Note: If the authorized_proxy_user_config parameter is already present for other services, append the Immuta configuration value to the end.

-authorized_proxy_user_config=hue=*;&lt;immuta web service principal&gt;=*

Spark Configuration

In spark-conf/spark-defaults.conf configure:

Note: Enabling Immuta's Spark Access Pattern in spark-defaults.conf will cause all Spark based tools such as Hive on Spark to not function properly. Skip this step if you are using such tools.

spark.broadcast.factory=org.apache.spark.broadcast.ImmutaSerializableBroadcastFactory
spark.executor.extraClassPath=/opt/immuta/immuta-hadoop-filesystem-2020.2.8.jar:/opt/immuta/immuta-spark-context-2020.2.8.jar
spark.driver.extraClassPath=/opt/immuta/immuta-hadoop-filesystem-2020.2.8.jar:/opt/immuta/immuta-spark-context-2020.2.8.jar
spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=/user/immuta/allowedCallingClasses.json
spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=/user/immuta/allowedCallingClasses.json
spark.hadoop.fs.hdfs.impl=com.immuta.hadoop.ImmutaSparkTokenFileSystem

In spark-conf/spark-env.sh configure,

export PYTHONPATH=/opt/immuta/python/ImmutaContext.py
export PYTHONSTARTUP=/opt/immuta/python/initialize.py

Immuta Web Service configuration

The Immuta Web Service needs to be updated to support the HDFS plugin. Update /etc/immuta/config.yml on the Web Service nodes with the following values.

  • hdfsSystemToken: The token used by NameNode plugin to authenticate with the Immuta REST API. This must equal the value set in immuta.system.api.key. Use the value of HDFS_SYSTEM_TOKEN generated earlier.
    • Example: mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
  • kerberos
    • ticketRefreshInterval: Time in milliseconds to wait between kinit executions. This should be lower than the ticket refresh interval required by the kerberos server.
      • Example: 43200000
    • username: User principal used for kinit.
      • Example: immuta
    • keyTabPath: The path to the keytab file on disk to be used for kinit
      • Example: /etc/immuta/immuta.keytab
    • krbConfigPath: The path to the krb5 configuration file on disk.
      • Example: /etc/krb5.conf
    • krbBinPath: The path to the Kerberos installation binary directory.
      • Example: /usr/bin/
  • client
    • kerberosRealm: The default realm to use for kerberos authentication.
      • Example: YOURCOMPANY.COM
...
client:
    kerberosRealm: YOURCOMPANY.COM
plugins:
    ...
    hdfsHandler:
        ...
        hdfsSystemToken: mYIUy6REcWrnW1mtVjZpuZiyyRFVj3
kerberos:
    ticketRefreshInterval: 43200000
    username: immuta
    keyTabPath: /etc/immuta/immuta.keytab
    krbConfigPath: /etc/krb5.conf
    krbBinPath: /usr/bin/
    ...

You must also be sure that the /etc/krb5.conf configuration on the Immuta Web Service nodes is accurate.

The Web Service must be restarted after making these changes.