1 of 3

Legacy Integrations

This section includes how-to guides for Immuta's legacy integrations.

Securing Hive and Impala without Sentry

Immuta offers both fine- and coarse-grained protection for Hive and Impala tables. However, additional protections are required to ensure that users cannot gain unauthorized access to data by connecting to Hive or Impala directly. Cloudera recommends using the Sentry service to secure access to Hive and Impala. As an alternative, this guide details steps that CDH cluster administrators can take to lock down Hive and Impala access without running the Sentry service.

Enabling ImmutaGroupsMapping

Hadoop has the concept of a Group Mapping Service, which is a way to retrieve groups corresponding to a provided user/Kerberos principal. By default, Hadoop services (HDFS, Hive, Impala, etc.) retrieve a user's groups from the local OS by way of the ShellBasedUnixGroupsMapping class. This guide shows you how to enrich this data to include the user's current project in Immuta.

Securing Hive and Impala Without Sentry

Each section in this guide is a required step to ensure that access to Hive and Impala is secured.

Restricting Access to Hive

After installing Immuta on your cluster, users will still be able to connect to Hive via the hive shell, beeline, or JDBC/ODBC connections. To prevent users from circumventing Immuta and gaining unauthorized access to data, you can leverage HDFS Access control lists (ACLs) without running Sentry.

Enable HDFS Access Control Lists in Cloudera Manager

See the official Cloudera Documentation to complete this step.

Enable Hive Impersonation in Cloudera Manager

In order to leverage ACLs to secure Hive, Hive impersonation must be enabled. To enable Hive impersonation in Cloudera manager, set hive.server2.enable.impersonation, hive.server2.enable.doAs to true in the Hive service configuration.

Configure Access Control Lists

Group in this context refers to Linux groups, not Sentry groups.

You must configure ACLs for each location in HDFS that Hive data will be stored in to restrict access to hive, impala, and data owners that belong to a particular group. You can accomplish this by running the commands below.

hadoop fs -setfacl -m other::--- /user/hive/warehouse
hadoop fs -setfacl -m user::rwx /user/hive/warehouse
hadoop fs -setfacl -m group::rwx /user/hive/warehouse
hadoop fs -setfacl -m group:hive:rwx /user/hive/warehouse
hadoop fs -setfacl -m group:examplegroup:rwx /user/hive/warehouse

In this example, we are allowing members of the hive and examplegroup to select & insert on tables in hive. Note that the hive group only contains the hive and impala users, while examplegroup contains the privileged users who would be considered potential data owners in Immuta.

By default, Hive stores data in HDFS under /user/hive/warehouse. However, you can change this directory in the above example if you are using a different data storage location on your cluster.

Restricting Access to Impala

After installing Immuta on your cluster, users will still be able to connect to Impala via impala-shell or JDBC/ODBC connections. To prevent users from circumventing Immuta and gaining unauthorized access to data, you can leverage policy configuration files for Impala without running Sentry.

Create Policy Configuration File

Group in this context refers to Linux groups, not Sentry groups.

The policy configuration file that will drive Impala's security must be in .ini format. The example below will grant users in group examplegroup the ability to read and write data in the default database. You can add additional groups and roles that correspond to different databases or tables.

[groups]
examplegroup = example_insert_role, example_select_role

[roles]
example_insert_role = server=server1->db=default->table=*->action=insert
example_select_role = server=server1->db=default->table=*->action=select

This policy configuration file assigns the group called examplegroup to the roles example_insert_role and example_select_role, which grant insert and select (read and write) privileges on all tables in the default database.

See the official Impala documentation for a detailed guide on policy configuration files. Note that while the guide mentions Sentry, running the Sentry service is not required to leverage policy configuration files.

Next, place the policy configuration file (we will call it policy.ini) in HDFS. The policy file should be owned by the impala user, and should only be accessible by the impala user. See below for an example.

hadoop fs -copyFromLocal /tmp/policy.ini /user/impala/
hadoop fs -chown impala:impala /user/impala/policy.ini
hadoop fs -chmod o-rwx /user/impala/policy.ini

Configure Impala to use Policy Configuration File

You can configure Impala to leverage your new policy file by navigating to Impala's configuration in Cloudera Manager and modifying Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve) with the snippet below.

-server_name=server1
-authorization_policy_file=/user/impala/policy.ini

You must restart the Impala service in Cloudera Manager to implement the policy changes. Note that server_name should correspond to the server that you define in your policy roles. Also note that each key-value pair should be placed on its own line in the configuration snippet.

Enabling ImmutaGroupsMapping

Introduction

Hadoop has the concept of a , which is a way to retrieve groups corresponding to a provided user/Kerberos principal. By default, Hadoop services (HDFS, Hive, Impala, etc.) will retrieve a user's groups from the local OS by way of the ShellBasedUnixGroupsMapping class.

With Immuta, this data can be enriched to include the user's current project. This can be helpful for systems where current project information could help provide access to data. For example, in Impala it is possible to GRANT access to a database or tables based on a user's membership in an Immuta project group. This way a user could read information from tables only when acting in the target project.

Group Naming

Immuta project group names are simply immuta_project_<project_id> where project_id is just the Immuta project's ID.

Configuration Prerequisites

In Impala it is important that the auth_to_local setting is enabled in order to map Kerberos principals to short usernames in order to properly retrieve groups from Immuta for the corresponding principal. For example, Impala should map bob/my.host.name@REALM to bob in order to properly map bob to the corresponding Immuta user account to determine the current project group (if any) for bob.
If Immuta HDFS Native Workspaces are being created on the target cluster, then the Immuta Partition Service principal needs to be a Sentry Admin user in order to CREATE databases and roles for use by Immuta.
If administrators want to allow users to CREATE non-data-source tables in the workspace database, the immuta.workspace.allow.create.table configuration option must be set to true for the Partition Service in generator.xml. It is also recommended that Sentry Object Ownership is enabled and set to ALL in this scenario, which allows users to DROP their own tables. If Object Ownership is not enabled, users will not be able to DROP tables and a Sentry Admin would need to clean up old tables.
For Hive, it is required that the ImmutaGroupsMapping service jar is added to the classpath for YARN applications. This can be done by updating the yarn.application.classpath configuration value in yarn-site.xml. In Cloudera manager the value /opt/cloudera/parcels/IMMUTA/lib/immuta-group-mapping.jar should be added under YARN Application Classpath in Yarn's Cloudera Manager configuration page. Note that if you are using a non-standard parcel directory, you should replace /opt/cloudera/parcels/ with your custom directory.

Check the Existing Group Mapping Service

It's a good idea to start by checking the existing Group Mapping Service in the configuration item hadoop.security.group.mapping. If this is not found in your configuration, the default is org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.

To use ImmutaGroupsMapping alongside another Group Mapping Service, there is an implementation called org.apache.hadoop.security.CompositeGroupsMapping. This Group Mapping Service takes the results of multiple Group Mapping Services and combines them. If CompositeGroupsMapping is already being used before adding ImmutaGroupsMapping, simply add ImmutaGroupsMapping as another provider in configuration. This will be detailed below.

Configuration to Add

To enable the ImmutaGroupsMapping, the following configuration needs to be added to Hadoop XML configuration for any target systems where the groups mapping should be applied.

If this should be applied across the cluster it should be added to the system-wide core-site.xml file.
If it is only being applied to a single system (Impala for instance), then it should be added to an XML file specifically for Impala.

Best practice

The group mapping service should be applied only to target systems requiring Immuta project groups to be determined for context-aware permissions. This can typically be limited to Hive and/or Impala. In Cloudera Manager this configuration can be added to Impala Daemon Advanced Configuration Snippet (Safety Valve) for core-site.xml and/or Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml.

Configuration Snippet

The following configuration shows the ImmutaGroupsMapping provider being used alongside the JniBasedUnixGroupsMappingWithFallback provider.

Critical

If the group mapping service is being configured for a specific service (i.e., Hive or Impala), it is critical that immuta.system.api.key is also configured for that target service. The ImmutaGroupsMapping provider requires the system API key in order to retrieve user group details. Add something like the following to the properties defined above.

Caching Considerations

By default, Hadoop's group services cache the retrieved groups for 5 minutes. This may not be desirable for Immuta deployments using group mapping because switching project contexts would then take up to 5 minutes to have an effect on the cluster. In order to lower this cache time, add the following configuration to the same file as above:

Securing Hive and Impala Without Sentry

Each section in this guide is a required step to ensure that access to Hive and Impala is secured.

Restricting Access to Hive

Enable HDFS Access Control Lists in Cloudera Manager

See the official Cloudera Documentation to complete this step.

Enable Hive Impersonation in Cloudera Manager

Configure Access Control Lists

Group in this context refers to Linux groups, not Sentry groups.

hadoop fs -setfacl -m other::--- /user/hive/warehouse
hadoop fs -setfacl -m user::rwx /user/hive/warehouse
hadoop fs -setfacl -m group::rwx /user/hive/warehouse
hadoop fs -setfacl -m group:hive:rwx /user/hive/warehouse
hadoop fs -setfacl -m group:examplegroup:rwx /user/hive/warehouse

By default, Hive stores data in HDFS under /user/hive/warehouse. However, you can change this directory in the above example if you are using a different data storage location on your cluster.

Restricting Access to Impala

Create Policy Configuration File

Group in this context refers to Linux groups, not Sentry groups.

[groups]
examplegroup = example_insert_role, example_select_role

[roles]
example_insert_role = server=server1->db=default->table=*->action=insert
example_select_role = server=server1->db=default->table=*->action=select

hadoop fs -copyFromLocal /tmp/policy.ini /user/impala/
hadoop fs -chown impala:impala /user/impala/policy.ini
hadoop fs -chmod o-rwx /user/impala/policy.ini

Configure Impala to use Policy Configuration File

-server_name=server1
-authorization_policy_file=/user/impala/policy.ini

Enabling ImmutaGroupsMapping

Introduction

Group Naming

Immuta project group names are simply immuta_project_<project_id> where project_id is just the Immuta project's ID.

Configuration Prerequisites

In Impala it is important that the auth_to_local setting is enabled in order to map Kerberos principals to short usernames in order to properly retrieve groups from Immuta for the corresponding principal. For example, Impala should map bob/my.host.name@REALM to bob in order to properly map bob to the corresponding Immuta user account to determine the current project group (if any) for bob.
If Immuta HDFS Native Workspaces are being created on the target cluster, then the Immuta Partition Service principal needs to be a Sentry Admin user in order to CREATE databases and roles for use by Immuta.
If administrators want to allow users to CREATE non-data-source tables in the workspace database, the immuta.workspace.allow.create.table configuration option must be set to true for the Partition Service in generator.xml. It is also recommended that Sentry Object Ownership is enabled and set to ALL in this scenario, which allows users to DROP their own tables. If Object Ownership is not enabled, users will not be able to DROP tables and a Sentry Admin would need to clean up old tables.
For Hive, it is required that the ImmutaGroupsMapping service jar is added to the classpath for YARN applications. This can be done by updating the yarn.application.classpath configuration value in yarn-site.xml. In Cloudera manager the value /opt/cloudera/parcels/IMMUTA/lib/immuta-group-mapping.jar should be added under YARN Application Classpath in Yarn's Cloudera Manager configuration page. Note that if you are using a non-standard parcel directory, you should replace /opt/cloudera/parcels/ with your custom directory.

Check the Existing Group Mapping Service

Configuration to Add

To enable the ImmutaGroupsMapping, the following configuration needs to be added to Hadoop XML configuration for any target systems where the groups mapping should be applied.

If this should be applied across the cluster it should be added to the system-wide core-site.xml file.
If it is only being applied to a single system (Impala for instance), then it should be added to an XML file specifically for Impala.

Best practice

Configuration Snippet

The following configuration shows the ImmutaGroupsMapping provider being used alongside the JniBasedUnixGroupsMappingWithFallback provider.

<property>
  <name>hadoop.security.group.mapping</name>
  <value>org.apache.hadoop.security.CompositeGroupsMapping</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.providers</name>
  <value>jni,immuta</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.providers.combined</name>
  <value>true</value>
  <description>Set to true if the results of all mapping services should be concatenated.</description>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.provider.jni</name>
  <value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.provider.immuta</name>
  <value>com.immuta.security.ImmutaGroupsMapping</value>
  <final>true</final>
</property>

Critical

<property>
  <name>immuta.system.api.key</name>
  <value>mYIUy6REcWrnW1mtVjZpuZiyyRFVj3</value>
  <final>true</final>
</property>

Caching Considerations

<property>
  <name>hadoop.security.groups.cache.secs</name>
  <value>10</value>
  <description>Set to whatever value is most tolerable for the delay in group change for a user.</description>
</property>