> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/2024.2/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/2024.2/data-and-integrations/legacy-integrations/immuta-groups-mapping.md).

# Enabling ImmutaGroupsMapping

## Introduction

Hadoop has the concept of a [Group Mapping Service](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/GroupsMapping.html), which is a way to retrieve groups corresponding to a provided user/Kerberos principal. By default, Hadoop services (HDFS, Hive, Impala, etc.) will retrieve a user's groups from the local OS by way of the `ShellBasedUnixGroupsMapping` class.

With Immuta, this data can be enriched to include the user's current project. This can be helpful for systems where current project information could help provide access to data. For example, in Impala it is possible to GRANT access to a database or tables based on a user's membership in an Immuta project group. This way a user could read information from tables only when acting in the target project.

## Group Naming

Immuta project group names are simply `immuta_project_<project_id>` where `project_id` is just the Immuta project's ID.

## Configuration Prerequisites

* In Impala it is important that the `auth_to_local` setting is enabled in order to map Kerberos principals to short usernames in order to properly retrieve groups from Immuta for the corresponding principal. For example, Impala should map `bob/my.host.name@REALM` to `bob` in order to properly map `bob` to the corresponding Immuta user account to determine the current project group (if any) for `bob`.
* If Immuta HDFS workspaces are being created on the target cluster, then the Immuta Partition Service principal needs to be a Sentry Admin user in order to `CREATE` databases and roles for use by Immuta.
* If administrators want to allow users to `CREATE` non-data-source tables in the workspace database, the `immuta.workspace.allow.create.table` configuration option must be set to `true` for the Partition Service in `generator.xml`. It is also recommended that Sentry Object Ownership is enabled and set to `ALL` in this scenario, which allows users to `DROP` their own tables. If Object Ownership is not enabled, users will not be able to `DROP` tables and a Sentry Admin would need to clean up old tables.
* For Hive, it is required that the ImmutaGroupsMapping service jar is added to the classpath for YARN applications. This can be done by updating the `yarn.application.classpath` configuration value in `yarn-site.xml`. In Cloudera manager the value `/opt/cloudera/parcels/IMMUTA/lib/immuta-group-mapping.jar` should be added under `YARN Application Classpath` in Yarn's Cloudera Manager configuration page. Note that if you are using a non-standard parcel directory, you should replace `/opt/cloudera/parcels/` with your custom directory.

### Check the Existing Group Mapping Service

It's a good idea to start by checking the existing Group Mapping Service in the configuration item `hadoop.security.group.mapping`. If this is not found in your configuration, the default is `org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback`.

To use ImmutaGroupsMapping alongside another Group Mapping Service, there is an implementation called `org.apache.hadoop.security.CompositeGroupsMapping`. This Group Mapping Service takes the results of multiple Group Mapping Services and combines them. If `CompositeGroupsMapping` is already being used before adding ImmutaGroupsMapping, simply add ImmutaGroupsMapping as another provider in configuration. This will be detailed below.

## Configuration to Add

To enable the ImmutaGroupsMapping, the following configuration needs to be added to Hadoop XML configuration for any target systems where the groups mapping should be applied.

* If this should be applied across the cluster it should be added to the system-wide `core-site.xml` file.
* If it is only being applied to a single system (Impala for instance), then it should be added to an XML file specifically for Impala.

{% hint style="info" %}
**Best practice**

The group mapping service should be applied only to target systems requiring Immuta project groups to be determined for context-aware permissions. This can typically be limited to Hive and/or Impala. In Cloudera Manager this configuration can be added to `Impala Daemon Advanced Configuration Snippet (Safety Valve) for core-site.xml` and/or `Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml`.
{% endhint %}

### Configuration Snippet

The following configuration shows the ImmutaGroupsMapping provider being used alongside the `JniBasedUnixGroupsMappingWithFallback` provider.

```xml
<property>
  <name>hadoop.security.group.mapping</name>
  <value>org.apache.hadoop.security.CompositeGroupsMapping</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.providers</name>
  <value>jni,immuta</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.providers.combined</name>
  <value>true</value>
  <description>Set to true if the results of all mapping services should be concatenated.</description>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.provider.jni</name>
  <value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value>
  <final>true</final>
</property>

<property>
  <name>hadoop.security.group.mapping.provider.immuta</name>
  <value>com.immuta.security.ImmutaGroupsMapping</value>
  <final>true</final>
</property>
```

{% hint style="danger" %}
**Critical**

If the group mapping service is being configured for a specific service (i.e., Hive or Impala), it is critical that `immuta.system.api.key` is also configured for that target service. The `ImmutaGroupsMapping` provider requires the system API key in order to retrieve user group details. Add something like the following to the properties defined above.

```xml
<property>
  <name>immuta.system.api.key</name>
  <value>mYIUy6REcWrnW1mtVjZpuZiyyRFVj3</value>
  <final>true</final>
</property>
```

{% endhint %}

### Caching Considerations

By default, Hadoop's group services cache the retrieved groups for 5 minutes. This may not be desirable for Immuta deployments using group mapping because switching project contexts would then take up to 5 minutes to have an effect on the cluster. In order to lower this cache time, add the following configuration to the same file as above:

```xml
<property>
  <name>hadoop.security.groups.cache.secs</name>
  <value>10</value>
  <description>Set to whatever value is most tolerable for the delay in group change for a user.</description>
</property>
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.immuta.com/2024.2/data-and-integrations/legacy-integrations/immuta-groups-mapping.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.