Skip to content

Native S3 Workspace Configuration for EMR

Audience: System Administrators

Content Summary: This page describes how to configure Native Workspaces for Immuta-enabled EMR clusters. The Native S3 Workspace requires an Immuta-bootstrapped EMR cluster. For more information about EMR deployments, please see the main installation guide.

Immuta App Settings

Native S3 Workspace

  1. Navigate to the App Settings page in Immuta Console.
  2. Select Native Workspace in the left sidebar.
  3. Select Add Workspace.
  4. For Workspace Type, select EMR.
  5. For Scheme Select s3.
  6. Fill out the modal, click Test Workspace Directory, and then save your changes.

EMR S3 Workspace

Native HDFS Workspace

  1. Navigate to the App Settings page in Immuta Console.
  2. Select Native Workspace in the left sidebar.
  3. Select Add Workspace.
  4. For Workspace Type, select EMR.
  5. For Scheme Select hdfs.
  6. Fill out the modal , click Test Workspace Directory, and then save your changes.

EMR HDFS Workspace

IAM Role Configuration

Immuta integrates with EMRFS to control access to sensitive data stored in S3. To configure this, you must create an IAM Role Policy for Immuta as described in the main EMR Installation Guide. To leverage Immuta's Native S3 Workspace capability, you must also give the Immuta data IAM role full control of the workspace bucket or folder.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:Head*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::$DATA_BUCKET_1",
                "arn:aws:s3:::$DATA_BUCKET_2",
                "arn:aws:s3:::$DATA_BUCKET_1/*",
                "arn:aws:s3:::$DATA_BUCKET_2/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::$WORKSPACE_BUCKET",
                "arn:aws:s3:::$WORKSPACE_BUCKET/*"
            ]
        }
    ]
}

Hive Configuration

In order for users to be able to query workspace data natively via Hive, you need to set additional configuration in hive-site for Hive to have access to the Immuta System API Key.

For maximum security in production deployments, you should store the System API Key in a JCEKS file for Hive to access. The location of this key should be set in immuta.hadoop.security.credential.provider.path.

[
   {
      "Classification":"hive-site",
      "Properties":{
         "hive.server2.enable.doAs":"true",
         "hive.security.metastore.authorization.auth.reads": "false",
         "hive.compute.query.using.stats": "true",
         "immuta.hadoop.security.credential.provider.path":"/home/hive/immuta_provider.jceks"
      },
      "Configurations":[]
   }
]