Native Workspace Configuration for EMR
Audience: System Administrators
Content Summary: This page describes how to configure Native Workspaces for Immuta-enabled EMR clusters. The Native S3 Workspace requires an Immuta-bootstrapped EMR cluster. For more information about EMR deployments, please see the main installation guide.
Immuta App Settings
The native workspace must be enabled from the App Settings page.
IAM Role Configuration
Immuta integrates with EMRFS to control access to sensitive data stored in S3. To configure this, you must create an IAM Role Policy for Immuta as described in the main EMR Installation Guide. To leverage Immuta's Native S3 Workspace capability, you must also give the Immuta data IAM role full control of the workspace bucket or folder.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:Head*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::$DATA_BUCKET_1",
"arn:aws:s3:::$DATA_BUCKET_2",
"arn:aws:s3:::$DATA_BUCKET_1/*",
"arn:aws:s3:::$DATA_BUCKET_2/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::$WORKSPACE_BUCKET",
"arn:aws:s3:::$WORKSPACE_BUCKET/*"
]
}
]
}
Hive Configuration
In order for users to be able to query workspace data natively via Hive, you need to set additional configuration
in hive-site
for Hive to have access to the Immuta System API Key.
For maximum security in production deployments, you should store the System API Key in a JCEKS file for Hive to
access. The location of this key should be set in immuta.hadoop.security.credential.provider.path
.
[
{
"Classification":"hive-site",
"Properties":{
"hive.server2.enable.doAs":"true",
"hive.security.metastore.authorization.auth.reads": "false",
"hive.compute.query.using.stats": "true",
"immuta.hadoop.security.credential.provider.path":"/home/hive/immuta_provider.jceks"
},
"Configurations":[]
}
]