Audience: System Administrators
Content Summary: This tutorial will guide you through the process of spinning up an Amazon Elastic Map Reduce cluster with Immuta's Hadoop and Spark security plugins installed.
Deprecation notice
Support for this integration has been deprecated.
This tutorial contains examples using the AWS CLI. These examples are conceptual in nature and will require modification to adapt to your exact deployment needs. If you wish to quickly familiarize yourself with Immuta's EMR integration, please visit the Quickstart Installation Guide for Immuta on AWS EMR.
This deployment is tested and known to work on the EMR releases listed below.
5.17.0
5.18.0
5.19.0
5.20.0
5.21.0
5.22.0
5.23.0
5.24.0
5.25.0
5.26.0
5.27.0
5.28.0
5.29.0
5.30.0
5.31.0
5.32.0
In addition to the EMR cluster itself, Immuta requires a handful of additional AWS resources in order to function properly.
In order to bootstrap the EMR cluster with Immuta's software bundle and startup scripts, you will need to create an S3 bucket to hold these artifacts.
In this guide, the bucket is referenced by the placeholder $BOOTSTRAP_BUCKET
. You should substitute this bucket name for a unique bucket name of your choosing. The bucket must contain all artifacts listed below. These artifacts can be found at Immuta Downloads.
Immuta's Spark integration relies on an IAM role policy that has access to the S3 buckets where your sensitive data is stored. Note that the EC2 Instance Roles for your EMR cluster should not have access to these buckets. Immuta will broker access to the data in these buckets to authorized users.
Modify the JSON data below to include the correct name of your data bucket(s), and save as immuta_data_iam_policy.json
.
If you are leveraging Immuta's Native S3 Workspace capability, you must also give the Immuta data IAM role full control of the workspace bucket or folder.
Now you can run the following command to create the Immuta IAM user policy.
The IAM role that brokers access to S3 data must be able to assume the cluster node instance roles, and vice versa. Since this a cycle, you will need to create both roles with generic trust policies, and then update them after both roles are created.
Create a file called immuta_data_role_trust_policy_generic.json
as seen below.
After creating the immuta_data_role_trust_policy_generic.json
file from above, run the following command to create the Immuta data IAM role. Note that you will be using the generic IAM role trust policy that you created in the previous step. This will be updated when both the data and instance IAM roles are created.
Next you will need to attach the IAM policy that allows access to your protected data in S3.
Modify the JSON data below to include the correct name of your bootstrap bucket, and save as immuta_emr_instance_policy.json
.
Note that the above policy is derived from the Minimal EMR role for EC2 (instance profile) policy
described in Amazon's Best Practices for Securing Amazon EMR guide. You may need to tune this policy based on your organization's environment and needs.
After creating the immuta_emr_instance_policy.json
file from above, run the following command to create the Immuta EMR Instance policy.