Immuta AWS Integrations
Audience: System Administrators
Content Summary: Immuta deployments to AWS environments are fully supported, and Immuta provides a number of specific integrations with AWS infrastructure, data, and analytic components.
Deployment Integrations
With respect to infrastructure, Immuta's Docker Single Node, Enterprise Linux 6, and Enterprise Linux 7 installations work without issue on AWS EC2 instances. However, the recommended approach is to leverage AWS's Elastic Container Service for Kubernetes as the underlying infrastructure for your Immuta deployment.
This EKS deployment scenario leverages Immuta's Helm Deployment with some configuration specific to AWS. Please see the EKS installation instructions for full details.
Data Integrations
Immuta supports a variety of AWS object and relational data stores, as well as EMR support, through the Query Engine and Spark SQL access patterns.
Relational Data
Immuta supports most AWS SQL-based data stores as Query-backed Data Sources These include:
- Relational Database Service
- Amazon Aurora
- MySQL
- PostgreSQL
- MariaDB
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Amazon Aurora
- Redshift
- Redshift Spectrum
- Athena
- Apache Hive on EMR
This capability allows Immuta users to run policy-enforced SQL queries of data in any of those source systems through a single database connection to Immuta's Query Engine.
Object Data and EMR
Immuta supports objects in S3 buckets as Object-backed Data Sources and supports the use of objects in S3 buckets when used in conjunction with EMR via both the HDFS Access Pattern and the Spark SQL Access Pattern.
This capability allows for policy-enforced access to objects via HDFS on EMR clusters as well as policy-enforced SparkSQL queries of certain object types within S3 via Spark on EMR clusters.
However, users who have Hive tables stored in S3N outside of EMR should switch to using S3A before exposing or accessing that data via Immuta, as the S3N FileSystem has been deprecated in Hadoop and should no longer be used. See Hadoop-AWS module: Integration with Amazon Web Services for details.
Analytic Integrations
Any tool that can connect to a PostgreSQL database or process data in EMR can consume Immuta policy-enforced data. Below is a breakdown of AWS analytic tool integration by method of access.
- Immuta Query Engine
- Amazon Quick Sight
- Amazon Sage Maker
- Code running in
- Lambda
- EC2
- ECS
- EMR
- Sage Maker
- Standard SparkSQL / Hadoop usage on EMR
Hybrid Environments
In addition to direct integrations with AWS analytic tools and data stores, Immuta allows users to expose on-prem data stores to services in AWS for policy-enforced data consumption. The fine-grained control provided by Immuta policies can allow users to expose data that would otherwise not be allowed to leave on premises facilities for use in an AWS environment. Conversely, Immuta can securely expose in-cloud data to consumers or services that operate on-premises or even to SaaS solutions hosted elsewhere.
Architecture
Because of the number of integration points Immuta has with AWS components, the standard Immuta-on-AWS deployment architecture is split into two summary diagrams below. One focuses on the use of Immuta in conjunction with relational data stores. The other focuses on the use of Immuta in conjunction with EMR.
Immuta AWS Architecture - Relational Data Focused
Immuta AWS Architecture - EMR Focused
Architecture Considerations
The main architectural considerations for running Immuta within an AWS environment focus on security. Internal to the EKS deployment, all communications are encrypted with TLS and the default for both the Immuta Web Service and the Immuta Query Engine is to use TLS connections for all client interaction with Immuta. It is also best practice to expose data to Immuta only over TLS connections.
While Immuta can run in any VPC in your account, the recommended approach is to run Immuta in its own VPC and allow for inbound traffic to that VPC on ports 80, 443, and 5432. The VPC in which Immuta runs should then be peered with any private VPCs containing EMR clusters or data stores to which Immuta will connect. Additionally Immuta may be run in a private subnet if desired, though that will necessitate additional VPC peering requirements with any VPCs where data will be consumed. Security should also be set to allow traffic on the appropriate ports.
This approach segments Immuta from unrelated infrastructure, preventing other software from inadvertently impacting Immuta performance.
Peering Details
The exact procedure for peering your data or analytic VPCs with the VPC in which Immuta is running will depend on your network architecture within AWS. For a list of common scenarios and instructions for each please see Amazon's VPC Peering Scenarios. Note also that you will need to update routing tables accordingly. Please see Amazon's guide on Updating Your Route Tables for a VPC Peering Connection for complete details on procedures for this.