AWS Lake Formation Reference Guide
Last updated
Last updated
In the AWS Lake Formation connection, Immuta registers data from Glue Data Catalog. Immuta users who have been granted access to the Glue Data Catalog table or view in AWS Lake Formation can query it using one of these analytic engines:
Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
AWS Lake Formation is configured and data is registered through connections, an Immuta feature that allows you to register your data objects in a technology through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.
After you set up a data lake in Lake Formation and a Glue Data Catalog, you provide Immuta an AWS IAM role with the permissions outlined on the Register an AWS Lake Formation connection page to register the Lake Formation connection.
Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.
In the example below, the Immuta application administrator connects the Glue Data Catalog that contains marketing-data
, research-data
, and cs-data
metadata. Immuta these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the Connections reference guide for details about connections and how to manage them. To configure your Lake Formation connection, see the Register an AWS Lake Formation connection guide.
The following user actions spur various processes in the Lake Formation connection so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:
Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database.
The Lake Formation connection supports the following authentication methods to register a connection:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Contact your Immuta representative for the AWS account to add to your trust policy.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the AWS permissions listed in the Register an AWS Lake Formation connection guide.
Immuta will not apply policies in this connection.
Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta supports all three methods for user provisioning in the Lake Formation connection.
However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.
See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.
Enable AWS IAM Identity Center (IDC) (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the map users section for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an IAM role max limit of 5,000 per AWS account.
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they request on behalf of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.
Immuta supports mapping an Immuta user to AWS in one of the following ways:
IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
Existing S3 integrations have no impact on Lake Formation connections; they can be used in tandem.
User query audit
AWS Lake Formation has the following limitations:
50 tag limit per resource
1000 tag limit total
1000 values per tag
See the AWS documentation for details.
Immuta is actively making improvements to the AWS Lake Formation integration throughout the preview phases. Be aware of these temporary limitations during the early preview stages:
Immuta policies are currently not supported for the AWS Lake Formation connection. Once policies are supported, the following limitations will apply:
Immuta will only synchronize policies on a 1-minute schedule, so it could be up to 1 minute from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 1-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.
LF-Tags created for automatic subscription policies are not removed when no longer applicable. This can result in growth of the LF-Tag value space and may hit quotas if many policy changes occur over time. LF-Tags can be manually removed to free up space if quota is a concern.
You cannot use the AWS Lake Formation connection if you are using data polices on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation connection would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly.
Scale constraints are limited to 2000 data sources and 100 users.
Multiple AWS Lake Formation integrations are not supported on a single Immuta tenant.
Immuta does not ingest existing LF-Tags.