AWS Lake Formation
Last updated
Was this helpful?
Last updated
Was this helpful?
In the AWS Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:
Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.
Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.
In the example below, the Immuta application administrator connects the Glue Data Catalog that contains marketing-data
, research-data
, and cs-data
metadata. Immuta these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
The table below outlines how two different automatic subscription policies authored in Immuta are orchestrated in Lake Formation.
Governor authors a global policy in Immuta.
"Users may subscribe to data sources tagged Research
when they are members of group Research
."
"Users may subscribe to data sources tagged CS
when they have the attribute training.complete
."
Immuta calculates data sources affected
research-data
marketing-data
cs-data
Immuta calculates users affected.
Alex
Taylor
Deepu
Casey
Mary
Catherine
Immuta generates a group identifier for the users and data sources affected.
1234
5678
Immuta creates an LF-Tag key and value.
Immuta_policy=
1234
Immuta_policy=
5678
Immuta assigns the LF-Tag to the AWS resource in Lake Formation.
Assign tag Immuta_policy=
1234
to research-data
and marketing-data
Assign tag Immuta_policy=
5678
to cs-data
Immuta grants the LF-Tag to users in Lake Formation.
GRANT (SELECT) on tag Immuta_policy=
1234
TO arn:aws:iam::123456:user/Alex
GRANT (SELECT) on tag Immuta_policy=
1234
TO arn:aws:iam::123456:user/Taylor
GRANT (SELECT) on tag Immuta_policy=
1234
TO arn:aws:iam::123456:user/Deepu
GRANT (SELECT) on tag Immuta_policy=
5678
TO arn:aws:iam::123456:user/Casey
GRANT (SELECT) on tag Immuta_policy=
5678
TO arn:aws:iam::123456:user/Mary
GRANT (SELECT) on tag Immuta_policy=
5678
TO arn:aws:iam::123456:user/Catherine
The following user actions spur various processes in the Lake Formation connection so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:
Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes LF-Tags from that AWS resource.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and generates an LF-Tag key and value.
User manually subscribed to a data source: When a user is manually added to a data source by a data owner, Immuta grants the user direct access to the table in Lake Formation.
Automatic subscription policy deleted: Immuta deletes the LF-Tag key and values.
User removed from a data source: Immuta revokes the user's access to the table or the LF-Tag.
The image below illustrates these processes.
The Lake Formation connection supports the following authentication methods to register a connection:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Contact your Immuta representative for the AWS account to add to your trust policy.
However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.
See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.
Immuta supports mapping an Immuta user to AWS in one of the following ways:
Existing S3 integrations have no impact on Lake Formation connections; they can be used in tandem.
Temporary limitation: Immuta will only synchronize policies on a 10-minute schedule, so it could be up to 10 minutes from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 10-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.
You cannot use the AWS Lake Formation connection if you are using data polices on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation connection would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly.
User query audit
AWS Lake Formation has the following limitations:
50 tag limit per resource
1000 tag limit total
1000 values per tag
See the for more details about Lake Formation access controls.
AWS Lake Formation is configured and data is registered through , an Immuta feature that allows you to register your data objects in a technology through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.
After you set up a data lake in Lake Formation and a Glue Data Catalog, you provide Immuta an AWS IAM role with the to register the Lake Formation connection.
See the for details about connections and how to manage them. To configure your Lake Formation connection, see the .
When an Immuta subscription policy is applied to data sources, Immuta calculates and stores the policy logic in the Immuta metadata database and generates an LF-Tag key and value that is applied to the table in AWS. When users are subscribed to the data source, Immuta issues grants either directly to the table (if they are manually subscribed to the data source by a data owner) or to the LF-Tag (if they are subscribed by an automatic subscription policy). See the for details about these policy types.
Immuta then applies the LF-Tags to the affected AWS resources and grants users permissions on the LF-Tags. See the for details about this process.
: When a user account is mapped to Immuta, their metadata is stored in the metadata database.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the AWS permissions listed in the .
The AWS Lake Formation connection allows users to author to enforce access controls. Data policies are unsupported.
See the for details about subscription policy enforcement.
Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta for user provisioning in the Lake Formation connection.
Enable (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an .
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.
The IAM role name and IAM user name are case-sensitive. See the for details.
: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
See the for instructions on mapping principals to user accounts in Immuta.
While the offers access control for raw object storage, the Lake Formation connection provides access control for Glue Data Catalog views and tables. Together, they offer support for every cloud-native data warehouse and lakehouse for AWS users.
See the for details.