AWS Lake Formation

Public preview: This connection is available to all accounts.

In the AWS Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

  • Amazon Athena

  • Amazon EMR Spark

  • Amazon Redshift Spectrum

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

If a user is automatically subscribed to the data source by a policy, Immuta creates an LF-Tag for that user and data source they are subscribed to. If the user is manually added to a data source by the data owner, Immuta grants direct access to the table in Lake Formation.

See the AWS Lake Formation documentation for more details about Lake Formation access controls.

What does Immuta do in my AWS environment?

Registering a connection

AWS Lake Formation is configured and data is registered through connections, an Immuta feature that allows you to register your data objects in a technology through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.

After you set up a data lake in Lake Formation and a Glue Data Catalog, you provide Immuta an AWS IAM role with the permissions outlined on the Register an AWS Lake Formation connection page to register the Lake Formation connection.

Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.

In the example below, the Immuta application administrator connects the Glue Data Catalog that contains marketing-data , research-data , and cs-data metadata. Immuta these tables as data sources and stores the table metadata in the Immuta metadata database.

Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

See the Connections reference guide for details about connections and how to manage them. To configure your Lake Formation connection, see the Register an AWS Lake Formation connection guide.

Applying policies

When an Immuta subscription policy is applied to data sources, Immuta calculates and stores the policy logic in the Immuta metadata database and generates an LF-Tag key and value that is applied to the table in AWS. When users are subscribed to the data source, Immuta issues grants either directly to the table (if they are manually subscribed to the data source by a data owner) or to the LF-Tag (if they are subscribed by an automatic subscription policy). See the Protecting data page for details about these policy types.

The table below outlines how two different automatic subscription policies authored in Immuta are orchestrated in Lake Formation.

Immuta actions
Example 1
Example 2
  1. Governor authors a global policy in Immuta.

"Users may subscribe to data sources tagged Research when they are members of group Research."

"Users may subscribe to data sources tagged CS when they have the attribute training.complete."

  1. Immuta calculates data sources affected

  • research-data

  • marketing-data

  • cs-data

  1. Immuta calculates users affected.

  • Alex

  • Taylor

  • Deepu

  • Casey

  • Mary

  • Catherine

  1. Immuta generates a group identifier for the users and data sources affected.

1234

5678

  1. Immuta creates an LF-Tag key and value.

Immuta_policy=1234

Immuta_policy=5678

  1. Immuta assigns the LF-Tag to the AWS resource in Lake Formation.

Assign tag Immuta_policy=1234 to research-data and marketing-data

Assign tag Immuta_policy=5678 to cs-data

  1. Immuta grants the LF-Tag to users in Lake Formation.

GRANT (SELECT) on tag Immuta_policy=1234TO arn:aws:iam::123456:user/Alex

GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Taylor

GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Deepu

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Casey

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Mary

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Catherine

Maintaining state with Lake Formation

The following user actions spur various processes in the Lake Formation connection so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:

  • Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

  • Data source deleted: Immuta deletes the data source metadata from the metadata database and removes LF-Tags from that AWS resource.

  • Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and generates an LF-Tag key and value.

    Immuta then applies the LF-Tags to the affected AWS resources and grants users permissions on the LF-Tags. See the applying policies section for details about this process.

  • User manually subscribed to a data source: When a user is manually added to a data source by a data owner, Immuta grants the user direct access to the table in Lake Formation.

  • Automatic subscription policy deleted: Immuta deletes the LF-Tag key and values.

  • AWS user account is mapped to Immuta: When a user account is mapped to Immuta, their metadata is stored in the metadata database.

  • User removed from a data source: Immuta revokes the user's access to the table or the LF-Tag.

The image below illustrates these processes.

See the AWS documentation for more details about the AWS components pictured above.

Supported object types

The supported object types for AWS Lake Formation are listed below. When applying read and write access policies to these data sources, the privileges granted by Immuta vary depending on the object type. See an outline of privileges granted by Immuta on the Subscription policy access types page.

  • Glue Data Catalog table

  • Glue Data Catalog view

Security and compliance

Authentication methods

The Lake Formation connection supports the following authentication methods to register a connection:

  • Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Contact your Immuta representative for the AWS account to add to your trust policy.

  • Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the AWS permissions listed in the Register an AWS Lake Formation connection guide.

Supported policies

The AWS Lake Formation connection allows users to author subscription policies to enforce access controls. Data policies are unsupported.

See the applying policies section for details about subscription policy enforcement.

User provisioning

Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta supports all three methods for user provisioning in the Lake Formation connection.

However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

  1. Enable AWS IAM Identity Center (IDC) (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the map users section for instructions on mapping users from AWS IDC to user accounts in Immuta.

  2. Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an IAM role max limit of 5,000 per AWS account.

  3. Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they request on behalf of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.

Mapping IAM principals in Immuta

Names are case-sensitive

The IAM role name and IAM user name are case-sensitive. See the AWS documentation for details.

Immuta supports mapping an Immuta user to AWS in one of the following ways:

  • IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.

See the map users section for instructions on mapping principals to user accounts in Immuta.

Existing S3 integrations

Existing S3 integrations have no impact on Lake Formation connections; they can be used in tandem.

While the Amazon S3 integration offers access control for raw object storage, the Lake Formation connection provides access control for Glue Data Catalog views and tables. Together, they offer support for every cloud-native data warehouse and lakehouse for AWS users.

Limitations and known issues

  • You cannot use the AWS Lake Formation connection if you are using data polices on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation connection would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly.

  • User query audit

  • AWS Lake Formation has the following limitations:

    • 50 tag limit per resource

    • 1000 tag limit total

    • 1000 values per tag

    See the AWS documentation for details.

  • Immuta is actively making improvements to the AWS Lake Formation integration throughout the preview phases. Be aware of these temporary limitations during the early preview stages:

    • Immuta will only synchronize policies on a 1-minute schedule, so it could be up to 1 minute from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 1-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.

    • LF-Tags created for automatic subscription policies are not removed when no longer applicable. This can result in growth of the LF-Tag value space and may hit quotas if many policy changes occur over time. LF-Tags can be manually removed to free up space if quota is a concern.

    • Scale constraints are limited to 2000 data sources and 100 users.

    • Multiple AWS Lake Formation integrations are not supported on a single Immuta tenant.

    • Immuta does not ingest existing LF-Tags.

Last updated

Was this helpful?