# AWS Lake Formation

{% hint style="info" %}
**Public preview**: This feature is available to all accounts.
{% endhint %}

In the AWS Lake Formation integration, Immuta orchestrates [Lake Formation access controls](#user-content-fn-1)[^1] on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

* Amazon Athena
* Amazon EMR Spark
* [Amazon Redshift Spectrum](#user-content-fn-2)[^2]

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

<figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2Fgit-blob-5e864e874133839cc006e2da49157e9d8e975f6e%2Flf-sequence-diagram.png?alt=media" alt="When users submit a query against a data source they are subscribed to, the analytic engine requests metadata from Glue Data Catalog, which then queries Lake Formation to determine what data the user is allowed to see. Then, the analytic engine requests temporary access from Lake Formation, retrieves the data from S3, and filters the data to return policy-enforced data to the user."><figcaption><p>If a user is automatically subscribed to the data source by a policy, Immuta creates an LF-Tag for that user and data source they are subscribed to. If the user is manually added to a data source by the data owner, Immuta grants direct access to the table in Lake Formation.</p></figcaption></figure>

See the [AWS Lake Formation documentation](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html) for more details about Lake Formation access controls.

## What does Immuta do in my AWS environment?

### Registering a connection

AWS Lake Formation is configured and data is registered through [connections](https://documentation.immuta.com/SaaS/configuration/integrations/data-and-integrations/registering-a-connection/reference-guides/connections-overview), an Immuta feature that allows you to register your data objects in a technology through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.

After you set up a data lake in Lake Formation and a Glue Data Catalog, you provide Immuta an AWS IAM role with the [permissions outlined on the Register an AWS Lake Formation connection page](https://documentation.immuta.com/SaaS/configuration/integrations/register-an-aws-lake-formation-connection#permissions) to register the Lake Formation connection.

Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.

In the example below, the Immuta application administrator connects the Glue Data Catalog that contains `marketing-data`, `research-data`, and `cs-data` metadata. Immuta registers[^3] these tables as data sources and stores the table metadata in the Immuta metadata database.

<figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2Fgit-blob-e8ea986977426afb636edc4d89e819dfe7bd6aa7%2FAWS%20Lake%20Formation%20integration%20-%20Configure%20integration%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

See the [Connections reference guide](https://documentation.immuta.com/SaaS/configuration/integrations/data-and-integrations/registering-a-connection/reference-guides/connections-overview) for details about connections and how to manage them. To configure your Lake Formation integration and register data, see the [Register an AWS Lake Formation connection guide](https://documentation.immuta.com/SaaS/configuration/integrations/aws-lake-formation/register-an-aws-lake-formation-connection).

### Applying policies

When an Immuta subscription policy is applied to data sources, Immuta calculates and stores the policy logic in the Immuta metadata database and generates an LF-Tag key and value that is applied to the table in AWS. When users are subscribed to the data source, Immuta issues grants either directly to the table (if they are manually subscribed to the data source by a data owner) or to the LF-Tag (if they are subscribed by an automatic subscription policy). See the [Protecting data page](https://documentation.immuta.com/SaaS/configuration/integrations/aws-lake-formation/protecting-data#protecting-data) for details about these policy types.

The table below outlines how two different automatic subscription policies authored in Immuta are orchestrated in Lake Formation.

| Immuta actions                                                                                                                         | Example 1                                                                                                                                                                                                                                                                                                                                                                                            | Example 2                                                                                                                                                                                                                                                                                                                                                                                            |
| -------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <ol><li>Governor authors a global policy in Immuta.</li></ol>                                                                          | "Users may subscribe to data sources tagged `Research` when they are members of group `Research`."                                                                                                                                                                                                                                                                                                   | "Users may subscribe to data sources tagged `CS` when they have the attribute `training.complete`."                                                                                                                                                                                                                                                                                                  |
| <ol start="2"><li>Immuta calculates data sources affected</li></ol>                                                                    | <ul><li><code>research-data</code></li><li><code>marketing-data</code></li></ul>                                                                                                                                                                                                                                                                                                                     | <ul><li><code>cs-data</code></li></ul>                                                                                                                                                                                                                                                                                                                                                               |
| <ol start="3"><li>Immuta calculates users affected.</li></ol>                                                                          | <ul><li><code>Alex</code></li><li><code>Taylor</code></li><li><code>Deepu</code></li></ul>                                                                                                                                                                                                                                                                                                           | <ul><li><code>Casey</code></li><li><code>Mary</code></li><li><code>Catherine</code></li></ul>                                                                                                                                                                                                                                                                                                        |
| <ol start="4"><li>Immuta generates a group identifier for the users and data sources affected.</li></ol>                               | `1234`                                                                                                                                                                                                                                                                                                                                                                                               | `5678`                                                                                                                                                                                                                                                                                                                                                                                               |
| <ol start="5"><li>Immuta creates an <mark style="color:red;">LF-Tag key</mark> and <mark style="color:purple;">value</mark>.</li></ol> | <mark style="color:red;">`Immuta_policy=`</mark><mark style="color:purple;">`1234`</mark>                                                                                                                                                                                                                                                                                                            | <mark style="color:red;">`Immuta_policy=`</mark><mark style="color:purple;">`5678`</mark>                                                                                                                                                                                                                                                                                                            |
| <ol start="6"><li>Immuta assigns the LF-Tag to the AWS resource in Lake Formation.</li></ol>                                           | Assign tag <mark style="color:red;">`Immuta_policy=`</mark><mark style="color:purple;">`1234`</mark> to `research-data` and `marketing-data`                                                                                                                                                                                                                                                         | Assign tag <mark style="color:red;">`Immuta_policy=`</mark><mark style="color:purple;">`5678`</mark> to `cs-data`                                                                                                                                                                                                                                                                                    |
| <ol start="7"><li>Immuta grants the LF-Tag to users in Lake Formation.</li></ol>                                                       | <p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=1234</code></mark>TO arn:aws:iam::123456:user/Alex<br></p><p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=1234</code></mark> TO arn:aws:iam::123456:user/Taylor<br></p><p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=1234</code></mark> TO arn:aws:iam::123456:user/Deepu</p> | <p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=5678</code></mark> TO arn:aws:iam::123456:user/Casey<br></p><p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=5678</code></mark> TO arn:aws:iam::123456:user/Mary</p><p>GRANT (SELECT) on tag <mark style="color:red;"><code>Immuta\_policy=5678</code></mark> TO arn:aws:iam::123456:user/Catherine</p> |

#### Lake Formation privileges granted by Immuta

The privileges Immuta issues to users when they are subscribed to a data source vary depending on the object type. See an outline of privileges granted by Immuta on the [Subscription policy access types page](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents/reference-guides/subscription-access-types#granting-aws-lake-formation-privileges).

### Maintaining state with Lake Formation

The following user actions spur various processes in the Lake Formation integration so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:

* **Data source created**: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
* **Data source deleted**: Immuta deletes the data source metadata from the metadata database and removes LF-Tags from that AWS resource.
* **Automatic subscription policy applied to or updated on a data source**: Immuta calculates the users and data sources affected by the policy change and generates an LF-Tag key and value.

  Immuta then applies the LF-Tags to the affected AWS resources and grants users permissions on the LF-Tags. See the [applying policies section](#applying-policies) for details about this process.
* **User manually subscribed to a data source**: When a user is manually added to a data source by a data owner, Immuta grants the user direct access to the table in Lake Formation.
* **Automatic subscription policy deleted**: Immuta deletes the LF-Tag key and values.
* [**AWS user account is mapped to Immuta**](#mapping-iam-principals-in-immuta): When a user account is mapped to Immuta, their metadata is stored in the metadata database.
* **User removed from a data source**: Immuta revokes the user's access to the table or the LF-Tag.

The image below illustrates these processes.

<figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2Fgit-blob-457f9eed22a0fce0b5da8250344096ae199855b5%2FAWS%20Lake%20Formation%20integration%20(1).png?alt=media" alt=""><figcaption><p>See the <a href="https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html">AWS documentation</a> for more details about the AWS components pictured above.</p></figcaption></figure>

#### Tag ingestion

{% hint style="info" %}
**Private preview**: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
{% endhint %}

When registering an AWS Lake Formation connection, you can opt to ingest Lake Formation Tags. If this option is enabled, then every data source in Immuta will have the Lake Formation Tags pulled in and automatically applied. Immuta will check every 24 hours for any relevant metadata changes in AWS Lake Formation.

Tag ingestion is an integration-wide setting and, once enabled, cannot be disabled on a data-source-by-data-source basis. Additionally, if enabled, no other external catalog can be linked to the AWS Lake Formation data sources.

## Supported object types

<table><thead><tr><th>Glue Data Catalog object type</th><th width="203.0078125">Subscription policy support</th><th>Data policy support</th><th>Request app support</th></tr></thead><tbody><tr><td>Table</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td></tr><tr><td>View</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td></tr></tbody></table>

## Supported policies

The AWS Lake Formation integration allows users to author [subscription policies](https://documentation.immuta.com/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents) to enforce access controls. Data policies are unsupported.

See the [applying policies section](#applying-policies) for details about subscription policy enforcement.

## Security and compliance

### Authentication methods

The Lake Formation integration supports the following authentication methods to register a connection:

* **Access using AWS IAM role (recommended)**: Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
* **Access using access key and secret access key**: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the AWS permissions listed in the [Register an AWS Lake Formation connection guide](https://documentation.immuta.com/SaaS/configuration/integrations/register-an-aws-lake-formation-connection#permissions).

## User provisioning

Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta [supports all three methods](#mapping-iam-principals-in-immuta) for user provisioning in the Lake Formation integration.

However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

1. **Enable** [**AWS IAM Identity Center (IDC)**](https://aws.amazon.com/iam/identity-center/) **(recommended)**: IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user.

   Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the [map users section](https://documentation.immuta.com/SaaS/configuration/integrations/register-an-aws-lake-formation-connection#map-users) for instructions on mapping users from AWS IDC to user accounts in Immuta.
2. **Create an IAM role per user**: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users.

   This approach can be a challenge because there is an [IAM role max limit of 5,000 per AWS account](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html#reference_iam-quotas-entities).
3. **Request on behalf of IAM roles (not recommended)**: Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they [request on behalf](https://documentation.immuta.com/SaaS/request/access-data-products/reference-guide/data-product-access) of the IAM role user rather than themselves.

   This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.

### Mapping IAM principals in Immuta

{% hint style="info" %}
**Names are case-sensitive**

The IAM role name and IAM user name are case-sensitive. See the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html) for details.
{% endhint %}

Immuta supports mapping an Immuta user to AWS in one of the following ways:

* [AWS IAM Identity Center user IDs](https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html)
* [IAM role principals](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html#principal-roles): Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
* [IAM user principals](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html#principal-users)

See the [map users section](https://documentation.immuta.com/SaaS/configuration/integrations/register-an-aws-lake-formation-connection#map-users) for instructions on mapping principals to user accounts in Immuta.

## Existing Amazon S3 integrations

Existing Amazon S3 integrations have no impact on AWS Lake Formation integrations; they can be used in tandem.

While the [Amazon S3 integration](https://documentation.immuta.com/SaaS/configuration/integrations/amazon-s3-integration) offers access control for raw object storage, the Lake Formation integration provides access control for Glue Data Catalog views and tables. Together, they offer support for every cloud-native data warehouse and lakehouse for AWS users.

## Limitations and known issues

* You cannot use the AWS Lake Formation integration if you are using data policies on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation integration would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly. Instead, use the [Amazon Redshift Spectrum integration](https://documentation.immuta.com/SaaS/configuration/integrations/redshift/amazon-redshift-view-based-integration).
* Impersonation
* User query audit
* AWS Lake Formation has the following limitations:

  * 50 tag limit per resource
  * 1000 tag limit total
  * 1000 values per tag

  See the [AWS documentation](https://docs.aws.amazon.com/lake-formation/latest/dg/lf-limitations.html) for details.
* Immuta is actively making improvements to the AWS Lake Formation integration throughout the preview phases. Be aware of these temporary limitations during the early preview stages:
  * Immuta will only synchronize policies on a 1-minute schedule, so it could be up to 1 minute from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 1-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.
  * LF-Tags created for automatic subscription policies are not removed when no longer applicable. This can result in growth of the LF-Tag value space and may hit quotas if many policy changes occur over time. LF-Tags can be manually removed to free up space if quota is a concern.
  * Scale constraints are limited to 2000 data sources and 100 users.
  * Multiple AWS Lake Formation integrations are not supported on a single Immuta tenant.
  * Immuta does not ingest existing LF-Tags.

[^1]: Amazon Lake Formation is an AWS security model that allows you to govern access to Glue Data Catalog tables and views.

[^2]: You cannot use the AWS Lake Formation connection if you are using data policies on Redshift Spectrum data sources. Instead, use the [Amazon Redshift view-based integration](https://documentation.immuta.com/SaaS/configuration/integrations/redshift/amazon-redshift-view-based-integration).

    See the [Limitations and known issues section](#limitations-and-known-issues) for details.

[^3]: See the [Connections reference guide](https://documentation.immuta.com/SaaS/configuration/data-and-integrations/registering-a-connection/reference-guides/connections-overview#object-sync) for more details about how data objects are synced with Immuta so that the objects registered in your Glue Data Catalog stay synchronous with the registered objects in Immuta.
