# Register an AWS Lake Formation Connection

{% hint style="info" %}
**Public preview**: This connection is available to all accounts.
{% endhint %}

The connection API is a REST API that allows users to register an AWS Lake Formation connection[^1] to Immuta with a single set of credentials rather than configuring an integration and creating data sources separately. Then Immuta can manage and enforce access controls on your data through that connection. To manage your connection, see the [Manage a connection reference guide](https://documentation.immuta.com/saas/developer-guides/api-intro/connections-api/how-to-guides/manage-a-connection).

#### **Requirements**

These are permissions that the user registering the connection must have in order to successfully complete setup.

* `APPLICATION_ADMIN` Immuta permission
* [`Create LF-Tag` AWS permission](#user-content-fn-2)[^2]
* `DESCRIBE` AWS permission. You must have the `DESCRIBE` permission on the [required resources](https://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-reference.html) in AWS:
  * All databases that should be registered in the connection
  * All tables that should be registered in the connection
  * Any LF-Tags you are using on the resources that should be registered in the connection
* The AWS account credentials or [AWS IAM role](#user-content-fn-3)[^3] you provide for the [Immuta service principal](#id-1.-set-up-the-immuta-service-principal) must have permissions to perform the following actions to register data and apply policies:
  * [Glue Data Catalog actions](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsglue.html)
    * `glue:GetDatabase`
    * `glue:GetTables`
    * `glue:GetDatabases`
    * `glue:GetTable`
  * [Lake Formation actions](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awslakeformation.html)
    * `lakeformation:ListPermissions`
    * `lakeformation:BatchGrantPermissions`
    * `lakeformation:BatchRevokePermissions`
    * `lakeformation:CreateLFTag`
    * `lakeformation:UpdateLFTag`
    * `lakeformation:DeleteLFTag`
    * `lakeformation:AddLFTagsToResource`
    * `lakeformation:RemoveLFTagsFromResource`
    * `lakeformation:GetResourceLFTags`
    * `lakeformation:ListLFTags`
    * `lakeformation:GetLFTag`
    * `lakeformation:SearchTablesByLFTags`
    * `lakeformation:SearchDatabasesByLFTags`

#### **Prerequisites**

* [Data lake is set up in AWS Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/initial-lf-config.html). The account in which this is set up is referred to as the **admin account**. This is the account that you will use to initially configure IAM and AWS Lake Formation permissions to give the Immuta service principal access to perform operations. The user in this account must be able to manage IAM permissions and Lake Formation permissions for all data in the Glue Data Catalog.
* No AWS Lake Formation connections configured in the same Immuta instance for the same Glue Data Catalog.
* The databases and tables you want Immuta to govern must be [configured in AWS to respect the AWS Lake Formation permissions](https://docs.aws.amazon.com/lake-formation/latest/dg/initial-lf-config.html#setup-change-cat-settings). Immuta cannot govern resources that use IAM access control or hybrid access mode. To ensure Immuta can govern your resources, verify that the default Data Catalog settings in AWS are **unchecked.** See the screenshot below and [AWS documentation](https://docs.aws.amazon.com/lake-formation/latest/dg/initial-lf-config.html#setup-change-cat-settings) for instructions on changing these settings:

  <figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2Fgit-blob-8e9b22eb1317611f8ad9e06b902d176b2c8016bc%2Fdata-catalog-settings.png?alt=media" alt="The Use only IAM access control for new databases option and the Use only IAM access control for new tables in new databases option in the Data Catalog Settings section in AWS are unchecked so that Immuta can govern resources."><figcaption><p>AWS Data Catalog settings must be unchecked for Immuta to govern access.</p></figcaption></figure>
* [**Enable** **AWS IAM Identity Center (IDC) (recommended)**](#user-content-fn-4)[^4]: [IDC](https://aws.amazon.com/iam/identity-center/) is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user.

  Enabling IDC does not impact any existing access controls; it is additive. See the [map users section](https://documentation.immuta.com/saas/configuration/integrations/aws-lake-formation/register-an-aws-lake-formation-connection#map-users) for instructions on mapping users from AWS IDC to user accounts in Immuta.

## 1. Set up the Immuta service principal

The Immuta service principal is the [AWS IAM role that Immuta will assume](#user-content-fn-5)[^5] to perform operations in your AWS account. This role must have all the necessary permissions in AWS Glue and AWS Lake Formation to allow Immuta to register data sources and apply policies.

1. Create an IAM policy with the following AWS Lake Formation and AWS Glue permissions. You will attach this to your service principal once created.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetTables",
        "glue:GetDatabases",
        "glue:GetTable",
        "lakeformation:ListPermissions",
        "lakeformation:BatchGrantPermissions",
        "lakeformation:BatchRevokePermissions",
        "lakeformation:CreateLFTag",
        "lakeformation:UpdateLFTag",
        "lakeformation:DeleteLFTag",
        "lakeformation:AddLFTagsToResource",
        "lakeformation:RemoveLFTagsFromResource",
        "lakeformation:GetResourceLFTags",
        "lakeformation:ListLFTags",
        "lakeformation:GetLFTag",
        "lakeformation:SearchTablesByLFTags",
        "lakeformation:SearchDatabasesByLFTags"
      ],
      "Resource": "*"
    }
  ]
}
```

2. [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html) and select **AWS Account** as the trusted entity type. This role will be used by Immuta to set up the connection and orchestrate AWS Lake Formation policies. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
3. Add the IAM policy from step 1 to your service principal. These permissions will allow the service principal to register data sources and apply policies on Immuta's behalf.
4. Add the service principal as an [LF-Tag Creator](https://docs.aws.amazon.com/lake-formation/latest/dg/TBAC-adding-tag-creator.html#add-lf-tag-creator).
   1. In the Lake Formation console, navigate to **Permissions**.
   2. Select **LF-Tags and permissions**.
   3. Select **LF-Tag creators**, and then **Add LF-Tag creators**.
   4. Enter your service principal, and grant it the **Create LF-Tag** permission and grantable permission.
   5. Click **Add** to save your changes.
5. Grant the service principal permissions on any tables that will be registered in Immuta. There are two ways to give the service principal these permissions: either make a new LF-Tag that gives the appropriate permissions and apply it to [all databases or tables that Immuta will manage](#user-content-fn-6)[^6], or make the role a superuser in Lake Formation.

{% tabs %}
{% tab title="LF-Tag permissions (recommended)" %}
This method follows the principle of least privilege and is the most flexible way of granting permissions to the service principal. LF-Tags cascade down from databases to tables, while allowing for exceptions. This means that when you apply this tag to a database, it will automatically apply to all tables within that database and allow you to remove it from any tables if those should be out of the scope of Immuta’s governance.

1. Create a new LF-Tag, giving yourself permissions to grant that tag to a user, which will ultimately be your service principal.
   1. In the Lake Formation console, navigate to **LF-Tags and permissions** and click **Add LF-Tag**. You will need the `Create LF-Tag` permission to do this.
   2. Create a single **tag key with one tag value**. For example,
      1. Tag key: `immuta_governed`
      2. Tag value: `true`
   3. On the **LF-Tag key-value pair**, grant the `ASSOCIATE` LF-Tag permission to *your own IAM principal*.
2. Grant this tag to the Immuta service principal.
   1. In the Lake Formation console, navigate to **Data permissions** and click **Grant**.
   2. Enter the **service principal’s IAM role**.
   3. Add the **key-value pair** of the tag you created in step 1.
   4. Under **Table Permissions**, select the following grantable permissions: `SELECT`, `DESCRIBE`, `INSERT`, `DELETE`.
   5. Click **Grant**.
3. [Apply this **tag** to the **resources** you would like Immuta to govern.](https://docs.aws.amazon.com/lake-formation/latest/dg/TBAC-assigning-tags.html) The Immuta service principal will now have the minimum required permissions on these resources. If new resources are created in AWS, you must repeat this process of applying this tag to those resources if you want Immuta to govern them.
   {% endtab %}

{% tab title="Superuser" %}
{% hint style="warning" %}
This option enables all Lake Formation operations on all data in the Glue Data Catalog. This is highly privileged and runs the risk of managing permissions on data you did not intend to.
{% endhint %}

This method will grant all necessary permissions to the service principal, but grants more than the service principal needs without being as flexible, since it does not allow for exceptions like the LF-Tag method. You can make the service principal a superuser on the entire catalog or specify individual resources.

1. In the Lake Formation console, navigate to **Data permissions** and click **Grant**.
2. Enter your **service principal’s IAM role**.
3. Select **Named Data Catalog resources**, and input the **Glue Data Catalog ID** and any **databases** or **tables** you wish to specify.
4. Under **Grantable permissions**, select **Super** and click **Grant**.

Follow the [AWS documentation](https://docs.aws.amazon.com/lake-formation/latest/dg/change-settings.html) to grant `ALL` permissions to the `DataLakePrincipalIdentifier` for the Immuta service principal ARN.
{% endtab %}
{% endtabs %}

## 2. Create the connection in Immuta

<mark style="color:green;">`POST`</mark> `/data/connection`

Copy the request and update the **`<placeholder_values>`** with your connection details. Then submit the request.

Find descriptions of the editable attributes in the table below.

{% hint style="info" %}
**Test run**

Opt to test and validate the create connection payload using a dry run:

<mark style="color:green;">`POST`</mark> `/data/connection/test`
{% endhint %}

{% tabs %}
{% tab title="Access key" %}
{% code overflow="wrap" %}

```
curl -X 'POST' \
    'https://<your-immuta-url>/data/connection' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: <your-bearer-token>' \
    -d '{
     "connectionKey": "<your-connection-key-name>",
     "connection": {
       "technology": "Glue",
       "authenticationType": "accessKey",
       "region": "us-east-1",
       "accountId": "<your-aws-account-id>",
       "accessKeyId": "<your-access-key-id>",
       "secretAccessKey": "<your-secret-access-key>"
     },
     "settings": {
        "isActive": false
     },
     "options": {
        "forceRecursiveCrawl": true
     }
    }'
```

{% endcode %}
{% endtab %}

{% tab title="AWS IAM role" %}
Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.

Set the **external ID** and **account ID** Immuta provides in the response in a condition on the trust relationship for the cross-account IAM specified above. See the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html) for guidance.

{% code overflow="wrap" %}

```
curl -X 'POST' \
    'https://<your-immuta-url>/data/connection' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: <your-bearer-token>' \
    -d '{
     "connectionKey": "<your-connection-key-name>",
     "connection": {
       "technology": "Glue",
       "authenticationType": "assumedRole",
       "region": "us-east-1",
       "accountId": "<your-aws-account-id>",
       "roleARN": "<your-iam-role-arn>"
     },
     "settings": {
        "isActive": false
     },
     "options": {
        "forceRecursiveCrawl": true
     }
    }'
```

{% endcode %}
{% endtab %}
{% endtabs %}

#### Payload parameters

<table><thead><tr><th width="192">Attribute</th><th width="245">Description</th><th>Required</th></tr></thead><tbody><tr><td><strong>connectionKey</strong> <code>string</code></td><td>A unique name for the connection. Avoid the use of periods (<code>.</code>) or <a data-footnote-ref href="#user-content-fn-7">restricted words</a> in your connection key.</td><td><strong>Yes</strong></td></tr><tr><td><strong>connection</strong> <code>object</code></td><td>Configuration attributes of the AWS Lake Formation connection.</td><td><strong>Yes</strong></td></tr><tr><td>connection.<strong>technology</strong> <code>string</code></td><td>The technology backing the new connection.</td><td><strong>Yes</strong></td></tr><tr><td>connection.<strong>authenticationType</strong> <code>string</code></td><td>The authentication type to register the connection.</td><td><strong>Yes</strong></td></tr><tr><td>connection.<strong>region</strong> <code>string</code></td><td>The region of the AWS account associated with the Glue Data Catalog.</td><td><strong>Yes</strong></td></tr><tr><td>connection.<strong>accountId</strong> <code>string</code></td><td>The Amazon account ID of the Glue Data Catalog that contains the data you want to register.</td><td><strong>Yes</strong></td></tr><tr><td>connection.<strong>accessKeyId</strong> <code>string</code></td><td>The access key ID of an AWS account with the <a href="#id-1.-set-up-the-immuta-service-principal">AWS permissions listed in the set up the Immuta service principal section</a>.</td><td>Required if <strong>authenticationType</strong> is <code>accessKey</code>.</td></tr><tr><td>connection.<strong>secretAccessKey</strong> <code>string</code></td><td>The secret access key of an AWS account with the <a href="#id-1.-set-up-the-immuta-service-principal">AWS permissions listed in the set up the Immuta service principal section</a>.</td><td>Required if <strong>authenticationType</strong> is <code>accessKey</code>.</td></tr><tr><td>connection.<strong>roleARN</strong> <code>string</code></td><td>The Amazon resource name of the role Immuta will assume from Immuta's AWS account in order to perform any operations in your AWS account.</td><td>Required if <strong>authenticationType</strong> is <code>assumedRole</code>.</td></tr><tr><td><strong>settings</strong> <code>object</code></td><td>Specifications of the connection's settings, including active status.</td><td>No</td></tr><tr><td>settings.<strong>isActive</strong> <code>boolean</code></td><td>When <code>false</code>, data objects will be inactive by default when created in Immuta. Set to <code>false</code> for the recommended configuration.</td><td>No</td></tr><tr><td><strong>options</strong> <code>object</code></td><td>Specification of the connection's default behavior for object crawls.</td><td>No</td></tr><tr><td>options.<strong>forceRecursiveCrawl</strong> <code>boolean</code></td><td>If <code>false</code>, only active objects will be crawled. If <code>true</code>, both active and inactive data objects will be crawled; any child objects from inactive objects will be set as inactive. Set to <code>true</code> for the recommended configuration.</td><td>No</td></tr></tbody></table>

### Response schema

| Attribute               | Description                                                                                                                                                        |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **objectPath** `string` | The list of names that uniquely identify the path to a data object in the remote platform's hierarchy. The first element should be the associated `connectionKey`. |
| **bulkId** `string`     | A bulk ID that can be used to search for the status of background jobs triggered by this request.                                                                  |

#### Example response

```
{
  "objectPath": ['<your-connection-key-name>'],
  "bulkId": "a-new-uuid"
}
```

[^1]: To learn more about connections in Immuta, see the [Connections section](https://documentation.immuta.com/saas/configuration/integrations/data-and-integrations/registering-a-connection/reference-guides/connections-overview).

[^2]: This permission is only required if using the [LF-Tag permissions method](#lf-tag-permissions-recommended) to set up the Immuta service principal.

[^3]: Instructions for setting up this role and granting the necessary permissions are provided in the [set up the Immuta service principal section](#set-up-the-immuta-service-principal).

[^4]: See the [User provisioning](https://documentation.immuta.com/saas/configuration/integrations/aws-lake-formation/reference-guides/aws-lake-formation#user-provisioning) section for more details about this recommendation and alternatives.

[^5]: The Immuta service principal can also be an IAM user authenticating with an access key and secret key. However, this option is not recommended.

[^6]: You can opt to apply the LF-Tag to just the database, as this will also tag all the tables within the database.

[^7]: Your display name cannot be any of the following words: `data`, `connection`, `object`, `crawl`, `search`, `settings`, `metadata`, `permission`, `sync`, `bulk`, and `upgrade`.
