# Create a Data Source

The V2 API is built to easily enable an “as-code” approach to managing your data sources, so each time you `POST` data to this endpoint, you must provide complete details of what you want in Immuta. The two examples below illustrate this design:

* If you `POST` once explicitly defining a single table under sources, and then `POST` a second time with a different table, this will result in a single data source in Immuta pointing to the second table and the first data source will be deleted or disabled (depending on the value specified for [<mark style="color:blue;">`hardDelete`</mark>](#options-object)).
* If you `POST` once with two `tableTags` specified (e.g., `Tag.A` and `Tag.B`) and do a follow-up `POST` with `tableTags: [Tag.C]`, only `Tag.C` will exist on all of the tables specified; tags `Tag.A` and `Tag.B` will be removed from all the data sources. *Note: If you are frequently using the v2 API to update data tags, consider using the* [*custom REST catalog integration*](https://documentation.immuta.com/saas/configuration/tags/catalogs/reference-guides/interface) *instead.*

Through this endpoint, you can create or update all data sources for a given schema or database.

## <mark style="color:green;">POST</mark> `/api/v2/data`

Create or update data sources.

**Required Immuta permission**: `CREATE_DATA_SOURCE`

{% tabs %}
{% tab title="Basic data source" %}

```yaml
connectionKey: my-databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: tpc
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/11101101
  handler: Databricks
```

{% endtab %}

{% tab title="Complex data source" %}

```yaml
connectionKey: my-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: data
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/1110-11123
  handler: Databricks
sources:
  - table: credit_card_transactions
    schema: data
    tags:
      table:
        - PCI
        - SENSITIVE
      columns:
        - columnName: transaction_date
          tags:
            - PCI
            - DATE
  - table: crime_data
    schema: data
    naming:
      datasource: Crime Data
      table: crime_data
      schema: databricks
```

{% endtab %}
{% endtabs %}

<details>

<summary>Technology-specific examples</summary>

**Databricks data source with M2M OAuth - Azure Databricks**

{% hint style="warning" %}
**Databricks Unity Catalog behavior**

If you register a connection and a data object has no subscription policy set on it, Immuta will REVOKE access to the data in Databricks for all Immuta users, even if they had been directly granted access to the table in Unity Catalog.

If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.

See the [Databricks Unity Catalog reference guide](https://documentation.immuta.com/saas/configuration/integrations/databricks/databricks-unity-catalog/unity-catalog-overview#user-permissions-immuta-revokes) for more details about permissions Immuta revokes and how to configure this behavior for your connection.
{% endhint %}

```yaml
connectionKey: my-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
  schemaProjectNameFormat: <schema>
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: data
  authenticationMethod: oAuthM2M
  useCertificate: false
  clientId: "${service_principal_clientId}"
  audience: https://your.databricks.hostname.com/oidc/v1/token 
  scope: all-apis
  clientSecret: "${clientSecret}"
  httpPath: sql/protocolv1/o/0/1110-11123
  handler: Databricks
```

**Databricks data source with overriding the naming convention**

{% hint style="warning" %}
**Databricks Unity Catalog behavior**

If you register a connection and a data object has no subscription policy set on it, Immuta will REVOKE access to the data in Databricks for all Immuta users, even if they had been directly granted access to the table in Unity Catalog.

If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.

See the [Databricks Unity Catalog reference guide](https://documentation.immuta.com/saas/configuration/integrations/databricks/databricks-unity-catalog/unity-catalog-overview#user-permissions-immuta-revokes) for more details about permissions Immuta revokes and how to configure this behavior for your connection.
{% endhint %}

```yaml
connectionKey: ebock-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: ebock
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/1110-185737-wove
  handler: Databricks
sources:
  - table: credit_card_transactions
    schema: ebock
  - table: crime_data_delta
    schema: ebock
    naming:
      datasource: Crime Data
      table: crime_data
      schema: databricks
  - table: hipaa_data
    schema: ebock
```

**Redshift Spectrum data source**

Your `nativeSchemaFormat` must contain `_immuta` to avoid schema name conflicts.

```yaml
connectionKey: redshift
connection:
  hostname: your-redshift-cluster.djie25k.us-east-1.redshift.amazonaws.com
  port: 5439
  ssl: true
  database: your_database_with_external_schema
  username: awsuser
  password: your_password
  handler: Redshift
  schema: external_schema
nameTemplate:
  dataSourceFormat: <Tablename>
  schemaFormat: <schema>
  tableFormat: <tablename>
  schemaProjectNameFormat: <Schema>
  nativeSchemaFormat: <schema>_immuta
  nativeViewFormat: <tablename>
sources:
  - all: true
```

**Snowflake data source only registering specific tables**

```yaml
connectionKey: tpc-snowflake
nameTemplate:
  dataSourceFormat: Snowflake <Tablename>
  tableFormat: <tablename>
  schemaFormat: snowflake
connection:
  hostname: example.hostname.snowflakecomputing.com
  port: 443
  ssl: true
  database: TPC
  username: USERA
  password: "${SNOWFLAKE_PASSWORD}"
  schema: PUBLIC
  warehouse: IT_WH
  handler: Snowflake
sources:
  - table: CASE
    schema: PUBLIC
  - table: CASE2
    schema: PUBLIC
  - table: CUSTOMER
    schema: PUBLIC
  - table: WEB_SALES
    schema: PUBLIC
```

</details>

## Path parameters

| Parameter            | Description                                                                                                                   | Required or optional | Default value |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------- | -------------------- | ------------- |
| **dryRun** `boolean` | If `true`, no updates will actually be made.                                                                                  | Optional             | `false`       |
| **wait** `number`    | The number of seconds to wait for data sources to be created before returning. Anything less than `0` will wait indefinitely. | Optional             | `0`           |

## Body parameters

The body of the request contains the details of the data source you want to create. The following table describes the attributes you can include in the body.

| Attribute                                         | Description                                                                                                       | Required or optional |
| ------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------- |
| **connectionKey** `string`                        | A key/name to uniquely identify this collection of data sources.                                                  | Required             |
| [**connection**](#connection-object) `object`     | Connection information.                                                                                           | Required             |
| [**nameTemplate**](#nametemplate-object) `object` | A template to override naming conventions. If not provided, system defaults will be used.                         | Optional             |
| [**options**](#options-object) `object`           | Override options for these data sources. If not provided, system defaults will be used.                           | Optional             |
| [**owners**](#owners-object) `object`             | Specify owners for all data sources created.                                                                      | Optional             |
| [**sources**](#sources-array) `array`             | Configure which data sources are created. If not provided, all objects from the given connection will be created. | Optional             |

### `connection` object

The `connection` object specifies the connection details required to connect to your data source. The tables below describes its child attributes.

{% tabs %}
{% tab title="Snowflake data source" %}

| Attribute                            | Description                                                                                                                                               | Required or optional                                                                         |
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| **handler**                          | `Snowflake`                                                                                                                                               | Required                                                                                     |
| **ssl** `boolean`                    | Set to `true` to enable SSL communication with the remote database.                                                                                       | Optional                                                                                     |
| **database** `string`                | The database name.                                                                                                                                        | Required                                                                                     |
| **schema** `string`                  | The schema in the remote database.                                                                                                                        | Optional                                                                                     |
| **hostname** `string`                | The hostname of the remote database instance.                                                                                                             | Required                                                                                     |
| **port** `number`                    | The port of the remote database instance.                                                                                                                 | Optional                                                                                     |
| **warehouse** `string`               | The default pool of compute resources Immuta will use to run queries and other Snowflake operations.                                                      | Required                                                                                     |
| **connectionStringOptions** `string` | Additional connection string options to be used when connecting to the remote database.                                                                   | Optional                                                                                     |
| **authenticationMethod** `string`    | The type of authentication method to use. Options include `userPassword`, `keyPair`, and `oAuthClientCredentials`.                                        | Required                                                                                     |
| **username** `string`                | The username used to connect to the remote database.                                                                                                      | Required if using `userPassword` or `keyPair`.                                               |
| **password** `string`                | The password used to connect to the remote database.                                                                                                      | Required if using `userPassword`.                                                            |
| **useCertificate** `boolean`         | Set to `true` when using client certificate credentials to request an access token. Otherwise, set to `false` to use client secret.                       | Required if using `oAuthClientCredentials`.                                                  |
| **userFiles** `object`               | Details about the files required for the request.                                                                                                         | Required if using `keyPair` or `oAuthClientCredentials` with `useCertificate` set to `true`. |
| **keyName** `string`                 | The connection name of the key file. Must be `PRIV_KEY_FILE` if using `keyPair`, or must be `oauth client certificate` if using `oAuthClientCredentials`. | Required if using `keyPair` or `oAuthClientCredentials` with `useCertificate` set to `true`. |
| **content** `string`                 | The content of the file, base-64 encoded.                                                                                                                 | Required if using `keyPair` or `oAuthClientCredentials` with `useCertificate` set to `true`. |
| **userFilename** `string`            | The name of the file - for display in the UI.                                                                                                             | Required if using `keyPair` or `oAuthClientCredentials` with `useCertificate` set to `true`. |
| {% endtab %}                         |                                                                                                                                                           |                                                                                              |

{% tab title="Databricks data source" %}

| Attribute                            | Description                                                                                                                                                                    | Required or optional                                                 |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- |
| **handler**                          | `Databricks`                                                                                                                                                                   | Required                                                             |
| **ssl** `boolean`                    | Set to `true` to enable SSL communication with the remote database.                                                                                                            | Optional                                                             |
| **database** `string`                | The database name.                                                                                                                                                             | Optional                                                             |
| **hostname** `string`                | The hostname of the remote database instance.                                                                                                                                  | Required                                                             |
| **port** `number`                    | The port of the remote database instance.                                                                                                                                      | Optional                                                             |
| **connectionStringOptions** `string` | Additional connection string options to be used when connecting to the remote database.                                                                                        | Optional                                                             |
| **authenticationMethod** `string`    | The type of authentication method to use. Options include `oAuthM2M` and `token`.                                                                                              | Required                                                             |
| **token** `string`                   | The Databricks personal access token for the service principal created for Immuta.                                                                                             | Required if using `token` authentication.                            |
| **useCertificate** `boolean`         | Set to `true` when using client certificate credentials to request an access token. Otherwise, client secret.                                                                  | Required if using `oAuthM2M`.                                        |
| **clientId** `string`                | The client identifier of the Immuta service principal you configured. This is the client ID displayed in Databricks when creating the client secret for the service principal. | Required if using `oAuthM2M`.                                        |
| **audience** `string`                | The audience for the OAuth Client Credential token request.                                                                                                                    | Required if using `oAuthM2M`.                                        |
| **clientSecret** `string`            | An application password an app can use in place of a certificate to identity itself.                                                                                           | Required if using `oAuthM2M` and `useCertificate` is set to `false`. |
| **certificateThumbprint** `string`   | The certificate thumbprint to use to generate the JWT for the OAuth Client Credential request.                                                                                 | Required if using `oAuthM2M` and `useCertificate` is set to `true`.  |
| **scope** `string`                   | The scope limits the operations and roles allowed in Databricks by the access token. See the [OAuth 2.0 documentation](https://oauth.net/2/scope/) for details about scopes.   | Optional                                                             |
| **httpPath** `string`                | The HTTP path of your Databricks cluster or SQL warehouse.                                                                                                                     | Required                                                             |
| {% endtab %}                         |                                                                                                                                                                                |                                                                      |

{% tab title="Redshift Spectrum data source" %}

| Attribute                            | Description                                                                             | Required or optional      |
| ------------------------------------ | --------------------------------------------------------------------------------------- | ------------------------- |
| **handler**                          | `Redshift`                                                                              | Required                  |
| **ssl** `boolean`                    | Set to `true` to enable SSL communication with the remote database.                     | Optional                  |
| **database** `string`                | The database name.                                                                      | Optional                  |
| **schema** `string`                  | The schema in the remote database.                                                      | Required                  |
| **connectionStringOptions** `string` | Additional connection string options to be used when connecting to the remote database. | Optional                  |
| **hostname** `string`                | The hostname of the remote database instance.                                           | Required                  |
| **port** `number`                    | The port of the remote database instance.                                               | Optional                  |
| **authenticationMethod** `string`    | The type of authentication method to use. Options include `userPassword` and `okta`.    | Required                  |
| **username** `string`                | The username used to connect to the remote database.                                    | Required                  |
| **password** `string`                | The password used to connect to the remote database.                                    | Required                  |
| **idpHost** `string`                 | The Okta identity provider host URL.                                                    | Required if using `okta`. |
| **appID** `string`                   | The Okta application ID.                                                                | Required if using `okta`. |
| **role** `string`                    | The Okta role.                                                                          | Required if using `okta`. |
| {% endtab %}                         |                                                                                         |                           |

{% tab title="Other data sources" %}

| Attribute                            | Description                                                                                                                                                                                                                           |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **handler**                          | `Google BigQuery`, `Presto`, and `Trino`                                                                                                                                                                                              |
| **ssl** `boolean`                    | Set to `true` to enable SSL communication with the remote database.                                                                                                                                                                   |
| **database** `string`                | The database name.                                                                                                                                                                                                                    |
| **schema** `string`                  | The schema in the remote database.                                                                                                                                                                                                    |
| **userFiles** `array`                | Array of objects; each object must have `keyName` (corresponds to a connection string option), `content` (base-64 encoded content), and `userFilename` (the name of the file - for display purposes in the app).                      |
| **connectionStringOptions** `string` | Additional connection string options to be used when connecting to the remote database.                                                                                                                                               |
| **hostname** `string`                | The hostname of the remote database instance.                                                                                                                                                                                         |
| **port** `number`                    | The port of the remote database instance.                                                                                                                                                                                             |
| **authenticationMethod** `string`    | The type of authentication method to use. Starburst (`Trino`) and Trino (`Presto`) options include `No Authentication`, `LDAP Authentication`, or `Kerberos Authentication`. Google BigQuery (`Google BigQuery`) option is `keyFile`. |
| **username** `string`                | The username used to connect to the remote database.                                                                                                                                                                                  |
| **password** `string`                | The password used to connect to the remote database.                                                                                                                                                                                  |
| **sid** `string`                     | Required for Google BigQuery, the BigQuery project ID used to build the connection string.                                                                                                                                            |
| {% endtab %}                         |                                                                                                                                                                                                                                       |
| {% endtabs %}                        |                                                                                                                                                                                                                                       |

### `nameTemplate` object

Use the `nameTemplate` object to use the backing table, schema, or database names to systematically name the Immuta data sources created through the connection. All names will default to lowercase. The table below describes its child attributes.

| Attribute                            | Description                                                                | Accepted values                                                                                                                |
| ------------------------------------ | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| **dataSourceFormat** `string`        | Format to be used to name the data sources created in this group.          | <ul><li><code>\<tablename></code></li><li><code>\<schema></code></li><li><code>\<database></code></li><li>Any string</li></ul> |
| **schemaFormat** `string`            | Format to be used to name the Immuta schema created in this group.         | <ul><li><code>\<tablename></code></li><li><code>\<schema></code></li><li><code>\<database></code></li><li>Any string</li></ul> |
| **tableFormat** `string`             | Format to be used to name the Immuta table created in this group.          | <ul><li><code>\<tablename></code></li><li><code>\<schema></code></li><li><code>\<database></code></li><li>Any string</li></ul> |
| **schemaProjectNameFormat** `string` | Format to be used to name the Immuta schema project created in this group. | <ul><li><code>\<tablename></code></li><li><code>\<schema></code></li><li><code>\<database></code></li><li>Any string</li></ul> |

**Example**

For the table, `TPC.CUSTOMER`, that is given the following `nameTemplate`:

```yaml
dataSourceFormat: <schema> <tablename>
tableFormat: <tablename>
schemaFormat: <schema>
schemaProjectNameFormat: <schema>
```

This `nameTemplate` will produce a data source named `tpc.customer` in a schema project named `tpc`.

### `options` object

The `options` object allows you to override the default options for the data sources created through this connection. If not provided, Immuta will use the system defaults. The table below describes its child attributes.

| Attribute                                   | Description                                                                                                                                                                                                         | Default values |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- |
| **staleDataTolerance** `integer`            | The length in seconds that data for these data sources can be cached.                                                                                                                                               | -              |
| **disableSensitiveDataDiscovery** `boolean` | If `true`, Immuta will not perform [identification](https://documentation.immuta.com/saas/configuration/tags/data-discovery) for the data sources created through this connection.                                  | `false`        |
| **domainCollectionId** `string`             | The ID of the domain to assign the data sources to. Use the [GET /domain endpoint](https://documentation.immuta.com/saas/developer-guides/immuta-v1-api/domains-api#get-domain) to retrieve domains and domain IDs. | -              |
| **hardDelete** `boolean`                    | If `true`, when the table backing the data source is no longer available, the data source in Immuta is deleted. If this is `false`, the data source will be disabled.                                               | `false`        |
| **tableTags** `array`                       | An array of tags (strings) to place at the data source level on every data source.                                                                                                                                  | -              |

### `owners` object

There are three options for the `owners` object when `POST`ing to the `/data` endpoint:

1. Include the object with data owners.
2. Include the object, but leave the `type`, `name`, and `iam` out. This will remove all data owners from the data source (other than the calling user).
3. Exclude the object from the payload. This will not impact your data owners and allow you to manage data owners through external processes or the UI.

The `owners` object is an array of objects for each owner. The table below describes its child attributes.

| Attribute         | Description                                                                                                                             | Accepted values                                                |
| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| **type** `string` | The type of owner that is being added.                                                                                                  | <ul><li><code>group</code></li><li><code>user</code></li></ul> |
| **name** `string` | The name of the group or the username of the user.                                                                                      | -                                                              |
| **iam** `string`  | The ID of the identity manager system the user or group comes from. If excluded, any user/group that matches will be added as an owner. | -                                                              |

### `sources` array

{% hint style="info" %}
**Best practices**

* Register everything and use subscription policies to control access: If you are not tagging individual columns, omit `sources` to create data sources for all tables in the schema or database, and then use subscription policies to control access to the tables instead of excluding them from Immuta.
* Use schema monitoring: Specifying `all: true` will turn on automatic schema monitoring in Immuta. As tables are added or removed, Immuta will look for those changes on a schedule (by default, once a day) and either disable or delete data sources for removed tables or create data sources for new tables.
  {% endhint %}

The `sources` array determines which tables are registered as data sources. The table below describes its child attributes.

| Option                                                      | Description                                                                                                                                                                        | Required or optional |
| ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| **all** `boolean`                                           | If `true`, all tables will be registered in Immuta and schema monitoring will be on.                                                                                               | Required             |
| **table** `string`                                          | The specific table to register in Immuta as a data source.                                                                                                                         | Optional             |
| **schema** `string`                                         | The specific schema to monitor with schema monitoring.                                                                                                                             | Optional             |
| [**columnDescriptions**](#columndescriptions-array) `array` | Details about the data source columns.                                                                                                                                             | Optional             |
| **description** `string`                                    | A short description for the data source.                                                                                                                                           | Optional             |
| **documentation** `string`                                  | Markdown-supported documentation for the data source.                                                                                                                              | Optional             |
| **naming** `object`                                         | Use this object to override the `nameTemplate` provided for the whole database/schema. [This object's attributes are the same as the `nameTemplate` object](#nametemplate-object). | Optional             |
| **owners** `object`                                         | Specify owners for an individual data source. [This object is the same as `owners` object](#owners-object).                                                                        | Optional             |
| [**tags**](#tags-object) `object`                           | Details about the tags to attach to the data source.                                                                                                                               | Optional             |

**Examples**

{% tabs %}
{% tab title="Register all tables" %}

```yaml
sources:
  - all: true
```

{% endtab %}

{% tab title="Register specific tables" %}
This will register specific tables and add tags and column descriptions.

```yaml
sources:
  - table: name_of_table
    schema: name_of_schema
    tags:
      table:
        - Sensitive
        - Marketing
      columns:
        - columnName: acct_num
          tags:
            - unique_id
    columnDescriptions:
      - columnName: acct_num
        description: The account number
```

{% endtab %}
{% endtabs %}

#### `columns` object

There are three options for the `columns` object when `POST`ing to the `/data` endpoint:

1. Include the object with column details. Only the columns listed will be in the Immuta data source.
2. Include the object, but leave it empty. This will turn on column detection, and Immuta will update the columns once a day to be accurate to the backing table.
3. Exclude the object from the payload. This will register all the columns in the table, but column detection will be off.

The `columns` object is an array of objects for each column. The table below describes its child attributes.

| Attribute                | Description                                                    |
| ------------------------ | -------------------------------------------------------------- |
| **name** `string`        | The column name.                                               |
| **dataType** `string`    | The data type.                                                 |
| **nullable** `boolean`   | If `true`, the column contains `null`.                         |
| **remoteType** `string`  | The actual data type in the remote database.                   |
| **primaryKey** `string`  | Specifies whether this is the primary key of the remote table. |
| **description** `string` | Describes the column.                                          |

#### `columnDescriptions` array

You can add descriptions to columns without having to specify all the columns in the data source. `columnDescriptions` is an array of objects with the following schema:

| Attribute                | Description                    |
| ------------------------ | ------------------------------ |
| **columnName** `string`  | The column name.               |
| **description** `string` | The description of the column. |

#### `tags` object

You can add tags to columns or data sources. `tags` is an object with the following schema:

| Attribute           | Description                                                                                                                         |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **table** `array`   | An array of tags (strings) to add to this table.                                                                                    |
| **columns** `array` | An array of objects that specifies columnName (string) and tags (an array of tags). The listed tags will be applied to the columns. |
