Create a Data Source

The V2 API is built to easily enable an “as-code” approach to managing your data sources, so each time you POST data to this endpoint, you must provide complete details of what you want in Immuta. The two examples below illustrate this design:

If you POST once explicitly defining a single table under sources, and then POST a second time with a different table, this will result in a single data source in Immuta pointing to the second table and the first data source will be deleted or disabled (depending on the value specified for hardDelete).
If you POST once with two tableTags specified (e.g., Tag.A and Tag.B) and do a follow-up POST with tableTags: [Tag.C], only Tag.C will exist on all of the tables specified; tags Tag.A and Tag.B will be removed from all the data sources. Note: If you are frequently using the v2 API to update data tags, consider using the custom REST catalog integration instead.

Through this endpoint, you can create or update all data sources for a given schema or database.

POST `/api/v2/data`

Create or update data sources.

Required Immuta permission: CREATE_DATA_SOURCE

connectionKey: my-databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: tpc
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/11101101
  handler: Databricks

connectionKey: my-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: data
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/1110-11123
  handler: Databricks
sources:
  - table: credit_card_transactions
    schema: data
    tags:
      table:
        - PCI
        - SENSITIVE
      columns:
        - columnName: transaction_date
          tags:
            - PCI
            - DATE
  - table: crime_data
    schema: data
    naming:
      datasource: Crime Data
      table: crime_data
      schema: databricks

Technology-specific examples

Databricks data source with M2M OAuth - Azure Databricks

connectionKey: my-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
  schemaProjectNameFormat: <schema>
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: data
  authenticationMethod: oAuthM2M
  useCertificate: false
  clientId: "${service_principal_clientId}"
  audience: https://your.databricks.hostname.com/oidc/v1/token 
  scope: all-apis
  clientSecret: "${clientSecret}"
  httpPath: sql/protocolv1/o/0/1110-11123
  handler: Databricks

Databricks data source with overriding the naming convention

connectionKey: ebock-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: ebock
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/1110-185737-wove
  handler: Databricks
sources:
  - table: credit_card_transactions
    schema: ebock
  - table: crime_data_delta
    schema: ebock
    naming:
      datasource: Crime Data
      table: crime_data
      schema: databricks
  - table: hipaa_data
    schema: ebock

Redshift Spectrum data source

Your nativeSchemaFormat must contain _immuta to avoid schema name conflicts.

connectionKey: redshift
connection:
  hostname: your-redshift-cluster.djie25k.us-east-1.redshift.amazonaws.com
  port: 5439
  ssl: true
  database: your_database_with_external_schema
  username: awsuser
  password: your_password
  handler: Redshift
  schema: external_schema
nameTemplate:
  dataSourceFormat: <Tablename>
  schemaFormat: <schema>
  tableFormat: <tablename>
  schemaProjectNameFormat: <Schema>
  nativeSchemaFormat: <schema>_immuta
  nativeViewFormat: <tablename>
sources:
  - all: true

Snowflake data source only registering specific tables

connectionKey: tpc-snowflake
nameTemplate:
  dataSourceFormat: Snowflake <Tablename>
  tableFormat: <tablename>
  schemaFormat: snowflake
connection:
  hostname: example.hostname.snowflakecomputing.com
  port: 443
  ssl: true
  database: TPC
  username: USERA
  password: "${SNOWFLAKE_PASSWORD}"
  schema: PUBLIC
  warehouse: IT_WH
  handler: Snowflake
sources:
  - table: CASE
    schema: PUBLIC
  - table: CASE2
    schema: PUBLIC
  - table: CUSTOMER
    schema: PUBLIC
  - table: WEB_SALES
    schema: PUBLIC

Path parameters

Parameter

Description

Required or optional

Default value

dryRun boolean

If true, no updates will actually be made.

Optional

false

wait number

The number of seconds to wait for data sources to be created before returning. Anything less than 0 will wait indefinitely.

Optional

0

Body parameters

The body of the request contains the details of the data source you want to create. The following table describes the attributes you can include in the body.

Attribute

Description

Required or optional

connectionKey string

A key/name to uniquely identify this collection of data sources.

Required

connection object

Connection information.

Required

nameTemplate object

A template to override naming conventions. If not provided, system defaults will be used.

Optional

options object

Override options for these data sources. If not provided, system defaults will be used.

Optional

owners object

Specify owners for all data sources created.

Optional

sources array

Configure which data sources are created. If not provided, all objects from the given connection will be created.

Optional

`connection` object

The connection object specifies the connection details required to connect to your data source. The tables below describes its child attributes.

Attribute

Description

Required or optional

handler

Snowflake

Required

ssl boolean

Set to true to enable SSL communication with the remote database.

Optional

database string

The database name.

Required

schema string

The schema in the remote database.

Optional

hostname string

The hostname of the remote database instance.

Required

port number

The port of the remote database instance.

Optional

warehouse string

The default pool of compute resources Immuta will use to run queries and other Snowflake operations.

Required

connectionStringOptions string

Additional connection string options to be used when connecting to the remote database.

Optional

authenticationMethod string

The type of authentication method to use. Options include userPassword, keyPair, and oAuthClientCredentials.

Required

username string

The username used to connect to the remote database.

Required if using userPassword or keyPair.

password string

The password used to connect to the remote database.

Required if using userPassword.

useCertificate boolean

Set to true when using client certificate credentials to request an access token. Otherwise, set to false to use client secret.

Required if using oAuthClientCredentials.

userFiles object

Details about the files required for the request.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

keyName string

The connection name of the key file. Must be PRIV_KEY_FILE if using keyPair, or must be oauth client certificate if using oAuthClientCredentials.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

content string

The content of the file, base-64 encoded.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

userFilename string

The name of the file - for display in the UI.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

Attribute

Description

Required or optional

handler

Databricks

Required

ssl boolean

Set to true to enable SSL communication with the remote database.

Optional

database string

The database name.

Optional

hostname string

The hostname of the remote database instance.

Required

port number

The port of the remote database instance.

Optional

connectionStringOptions string

Additional connection string options to be used when connecting to the remote database.

Optional

authenticationMethod string

The type of authentication method to use. Options include oAuthM2M and token.

Required

token string

The Databricks personal access token for the service principal created for Immuta.

Required if using token authentication.

useCertificate boolean

Set to true when using client certificate credentials to request an access token. Otherwise, client secret.

Required if using oAuthM2M.

clientId string

The client identifier of the Immuta service principal you configured. This is the client ID displayed in Databricks when creating the client secret for the service principal.

Required if using oAuthM2M.

audience string

The audience for the OAuth Client Credential token request.

Required if using oAuthM2M.

clientSecret string

An application password an app can use in place of a certificate to identity itself.

Required if using oAuthM2M and useCertificate is set to false.

certificateThumbprint string

The certificate thumbprint to use to generate the JWT for the OAuth Client Credential request.

Required if using oAuthM2M and useCertificate is set to true.

scope string

The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.

Optional

httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Required

Attribute

Description

Required or optional

handler

Redshift

Required

ssl boolean

Set to true to enable SSL communication with the remote database.

Optional

database string

The database name.

Optional

schema string

The schema in the remote database.

Required

connectionStringOptions string

Additional connection string options to be used when connecting to the remote database.

Optional

hostname string

The hostname of the remote database instance.

Required

port number

The port of the remote database instance.

Optional

authenticationMethod string

The type of authentication method to use. Options include userPassword and okta.

Required

username string

The username used to connect to the remote database.

Required

password string

The password used to connect to the remote database.

Required

idpHost string

The Okta identity provider host URL.

Required if using okta.

appID string

The Okta application ID.

Required if using okta.

role string

The Okta role.

Required if using okta.

Attribute

Description

handler

Google BigQuery, Presto, and Trino

ssl boolean

Set to true to enable SSL communication with the remote database.

database string

The database name.

schema string

The schema in the remote database.

userFiles array

Array of objects; each object must have keyName (corresponds to a connection string option), content (base-64 encoded content), and userFilename (the name of the file - for display purposes in the app).

connectionStringOptions string

Additional connection string options to be used when connecting to the remote database.

hostname string

The hostname of the remote database instance.

port number

The port of the remote database instance.

authenticationMethod string

The type of authentication method to use. Starburst (Trino) and Trino (Presto) options include No Authentication, LDAP Authentication, or Kerberos Authentication. Google BigQuery (Google BigQuery) option is keyFile.

username string

The username used to connect to the remote database.

password string

The password used to connect to the remote database.

sid string

Required for Google BigQuery, the BigQuery project ID used to build the connection string.

`nameTemplate` object

Use the nameTemplate object to use the backing table, schema, or database names to systematically name the Immuta data sources created through the connection. All names will default to lowercase. The table below describes its child attributes.

Attribute

Description

Accepted values

dataSourceFormat string

Format to be used to name the data sources created in this group.

<tablename>
<schema>
<database>
Any string

schemaFormat string

Format to be used to name the Immuta schema created in this group.

<tablename>
<schema>
<database>
Any string

tableFormat string

Format to be used to name the Immuta table created in this group.

<tablename>
<schema>
<database>
Any string

schemaProjectNameFormat string

Format to be used to name the Immuta schema project created in this group.

<tablename>
<schema>
<database>
Any string

Example

For the table, TPC.CUSTOMER, that is given the following nameTemplate:

dataSourceFormat: <schema> <tablename>
tableFormat: <tablename>
schemaFormat: <schema>
schemaProjectNameFormat: <schema>

This nameTemplate will produce a data source named tpc.customer in a schema project named tpc.

`options` object

The options object allows you to override the default options for the data sources created through this connection. If not provided, Immuta will use the system defaults. The table below describes its child attributes.

Attribute

Description

Default values

staleDataTolerance integer

The length in seconds that data for these data sources can be cached.

disableSensitiveDataDiscovery boolean

If true, Immuta will not perform identification for the data sources created through this connection.

false

domainCollectionId string

The ID of the domain to assign the data sources to. Use the GET /domain endpoint to retrieve domains and domain IDs.

hardDelete boolean

If true, when the table backing the data source is no longer available, the data source in Immuta is deleted. If this is false, the data source will be disabled.

false

tableTags array

An array of tags (strings) to place at the data source level on every data source.

`owners` object

There are three options for the owners object when POSTing to the /data endpoint:

Include the object with data owners.
Include the object, but leave the type, name, and iam out. This will remove all data owners from the data source (other than the calling user).
Exclude the object from the payload. This will not impact your data owners and allow you to manage data owners through external processes or the UI.

The owners object is an array of objects for each owner. The table below describes its child attributes.

Attribute

Description

Accepted values

type string

The type of owner that is being added.

group
user

name string

The name of the group or the username of the user.

iam string

The ID of the identity manager system the user or group comes from. If excluded, any user/group that matches will be added as an owner.

`sources` array

Best practices

Register everything and use subscription policies to control access: If you are not tagging individual columns, omit sources to create data sources for all tables in the schema or database, and then use subscription policies to control access to the tables instead of excluding them from Immuta.
Use schema monitoring: Specifying all: true will turn on automatic schema monitoring in Immuta. As tables are added or removed, Immuta will look for those changes on a schedule (by default, once a day) and either disable or delete data sources for removed tables or create data sources for new tables.

The sources array determines which tables are registered as data sources. The table below describes its child attributes.

Option

Description

Required or optional

all boolean

If true, all tables will be registered in Immuta and schema monitoring will be on.

Required

table string

The specific table to register in Immuta as a data source.

Optional

schema string

The specific schema to monitor with schema monitoring.

Optional

columnDescriptions array

Details about the data dictionary.

Optional

description string

A short description for the data source.

Optional

documentation string

Markdown-supported documentation for the data source.

Optional

naming object

Use this object to override the nameTemplate provided for the whole database/schema. This object's attributes are the same as the nameTemplate object.

Optional

owners object

Specify owners for an individual data source. This object is the same as owners object.

Optional

tags object

Details about the tags to attach to the data source.

Optional

Examples

sources:
  - all: true

This will register specific tables and add tags and column descriptions.

sources:
  - table: name_of_table
    schema: name_of_schema
    tags:
      table:
        - Sensitive
        - Marketing
      columns:
        - columnName: acct_num
          tags:
            - unique_id
    columnDescriptions:
      - columnName: acct_num
        description: The account number

`columns` object

There are three options for the columns object when POSTing to the /data endpoint:

Include the object with column details. Only the columns listed will be in the Immuta data source.
Include the object, but leave it empty. This will turn on column detection, and Immuta will update the columns once a day to be accurate to the backing table.
Exclude the object from the payload. This will register all the columns in the table, but column detection will be off.

The columns object is an array of objects for each column. The table below describes its child attributes.

Attribute

Description

name string

The column name.

dataType string

The data type.

nullable boolean

If true, the column contains null.

remoteType string

The actual data type in the remote database.

primaryKey string

Specifies whether this is the primary key of the remote table.

description string

Describes the column.

`columnDescriptions` array

You can add descriptions to columns without having to specify all the columns in the data source. columnDescriptions is an array of objects with the following schema:

Attribute

Description

columnName string

The column name.

description string

The description of the column.

`tags` object

You can add tags to columns or data sources. tags is an object with the following schema:

Attribute

Description

table array

An array of tags (strings) to add to this table.

columns array

An array of objects that specifies columnName (string) and tags (an array of tags). The listed tags will be applied to the columns.

PreviousImmuta V2 API NextCreate a Data Policy

Last updated 1 month ago

Was this helpful?

POST /api/v2/data

Path parameters

Body parameters

connection object

nameTemplate object

options object

owners object

sources array

columns object

columnDescriptions array

tags object

POST `/api/v2/data`

`connection` object

`nameTemplate` object

`options` object

`owners` object

`sources` array

`columns` object

`columnDescriptions` array

`tags` object