Create a Data Source

The V2 API is built to easily enable an “as-code” approach to managing your data sources, so each time you POST data to this endpoint, you must provide complete details of what you want in Immuta. The two examples below illustrate this design:

  • If you POST once explicitly defining a single table under sources, and then POST a second time with a different table, this will result in a single data source in Immuta pointing to the second table and the first data source will be deleted or disabled (depending on the value specified for hardDelete).

  • If you POST once with two tableTags specified (e.g., Tag.A and Tag.B) and do a follow-up POST with tableTags: [Tag.C], only Tag.C will exist on all of the tables specified; tags Tag.A and Tag.B will be removed from all the data sources. Note: If you are frequently using the v2 API to update data tags, consider using the custom REST catalog integration instead.

Through this endpoint, you can create or update all data sources for a given schema or database.

POST /api/v2/data

Create or update data sources.

Required Immuta permission: CREATE_DATA_SOURCE

connectionKey: my-databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: tpc
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/11101101
  handler: Databricks
Technology-specific examples

Databricks data source with M2M OAuth - Azure Databricks

connectionKey: my-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
  schemaProjectNameFormat: <schema>
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: data
  authenticationMethod: oAuthM2M
  useCertificate: false
  clientId: "${service_principal_clientId}"
  audience: https://your.databricks.hostname.com/oidc/v1/token 
  scope: all-apis
  clientSecret: "${clientSecret}"
  httpPath: sql/protocolv1/o/0/1110-11123
  handler: Databricks

Databricks data source with overriding the naming convention

connectionKey: ebock-databricks
nameTemplate:
  dataSourceFormat: Databricks <Tablename>
  tableFormat: <tablename>
  schemaFormat: databricks
connection:
  hostname: your.databricks.hostname.com
  port: 443
  ssl: true
  database: ebock
  username: token
  password: "${DATABRICKS_PASSWORD}"
  httpPath: sql/protocolv1/o/0/1110-185737-wove
  handler: Databricks
sources:
  - table: credit_card_transactions
    schema: ebock
  - table: crime_data_delta
    schema: ebock
    naming:
      datasource: Crime Data
      table: crime_data
      schema: databricks
  - table: hipaa_data
    schema: ebock

Redshift Spectrum data source

Your nativeSchemaFormat must contain _immuta to avoid schema name conflicts.

connectionKey: redshift
connection:
  hostname: your-redshift-cluster.djie25k.us-east-1.redshift.amazonaws.com
  port: 5439
  ssl: true
  database: your_database_with_external_schema
  username: awsuser
  password: your_password
  handler: Redshift
  schema: external_schema
nameTemplate:
  dataSourceFormat: <Tablename>
  schemaFormat: <schema>
  tableFormat: <tablename>
  schemaProjectNameFormat: <Schema>
  nativeSchemaFormat: <schema>_immuta
  nativeViewFormat: <tablename>
sources:
  - all: true

Snowflake data source only registering specific tables

connectionKey: tpc-snowflake
nameTemplate:
  dataSourceFormat: Snowflake <Tablename>
  tableFormat: <tablename>
  schemaFormat: snowflake
connection:
  hostname: example.hostname.snowflakecomputing.com
  port: 443
  ssl: true
  database: TPC
  username: USERA
  password: "${SNOWFLAKE_PASSWORD}"
  schema: PUBLIC
  warehouse: IT_WH
  handler: Snowflake
sources:
  - table: CASE
    schema: PUBLIC
  - table: CASE2
    schema: PUBLIC
  - table: CUSTOMER
    schema: PUBLIC
  - table: WEB_SALES
    schema: PUBLIC

Path parameters

Parameter
Description
Required or optional
Default value

dryRun boolean

If true, no updates will actually be made.

Optional

false

wait number

The number of seconds to wait for data sources to be created before returning. Anything less than 0 will wait indefinitely.

Optional

0

Body parameters

The body of the request contains the details of the data source you want to create. The following table describes the attributes you can include in the body.

Attribute
Description
Required or optional

connectionKey string

A key/name to uniquely identify this collection of data sources.

Required

connection object

Connection information.

Required

A template to override naming conventions. If not provided, system defaults will be used.

Optional

options object

Override options for these data sources. If not provided, system defaults will be used.

Optional

owners object

Specify owners for all data sources created.

Optional

sources array

Configure which data sources are created. If not provided, all objects from the given connection will be created.

Optional

connection object

The connection object specifies the connection details required to connect to your data source. The tables below describes its child attributes.

Attribute
Description
Required or optional

handler

Snowflake

Required

ssl boolean

Set to true to enable SSL communication with the remote database.

Optional

database string

The database name.

Required

schema string

The schema in the remote database.

Optional

hostname string

The hostname of the remote database instance.

Required

port number

The port of the remote database instance.

Optional

warehouse string

The default pool of compute resources Immuta will use to run queries and other Snowflake operations.

Required

connectionStringOptions string

Additional connection string options to be used when connecting to the remote database.

Optional

authenticationMethod string

The type of authentication method to use. Options include userPassword, keyPair, and oAuthClientCredentials.

Required

username string

The username used to connect to the remote database.

Required if using userPassword or keyPair.

password string

The password used to connect to the remote database.

Required if using userPassword.

useCertificate boolean

Set to true when using client certificate credentials to request an access token. Otherwise, set to false to use client secret.

Required if using oAuthClientCredentials.

userFiles object

Details about the files required for the request.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

keyName string

The connection name of the key file. Must be PRIV_KEY_FILE if using keyPair, or must be oauth client certificate if using oAuthClientCredentials.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

content string

The content of the file, base-64 encoded.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

userFilename string

The name of the file - for display in the UI.

Required if using keyPair or oAuthClientCredentials with useCertificate set to true.

nameTemplate object

Use the nameTemplate object to use the backing table, schema, or database names to systematically name the Immuta data sources created through the connection. All names will default to lowercase. The table below describes its child attributes.

Attribute
Description
Accepted values

dataSourceFormat string

Format to be used to name the data sources created in this group.

  • <tablename>

  • <schema>

  • <database>

  • Any string

schemaFormat string

Format to be used to name the Immuta schema created in this group.

  • <tablename>

  • <schema>

  • <database>

  • Any string

tableFormat string

Format to be used to name the Immuta table created in this group.

  • <tablename>

  • <schema>

  • <database>

  • Any string

schemaProjectNameFormat string

Format to be used to name the Immuta schema project created in this group.

  • <tablename>

  • <schema>

  • <database>

  • Any string

Example

For the table, TPC.CUSTOMER, that is given the following nameTemplate:

dataSourceFormat: <schema> <tablename>
tableFormat: <tablename>
schemaFormat: <schema>
schemaProjectNameFormat: <schema>

This nameTemplate will produce a data source named tpc.customer in a schema project named tpc.

options object

The options object allows you to override the default options for the data sources created through this connection. If not provided, Immuta will use the system defaults. The table below describes its child attributes.

Attribute
Description
Default values

staleDataTolerance integer

The length in seconds that data for these data sources can be cached.

-

disableSensitiveDataDiscovery boolean

If true, Immuta will not perform identification for the data sources created through this connection.

false

domainCollectionId string

The ID of the domain to assign the data sources to. Use the GET /domain endpoint to retrieve domains and domain IDs.

-

hardDelete boolean

If true, when the table backing the data source is no longer available, the data source in Immuta is deleted. If this is false, the data source will be disabled.

false

tableTags array

An array of tags (strings) to place at the data source level on every data source.

-

owners object

There are three options for the owners object when POSTing to the /data endpoint:

  1. Include the object with data owners.

  2. Include the object, but leave the type, name, and iam out. This will remove all data owners from the data source (other than the calling user).

  3. Exclude the object from the payload. This will not impact your data owners and allow you to manage data owners through external processes or the UI.

The owners object is an array of objects for each owner. The table below describes its child attributes.

Attribute
Description
Accepted values

type string

The type of owner that is being added.

  • group

  • user

name string

The name of the group or the username of the user.

-

iam string

The ID of the identity manager system the user or group comes from. If excluded, any user/group that matches will be added as an owner.

-

sources array

Best practices

  • Register everything and use subscription policies to control access: If you are not tagging individual columns, omit sources to create data sources for all tables in the schema or database, and then use subscription policies to control access to the tables instead of excluding them from Immuta.

  • Use schema monitoring: Specifying all: true will turn on automatic schema monitoring in Immuta. As tables are added or removed, Immuta will look for those changes on a schedule (by default, once a day) and either disable or delete data sources for removed tables or create data sources for new tables.

The sources array determines which tables are registered as data sources. The table below describes its child attributes.

Option
Description
Required or optional

all boolean

If true, all tables will be registered in Immuta and schema monitoring will be on.

Required

table string

The specific table to register in Immuta as a data source.

Optional

schema string

The specific schema to monitor with schema monitoring.

Optional

Details about the data dictionary.

Optional

description string

A short description for the data source.

Optional

documentation string

Markdown-supported documentation for the data source.

Optional

naming object

Use this object to override the nameTemplate provided for the whole database/schema. This object's attributes are the same as the nameTemplate object.

Optional

owners object

Specify owners for an individual data source. This object is the same as owners object.

Optional

tags object

Details about the tags to attach to the data source.

Optional

Examples

sources:
  - all: true

columns object

There are three options for the columns object when POSTing to the /data endpoint:

  1. Include the object with column details. Only the columns listed will be in the Immuta data source.

  2. Include the object, but leave it empty. This will turn on column detection, and Immuta will update the columns once a day to be accurate to the backing table.

  3. Exclude the object from the payload. This will register all the columns in the table, but column detection will be off.

The columns object is an array of objects for each column. The table below describes its child attributes.

Attribute
Description

name string

The column name.

dataType string

The data type.

nullable boolean

If true, the column contains null.

remoteType string

The actual data type in the remote database.

primaryKey string

Specifies whether this is the primary key of the remote table.

description string

Describes the column.

columnDescriptions array

You can add descriptions to columns without having to specify all the columns in the data source. columnDescriptions is an array of objects with the following schema:

Attribute
Description

columnName string

The column name.

description string

The description of the column.

tags object

You can add tags to columns or data sources. tags is an object with the following schema:

Attribute
Description

table array

An array of tags (strings) to add to this table.

columns array

An array of objects that specifies columnName (string) and tags (an array of tags). The listed tags will be applied to the columns.

Last updated

Was this helpful?