1 of 4

Reference Guides

External Catalog Introduction

Users who want to use tags from outside of Immuta can connect an external catalog to automatically pull and apply them to Immuta data sources. These tags can then be used to drive policies or classification frameworks.

Supported external catalogs

Immuta supports the following external catalogs:

Alation
Collibra
Microsoft Purview
Custom REST catalog
Databricks Unity Catalog tag ingestion
Snowflake tag ingestion

To configure an external catalog, see the Configure an external catalog guide.

Architecture

Once an external catalog has been configured on the Immuta app settings page, there are two recurring process steps:

Linking to data sources and columns: Whenever a new data source is created or an external catalog is set up, Immuta will attempt to automatically link data sources to their corresponding assets in the external catalog. This is done by comparing the fully qualified name of a data source in Immuta with its corresponding asset name in the external catalog, so data sources must have the same name in Immuta and the external catalog. Alternatively, a user can also manually link a data source to an asset in an external catalog. Once a data source has been linked to an external catalog, it can be seen on the data source's detail page.
Pull and apply tags in Immuta: Using the link established in the first step, Immuta polls the external catalog to ingest and apply tags to each data source and its columns. Immuta checks every 24 hours for any relevant metadata changes in the connected external catalog. Tags originating from an external catalog can be found on the tags list page and on the data dictionary page for each data source.

See below for more information about the way Immuta integrates with each supported external catalog provider.

Alation

Immuta's Alation integration supports importing both tags and custom fields, Alation's two primary ways of allowing data stewards to apply metadata to data assets.

Tags: Tags are a single word or phrase that can be attached to most Alation objects by nearly anyone. For instance, users can add a PCI tag for financial data.
Custom fields: Custom fields are key-value pairs that can only be attached and removed by authorized users. Unlike tags, custom fields can have multiple values associated with a single key. For example, the custom field DK_STEWARD could have MARKETING, FINANCE, and CUSTOMER values associated with it. Using Alation custom fields allows you to explicitly control who can modify information associated with that field inside of Alation, whereas Alation standard tags are modifiable by any user inside of Alation.

When pulled into Immuta, Alation tags and custom fields will be applied to data sources as either column or data source tags in Immuta. Importing both Alation tags and custom fields into Immuta provides full flexibility for customers leveraging the Alation enterprise data catalog, no matter what operating model they choose to document their metadata in Alation.

Collibra

Collibra tags using the dot "." delimiter will be transformed into hierarchical tags in Immuta. To learn more about the benefits of hierarchical tags for policy authoring, see tag hierarchy.

Immuta's Collibra integration supports importing both tags and attributes. Additionally, data source and column descriptions from the connected Collibra catalog will be pulled into Immuta.

Tags: Tags are a single word or phrase that can be attached to objects in Collibra. For instance, users can add a PHI tag on health-related data assets.
Attributes: Attributes in Collibra are a characteristic that describes an asset with an individual field. Unlike tags, attributes can have multiple values associated with a single key. For example, the attribute classification could have non sensitive, sensitive, and highly sensitive values associated with it. Using Collibra attributes allows you to explicitly control who can modify information associated with that field inside of Collibra, whereas Collibra standard tags are modifiable by any user inside of Collibra.

When pulled into Immuta, Collibra tags and attributes will be applied to data sources as either column or data source tags in Immuta. Importing both Collibra tags and attributes into Immuta provides full flexibility for customers leveraging the Collibra data catalog, no matter what operating model they choose to document their metadata in Collibra.

How Immuta gets metadata from Collibra

Linking to data sources and columns in Collibra: Immuta links data sources to assets in Collibra by looking up the full name. To ensure unique names that Immuta can easily link to, it is recommended that customers use Collibra Edge to onboard their data sources into Collibra.
Pull and apply tags in Immuta from Collibra: Immuta checks Collibra every 24 hours by observing the linked assets history for any relevant metadata changes. Based on these changes, Immuta then only polls and ingests tags from Collibra for the relevant data sources. However, if Immuta observes more than 25,000 metadata changes in Collibra within 24 hours, it will poll all data sources for tags during that run of external catalog tag synchronization.

Limitations

Collibra assets must have unique full names in order for Immuta to guarantee exact matching. If there are multiple Collibra assets with the same name, Immuta will link to the first asset it matches to.
Columns must have a direct relation to their parent asset in Collibra. Indirect/inherited relations are not supported and will result in column tags and attributes not being ingested into Immuta.

Microsoft Purview catalog

Private preview

The Microsoft Purview catalog integration is only available to select accounts. Contact your Immuta representative to enable this feature.

The Microsoft Purview catalog integration with Immuta currently supports ingestion of Classifications and Managed attributes as tags. Additionally, data source and column descriptions from the connected Microsoft Purview catalog will be pulled into Immuta.

How Immuta gets metadata from Microsoft Purview

Linking to data sources and columns in Microsoft Purview: Immuta links data sources to assets in Microsoft Purview by looking up the fully qualified name of an entity. The composition of the fully qualified name in Microsoft Purview differs depending on the technology type backing the data source.
Pull and apply tags in Immuta from Microsoft Purview: Immuta polls Microsoft Purview every 24 hours for all tags.

Limitations

Standard tags from Purview do not get ingested into Immuta
The current implementation only supports Databricks Unity Catalog, Snowflake and Azure Synapse Analytics data sources and their associated columns
Managed attributes are supported, but have the following limitations:
- If a managed attribute is applied to an Immuta data source but later expires, it will still appear as a tag on the data source. Expired attributes must be removed from the object in Purview for the tag to be removed from the Immuta data source.
- The following managed attribute data types are not supported and will not be applied to Immuta data sources as tags:
  - Dates
  - Number types
  - Rich text

Custom REST catalog

If users have an unsupported catalog, or have customized their catalog integration, they can connect through the REST Catalog using the Immuta API.

For more details about using a custom REST catalog with Immuta, see the Custom REST Catalog Interface Introduction.

Databricks Unity Catalog tag ingestion

Design partner preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Users can connect their Databricks Unity Catalog account to allow Immuta to ingest Databricks tags and apply them to Databricks data sources. To learn more about Databricks Unity Catalog tag ingestion, see the Databricks Unity Catalog reference guide.

Snowflake tag ingestion

Users can connect a Snowflake account to allow Immuta to ingest Snowflake tags onto Snowflake data sources. To learn more about Snowflake tag ingestion, see the Snowflake reference guide.

External catalog behaviors

Tags ingested from external catalogs cannot be edited within Immuta. To edit, delete, or add a tag from an external catalog to a data source or column, make the change in the external catalog.
You can configure multiple external catalogs within a single tenant of Immuta, but only one external catalog can be linked to a data source.
Immuta searches all external catalog providers once per day and links data sources without an external catalog attached to them to the first catalog that matches.
S3 data sources cannot currently be linked to external catalogs.

Resources

To configure an external catalog, see the Configuration how-to guide.
To learn more about how Immuta can automatically tag your data with Discover, see the Discover introduction.

Custom REST Catalog Interface Introduction

The custom REST catalog integration allows Immuta to make a to a Custom REST service you develop to retrieve metadata. The Custom REST service receives Immuta's calls, and then collects the relevant information and delivers it back to Immuta.

The diagram below highlights the main feature of Immuta's Custom REST Catalog integration.

Through a Custom REST Catalog, you can build and maintain your own solutions that provide metadata required to effectively use Immuta within your organization.

Section Contents

Custom REST Catalog Interface Endpoints

Architectural Overview

The diagram below contrasts Immuta's provided catalog integration architecture with this Customer REST Catalog interface - which gives the customer tremendous control over the metadata being provided to Immuta.

The custom-developed service must be built to receive and handle calls to the REST endpoints specified below. Immuta will call these endpoints as detailed below when certain events occur and at various intervals. The required responses to complete the connection are also detailed.

General Concepts

Tags in Immuta

Tags are attributes applied to data - either at the top, data source, level or at the individual column level.

Tags in Immuta take the form of a nested tree structure. There are "parents", "children", "grand-children", etc.:

| Parent (root)
|\_ Child1
|   \_ Grandchild1 (leaf)
 \_ Child2
    \_ Grandchild1 (leaf)

The REST Catalog interface interprets a tag's relationship mapping from a string based on a standard "dot" (.) notation, like:

"Parent.Child1.Grandchild1"

Tags returned must meet the following constraints:

They must be no longer than 500 characters. Longer tags will not throw an error but will be truncated silently at 500 characters.
They must be composed of letters, digits, underscores, dashes, and whitespace characters. A period (.) is used as a separator as described above. Other special characters are not supported.

A tag object has a single id property, which is used to uniquely identify the tag within the catalog. This id may be of either a string or integer type, and its value is completely up to the designer of the REST Catalog service. Common examples include: a standard integer value, a UUID, or perhaps a hash of the tag's string value (if it is unique within the system).

For this Customer REST Catalog interface, tags are represented in JSON like:

"<string>[.<string>[.<string>...]]": {
    "id": "<unique identifier, string or int>"
},

For example, the object below specifies 3 different tags:

"REST_Catalog_Root": {
    "id": "id_is_set_by_catalog_and_can_be_int_or_string"
},
"REST_Catalog_Root.Child1": {
    "id": "d3e859da-40e9-43d2-a302-294458e79a64"
},
"REST_Catalog_Root.Child2.Grandchild1": {
    "id": 10
}

For more information on tags and how they are created, managed, and displayed within Immuta, see our tag documentation.

Descriptions in Immuta

Descriptions are strings that, like tags, can be applied to either a data source or an individual column. These strings support UTF-8, including special and various language characters.

Authentication

Immuta can make requests to your REST Catalog service using any of the following authentication methods:

Username and password: Immuta can send requests with a username and a password in the Authorization HTTP header. In this case, the custom REST service will need to be able to parse a Basic Authorization Header and validate the credentials sent with it.
PKI Certificate: Immuta can also send requests using a CA certificate, a certificate, and a key.
NO Authentication: Immuta can make unauthenticated requests to your REST Catalog service. However, this should only be used if you have other security measures in place (e.g., if the service is in an isolated network that's reachable only by your Immuta environment).

Authentication and specific endpoints

When accessing the /dataSource and /tags endpoints, Immuta will use the configured username and password. If you choose to also protect the human-readable pages with authentication, users will be prompted to authenticate when they first visit those pages.

Endpoint Specification

GET `/tags`

Overview

The /tags endpoint is used to collect ALL the tags the catalog can provide. It is used by Immuta to populate Immuta's tags list in the Governance section. These tags can then be used for policy creation ahead of actual data sources being created that make use of them. This enables policies to immediately apply when data sources are registered with Immuta.

As with all external catalogs, tags ingested by Immuta from the REST catalog interface are not able to be modified locally within Immuta as this catalog becomes the "source of truth" for them. This results in the tags showing in Immuta with either a lock icon next to them, or without the delete button that would allow a user to manually remove them from an assigned data source or column.

Request

The /tags endpoint receives a simple GET request from Immuta. No payload nor query parameters are required.

Example request:

curl http://<your_custom_rest_catalog>/tags \
     --header 'Authorization: Basic <base64 of username:password>'

Response

The Custom REST service must respond with an object that maps all tag name strings to associated ids. The tag name string fully-qualifies the location of the tag in the tree structure as detailed previously, and the id is a globally unique identifier assigned by the REST catalog to that tag.

Example response:

{
  "REST_Catalog_Root": {
      "id": "id_is_set_by_catalog_and_can_be_int_or_string"
  },
  "REST_Catalog_Root.Child1": {
      "id": "d3e859da-40e9-43d2-a302-294458e79a64"
  },
  "REST_Catalog_Root.Child2.Grandchild1": {
      "id": 10
  }
}

POST `/dataSource`

Overview

The /dataSource endpoint does the vast majority of the work. It receives a POST request from Immuta, and returns the mapping of a data source and its columns to the applied tags and descriptions.

Immuta will try to fetch metadata for a data source in the system at various times:

During data source creation. During data source creation, Immuta will send metadata to the REST Catalog service, most notably the connection details of the data source, which includes the schema and table name. It is important that the Custom REST service implemented can parse this information and search its records for an appropriate record to return with an ID unique to this data source in its catalogMetadata object.
When a user manually links the data source. Data sources that either fail to auto-link, or that were created prior to the Custom REST catalog being configured, can still be manually linked. To do so, a data source owner can provide the ID of the asset as defined by the Custom REST Catalog via the Immuta UI. In order for this to work, the Custom REST Catalog service must support matching data source assets by unique ID.
During various refreshes. Once linked, Immuta will periodically call the /dataSource endpoint to ensure information is up to date.

Request

Immuta's POST requests to the /dataSource endpoint will consist of a payload containing many of the elements outlined below:

Attribute

Data Type

Description

catalogMetadata

dictionary

Object holding the data source's catalog metadata.

catalogMetadata.id

string or integer

The unique identifier of the data source in the catalog.

catalogMetadata.name

string

The name of the data source in the catalog.

handlerInfo

dictionary

Object holding the data source's connection details.

handlerInfo.schema

string

The data source’s schema name in the source system.

handlerInfo.table

string

The data source’s table name in the source system.

handlerInfo.hostname

string

The data source’s connection schema in the source storage system.

handlerInfo.port

integer

The data source’s connection port in the source storage system.

handlerInfo.query

string

The data source’s connection schema in the source storage system, if applicable.

dataSource

dictionary

Object holding general data source information from Immuta. This can be viewed with debugging, but is not usually required for catalog purposes.

This object must be parsed by the in Custom REST Catalog order to determine the specific data source metadata being requested.

For the most part, Immuta will provide the id of the data source as part of the catalogMetadata. This should be used as the primary metadata lookup value.

{
  "catalogMetadata": {
    "id": <unique integer or string value>
  }
}

When a data source is being created, such an id will not yet be known to Immuta. Immuta will instead send handlerInfo information as part of the request.

{
  "handlerInfo": {
    "schema": "schema_name",
    "table": "table_name"
  }
}

When an id is not specified, the schema and table name elements should be parsed in an attempt to identify the desired catalog entry and provide an appropriate id. If such a lookup is successful and an id is returned to Immuta in the catalogMetadata section, Immuta will establish an automatic link between the the new data source and the catalog entry, and future references will use that id.

The schema for the /dataSource response uses the same tag object structure from the /tags response, along with the following set of metadata keys for both data sources and columns.

Attribute

Data Type

Description

catalogMetadata

dictionary

Object holding the data source's catalog metadata.

catalogMetadata.id

string or integer

The unique identifier of the data source in the catalog.

catalogMetadata.name

string

The name of the data source in the catalog.

description

string

A description of the data source.

tags

<tags object>

Object containing the data source-level tags.

dictionary

dictionary

Object containing the column names of the data source as its keys.

dictionary.<column>

dictionary

Object containing a single column's metadata.

dictionary.<column>.catalogMetadata.id

string or integer

The unique identifier of the column in the catalog.

dictionary.<column>.description

string

A description of the column.

dictionary.<column>.tags

<tags object>

Object containing the column-level tags as keys.

Response

Example response:

  "catalogMetadata": {
    "id": <unique integer or string>
  },
  "description": <string>,
  "tags": {
    "Parent": {
      "id": <tag_id1>
    },
  },
  "dictionary": {
    "some_column_name": {
      "catalogMetadata": {
        "id": <col_id1>
      },
      "description": "This column has example data in it",
      "tags": {
        "Parent.Child1": {
          "id": <tag_id2>
        },
        "Parent.Child1.Grandchild1": {
          "id": <tag_id3>
        }
      }
    }
  }
}

GET `/dataSource/page/{id}`

Overview

This endpoint returns a human-readable information page from the REST catalog for the data source associated with {id}. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the data source that may not be directly ingested by or visible within Immuta. This link is accessed in the Immuta UI when a user clicks the catalog logo associated with the data source.

Request

Immuta will send a GET request to the /dataSource/page/{id} endpoint, where {id} will be:

Attribute

Data Type

Description

id

URL Parameter, integer or string

The unique identifier of the data source in the remote catalog system.

Example request:

curl http://<your_custom_rest_catalog>/dataSource/page/123

Response

The Custom REST Catalog can either provide such a page directly, or can redirect the user to any resource where the appropriate page would be provided - for example a backing full service catalog such as Collibra, if this Custom REST catalog is simply being used to support a custom data model.

Example response:

<html> 
  <head> 
    <title>data source 123</title> 
  </head> 
  <body> data source 123 is a data source that was created just for documentation.
  </body> 
</html>

GET `/column/{id}`

Overview

This endpoint returns the catalog's human-readable information page for the column associated with {id}. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the specific column that may not be directly ingested by or visible within Immuta.

Request

Immuta will send a GET request to the /column/{id} endpoint, where {id} will be:

Attribute

Data Type

Description

id

URL Parameter, integer or string

The unique identifier of the column in the remote catalog system.

Example request:

curl http://<your_custom_rest_catalog>/column/10

Response

Example response:

<html>
  <head>
    <title>data source 123 Column 10</title>
  </head>
  <body>
    Column 10 is full of example data for documentation reasons.
  </body>
</html>

External Catalog Introduction

Supported external catalogs

Immuta supports the following external catalogs:

Alation
Collibra
Microsoft Purview
Custom REST catalog
Databricks Unity Catalog tag ingestion
Snowflake tag ingestion

To configure an external catalog, see the Configure an external catalog guide.

Architecture

Once an external catalog has been configured on the Immuta app settings page, there are two recurring process steps:

Linking to data sources and columns: Whenever a new data source is created or an external catalog is set up, Immuta will attempt to automatically link data sources to their corresponding assets in the external catalog. This is done by comparing the fully qualified name of a data source in Immuta with its corresponding asset name in the external catalog, so data sources must have the same name in Immuta and the external catalog. Alternatively, a user can also manually link a data source to an asset in an external catalog. Once a data source has been linked to an external catalog, it can be seen on the data source's detail page.
Pull and apply tags in Immuta: Using the link established in the first step, Immuta polls the external catalog to ingest and apply tags to each data source and its columns. Immuta checks every 24 hours for any relevant metadata changes in the connected external catalog. Tags originating from an external catalog can be found on the tags list page and on the data dictionary page for each data source.

See below for more information about the way Immuta integrates with each supported external catalog provider.

Alation

Immuta's Alation integration supports importing both tags and custom fields, Alation's two primary ways of allowing data stewards to apply metadata to data assets.

Tags: Tags are a single word or phrase that can be attached to most Alation objects by nearly anyone. For instance, users can add a PCI tag for financial data.
Custom fields: Custom fields are key-value pairs that can only be attached and removed by authorized users. Unlike tags, custom fields can have multiple values associated with a single key. For example, the custom field DK_STEWARD could have MARKETING, FINANCE, and CUSTOMER values associated with it. Using Alation custom fields allows you to explicitly control who can modify information associated with that field inside of Alation, whereas Alation standard tags are modifiable by any user inside of Alation.

Collibra

Collibra tags using the dot "." delimiter will be transformed into hierarchical tags in Immuta. To learn more about the benefits of hierarchical tags for policy authoring, see tag hierarchy.

Immuta's Collibra integration supports importing both tags and attributes. Additionally, data source and column descriptions from the connected Collibra catalog will be pulled into Immuta.

Tags: Tags are a single word or phrase that can be attached to objects in Collibra. For instance, users can add a PHI tag on health-related data assets.
Attributes: Attributes in Collibra are a characteristic that describes an asset with an individual field. Unlike tags, attributes can have multiple values associated with a single key. For example, the attribute classification could have non sensitive, sensitive, and highly sensitive values associated with it. Using Collibra attributes allows you to explicitly control who can modify information associated with that field inside of Collibra, whereas Collibra standard tags are modifiable by any user inside of Collibra.

How Immuta gets metadata from Collibra

Linking to data sources and columns in Collibra: Immuta links data sources to assets in Collibra by looking up the full name. To ensure unique names that Immuta can easily link to, it is recommended that customers use Collibra Edge to onboard their data sources into Collibra.
Pull and apply tags in Immuta from Collibra: Immuta checks Collibra every 24 hours by observing the linked assets history for any relevant metadata changes. Based on these changes, Immuta then only polls and ingests tags from Collibra for the relevant data sources. However, if Immuta observes more than 25,000 metadata changes in Collibra within 24 hours, it will poll all data sources for tags during that run of external catalog tag synchronization.

Limitations

Collibra assets must have unique full names in order for Immuta to guarantee exact matching. If there are multiple Collibra assets with the same name, Immuta will link to the first asset it matches to.
Columns must have a direct relation to their parent asset in Collibra. Indirect/inherited relations are not supported and will result in column tags and attributes not being ingested into Immuta.

Microsoft Purview catalog

Private preview

The Microsoft Purview catalog integration is only available to select accounts. Contact your Immuta representative to enable this feature.

How Immuta gets metadata from Microsoft Purview

Linking to data sources and columns in Microsoft Purview: Immuta links data sources to assets in Microsoft Purview by looking up the fully qualified name of an entity. The composition of the fully qualified name in Microsoft Purview differs depending on the technology type backing the data source.
Pull and apply tags in Immuta from Microsoft Purview: Immuta polls Microsoft Purview every 24 hours for all tags.

Limitations

Standard tags from Purview do not get ingested into Immuta
The current implementation only supports Databricks Unity Catalog, Snowflake and Azure Synapse Analytics data sources and their associated columns
Managed attributes are supported, but have the following limitations:
- If a managed attribute is applied to an Immuta data source but later expires, it will still appear as a tag on the data source. Expired attributes must be removed from the object in Purview for the tag to be removed from the Immuta data source.
- The following managed attribute data types are not supported and will not be applied to Immuta data sources as tags:
  - Dates
  - Number types
  - Rich text

Custom REST catalog

If users have an unsupported catalog, or have customized their catalog integration, they can connect through the REST Catalog using the Immuta API.

For more details about using a custom REST catalog with Immuta, see the Custom REST Catalog Interface Introduction.

Databricks Unity Catalog tag ingestion

Design partner preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Snowflake tag ingestion

Users can connect a Snowflake account to allow Immuta to ingest Snowflake tags onto Snowflake data sources. To learn more about Snowflake tag ingestion, see the Snowflake reference guide.

External catalog behaviors

Tags ingested from external catalogs cannot be edited within Immuta. To edit, delete, or add a tag from an external catalog to a data source or column, make the change in the external catalog.
You can configure multiple external catalogs within a single tenant of Immuta, but only one external catalog can be linked to a data source.
Immuta searches all external catalog providers once per day and links data sources without an external catalog attached to them to the first catalog that matches.
S3 data sources cannot currently be linked to external catalogs.

Resources

To configure an external catalog, see the Configuration how-to guide.
To learn more about how Immuta can automatically tag your data with Discover, see the Discover introduction.

Custom REST Catalog Interface Endpoints

Architectural Overview

General Concepts

Tags in Immuta

Tags are attributes applied to data - either at the top, data source, level or at the individual column level.

Tags in Immuta take the form of a nested tree structure. There are "parents", "children", "grand-children", etc.:

| Parent (root)
|\_ Child1
|   \_ Grandchild1 (leaf)
 \_ Child2
    \_ Grandchild1 (leaf)

The REST Catalog interface interprets a tag's relationship mapping from a string based on a standard "dot" (.) notation, like:

"Parent.Child1.Grandchild1"

Tags returned must meet the following constraints:

They must be no longer than 500 characters. Longer tags will not throw an error but will be truncated silently at 500 characters.
They must be composed of letters, digits, underscores, dashes, and whitespace characters. A period (.) is used as a separator as described above. Other special characters are not supported.

For this Customer REST Catalog interface, tags are represented in JSON like:

"<string>[.<string>[.<string>...]]": {
    "id": "<unique identifier, string or int>"
},

For example, the object below specifies 3 different tags:

"REST_Catalog_Root": {
    "id": "id_is_set_by_catalog_and_can_be_int_or_string"
},
"REST_Catalog_Root.Child1": {
    "id": "d3e859da-40e9-43d2-a302-294458e79a64"
},
"REST_Catalog_Root.Child2.Grandchild1": {
    "id": 10
}

For more information on tags and how they are created, managed, and displayed within Immuta, see our tag documentation.

Descriptions in Immuta

Descriptions are strings that, like tags, can be applied to either a data source or an individual column. These strings support UTF-8, including special and various language characters.

Authentication

Immuta can make requests to your REST Catalog service using any of the following authentication methods:

Username and password: Immuta can send requests with a username and a password in the Authorization HTTP header. In this case, the custom REST service will need to be able to parse a Basic Authorization Header and validate the credentials sent with it.
PKI Certificate: Immuta can also send requests using a CA certificate, a certificate, and a key.
NO Authentication: Immuta can make unauthenticated requests to your REST Catalog service. However, this should only be used if you have other security measures in place (e.g., if the service is in an isolated network that's reachable only by your Immuta environment).

Authentication and specific endpoints

Endpoint Specification

GET `/tags`

Overview

Request

The /tags endpoint receives a simple GET request from Immuta. No payload nor query parameters are required.

Example request:

curl http://<your_custom_rest_catalog>/tags \
     --header 'Authorization: Basic <base64 of username:password>'

Response

Example response:

{
  "REST_Catalog_Root": {
      "id": "id_is_set_by_catalog_and_can_be_int_or_string"
  },
  "REST_Catalog_Root.Child1": {
      "id": "d3e859da-40e9-43d2-a302-294458e79a64"
  },
  "REST_Catalog_Root.Child2.Grandchild1": {
      "id": 10
  }
}

POST `/dataSource`

Overview

The /dataSource endpoint does the vast majority of the work. It receives a POST request from Immuta, and returns the mapping of a data source and its columns to the applied tags and descriptions.

Immuta will try to fetch metadata for a data source in the system at various times:

During data source creation. During data source creation, Immuta will send metadata to the REST Catalog service, most notably the connection details of the data source, which includes the schema and table name. It is important that the Custom REST service implemented can parse this information and search its records for an appropriate record to return with an ID unique to this data source in its catalogMetadata object.
When a user manually links the data source. Data sources that either fail to auto-link, or that were created prior to the Custom REST catalog being configured, can still be manually linked. To do so, a data source owner can provide the ID of the asset as defined by the Custom REST Catalog via the Immuta UI. In order for this to work, the Custom REST Catalog service must support matching data source assets by unique ID.
During various refreshes. Once linked, Immuta will periodically call the /dataSource endpoint to ensure information is up to date.

Request

Immuta's POST requests to the /dataSource endpoint will consist of a payload containing many of the elements outlined below:

Attribute

Data Type

Description

catalogMetadata

dictionary

Object holding the data source's catalog metadata.

catalogMetadata.id

string or integer

The unique identifier of the data source in the catalog.

catalogMetadata.name

string

The name of the data source in the catalog.

handlerInfo

dictionary

Object holding the data source's connection details.

handlerInfo.schema

string

The data source’s schema name in the source system.

handlerInfo.table

string

The data source’s table name in the source system.

handlerInfo.hostname

string

The data source’s connection schema in the source storage system.

handlerInfo.port

integer

The data source’s connection port in the source storage system.

handlerInfo.query

string

The data source’s connection schema in the source storage system, if applicable.

dataSource

dictionary

Object holding general data source information from Immuta. This can be viewed with debugging, but is not usually required for catalog purposes.

This object must be parsed by the in Custom REST Catalog order to determine the specific data source metadata being requested.

For the most part, Immuta will provide the id of the data source as part of the catalogMetadata. This should be used as the primary metadata lookup value.

{
  "catalogMetadata": {
    "id": <unique integer or string value>
  }
}

When a data source is being created, such an id will not yet be known to Immuta. Immuta will instead send handlerInfo information as part of the request.

{
  "handlerInfo": {
    "schema": "schema_name",
    "table": "table_name"
  }
}

The schema for the /dataSource response uses the same tag object structure from the /tags response, along with the following set of metadata keys for both data sources and columns.

Attribute

Data Type

Description

catalogMetadata

dictionary

Object holding the data source's catalog metadata.

catalogMetadata.id

string or integer

The unique identifier of the data source in the catalog.

catalogMetadata.name

string

The name of the data source in the catalog.

description

string

A description of the data source.

tags

<tags object>

Object containing the data source-level tags.

dictionary

dictionary

Object containing the column names of the data source as its keys.

dictionary.<column>

dictionary

Object containing a single column's metadata.

dictionary.<column>.catalogMetadata.id

string or integer

The unique identifier of the column in the catalog.

dictionary.<column>.description

string

A description of the column.

dictionary.<column>.tags

<tags object>

Object containing the column-level tags as keys.

Response

Example response:

  "catalogMetadata": {
    "id": <unique integer or string>
  },
  "description": <string>,
  "tags": {
    "Parent": {
      "id": <tag_id1>
    },
  },
  "dictionary": {
    "some_column_name": {
      "catalogMetadata": {
        "id": <col_id1>
      },
      "description": "This column has example data in it",
      "tags": {
        "Parent.Child1": {
          "id": <tag_id2>
        },
        "Parent.Child1.Grandchild1": {
          "id": <tag_id3>
        }
      }
    }
  }
}

GET `/dataSource/page/{id}`

Overview

Request

Immuta will send a GET request to the /dataSource/page/{id} endpoint, where {id} will be:

Attribute

Data Type

Description

id

URL Parameter, integer or string

The unique identifier of the data source in the remote catalog system.

Example request:

curl http://<your_custom_rest_catalog>/dataSource/page/123

Response

Example response:

<html> 
  <head> 
    <title>data source 123</title> 
  </head> 
  <body> data source 123 is a data source that was created just for documentation.
  </body> 
</html>

GET `/column/{id}`

Overview

Request

Immuta will send a GET request to the /column/{id} endpoint, where {id} will be:

Attribute

Data Type

Description

id

URL Parameter, integer or string

The unique identifier of the column in the remote catalog system.

Example request:

curl http://<your_custom_rest_catalog>/column/10

Response

Example response:

<html>
  <head>
    <title>data source 123 Column 10</title>
  </head>
  <body>
    Column 10 is full of example data for documentation reasons.
  </body>
</html>

Reference Guides

External Catalog Introduction

Supported external catalogs

Architecture

Alation

Collibra

How Immuta gets metadata from Collibra

Limitations

Microsoft Purview catalog

How Immuta gets metadata from Microsoft Purview

Limitations

Custom REST catalog

Databricks Unity Catalog tag ingestion

Snowflake tag ingestion

External catalog behaviors

Resources

Custom REST Catalog Interface Introduction

Section Contents

Custom REST Catalog Interface Endpoints

Architectural Overview

General Concepts

Tags in Immuta

Descriptions in Immuta

Authentication

Endpoint Specification

GET /tags

Overview

Request

Response

POST /dataSource

Overview

Request

Response

GET /dataSource/page/{id}

Overview

Request

Response

GET /column/{id}

Overview

Request

Response

External Catalog Introduction

Supported external catalogs

Architecture

Alation

Collibra

How Immuta gets metadata from Collibra

Limitations

Microsoft Purview catalog

How Immuta gets metadata from Microsoft Purview

Limitations

Custom REST catalog

Databricks Unity Catalog tag ingestion

Snowflake tag ingestion

External catalog behaviors

Resources

Custom REST Catalog Interface Endpoints

Architectural Overview

General Concepts

Tags in Immuta

Descriptions in Immuta

Authentication

Endpoint Specification

GET /tags

Overview

Request

Response

POST /dataSource

Overview

Request

Response

GET /dataSource/page/{id}

Overview

Request

Response

GET /column/{id}

Overview

Request

Response

Custom REST Catalog Interface Introduction

GET `/tags`

POST `/dataSource`

GET `/dataSource/page/{id}`

GET `/column/{id}`

GET `/tags`

POST `/dataSource`

GET `/dataSource/page/{id}`

GET `/column/{id}`