Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Connect an external catalog to use tagging capabilities outside of Immuta and pull tags from external table schemas. Once the catalog has been connected, Immuta ingests a data dictionary from the catalog and applies data source and column tags directly to the data source. These tags can then be used to create policies.
This getting started guide outlines how to use external catalogs in Immuta to gain value from all three Immuta modules: Discover and Secure.
Configure an external catalog: Configure Alation, Collibra, or a custom REST catalog to ingest tags into Immuta.
External catalog integrations: This reference guide describes the requirements of the external catalogs Immuta supports.
Custom REST catalog introduction: This reference guide describes the custom catalog option for users to make API calls to retrieve metadata on their data.
Custom REST catalog interface endpoints: This reference guide describes the endpoints for configuring a custom REST catalog.
The how-to guides linked on this page illustrate how to link an external catalog with Immuta to ingest tags and add value to the Immuta modules: Secure and Discover.
Best practice: Use a single catalog; having more than one can lead to multiple truths and data leaks.
Requirement: A catalog with tags that correspond to your Immuta data sources
When changes are made to the external catalog, refresh external tags.
Requirements:
A physical data dictionary with assets that correspond to your Immuta data sources
The Collibra global role Catalog
or Catalog Author
When changes are made to the external catalog, refresh external tags.
Requirements:
A catalog with tags that correspond to your Immuta data sources
The ability to create a registered app in the Azure portal
When changes are made to the external catalog, refresh external tags.
Requirements:
A catalog with tags that correspond to your Immuta data sources
When changes are made to the external catalog, refresh external tags.
Requirements:
Fewer than 2,500 Databricks Unity Catalog data sources registered in Immuta
Databricks privileges listed on the Configure a Databricks Unity Catalog integration page
Once you register data sources, table and column tags from Databricks Unity Catalog will be ingested and applied to the corresponding data sources in Immuta.
Requirements:
A Snowflake user who can set the following permissions:
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake
GRANT APPLY TAG ON ACCOUNT
Snowflake Enterprise Edition or higher
Configure Snowflake tag ingestion in Immuta.
When changes are made to the tags in Snowflake, refresh external tags
This page outlines how to connect an external catalog on the Immuta app settings page. For details on prerequisites and external catalogs with Immuta, see the External catalog pre-configuration checklist.
To change the default expiration period for your Alation catalog's API tokens, see configure the expiration period for Alation API tokens.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Alation from the dropdown menu.
Complete the URL and API key fields. In order to generate an API access token for your Alation instance, follow the Alation documentation.
Configure whether or not Alation tags and custom fields are imported as Immuta tags:
Link Alation tags: When selected, Immuta imports Alation tags as Immuta tags.
Link Alation Custom Fields: When selected, Immuta imports Alation custom fields as Immuta tags.
Opt to select Upload Certificates.
Upload the Certificate Authority, Certificate File, and Key File.
Opt to enable Strict SSL by selecting the checkbox.
Click the Test Connection button.
Once the connection is successful, click Save.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter the Display Name and select Collibra from the dropdown menu.
Enter the HTTP endpoint of the catalog in the URL field.
Complete the Username and Password fields. Note: This is the username and the password that Immuta can use to connect to the external catalog.
Opt to Require the data source name in Collibra to contain both the schema and table name by selecting the checkbox.
Complete the Asset Mappings modal to set which asset types in Collibra should align to Immuta's data sources and columns.
Complete the Attributes as Tags modal to specify which Collibra attributes you would like to pull in as tags in Immuta.
Opt to select Upload Certificates.
Upload the Certificate Authority, Certificate File, and Key File.
Opt to enable Strict SSL by selecting the checkbox.
Click the Test Connection button.
Once the connection is successful, click Save.
Private preview
The Microsoft Purview catalog integration is only available to select accounts. Contact your Immuta representative to enable this feature.
Register an app in the Azure portal with the with the following settings:
Supported account type: "Accounts in this organizational directory only"
Microsoft-Graph: User.Read
API permission
A client secret
Using that registered app, navigate to Immuta and complete the following:
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter the Display Name and select Microsoft Purview from the dropdown menu.
Complete the following fields:
Enter the Microsoft Purview endpoint URL including the Azure Account Name, like https://<ACCOUNTNAME>.purview.azure.com, in the Purview Endpoint URL field.
Complete the Microsoft Entra Directory (tenant) ID and Microsoft Entra (client) ID fields.
Enter the Microsoft Entra Application Client Secret ID for Immuta to authenticate and connect to the Purview API. The secret cannot be expired.
Click the Test Connection button.
Once the test is successful, click Save.
Integrating a custom REST catalog service with Immuta requires implementing a REST interface. For details about the necessary endpoints that must be serviced, see the Custom REST catalog interface endpoints page.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter the Display Name and select Rest from the dropdown menu.
Select the Internal Plugin checkbox if the catalog has been uploaded to Immuta as a custom server plugin.
Complete the following fields:
Enter the HTTP endpoint of the catalog in the URL field.
Complete the Username and Password fields.
Enter the path of the Tags Endpoint.
Enter the path of the Data Source Endpoint.
Enter the path to the information page for a data source in the Data Source Link Template field.
Opt to enter the path to the information page for a column in the Column Link Template field.
Opt to upload a Catalog Image.
Opt to select Upload Certificates.
Upload the Certificate Authority, Certificate File, and Key File.
Opt to enable Strict SSL by selecting the checkbox.
Click the Test Connection button.
Click the Test Data Source Link.
Once both tests are successful, click Save.
See the Configure a Snowflake integration page for guidance on configuring tag ingestion.
If Snowflake data sources existed before configuring tag ingestion, Immuta will automatically sync those data sources to the catalog and apply tags to them. Immuta will automatically check the external catalog for changes and sync data sources to the catalog every 24 hours.
See the Configure a Databricks Unity Catalog integration page for guidance on configuring tag ingestion.
If Databricks Unity Catalog data sources existed before configuring tag ingestion, Immuta will automatically sync those data sources to the catalog and apply tags to them. Immuta will automatically check the external catalog for changes and sync data sources to the catalog every 24 hours.
You can manually link and remove external catalogs from data sources on the data source overview tab.
Navigate to your data source.
In the connection information section, click the Link Catalog icon (or Unlink Catalog to remove an external catalog from a data source).
Select your external catalog from the dropdown menu.
Click Link to confirm.
Navigate to your data source and click the data source Health dropdown menu.
Click Re-run in the External Catalog section.
Users who want to use tagging capabilities outside of Immuta and pull tags from external table schemas can connect Alation, Collibra, or Microsoft Purview as an external catalog. If users have an unsupported catalog, or have customized their integration, they can connect through the REST Catalog using the Immuta API. Users can also connect to and ingest tags from Snowflake and Databricks Unity Catalog onto Snowflake and Databricks Unity Catalog data sources.
Once they have been connected, Immuta will ingest a data dictionary from the catalog that will apply data source and column tags directly onto data sources. These tags can then be used to drive governance policies or classification frameworks. Using existing metadata from external catalogs can allow users to scale policy creation quickly.
Immuta supports the following external catalogs:
Alation
Collibra
Private preview
The Microsoft Purview catalog integration is only available to select accounts. Contact your Immuta representative to enable this feature.
The Microsoft Purview catalog integration with Immuta currently supports tag ingestion for Databricks Unity Catalog and Azure Synapse Analytics data sources.
In addition to tags, the following Purview objects will also be pulled in and applied to data sources as either column or data source tags in Immuta:
System classifications
Custom classifications
Managed attributes
Managed attributes are supported, but have the following limitations:
If a managed attribute is applied to an Immuta data source but later expires, it will still appear as a tag on the data source. Expired attributes must be removed from the object in Purview for the tag to be removed from the Immuta data source.
The following managed attribute data types are not supported and will not be applied to Immuta data sources as tags:
Dates
Number types
Rich text
If users have an unsupported catalog, or have customized their catalog integration, they can connect through the REST Catalog using the Immuta API.
For more details about using a custom REST catalog with Immuta, see the Custom REST Catalog Interface Introduction.
Design partner preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Users can connect their Databricks Unity Catalog account to allow Immuta to ingest Databricks tags and apply them to Databricks data sources. To learn more about Databricks Unity Catalog tag ingestion, see the Databricks Unity Catalog reference guide.
Users can connect a Snowflake account to allow Immuta to ingest Snowflake tags onto Snowflake data sources. To learn more about Snowflake tag ingestion, see the Snowflake reference guide.
Tags ingested from external catalogs cannot be edited within Immuta. To edit, delete, or add a tag from an external catalog to a data source or column, make the change in the external catalog.
You can configure multiple external catalogs within a single tenant of Immuta, but only one external catalog can be linked to a data source.
To configure an external catalog, see the Configuration how-to guide.
To learn more about how Immuta can automatically tag your data with Discover, see the Discover introduction.
The custom REST catalog integration allows Immuta to make a defined set of API calls to a Custom REST service you develop to retrieve metadata. The Custom REST service receives Immuta's calls, and then collects the relevant information and delivers it back to Immuta.
The diagram below highlights the main feature of Immuta's Custom REST Catalog integration.
Through a Custom REST Catalog, you can build and maintain your own solutions that provide metadata required to effectively use Immuta within your organization.
API Interface Specification Documentation: This page details the endpoints and data schemas of the API and contains example requests and responses.
The diagram below contrasts Immuta's provided catalog integration architecture with this Customer REST Catalog interface - which gives the customer tremendous control over the metadata being provided to Immuta.
The custom-developed service must be built to receive and handle calls to the REST endpoints specified below. Immuta will call these endpoints as detailed below when certain events occur and at various intervals. The required responses to complete the connection are also detailed.
Tags are attributes applied to data - either at the top, data source, level or at the individual column level.
Tags in Immuta take the form of a nested tree structure. There are "parents", "children", "grand-children", etc.:
The REST Catalog interface interprets a tag's relationship mapping from a string based on a standard "dot" (.
) notation, like:
Tags returned must meet the following constraints:
They must be no longer than 500 characters. Longer tags will not throw an error but will be truncated silently at 500 characters.
They must be composed of letters, digits, underscores, dashes, and whitespace characters. A period (.
) is used as a separator as described above. Other special characters are not supported.
A tag object has a single id
property, which is used to uniquely identify the tag within the catalog. This id
may be of either a string or integer type, and its value is completely up to the designer of the REST Catalog service. Common examples include: a standard integer value, a UUID, or perhaps a hash of the tag's string value (if it is unique within the system).
For this Customer REST Catalog interface, tags are represented in JSON like:
For example, the object below specifies 3 different tags:
For more information on tags and how they are created, managed, and displayed within Immuta, see our tag documentation.
Descriptions are strings that, like tags, can be applied to either a data source or an individual column. These strings support UTF-8, including special and various language characters.
Immuta can make requests to your REST Catalog service using any of the following authentication methods:
Username and password: Immuta can send requests with a username and a password in the Authorization HTTP header. In this case, the custom REST service will need to be able to parse a Basic Authorization Header and validate the credentials sent with it.
PKI Certificate: Immuta can also send requests using a CA certificate, a certificate, and a key.
NO Authentication: Immuta can make unauthenticated requests to your REST Catalog service. However, this should only be used if you have other security measures in place (e.g., if the service is in an isolated network that's reachable only by your Immuta environment).
Authentication and specific endpoints
When accessing the /dataSource
and /tags
endpoints, Immuta will use the configured username and password. If you choose to also protect the human-readable pages with authentication, users will be prompted to authenticate when they first visit those pages.
/tags
The /tags
endpoint is used to collect ALL the tags the catalog can provide. It is used by Immuta to populate Immuta's tags list in the Governance section. These tags can then be used for policy creation ahead of actual data sources being created that make use of them. This enables policies to immediately apply when data sources are registered with Immuta.
As with all external catalogs, tags ingested by Immuta from the REST catalog interface are not able to be modified locally within Immuta as this catalog becomes the "source of truth" for them. This results in the tags showing in Immuta with either a lock icon next to them, or without the delete button that would allow a user to manually remove them from an assigned data source or column.
The /tags
endpoint receives a simple GET request from Immuta. No payload nor query parameters are required.
Example request:
The Custom REST service must respond with an object that maps all tag name strings to associated id
s. The tag name string fully-qualifies the location of the tag in the tree structure as detailed previously, and the id
is a globally unique identifier assigned by the REST catalog to that tag.
Example response:
/dataSource
The /dataSource
endpoint does the vast majority of the work. It receives a POST
request from Immuta, and returns the mapping of a data source and its columns to the applied tags and descriptions.
Immuta will try to fetch metadata for a data source in the system at various times:
During data source creation. During data source creation, Immuta will send metadata to the REST Catalog service, most notably the connection details of the data source, which includes the schema and table name. It is important that the Custom REST service implemented can parse this information and search its records for an appropriate record to return with an ID unique to this data source in its catalogMetadata
object.
When a user manually links the data source. Data sources that either fail to auto-link, or that were created prior to the Custom REST catalog being configured, can still be manually linked. To do so, a data source owner can provide the ID of the asset as defined by the Custom REST Catalog via the Immuta UI. In order for this to work, the Custom REST Catalog service must support matching data source assets by unique ID.
During various refreshes. Once linked, Immuta will periodically call the /dataSource
endpoint to ensure information is up to date.
Immuta's POST requests to the /dataSource
endpoint will consist of a payload containing many of the elements outlined below:
catalogMetadata
dictionary
Object holding the data source's catalog metadata.
catalogMetadata.id
string or integer
The unique identifier of the data source in the catalog.
catalogMetadata.name
string
The name of the data source in the catalog.
handlerInfo
dictionary
Object holding the data source's connection details.
handlerInfo.schema
string
The data source’s schema name in the source system.
handlerInfo.table
string
The data source’s table name in the source system.
handlerInfo.hostname
string
The data source’s connection schema in the source storage system.
handlerInfo.port
integer
The data source’s connection port in the source storage system.
handlerInfo.query
string
The data source’s connection schema in the source storage system, if applicable.
dataSource
dictionary
Object holding general data source information from Immuta. This can be viewed with debugging, but is not usually required for catalog purposes.
This object must be parsed by the in Custom REST Catalog order to determine the specific data source metadata being requested.
For the most part, Immuta will provide the id
of the data source as part of the catalogMetadata
. This should be used as the primary metadata lookup value.
When a data source is being created, such an id
will not yet be known to Immuta. Immuta will instead send handlerInfo
information as part of the request.
When an id
is not specified, the schema
and table
name elements should be parsed in an attempt to identify the desired catalog entry and provide an appropriate id
. If such a lookup is successful and an id
is returned to Immuta in the catalogMetadata
section, Immuta will establish an automatic link between the the new data source and the catalog entry, and future references will use that id
.
The schema for the /dataSource
response uses the same tag object structure from the /tags
response, along with the following set of metadata keys for both data sources and columns.
catalogMetadata
dictionary
Object holding the data source's catalog metadata.
catalogMetadata.id
string or integer
The unique identifier of the data source in the catalog.
catalogMetadata.name
string
The name of the data source in the catalog.
description
string
A description of the data source.
tags
<tags object>
Object containing the data source-level tags.
dictionary
dictionary
Object containing the column names of the data source as its keys.
dictionary.<column>
dictionary
Object containing a single column's metadata.
dictionary.<column>.catalogMetadata.id
string or integer
The unique identifier of the column in the catalog.
dictionary.<column>.description
string
A description of the column.
dictionary.<column>.tags
<tags object>
Object containing the column-level tags as keys.
Example response:
/dataSource/page/{id}
This endpoint returns a human-readable information page from the REST catalog for the data source associated with {id}
. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the data source that may not be directly ingested by or visible within Immuta. This link is accessed in the Immuta UI when a user clicks the catalog logo associated with the data source.
Immuta will send a GET request to the /dataSource/page/{id}
endpoint, where {id}
will be:
id
URL Parameter, integer or string
The unique identifier of the data source in the remote catalog system.
Example request:
The Custom REST Catalog can either provide such a page directly, or can redirect the user to any resource where the appropriate page would be provided - for example a backing full service catalog such as Collibra, if this Custom REST catalog is simply being used to support a custom data model.
Example response:
/column/{id}
This endpoint returns the catalog's human-readable information page for the column associated with {id}
. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the specific column that may not be directly ingested by or visible within Immuta.
Immuta will send a GET request to the /column/{id}
endpoint, where {id}
will be:
id
URL Parameter, integer or string
The unique identifier of the column in the remote catalog system.
Example request:
The Custom REST Catalog can either provide such a page directly, or can redirect the user to any resource where the appropriate page would be provided - for example a backing full service catalog such as Collibra, if this Custom REST catalog is simply being used to support a custom data model.
Example response: