LogoLogo
2024.3
  • Immuta Documentation - 2024.3
  • What is Immuta?
  • Self-Managed Deployment
    • Requirements
    • Install
      • Managed Public Cloud
      • Red Hat OpenShift
    • Upgrade
      • Migrating to the New Helm Chart
      • Upgrading (IEHC)
      • Upgrading (IHC)
    • Guides
      • Ingress Configuration
      • TLS Configuration
      • Cosign Verification
      • Production Best Practices
      • Rotating Credentials
      • External Cache Configuration
      • Enabling Legacy Query Engine and Fingerprint
      • Private Container Registries
      • Air-Gapped Environments
    • Disaster Recovery
    • Troubleshooting
    • Conventions
  • Integrations
    • Immuta Integrations
    • Snowflake
      • Getting Started
      • How-to Guides
        • Configure a Snowflake Integration
        • Snowflake Table Grants Migration
        • Edit or Remove Your Snowflake Integration
        • Integration Settings
          • Enable Snowflake Table Grants
          • Use Snowflake Data Sharing with Immuta
          • Configure Snowflake Lineage Tag Propagation
          • Enable Snowflake Low Row Access Policy Mode
            • Upgrade Snowflake Low Row Access Policy Mode
      • Reference Guides
        • Snowflake Integration
        • Snowflake Data Sharing
        • Snowflake Lineage Tag Propagation
        • Snowflake Low Row Access Policy Mode
        • Snowflake Table Grants
        • Warehouse Sizing Recommendations
      • Phased Snowflake Onboarding Concept Guide
    • Databricks Unity Catalog
      • Getting Started
      • How-to Guides
        • Configure a Databricks Unity Catalog Integration
        • Migrate to Unity Catalog
      • Databricks Unity Catalog Integration Reference Guide
    • Databricks Spark
      • How-to Guides
        • Configuration
          • Simplified Databricks Spark Configuration
          • Manual Databricks Spark Configuration
          • Manually Update Your Databricks Cluster
          • Install a Trusted Library
        • DBFS Access
        • Limited Enforcement in Databricks Spark
        • Hide the Immuta Database in Databricks
        • Run spark-submit Jobs on Databricks
        • Configure Project UDFs Cache Settings
        • External Metastores
      • Reference Guides
        • Databricks Spark Integration
        • Databricks Spark Pre-Configuration Details
        • Configuration Settings
          • Databricks Spark Cluster Policies
            • Python & SQL
            • Python & SQL & R
            • Python & SQL & R with Library Support
            • Scala
            • Sparklyr
          • Environment Variables
          • Ephemeral Overrides
          • Py4j Security Error
          • Scala Cluster Security Details
          • Databricks Security Configuration for Performance
        • Databricks Change Data Feed
        • Databricks Libraries Introduction
        • Delta Lake API
        • Spark Direct File Reads
        • Databricks Metastore Magic
    • Starburst (Trino)
      • Getting Started
      • How-to Guides
        • Configure Starburst (Trino) Integration
        • Customize Read and Write Access Policies for Starburst (Trino)
      • Starburst (Trino) Integration Reference Guide
    • Redshift
      • Getting Started
      • How-to Guides
        • Configure Redshift Integration
        • Configure Redshift Spectrum
      • Reference Guides
        • Redshift Integration
        • Redshift Pre-Configuration Details
    • Azure Synapse Analytics
      • Getting Started
      • Configure Azure Synapse Analytics Integration
      • Reference Guides
        • Azure Synapse Analytics Integration
        • Azure Synapse Analytics Pre-Configuration Details
    • Amazon S3
    • Google BigQuery
    • Legacy Integrations
      • Securing Hive and Impala Without Sentry
      • Enabling ImmutaGroupsMapping
    • Catalogs
      • Getting Started with External Catalogs
      • Configure an External Catalog
      • Reference Guides
        • External Catalogs
        • Custom REST Catalogs
          • Custom REST Catalog Interface Endpoints
  • Data
    • Registering Metadata
      • Data Sources in Immuta
      • Register Data Sources
        • Create a Data Source
        • Create an Amazon S3 Data Source
        • Create a Google BigQuery Data Source
        • Bulk Create Snowflake Data Sources
      • Data Source Settings
        • How-to Guides
          • Manage Data Sources and Data Source Settings
          • Manage Data Source Members
          • Manage Access Requests and Tasks
          • Manage Data Dictionary Descriptions
          • Disable Immuta from Sampling Raw Data
        • Data Source Health Checks Reference Guide
      • Schema Monitoring
        • How-to Guides
          • Run Schema Monitoring and Column Detection Jobs
          • Manage Schema Monitoring
        • Reference Guides
          • Schema Monitoring
          • Schema Projects
        • Why Use Schema Monitoring?
    • Domains
      • Getting Started with Domains
      • Domains Reference Guide
    • Tags
      • How-to Guides
        • Create and Manage Tags
        • Add Tags to Data Sources and Projects
      • Tags Reference Guide
  • People
    • Getting Started
    • Identity Managers (IAMs)
      • How-to Guides
        • Okta LDAP Interface
        • OpenID Connect
          • OpenID Connect Protocol
          • Okta and OpenID Connect
          • OneLogin with OpenID
        • SAML
          • SAML Protocol
          • Microsoft Entra ID
          • Okta SAML SCIM
      • Reference Guides
        • Identity Managers
        • SAML Single Logout
        • SAML Protocol Configuration Options
    • Immuta Users
      • How-to Guides
        • Managing Personas and Permissions
        • Manage Attributes and Groups
        • User Impersonation
        • External User ID Mapping
        • External User Info Endpoint
      • Reference Guides
        • Attributes and Groups in Immuta
        • Permissions and Personas
  • Discover Your Data
    • Getting Started with Discover
    • Introduction
    • Data Discovery
      • How-to Guides
        • Enable Sensitive Data Discovery (SDD)
        • Manage Identification Frameworks
        • Manage Identifiers
        • Run and Manage SDD on Data Sources
        • Manage Sensitive Data Discovery Settings
        • Migrate From Legacy to Native SDD
      • Reference Guides
        • How Competitive Criteria Analysis Works
        • Built-in Identifier Reference
        • Built-in Discovered Tags Reference
    • Data Classification
      • How-to Guides
        • Activate Classification Frameworks
        • Adjust Identification and Classification Framework Tags
        • How to Use a Built-In Classification Framework with Your Own Tags
      • Built-in Classification Frameworks Reference Guide
  • Detect Your Activity
    • Getting Started with Detect
      • Monitor and Secure Sensitive Data Platform Query Activity
        • User Identity Best Practices
        • Integration Architecture
        • Snowflake Roles Best Practices
        • Register Data Sources
        • Automate Entity and Sensitivity Discovery
        • Detect with Discover: Onboarding Guide
        • Using Immuta Detect
      • General Immuta Configuration
        • User Identity Best Practices
        • Integration Architecture
        • Databricks Roles Best Practices
        • Register Data Sources
    • Introduction
    • Audit
      • How-to Guides
        • Export Audit Logs to S3
        • Export Audit Logs to ADLS
        • Run Governance Reports
      • Reference Guides
        • Universal Audit Model (UAM)
          • UAM Schema
        • Query Audit Logs
          • Snowflake Query Audit Logs
          • Databricks Unity Catalog Query Audit Logs
          • Databricks Spark Query Audit Logs
          • Starburst (Trino) Query Audit Logs
        • Audit Export GraphQL Reference Guide
        • Governance Report Types
        • Unknown Users in Audit Logs
      • Deprecated Audit Guides
        • Legacy to UAM Migration
        • Download Audit Logs
        • System Audit Logs
    • Dashboards
      • Use the Detect Dashboards How-To Guide
      • Detect Dashboards Reference Guide
    • Monitors
      • Manage Monitors and Observations
      • Detect Monitors Reference Guide
  • Secure Your Data
    • Getting Started with Secure
      • Automate Data Access Control Decisions
        • The Two Paths: Orchestrated RBAC and ABAC
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
        • Test and Deploy Policy
      • Compliantly Open More Sensitive Data for ML and Analytics
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
      • Federated Governance for Data Mesh and Self-Serve Data Access
        • Defining Domains
        • Managing Data Products
        • Managing Data Metadata
        • Apply Federated Governance
        • Discover and Subscribe to Data Products
    • Introduction
      • Scalability and Evolvability
      • Understandability
      • Distributed Stewardship
      • Consistency
      • Availability of Data
    • Authoring Policies in Secure
      • Authoring Policies at Scale
      • Data Engineering with Limited Policy Downtime
      • Subscription Policies
        • How-to Guides
          • Author a Subscription Policy
          • Author an ABAC Subscription Policy
          • Subscription Policies Advanced DSL Guide
          • Author a Restricted Subscription Policy
          • Clone, Activate, or Stage a Global Policy
        • Reference Guides
          • Subscription Policies
          • Subscription Policy Access Types
          • Advanced Use of Special Functions
      • Data Policies
        • Overview
        • How-to Guides
          • Author a Masking Data Policy
          • Author a Minimization Policy
          • Author a Purpose-Based Restriction Policy
          • Author a Restricted Data Policy
          • Author a Row-Level Policy
          • Author a Time-Based Restriction Policy
          • Certifications Exemptions and Diffs
          • External Masking Interface
        • Reference Guides
          • Data Policy Types
          • Masking Policies
          • Row-Level Policies
          • Custom WHERE Clause Functions
          • Data Policy Conflicts and Fallback
          • Custom Data Policy Certifications
          • Orchestrated Masking Policies
    • Projects and Purpose-Based Access Control
      • Projects and Purpose Controls
        • Getting Started
        • How-to Guides
          • Create a Project
          • Create and Manage Purposes
          • Adjust a Policy
          • Project Management
            • Manage Projects and Project Settings
            • Manage Project Data Sources
            • Manage Project Members
        • Reference Guides
          • Projects and Purposes
          • Policy Adjustments
        • Why Use Purposes?
      • Equalized Access
        • Manage Project Equalization
        • Project Equalization Reference Guide
        • Why Use Project Equalization?
      • Masked Joins
        • Enable Masked Joins
        • Why Use Masked Joins?
      • Writing to Projects
        • How-to Guides
          • Create and Manage Snowflake Project Workspaces
          • Create and Manage Databricks Spark Project Workspaces
          • Write Data to the Workspace
        • Reference Guides
          • Project Workspaces
          • Project UDFs (Databricks)
    • Data Consumers
      • Subscribe to a Data Source
      • Query Data
        • Querying Snowflake Data
        • Querying Databricks Data
        • Querying Databricks SQL Data
        • Querying Starburst (Trino) Data
        • Querying Redshift Data
        • Querying Azure Synapse Analytics Data
      • Subscribe to Projects
  • Application Settings
    • How-to Guides
      • App Settings
      • BI Tools
        • BI Tool Configuration Recommendations
        • Power BI Configuration Example
        • Tableau Configuration Example
      • Add a License Key
      • Add ODBC Drivers
      • Manage Encryption Keys
      • System Status Bundle
    • Reference Guides
      • Data Processing, Encryption, and Masking Practices
      • Metadata Ingestion
  • Releases
    • Immuta v2024.3 Release Notes
    • Immuta Release Lifecycle
    • Immuta LTS Changelog
    • Immuta Support Matrix Overview
    • Immuta CLI Release Notes
    • Immuta Image Digests
    • Preview Features
      • Features in Preview
    • Deprecations
  • Developer Guides
    • The Immuta CLI
      • Install and Configure the Immuta CLI
      • Manage Your Immuta Tenant
      • Manage Data Sources
      • Manage Sensitive Data Discovery
        • Manage Sensitive Data Discovery Rules
        • Manage Identification Frameworks
        • Run Sensitive Data Discovery on Data Sources
      • Manage Policies
      • Manage Projects
      • Manage Purposes
      • Manage Audit
    • The Immuta API
      • Integrations API
        • Getting Started
        • How-to Guides
          • Configure an Amazon S3 Integration
          • Configure an Azure Synapse Analytics Integration
          • Configure a Databricks Unity Catalog Integration
          • Configure a Google BigQuery Integration
          • Configure a Redshift Integration
          • Configure a Snowflake Integration
          • Configure a Starburst (Trino) Integration
        • Reference Guides
          • Integrations API Endpoints
          • Integration Configuration Payload
          • Response Schema
          • HTTP Status Codes and Error Messages
      • Immuta V2 API
        • Data Source Payload Attribute Details
        • Data Source Request Payload Examples
        • Create Policies API Examples
        • Create Projects API Examples
        • Create Purposes API Examples
      • Immuta V1 API
        • Authenticate with the API
        • Configure Your Instance of Immuta
          • Get Fingerprint Status
          • Get Job Status
          • Manage Frameworks
          • Manage IAMs
          • Manage Licenses
          • Manage Notifications
          • Manage Sensitive Data Discovery (SDD)
          • Manage Tags
          • Manage Webhooks
          • Search Filters
        • Connect Your Data
          • Create and Manage an Amazon S3 Data Source
          • Create an Azure Synapse Analytics Data Source
          • Create an Azure Blob Storage Data Source
          • Create a Databricks Data Source
          • Create a Presto Data Source
          • Create a Redshift Data Source
          • Create a Snowflake Data Source
          • Create a Starburst (Trino) Data Source
          • Manage the Data Dictionary
        • Manage Data Access
          • Manage Access Requests
          • Manage Data and Subscription Policies
          • Manage Domains
          • Manage Write Policies
            • Write Policies Payloads and Response Schema Reference Guide
          • Policy Handler Objects
          • Search Audit Logs
          • Search Connection Strings
          • Search for Organizations
          • Search Schemas
        • Subscribe to and Manage Data Sources
        • Manage Projects and Purposes
          • Manage Projects
          • Manage Purposes
        • Generate Governance Reports
Powered by GitBook

Other versions

  • SaaS
  • 2024.3
  • 2024.2

Copyright © 2014-2024 Immuta Inc. All rights reserved.

On this page
  • Requirements
  • Enter connection information
  • Select virtual population
  • Enter basic information
  • Enable or disable schema monitoring
  • Create a schema detection job in Databricks
  • Create the data source
  • Advanced options

Was this helpful?

Export as PDF
  1. Data
  2. Registering Metadata
  3. Register Data Sources

Create a Data Source

Last updated 20 days ago

Was this helpful?

For a complete list of supported databases, see the .

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Redshift data sources

  • Redshift Spectrum data sources must be registered using .

  • Registering Redshift datashares as Immuta data sources is unsupported.

Requirements

  • CREATE_DATA_SOURCE Immuta permission

  • The Snowflake user registering data sources must have the following privileges on all securables:

    • USAGE on all databases and schemas with registered data sources.

    • REFERENCES on all tables and views registered in Immuta.

    • .

Snowflake imported databases

Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.

  • Databricks Spark integration requirements: Ensure that at least one of the traits below is true.

    • The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).

    • The user exposing the tables is listed in the immuta.spark.acl.whitelist configuration on the target cluster.

    • The user exposing the tables is a Databricks workspace administrator.

  • Databricks Unity Catalog integration requirements: When registering Databricks Unity Catalog securables in Immuta, use and ensure it has the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

    • USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.

    • USE SCHEMA on all schemas containing securables registered as Immuta data sources.

    • MODIFY and SELECT on all securables you want registered as Immuta data sources.

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

Enter connection information

Best Practice: Connections Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

  1. Navigate to the My Data Sources page.

  2. Click the New Data Source button in the top right corner.

  3. Select the data platform containing the data you wish to expose by clicking a tile.

  4. Input the connection parameters to the database you're exposing. Click the tabs below for guidance for select data platforms.

Required Google BigQuery roles for creating data sources

Ensure that the user creating the Google BigQuery data source has these roles:

  • roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset

  • roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset

  • roles/bigquery.jobUser on the project

Azure Databricks Unity Catalog limitation

  1. Complete the first four fields in the Connection Information box:

    • Server: hostname or IP address

    • Port: port configured for Databricks, typically port 443

    • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted

    • Database: the remote database

  2. Select your authentication method from the dropdown:

    • Access Token:

      1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.

      2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

    • OAuth machine-to-machine (M2M):

      1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

      2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token

      3. Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  3. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

  4. If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:

    UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789
  1. Click the Test Connection button.

Further Considerations

  • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.

  • Some data platforms require different connection information than pictured in this section. Refer to the tool-tips in the Immuta UI for this step if you need additional guidance.

  • If you are creating an Impala data source against a Kerberized instance of Impala, the username field locks down to your Immuta username unless you possess the IMPERSONATE_HDFS_USER permission.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.

Select virtual population

  1. Decide how to virtually populate the data source by selecting Create sources for all tables in this database and monitor for changes or Schema/Table.

  2. Complete the workflow for Create sources for all tables in this database and monitor for changes or Schema/Table selection, which are outlined on the tabs below:

Create sources for all tables in this database and monitor for changes

Selecting this option will create and keep in sync all data sources within this database. New schemas will be automatically detected and the corresponding data sources and schema projects will be created.

Schema/Table

Selecting this option will create and keep in sync all tables within the schema(s) selected. No new schemas will be detected.

  1. If you choose Schema/Table, click Edit in the table selection box that appears.

  2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

  3. After making your selection(s), click Apply.

Enter basic information

Provide information about your source to make it discoverable to users.

  1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in the Immuta Query Engine. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

  2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

    2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  3. Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

<Tablename>

The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

<Schema><Tablename>

The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

Custom

Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  1. Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Note: This step will only appear if all tables within a server have been selected for creation.

Create a schema detection job in Databricks

Generate Your Immuta API Key

Before you can run the script referenced in this tutorial, generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta Admin or the user who owns the schema detection groups that are being targeted.

  1. Click Download Schema Job Detection Template.

  2. Click the Click Here To Download text.

  3. Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Create the data source

Advanced options

None of the following options are required. However, completing these steps will help maximize the utility of your data source.

Column Detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

Event Time

An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

  1. Click the Edit button in the Event Time section.

  2. Select the column(s).

  3. Click Apply.

Selecting an Event Time column will enable

  • more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.

Latency

  1. Click Edit in the Latency section.

  2. Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.

  3. Click Apply.

This setting impacts the following behaviors:

  • How long Immuta waits to refresh data that is in cache by querying the remote data source. For example, if you only load data once a day in the remote source, this setting should be greater than 24 hours. If data is constantly loaded in the remote source, you need to decide how much data latency is tolerable vs how much load you want on your data source; however this is only relevant to Immuta S3, since SQL will always interactively query the remote database.

  • How often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

Sensitive Data Discovery

Data Owners can disable Sensitive Data Discovery for their data sources in this section.

  1. Click Edit in this section.

  2. Select Enabled or Disabled in the window that appears, and then click Apply.

Data Source Tags

Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

To add tags,

  1. Click the Edit button in the Data Source Tags section.

  2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

See the for instructions.

See the for instructions.

Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the for details.

Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the .

Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

When selecting the Schema/Table option you can opt to enable by selecting the checkbox in this section.

In most cases, Immuta’s schema detection job runs automatically from the Immuta web service. For Databricks, that automatic job is disabled because of the . In this case, Immuta requires users to download a schema detection job template (a Python script) and import that into their Databricks workspace.

Enable or Detect Column Changes on the Data Source creation page.

Before you can run the script, follow the to create the scope and secret using the Immuta API Key generated on your user profile page.

Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the , which has a simple mechanism for configuring proxies for a request.

Opt to configure settings in the section (outlined below), and then click Create to save the data source(s).

See to learn more about Column Detection.

the creation of in the Policy Builder.

Tags can also be added after you create your data source from the page on the Overview tab or the Data Dictionary tab.

Create an Amazon S3 data source guide
Create a Google BigQuery data source guide
service principal's application ID
OAuth 2.0 documentation
Schema Monitoring
ephemeral nature of Databricks clusters
Databricks documentation
Python requests library
time-based restrictions
Schema Monitoring
Advanced Options
via the Immuta CLI or V2 API
Data Source details
Azure Databricks Unity Catalog limitation
Schema Projects Overview
Immuta Support Matrix
this payload
the service principal from the integration configuration