LogoLogo
2025.1Book a demo
  • Immuta Documentation - 2025.1
  • Configuration
    • Deploy Immuta
      • Requirements
      • Install
        • Managed Public Cloud
        • Red Hat OpenShift
      • Upgrade
        • Migrating to the New Helm Chart
        • Upgrading IEHC
      • Guides
        • Ingress Configuration
        • TLS Configuration
        • Cosign Verification
        • Production Best Practices
        • Rotating Credentials
        • External Cache Configuration
        • Enabling Legacy Query Engine
        • Private Container Registries
        • Air-Gapped Environments
      • Disaster Recovery
      • Troubleshooting
      • Conventions
    • Connect Data Platforms
      • Data Platforms Overview
      • Amazon S3
      • AWS Lake Formation
        • Register an AWS Lake Formation Connection
        • AWS Lake Formation Reference Guide
      • Azure Synapse Analytics
        • Getting Started with Azure Synapse Analytics
        • Configure Azure Synapse Analytics Integration
        • Reference Guides
          • Azure Synapse Analytics Integration
          • Azure Synapse Analytics Pre-Configuration Details
      • Databricks
        • Databricks Spark
          • Getting Started with Databricks Spark
          • How-to Guides
            • Configure a Databricks Spark Integration
            • Manually Update Your Databricks Cluster
            • Install a Trusted Library
            • Project UDFs Cache Settings
            • Run R and Scala spark-submit Jobs on Databricks
            • DBFS Access
            • Troubleshooting
          • Reference Guides
            • Databricks Spark Integration Configuration
              • Installation and Compliance
              • Customizing the Integration
              • Setting Up Users
              • Spark Environment Variables
              • Ephemeral Overrides
            • Security and Compliance
            • Registering and Protecting Data
            • Accessing Data
              • Delta Lake API
        • Databricks Unity Catalog
          • Getting Started with Databricks Unity Catalog
          • How-to Guides
            • Register a Databricks Unity Catalog Connection
            • Configure a Databricks Unity Catalog Integration
            • Migrate to Unity Catalog
          • Databricks Unity Catalog Integration Reference Guide
      • Google BigQuery
      • Redshift
        • Getting Started with Redshift
        • How-to Guides
          • Configure Redshift Integration
          • Configure Redshift Spectrum
        • Reference Guides
          • Redshift Integration
          • Redshift Pre-Configuration Details
      • Snowflake
        • Getting Started with Snowflake
        • How-to Guides
          • Register a Snowflake Connection
          • Configure a Snowflake Integration
          • Snowflake Table Grants Migration
          • Edit or Remove Your Snowflake Integration
          • Integration Settings
            • Enable Snowflake Table Grants
            • Use Snowflake Data Sharing with Immuta
            • Configure Snowflake Lineage Tag Propagation
            • Enable Snowflake Low Row Access Policy Mode
              • Upgrade Snowflake Low Row Access Policy Mode
        • Reference Guides
          • Snowflake Integration
          • Snowflake Data Sharing
          • Snowflake Lineage Tag Propagation
          • Snowflake Low Row Access Policy Mode
          • Snowflake Table Grants
          • Warehouse Sizing Recommendations
        • Explanatory Guides
          • Phased Snowflake Onboarding
      • Starburst (Trino)
        • Getting Started with Starburst (Trino)
        • How-to Guides
          • Configure Starburst (Trino) Integration
          • Customize Read and Write Access Policies for Starburst (Trino)
        • Starburst (Trino) Integration Reference Guide
      • Queries Immuta Runs in Remote Platforms
      • Legacy Integrations
        • Securing Hive and Impala Without Sentry
        • Enabling ImmutaGroupsMapping
      • Connect Your Data
        • Connections
          • How-to Guides
            • Run Object Sync
            • Manage Connection Settings
            • Use the Connection Upgrade Manager
              • Troubleshooting
          • Reference Guides
            • Connections Reference Guide
            • Upgrading to Connections
              • Before You Begin
              • API Changes
              • FAQ
        • Data Sources
          • Data Sources in Immuta
          • Register Data Sources
            • Amazon S3 Data Source
            • Azure Synapse Analytics Data Source
            • Databricks Data Source
            • Google BigQuery Data Source
            • Redshift Data Source
            • Snowflake Data Source
              • Bulk Create Snowflake Data Sources
            • Starburst (Trino) Data Source
          • Data Source Settings
            • How-to Guides
              • Manage Data Sources and Data Source Settings
              • Manage Data Source Members
              • Manage Access Requests and Tasks
              • Manage Data Dictionary Descriptions
              • Disable Immuta from Sampling Raw Data
            • Data Source Health Checks Reference Guide
          • Schema Monitoring
            • How-to Guides
              • Run Schema Monitoring and Column Detection Jobs
              • Manage Schema Monitoring
            • Reference Guides
              • Schema Monitoring
              • Schema Projects
            • Why Use Schema Monitoring?
    • Manage Data Metadata
      • Connect External Catalogs
        • Getting Started with External Catalogs
        • Configure an External Catalog
        • Reference Guides
          • External Catalogs
          • Custom REST Catalogs
            • Custom REST Catalog Interface Endpoints
      • Data Identification
        • Introduction
        • Getting Started with Data Identification
        • How-to Guides
          • Use Identification
          • Manage Identifiers
          • Run and Manage Identification
          • Manage Identification Frameworks
          • Use Sensitive Data Discovery (SDD)
        • Reference Guides
          • How Competitive Criteria Analysis Works
          • Built-in Identifier Reference
            • Built-In Identifier Changelog
          • Built-in Discovered Tags Reference
      • Data Classification
        • How-to Guides
          • Activate Classification Frameworks
          • Adjust Identification and Classification Framework Tags
          • How to Use a Built-In Classification Framework with Your Own Tags
        • Classification Frameworks Reference Guide
      • Manage Tags
        • How-to Guides
          • Create and Manage Tags
          • Add Tags to Data Sources and Projects
        • Tags Reference Guide
    • Manage Users
      • Getting Started with Users
      • Identity Managers (IAMs)
        • How-to Guides
          • Okta LDAP Interface
          • OpenID Connect
            • OpenID Connect Protocol
            • Okta and OpenID Connect
            • OneLogin with OpenID Connect
          • SAML
            • SAML Protocol
            • Microsoft Entra ID
            • Okta SAML SCIM
        • Reference Guides
          • Identity Managers
          • SAML Single Logout
          • SAML Protocol Configuration Options
      • Immuta Users
        • How-to Guides
          • Managing Personas and Permissions
          • Manage Attributes and Groups
          • User Impersonation
          • External User ID Mapping
          • External User Info Endpoint
        • Reference Guides
          • Attributes and Groups in Immuta
          • Permissions and Personas
    • Organize Data into Domains
      • Getting Started with Domains
      • Domains Reference Guide
    • Application Settings
      • How-to Guides
        • App Settings
        • BI Tools
          • BI Tool Configuration Recommendations
          • Power BI Configuration Example
          • Tableau Configuration Example
        • Add a License Key
        • Add ODBC Drivers
        • Manage Encryption Keys
        • System Status Bundle
      • Reference Guides
        • Data Processing, Encryption, and Masking Practices
        • Metadata Ingestion
  • Governance
    • Introduction
      • Automate Data Access Control Decisions
        • The Two Paths: Orchestrated RBAC and ABAC
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
        • Test and Deploy Policy
      • Compliantly Open More Sensitive Data for ML and Analytics
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
    • Author Policies for Data Access Control
      • Introduction
        • Scalability and Evolvability
        • Understandability
        • Distributed Stewardship
        • Consistency
        • Availability of Data
      • Policies
        • Authoring Policies at Scale
        • Data Engineering with Limited Policy Downtime
        • Subscription Policies
          • How-to Guides
            • Author a Subscription Policy
            • Author an ABAC Subscription Policy
            • Subscription Policies Advanced DSL Guide
            • Author a Restricted Subscription Policy
            • Clone, Activate, or Stage a Global Policy
          • Reference Guides
            • Subscription Policies
            • Subscription Policy Access Types
            • Advanced Use of Special Functions
        • Data Policies
          • Overview
          • How-to Guides
            • Author a Masking Data Policy
            • Author a Minimization Policy
            • Author a Purpose-Based Restriction Policy
            • Author a Restricted Data Policy
            • Author a Row-Level Policy
            • Author a Time-Based Restriction Policy
            • Policy Certifications and Diffs
          • Reference Guides
            • Data Policy Types
            • Masking Policies
            • Row-Level Policies
            • Custom WHERE Clause Functions
            • Data Policy Conflicts and Fallback
            • Custom Data Policy Certifications
            • Orchestrated Masking Policies
      • Projects and Purpose-Based Access Control
        • Projects and Purpose Controls
          • Getting Started
          • How-to Guides
            • Create a Project
            • Create and Manage Purposes
            • Project Management
              • Manage Projects and Project Settings
              • Manage Project Data Sources
              • Manage Project Members
          • Reference Guides
            • Projects and Purposes
          • Why Use Purposes?
        • Equalized Access
          • Manage Project Equalization
          • Project Equalization Reference Guide
          • Why Use Project Equalization?
        • Masked Joins
          • Enable Masked Joins
          • Why Use Masked Joins?
        • Writing to Projects
          • How-to Guides
            • Create and Manage Snowflake Project Workspaces
            • Create and Manage Databricks Spark Project Workspaces
            • Write Data to the Workspace
          • Reference Guides
            • Project Workspaces
            • Project UDFs (Databricks)
    • Observe Access and Activity
      • Introduction
      • Audit
        • How-to Guides
          • Export Audit Logs to S3
          • Export Audit Logs to ADLS
          • Run Governance Reports
        • Reference Guides
          • Universal Audit Model (UAM)
            • UAM Schema
          • Query Audit Logs
            • Snowflake Query Audit Logs
            • Databricks Unity Catalog Query Audit Logs
            • Databricks Spark Query Audit Logs
            • Starburst (Trino) Query Audit Logs
          • Audit Export GraphQL Reference Guide
          • Governance Report Types
          • Unknown Users in Audit Logs
      • Dashboards
        • Use the Audit Dashboards How-To Guide
        • Audit Dashboards Reference Guide
      • Monitors
        • Manage Monitors and Observations
        • Monitors Reference Guide
    • Access Data
      • Subscribe to a Data Source
      • Query Data
        • Querying Snowflake Data
        • Querying Databricks Data
        • Querying Databricks SQL Data
        • Querying Starburst (Trino) Data
        • Querying Redshift Data
        • Querying Azure Synapse Analytics Data
        • Connect to a Database Tool to Run Ad Hoc Queries
      • Subscribe to Projects
  • Releases
    • Release Notes
      • Immuta v2025.1 Release Notes
        • User Interface Changes in v2025.1 LTS
      • Immuta LTS Changelog
      • Immuta Image Digests
      • Immuta CLI Release Notes
    • Immuta Release Lifecycle
    • Immuta Support Matrix Overview
    • Preview Features
      • Features in Preview
    • Deprecations and EOL
  • Developer Guides
    • The Immuta CLI
      • Install and Configure the Immuta CLI
      • Manage Your Immuta Tenant
      • Manage Data Sources
      • Manage Sensitive Data Discovery
        • Manage Sensitive Data Discovery Rules
        • Manage Identification Frameworks
        • Run Sensitive Data Discovery on Data Sources
      • Manage Policies
      • Manage Projects
      • Manage Purposes
      • Manage Audit
    • The Immuta API
      • Integrations API
        • Getting Started
        • How-to Guides
          • Configure an Amazon S3 Integration
          • Configure an Azure Synapse Analytics Integration
          • Configure a Databricks Unity Catalog Integration
          • Configure a Google BigQuery Integration
          • Configure a Redshift Integration
          • Configure a Snowflake Integration
          • Configure a Starburst (Trino) Integration
        • Reference Guides
          • Integrations API Endpoints
          • Integration Configuration Payload
          • Response Schema
          • HTTP Status Codes and Error Messages
      • Connections API
        • How-to Guides
          • Register a Connection
            • Register a Snowflake Connection
            • Register a Databricks Unity Catalog Connection
            • Register an AWS Lake Formation Connection
          • Manage a Connection
          • Deregister a Connection
        • Connection Registration Payloads Reference Guide
      • Immuta V2 API
        • Data Source Payload Attribute Details
        • Data Source Request Payload Examples
        • Create Policies API Examples
        • Create Projects API Examples
        • Create Purposes API Examples
      • Immuta V1 API
        • Authenticate with the API
        • Configure Your Instance of Immuta
          • Get Job Status
          • Manage Frameworks
          • Manage IAMs
          • Manage Licenses
          • Manage Notifications
          • Manage Tags
          • Manage Webhooks
          • Search Filters
          • Manage Identification
            • Identification Frameworks to Identifiers in Domains
            • Manage Sensitive Data Discovery (SDD)
        • Connect Your Data
          • Create and Manage an Amazon S3 Data Source
          • Create an Azure Synapse Analytics Data Source
          • Create an Azure Blob Storage Data Source
          • Create a Databricks Data Source
          • Create a Presto Data Source
          • Create a Redshift Data Source
          • Create a Snowflake Data Source
          • Create a Starburst (Trino) Data Source
          • Manage the Data Dictionary
        • Use Domains
        • Manage Data Access
          • Manage Access Requests
          • Manage Data and Subscription Policies
          • Manage Write Policies
            • Write Policies Payloads and Response Schema Reference Guide
          • Policy Handler Objects
          • Search Connection Strings
          • Search for Organizations
          • Search Schemas
        • Subscribe to and Manage Data Sources
        • Manage Projects and Purposes
          • Manage Projects
          • Manage Purposes
        • Generate Governance Reports
Powered by GitBook

Other versions

  • SaaS
  • 2025.1
  • 2024.3
  • 2024.2

Copyright © 2014-2024 Immuta Inc. All rights reserved.

On this page
  • Requirements
  • Enter connection information
  • Select virtual population
  • Enter basic information
  • Enable or disable schema monitoring
  • Opt to configure advanced settings
  • Column detection
  • Event time
  • Latency
  • Sensitive data discovery
  • Data source tags
  • Create the data source

Was this helpful?

Export as PDF
  1. Configuration
  2. Connect Data Platforms
  3. Connect Your Data
  4. Data Sources
  5. Register Data Sources

Databricks Data Source

Last updated 1 day ago

Was this helpful?

Deprecation notice

Support for registering Databricks Unity Catalog data sources using this legacy workflow has been deprecated. Instead, register your data using .

Requirements

Databricks Spark integration

When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

  • The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).

  • The user exposing the tables is listed in the immuta.spark.acl.allowlist configuration on the target cluster.

  • The user exposing the tables is a Databricks workspace administrator.

Databricks Unity Catalog integration

When registering Databricks Unity Catalog securables in Immuta, use and ensure it has the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

  • USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.

  • USE SCHEMA on all schemas containing securables registered as Immuta data sources.

  • MODIFY and SELECT on all securables you want registered as Immuta data sources.

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

Azure Databricks Unity Catalog limitation

Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the for details.

Enter connection information

Performance recommendations

  • Use a Databricks administrator account to register data sources with Immuta using the UI or API; however, you should not test Immuta policies using a Databricks administrator account, as they are able to bypass controls.

  1. Navigate to the Data Sources list page and click Register Data Source.

  2. Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

    • The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).

    • The user exposing the tables is listed in the `immuta.spark.acl.allowlist` configuration on the target cluster.

    • The user exposing the tables is a Databricks workspace administrator.

  3. Complete the first four fields in the Connection Information box:

    • Server: hostname or IP address

    • Port: port configured for Databricks, typically port 443

    • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted. Immuta recommends that all connections use SSL. Additional connection string arguments may also be provided below. Only Immuta uses the connection you provide and injects all policy controls when users query the system. Users always connect through Immuta with policies enforced and have no direct association with this connection.

    • Database: the remote database

  4. Select your authentication method from the dropdown:

    • Access Token:

      1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.

      2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

    • OAuth machine-to-machine (M2M):

      1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

      2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.

      3. Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  5. If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:

    UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789
  6. Click Test Connection.

Further considerations

  • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

  • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

  • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

    1. Opt to Edit in the table selection box that appears.

    2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

    3. After making your selection(s), click Apply.

Enter basic information

  1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

  2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

    2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  3. Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

    • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

    • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    • Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  4. Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Note: This step will only appear if all tables within a server have been selected for creation.

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

  • Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.

  1. Generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta admin or the user who owns the schema detection groups that are being targeted.

  2. On the data source creation page, click the checkbox to enable Schema Monitoring or Detect Column Changes.

  3. Click Download Schema Job Detection Template and then the Click Here To Download text.

  4. Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

Event time

An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

  1. Click the Edit button in the Event Time section.

  2. Select the column(s).

  3. Click Apply.

Selecting an Event Time column will enable

  • more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.

  • the creation of time-based restrictions in the policy builder.

Latency

  1. Click Edit in the Latency section.

  2. Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.

  3. Click Apply.

This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

  1. Click Edit in this section.

  2. Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

To add tags,

  1. Click the Edit button in the Data Source Tags section.

  2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Register entire databases with Immuta and run jobs through the Python script provided during data source registration.

Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the .

Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

Consider using to either run the schema monitoring job when your ETL process adds new tables or to add new tables.

Activate the to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

Before you can run the script, follow the to create the scope and secret using the Immuta API Key generated on your user profile page.

Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the , which has a simple mechanism for configuring proxies for a request.

See the page to learn more about column detection.

schema monitoring
service principal's application ID
OAuth 2.0 documentation
Immuta’s API
Databricks documentation
Python requests library
connections
new column added templated global policy
the service principal from the integration configuration
Schema projects overview
Azure Databricks Unity Catalog limitation