Create a Data Source
Last updated
Was this helpful?
Last updated
Was this helpful?
For a complete list of supported databases, see the .
CREATE_DATA_SOURCE
Immuta permission
Snowflake data source requirements:
USAGE
Snowflake privilege on the schema and database
REFERENCES
Snowflake privilege on the tables
Databricks Spark integration requirements: Ensure that at least one of the traits below is true.
The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the immuta.spark.acl.whitelist
configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.
Databricks Unity Catalog integration requirements: When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.
The following privileges on the parent catalogs and schemas of those tables:
SELECT
USE CATALOG
USE SCHEMA
USE SCHEMA
on system.information_schema
Snowflake imported databases
Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.
Navigate to the My Data Sources page.
Click the New Data Source button in the top right corner.
Select the data platform containing the data you wish to expose by clicking a tile.
Input the connection parameters to the database you're exposing. Click the tabs below for guidance for select data platforms.
Click the Test Connection button.
Decide how to virtually populate the data source by selecting Create sources for all tables in this database and monitor for changes or Schema/Table.
Complete the workflow for Create sources for all tables in this database and monitor for changes or Schema/Table selection, which are outlined on the tabs below:
Create sources for all tables in this database and monitor for changes
Selecting this option will create and keep in sync all data sources within this database. New schemas will be automatically detected and the corresponding data sources and schema projects will be created.
Provide information about your source to make it discoverable to users.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in the Immuta Query Engine. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>
The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
Note: This step will only appear if all tables within a server have been selected for creation.
Click Download Schema Job Detection Template.
Click the Click Here To Download text.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta
cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.
None of the following options are required. However, completing these steps will help maximize the utility of your data source.
Column Detection
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
See the for instructions.
See the for instructions.
Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the for details.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the .
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
When selecting the Schema/Table option you can opt to enable by selecting the checkbox in this section.
In most cases, Immuta’s schema detection job runs automatically from the Immuta web service. For Databricks, that automatic job is disabled because of the . In this case, Immuta requires users to download a schema detection job template (a Python script) and import that into their Databricks workspace.
Enable or Detect Column Changes on the Data Source creation page.
Before you can run the script, follow the to create the scope and secret using the Immuta API Key generated on your user profile page.
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the , which has a simple mechanism for configuring proxies for a request.
Opt to configure settings in the section (outlined below), and then click Create to save the data source(s).
See to learn more about Column Detection.
the creation of in the Policy Builder.
Tags can also be added after you create your data source from the page on the Overview tab or the Data Dictionary tab.