A data source is how data owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. A data source is a virtual representation of data that exists in a remote data storage technology.
When a data source is exposed, policies (written by data owners and data governors) are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and collaboration.
Best practices for connecting data
The best practices outlined below will also appear in callouts within relevant tutorials.
- The two-way SSL configuration is highly recommended as it is the most secure configuration for a custom blob store handler endpoint.
- Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
- It is recommended that path not be used in the resource restrictions. Additionally, single-bucket source data is
the only tested configuration. Athena databases with source data in multiple buckets may work, but would
require that additional resources be specified in the below policy anywhere
This section includes concept, reference, and how-to guides for creating policies. Some of these guides are provided below. See the left navigation for a complete list of resources.
- Connect data sources using dbt Cloud integration
- Create an Azure Synapse Analytics data source
- Create a Databricks data source
- Create a Redshift data source
- Create a Snowflake data source
- Create a Starburst data source
- Manage data sources
- Manage schema monitoring
- Run schema monitoring jobs