Data Source Management
Audience: Data Owners
Content Summary: This page illustrates the concepts behind creating and managing data sources in Immuta. For a tutorial detailing how to create and manage data sources, navigate to the Data Owner Guide.
Data Sources in Immuta
A data source is how Data Owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. In this sense, a data source is a virtual representation of data that exists in a remote data storage technology.
When a data source is exposed, policies (written by Data Owners and Data Governors) are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and collaboration.
Once subscribed, Data Users interact with data through Immuta through different access patterns: Databricks, the Immuta Query Engine, HDFS, S3, Snowflake, and SparkSQL. Accessing data through Immuta ensures that users are only consuming policy-controlled data with thorough auditing.
Data Source User Roles
There are various roles users and groups can play relating to data sources:
- Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
- Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are also capable of managing the data source's documentation and Data Dictionary.
- Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
- Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it is inside Immuta, but they are able to verify if their data was successfully ingested or not.
Data Source Types
Data sources fall in one of two categories in Immuta: query-backed data sources (those that are backed by SQL technologies) and object-backed data sources (those that are not backed by SQL technologies).
query-backed data sources: These data sources are accessible to subscribed Data Users through the Immuta Query Engine and appear as though they are Postgres tables.
object-backed data sources: These data sources are backed by data storage technologies that do not support SQL and can range from blob stores, to file systems, to APIs.
Data attributes are information about the data within the data source. These attributes are then matched against policy logic to determine if a row or object should be visible to a specific user. This matching is usually done between the data attribute and the user attribute.
For example, in the policy
Only show rows where
Country='US' for everyone except when user is a member of group
the data attribute (
US in the
Country column) is matched against the user attribute
Finance group) to determine whether or not rows are visible to the user accessing the data. In this case only
users who are a member of the Finance group can see all rows in the data source.
These user attributes give users access to various Immuta features and drive data source policies.
Permissions control what actions a user can take in Immuta, both API and UI actions. Permissions can be
added and removed from user accounts by a System Administrator (an Immuta user with the
USER_ADMIN permission); however,
the permissions themselves are managed by Immuta, and the actions associated with the permissions cannot be altered.
Groups allow System Administrators to group sets of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Like attributes, groups can be used to restrict what data a set of users has access to.
Attributes are custom tags that are applied to users to restrict what data users can see. Attributes can be added manually or mapped in from LDAP or Active Directory.
The Data Dictionary provides information about the columns within the data source, including column names and value types. Users subscribed to the data source can post and reply to discussion threads by commenting on the Data Dictionary.
Dictionary columns are automatically generated when the data source is created if the remote storage technology supports SQL. Otherwise, Data Owners or Experts can create the entries for the Data Dictionary manually.
The Data Dictionary is also where tags are added, edited, and removed for use in Global Policies. See the Tags Section for more information on their uses and tutorials.
This feature monitors servers for schema and table changes, including when schemas and tables are added or removed, and notifies Data Owners when any changes are made.
With this feature enabled, Immuta detects when a new table has been added and automatically creates a new data source. Correspondingly, if a remote table is removed, that data source is disabled in the console.
Data Owners or Governors can select which users monitor schema changes. If more than one user is selected as a monitor, one data source is created for each of these users.
Table Evolution Detection
Table Evolution Detection is a feature enabled by Data Owners to monitor when columns are added or removed and when column types are changed.
When new columns are added to the remote table, Immuta automatically applies the
New tag to these columns in the data
source, and a seeded
New Column Added Global Policy masks them, since these new columns could contain sensitive data.
Data Owners can then review and approve these changes from the Requests tab of their profile page.
Approving column changes removes the
New tags from the data source.