Data Source Management
Audience: Data Owners
Content Summary: This page illustrates the concepts behind creating and managing data sources in Immuta. For a tutorial detailing how to create and manage data sources, navigate to the Data Owner Guide.
Data Sources in Immuta
A data source is how Data Owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. In this sense, a data source is a virtual representation of data that exists in a remote data storage technology.
When a data source is exposed, policies (written by Data Owners and Data Governors) are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and collaboration.
Once subscribed, Data Users interact with data through Immuta through four different access patterns: HDFS, Spark, SQL, and the Immuta Virtual Filesystem. Accessing data through Immuta ensures that users are only consuming policy-controlled data with thorough auditing.
Data Source User Roles
There are various roles users and groups can play relating to data sources, including
- Subscribers: Those who have access to the data source data. With the appropriate data accesses and authorizations, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
- Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are also capable of managing the data source's documentation and Data Dictionary.
- Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
- Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it's inside Immuta, but they are able to verify if their data was successfully ingested or not.
Data Source Types
Data sources fall in one of two categories: those that are backed by SQL technologies (query-backed data sources) and those that are not (object-backed data sources).
query-backed data sources: These data sources are accessible to subscribed Data Users through the Immuta Query Engine and appear as though they are Postgres tables.
object-backed data sources: These data sources are backed by data storage technologies that do not support SQL and can range from blob stores, to filesystems, to APIs.
Data attributes are information about the data within the data source. These attributes are then matched against policy logic to determine if a row or object should be visible to a specific user. This matching is usually done between the data attribute and the user attribute.
For example, in the policy
Only show rows where
Country='US' for everyone except when user is a member of group
the data attribute (
US in the
Country column) is matched against the user attribute
Finance group) to determine whether or not rows will be visible to the user accessing the data. In this case only
users who are a member of the Finance group will see all rows in the data source.
These user attributes give users access to various Immuta features and drive data source policies.
Permissions control what actions a user can take in Immuta, both API and UI actions. Permissions can be
added and removed from user accounts by a System Administrator (an Immuta user with the
ADMIN permission); however,
the permissions themselves are managed by Immuta, and the actions associated with the permissions cannot be altered.
Groups allow System Administrators to group sets of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Like authorizations, groups can be used to restrict what data a set of users has access to.
Authorizations are custom tags that are applied to users to restrict what data users can see. Authorizations can be added manually or mapped in from LDAP or Active Directory.
The Data Dictionary provides information about the columns within the data source, including column names and value types. Users subscribed to the data source can post and reply to discussion threads by commenting on the Data Dictionary.
Dictionary columns are automatically generated when the data source is created if the remote storage technology supports SQL. Otherwise, Data Owners or Experts can create the entries for the Data Dictionary manually.