Skip to content

Data Source Management

Audience: Data Owners

Content Summary: This page illustrates the concepts behind creating and managing data sources in Immuta. For a tutorial detailing how to create and manage data sources, navigate to the Data Owner Guide.

Data Sources in Immuta

A data source is how Data Owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. In this sense, a data source is a virtual representation of data that exists in a remote data storage technology.

When a data source is exposed, policies (written by Data Owners and Data Governors) are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and collaboration.

Once subscribed, Data Users interact with data through Immuta through four different access patterns: HDFS, Spark, SQL, and the Immuta Virtual Filesystem. Accessing data through Immuta ensures that users are only consuming policy-controlled data with thorough auditing.

Data Source User Roles

There are various roles users and groups can play relating to data sources, including

  • Subscribers: Those who have access to the data source data. With the appropriate data accesses and authorizations, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
  • Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are also capable of managing the data source's documentation and Data Dictionary.
  • Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
  • Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it's inside Immuta, but they are able to verify if their data was successfully ingested or not.

Data Source Types

Data sources fall in one of two categories: those that are backed by SQL technologies (query-backed data sources) and those that are not (object-backed data sources).

  • query-backed data sources: These data sources are accessible to subscribed Data Users through the Immuta Query Engine and appear as though they are Postgres tables.

  • object-backed data sources: These data sources are backed by data storage technologies that do not support SQL and can range from blob stores, to filesystems, to APIs.

Data Attributes

Data attributes are information about the data within the data source. These attributes are then matched against policy logic to determine if a row or object should be visible to a specific user. This matching is usually done between the data attribute and the user attribute.

For example, in the policy

Only show rows where Country='US' for everyone except when user is a member of group Finance

the data attribute (US in the Country column) is matched against the user attribute (Finance group) to determine whether or not rows will be visible to the user accessing the data. In this case only users who are a member of the Finance group will see all rows in the data source.

User Attributes

User attributes are values connected to specific Immuta user accounts. These attributes fall into three categories: permissions, groups, and authorizations.

These user attributes give users access to various Immuta features and drive data source policies.

Permissions

Permissions control what actions a user can take in Immuta, both API and UI actions. Permissions can be added and removed from user accounts by a System Administrator (an Immuta user with the ADMIN permission); however, the permissions themselves are managed by Immuta, and the actions associated with the permissions cannot be altered.

Groups

Groups allow System Administrators to group sets of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Like authorizations, groups can be used to restrict what data a set of users has access to.

Authorizations

Authorizations are custom tags that are applied to users to restrict what data users can see. Authorizations can be added manually or mapped in from LDAP or Active Directory.

Data Dictionary

The Data Dictionary provides information about the columns within the data source, including column names and value types. Users subscribed to the data source can post and reply to discussion threads by commenting on the Data Dictionary.

Dictionary columns are automatically generated when the data source is created if the remote storage technology supports SQL. Otherwise, Data Owners or Experts can create the entries for the Data Dictionary manually.