This term refers to how Immuta users can consume and interact with data through Immuta. Accessing data through Immuta ensures that users are only consuming policy-controlled data with thorough auditing.
Each of Immuta's access patterns is described in detail in the Data Access Pattern Guides.
Authorizations are custom tags that can be added to a user to restrict what data the user can see. When creating a policy on a data source, Data Owners can apply the policy to any user that possesses an authorization. Authorizations can be added manually as well as mapped in from LDAP or Active Directory.
Blob handlers are the tools used to access the back end storage technology and stream blobs of data through Immuta. Different storage technologies can utilize their own versions of blob handlers, and Data Owners can even create their own custom blob handlers to meet their needs.
A data attribute is data about your data. These attributes can be used to match against policy logic to decide if a row/object should be visible to a given user. This matching is usually done between the data attribute and user attributes.
Data attributes are typically part of the data being exposed as a column or metadata attribute. For example, a
query-backed data source may have a column called
access, which is used in policy logic
to match against a user attribute to determine if they can see the given row. Furthermore, an object in an
object-backed data source may have a metadata attribute called
access which determines
whether or not a user can see that object.
The Data Dictionary provides information about columns within a data source, including column names and types, and users subscribed to the data source can comment on the Data Dictionary.
Dictionary columns are generated automatically when the data source is created if the remote storage technology supports SQL. Otherwise, the Data Owner or Expert can create the entries manually. See the Data Owner Guide for more details on managing the Data Dictionary.
Data Experts are those who are knowledgeable about the data source data and are capable of managing the data source's documentation and Data Dictionary.
See Managing Users and Groups in a Data Source for details on adding Experts to a data source.
A data source is how you virtually expose data across your enterprise to Immuta consumers. When you expose a data source you are not copying the data; you are using metadata from the data source to tell Immuta how to expose it. No raw data is moved to an end user (or into the Immuta cache) until it is fetched by that user. The Immuta caching layer is configurable to reduce load on your exposed databases, and with the cost of RAM dropping, building a virtual data lake with desired data flowing in and out through the Immuta caching layer will reduce infrastructure cost, database load, and data latency.
From a technical perspective, a data source is an abstraction to data living in a remote data storage technology. When you expose a data source, it becomes an authoritative view to that remote data without having to pass around connection strings or API guides. Policy enforcement and access is maintained through Immuta based on the settings provided by the data source creator, who is known within Immuta as the Data Owner. Once exposed and subscribed to, the data will be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and sharing.
For more information about data sources, see the Data Source Management Guide.
Data Source Minimization
Minimization policies expose a percentage of the data source to querying users. This percentage is configurable by the Data Owner and is based on a column with high cardinality.
As metadata for blobs/rows is ingested into Immuta, the data is tagged with a visibility marking which is an arbitrary JSON object that the Data Owner defines. The visibility for queryable data sources can be prescribed by selecting one or many columns to use as the visibility.
Differential Privacy provides mathematical guarantees that you cannot pinpoint an individual (row) in the data. This
foolproof anonymization understands the sensitivity of the query and applies the appropriate noise (if any) to
the response. For example,
average age could be changed from 50.5 to 55 at query time. Through this process, the Immuta
Query Engine restricts queries run on the data to aggregate queries (AVG, SUM, COUNT, etc.) and prevents very
sensitive queries from running at all.
Groups function similarly to those in Active Directory and LDAP, allowing admins to group a set of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Similar to authorizations, groups can use used to restrict what data a set of users has access to. When creating a policy on a data source, you can apply the policy to a group, which would affect any user that belongs to the said group. Permissions and authorizations cannot be applied to groups.
Immuta data sources can leverage two different types of handlers: Blob Handlers and Policy Handlers.
Identity managers (IAMs) authenticate Immuta users and control their access to data. Out of the box, Immuta supports several configurable identity managers:
- Immuta Identity Manager (Built-in)
- Active Directory
- Okta (SAML)
Immuta also offers support for custom IAM plugins, so you can use the Immuta API to implement your own identity manager. See the API Reference Guides for more details. See the Identity Managers Guide for information on IAM configuration.
Object-backed Data Sources
Object-backed data sources are based on data storage technologies that do not support SQL and can range from NoSQL technologies, to blob stores, to filesystems, to APIs. Object-backed data sources act like key/value stores. Immuta refers to object-backed data sources as “ingested” sources because Immuta must ingest metadata (not the raw data) about the data source to provide access and create access policy restrictions. Users provide Immuta metadata about the blobs they are exposing so that Immuta understands how to reach the blobs and apply policies.
Unlike query-backed data sources, which support data access through both the Immuta Query Engine and the Immuta Virtual Filesystem, object-backed data sources only allow users to access the unstructured data through the Filesystem. However, you can also pass Immuta features (new data you create) from within the data you are exposing that are queryable via the Query Engine.
- APPLICATION_ADMIN: Gives the user access to administrative actions for the configuration of Immuta. These actions include
- Adding external IAMs.
- Adding ODBC drivers.
- Adding external catalogs.
- Configuring email settings.
- USER_ADMIN: Gives the user access to administrative actions for managing users in Immuta. These include
- AUDIT: Gives the user access to the audit logs.
- CREATE_DATA_SOURCE: Gives the user the ability to create data sources.
- CREATE_DATA_SOURCE_IN_PROJECT: Gives the user the ability to create data sources within a project.
- CREATE_S3_DATASOURCE: Gives the user the ability to create an S3 data source.
- CREATE_S3_DATASOURCE_WITH_INSTANCE_ROLE: When creating an S3 data source, this allows the user to the handler to assume an AWS Role when ingesting data.
- CREATE_FILTER: Gives the user the ability to create and save a search filter.
- CREATE_PROJECT: Gives the user the ability to create projects.
- GOVERNANCE: Gives the user the ability to act as a Governor.
- IMPERSONATE_HDFS_USER: When creating an HDFS data source, this allows the user to enter any HDFS user name to use when accessing data.
Permissions are system-level mechanisms that control what actions a user is allowed to take. These are applied to
both the API and UI actions.
Permissions can be added to any user by an admin (any user with the
ADMIN permission); however, the permissions
themselves are managed by Immuta and cannot be added or removed.
Policies are fine-grained security controls Data Owners apply when creating data sources. For query-backed data sources, columns can be masked and rows hidden. For object-backed data sources, certain blobs of data can be hidden from certain users and particular fields in the content of the blobs can be masked, if the blob is a known format. The creator of the data source determines the logic behind what is hidden from whom, and the logic can be as complex as desired. Policies can be created through the Immuta workflows, or custom policy handlers can be created to inject complex policies.
Projects are logical groupings of data, members, and discussions based on business goals. Projects can also capture the purpose of the work and audit data access.
See the Projects Tutorial for more information on projects.
Projects contain purposes which can define (or restrict) the scope and usage of data within a project. Purpose restrictions can be defined by the Immuta Governor and/or the project owner(s). The Immuta Governor typically defines Immuta-wide restrictions like "To provide analytics." The project owner typically defines project- or data-specific restrictions such as "Billing," "Marketing," or "Research." Data that is accessed under the provision of a project will incorporate purpose-based auditing. If members join a project but would like to use the information for purposes other than what is specified, they can always create another project for those purposes.
See the Project Governor Guide for details about managing project purposes.
Access to data in a data source can be restricted to data source users acting under a specific purpose within the context of an Immuta project. To see the restricted data, data source subscribers must use the credentials that are associated with a project that contains the relevant purpose.
See the Policy Builder UI tutorial for information on enforcing Purpose-based Restrictions.
Query-backed Data Sources
Query-backed data sources are accessible to subscribed users via the Immuta Query Engine and appear as though they are Postgres tables. Whether Query-backed or not, the data is also accessible via the Immuta Virtual Filesystem. In either access pattern, the Immuta policies are put into action either through querying the remote database or filtering on-the-fly.
Note: Query-backed data sources are the only data sources that allow users to gain access to that data via the Immuta Query Engine.
To access the data in any data source, Immuta users must first be subscribed to that data source. The users with the most basic access to a data source are referred to as subscribers. Experts are subscribers with a set of their own special privileges.
Users can subscribe to a data source by requesting access through the Immuta UI or be added to the data source by the Data Owner.
See Managing Users and Groups in a Data Source for details on managing Data Users.
A Subscription Policy refers to how open a data source is to potential subscribers and can have one of four possible restriction levels:
- Anyone: Users will automatically be granted access (Least Restricted).
- Anyone Who Asks (and is Approved): Users will need to request access and be granted permission by a configured approver (Moderately Restricted).
- Users with Specific Groups/Authorizations: Only users with the specified groups/authorizations will be able to see the data source and subscribe (Moderately Restricted).
- Individual Users You Select: The data source will not appear in search results, Data Owners must manually add/remove users (Most Restricted).
For information on how to enforce Subscription Policies, see the Policy Builder UI tutorial.
Time-based restrictions only expose data within a defined time range, which is set by the Data Owner and is based on the event time column of the data source.
User attributes are used to drive data source policies as well as to give users access to certain Immuta features.