This term refers to how Immuta users can consume and interact with data through Immuta. Accessing data through Immuta ensures that users are only consuming policy-controlled data with thorough auditing.
- Immuta Data Access Patterns: Section Contents
- Native Access Pattern Installation: Section Contents
Azure Synapse Analytics
- Cloudera Native Workspace Configuration
- Disable / Uninstall the Immuta Hadoop Plugins from CDH
- Immuta CDH Integration Installation
- Immuta CDH Integration Prerequisites
- Immuta Log Analysis Tool for CDH Deployments
- Immuta Performance Optimization on CDH Clusters
- Run as a Non-Default User
- Upgrade the Immuta HDFS Plugin on CDH
Called "authorizations" prior to Immuta's 2.7 release, attributes are custom tags that can be added to a user to restrict what data the user can see. When creating a policy on a data source, Data Owners can apply the policy to any user that possesses an attribute. Attributes can be added manually as well as mapped in from an external IAM.
Blob handlers are the tools used to access the back end storage technology and stream blobs of data through Immuta. Different storage technologies can utilize their own versions of blob handlers, and Data Owners can even create their own custom blob handlers to meet their needs.
A data attribute is data about your data. These attributes can be used to match against policy logic to decide if a row/object should be visible to a given user. This matching is usually done between the data attribute and user attributes.
Data attributes are typically part of the data being exposed as a column or metadata attribute. For example, a
query-backed data source may have a column called
access, which is used in policy logic
to match against a user attribute to determine if they can see the given row. Furthermore, an object in an
object-backed data source may have a metadata attribute called
access which determines
whether or not a user can see that object.
The Data Dictionary provides information about columns within a data source, including column names and types, and users subscribed to the data source can comment on the Data Dictionary.
Dictionary columns are generated automatically when the data source is created if the remote storage technology supports SQL. Otherwise, the Data Owner or Expert can create the entries manually.
Data Experts are those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and the Data Dictionary.
A data source is how you virtually expose data across your enterprise to Immuta consumers. When you expose a data source you are not copying the data; you are using metadata from the data source to tell Immuta how to expose it. No raw data is moved to an end user (or into the Immuta cache) until it is fetched by that user. The Immuta caching layer is configurable to reduce load on your exposed databases, and with the cost of RAM dropping, building a virtual data lake with desired data flowing in and out through the Immuta caching layer will reduce infrastructure cost, database load, and data latency.
From a technical perspective, a data source is an abstraction to data living in a remote data storage technology. When you expose a data source, it becomes an authoritative view to that remote data without having to pass around connection strings or API guides. Policy enforcement and access is maintained through Immuta based on the settings provided by the data source creator, who is known within Immuta as the Data Owner. Once exposed and subscribed to, the data will be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and sharing.
Data Source Minimization
Minimization policies expose a percentage of the data source to querying users. This percentage is configurable by the Data Owner and is based on a column with high cardinality.
As metadata for blobs/rows is ingested into Immuta, the data is tagged with a visibility marking which is an arbitrary JSON object that the Data Owner defines. The visibility for queryable data sources can be prescribed by selecting one or many columns to use as the visibility.
Differential Privacy provides mathematical guarantees that you cannot pinpoint an individual (row) in the data. This
foolproof anonymization understands the sensitivity of the query and applies the appropriate noise (if any) to
the response. For example,
average age could be changed from 50.5 to 55 at query time. Through this process, the Immuta
Query Engine restricts queries run on the data to aggregate queries (AVG, SUM, COUNT, etc.) and prevents very
sensitive queries from running at all.
Data Fingerprints capture summary statistics from data sources so that the user can view how that data changes over time or how the data changes when policies affecting that data source are changed.
Immuta pulls a sample of a data source through a Postgres proxy and it exists, temporarily, in the fingerprint container. Immuta then distills this data down to a series of summary statistics and pushes those statistics back to the Immuta Metadata Database. Those summary statistics of a data source are captured when a data source is created, when a policy is applied or changed, or when a user manually updates the data source fingerprint from the Policies tab. The user can then track changes in that data.
Groups function similarly to those in Active Directory and LDAP, allowing admins to group a set of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Similar to attributes, groups can use used to restrict what data a set of users has access to. When creating a policy on a data source, you can apply the policy to a group, which would affect any user that belongs to the said group.
Immuta data sources can leverage two different types of handlers: Blob Handlers and Policy Handlers.
Identity managers (IAMs) authenticate Immuta users and control their access to data. Out of the box, Immuta supports several configurable identity managers:
- Immuta Identity Manager (Built-in)
- Active Directory
- Okta (SAML)
Immuta also offers support for custom IAM plugins, so you can use the Immuta API to implement your own identity manager.
Object-backed Data Sources
Object-backed data sources are based on data storage technologies that do not support SQL and can range from NoSQL technologies, to blob stores, to filesystems, to APIs. Object-backed data sources act like key/value stores. Immuta refers to object-backed data sources as “ingested” sources because Immuta must ingest metadata (not the raw data) about the data source to provide access and create access policy restrictions. Users provide Immuta metadata about the blobs they are exposing so that Immuta understands how to reach the blobs and apply policies.
Permissions are system-level mechanisms that control what actions a user is allowed to take. These are applied to
both the API and
Permissions can be added to any user by an admin (any user with the
USER_ADMIN permission); however, the permissions
themselves are managed by Immuta and cannot be added or removed.
- APPLICATION_ADMIN: Gives the user access to administrative actions for the configuration of Immuta. These actions include
- Adding external IAMs.
- Adding ODBC drivers.
- Adding external catalogs.
- Configuring email settings.
- USER_ADMIN: Gives the user access to administrative actions for managing users in Immuta. These include
- AUDIT: Gives the user access to the audit logs.
- CREATE_DATA_SOURCE: Gives the user the ability to create data sources.
- CREATE_DATA_SOURCE_IN_PROJECT: Gives the user the ability to create data sources within a project.
- CREATE_S3_DATASOURCE_WITH_INSTANCE_ROLE: When creating an S3 data source, this allows the user to the handler to assume an AWS Role when ingesting data.
- CREATE_FILTER: Gives the user the ability to create and save a search filter.
- CREATE_PROJECT: Gives the user the ability to create projects.
- FETCH_POLICY_INFO: Gives the user access to an endpoint that returns visibilities, masking information, and filters for a given data source.
- GOVERNANCE: Gives the user the ability to set Global Policies, create purpose-based usage restrictions on projects, and manage tags.
- IMPERSONATE_HDFS_USER: When creating an HDFS data source, this allows the user to enter any HDFS user name to use when accessing data.
Policies are fine-grained security controls Data Owners apply when creating data sources. For query-backed data sources, columns can be masked and rows hidden. For object-backed data sources, certain blobs of data can be hidden from certain users and particular fields in the content of the blobs can be masked, if the blob is a known format. The creator of the data source determines the logic behind what is hidden from whom, and the logic can be as complex as desired. Policies can be created through the Immuta workflows, or custom policy handlers can be created to inject complex policies.
Projects are logical groupings of data, members, and discussions based on business goals. Projects can also capture the purpose of the work and audit data access.
See the Chapter 5 - Collaborating, Writing, and Sharing for more information on projects.
Projects contain purposes which can define (or restrict) the scope and usage of data within a project. Purpose restrictions can be defined by the Immuta Governor and/or the project owner(s). The Immuta Governor typically defines Immuta-wide restrictions like "To provide analytics." The project owner typically defines project- or data-specific restrictions such as "Billing," "Marketing," or "Research." Data that is accessed under the provision of a project will incorporate purpose-based auditing. If members join a project but would like to use the information for purposes other than what is specified, they can always create another project for those purposes.
Access to data in a data source can be restricted to data source users acting under a specific purpose within the context of an Immuta project. To see the restricted data, data source subscribers must use the credentials that are associated with a project that contains the relevant purpose.
Query-backed Data Sources
Query-backed data sources are accessible to subscribed users via the Immuta Query Engine and appear as though they are Postgres tables. Immuta policies are put into action either through querying the remote database or filtering on-the-fly.
Note: Query-backed data sources are the only data sources that allow users to gain access to that data via the Immuta Query Engine.
Query-backed Data Sources
Sensitive Data Discovery
This feature can be enabled by an Application Admin to automatically identify and tag columns that contain sensitive data when a new data source is created. The Immuta application is pre-configured with a set of "Discovered" tags that can be used to write Global Policies proactively.
To access the data in any data source, Immuta users must first be subscribed to that data source. The users with the most basic access to a data source are referred to as subscribers. Experts are subscribers with a set of their own special privileges.
Users can subscribe to a data source by requesting access through the Immuta UI or be added to the data source by the Data Owner.
See Managing Users and Groups in a Data Source for details on managing Data Users.
A Subscription Policy refers to how open a data source is to potential subscribers and can have one of four possible restriction levels:
- Anyone: Users will automatically be granted access (Least Restricted).
- Anyone Who Asks (and is Approved): Users will need to request access and be granted permission by a configured approver (Moderately Restricted).
- Users with Specific Groups/Attributes: Only users with the specified groups/attributes will be able to see the data source and subscribe (Moderately Restricted).
- Individual Users You Select: The data source will not appear in search results, Data Owners must manually add/remove users (Most Restricted).
Tags serve several functions in Immuta. They can drive Local or Global Subscription and Data Policies, generate Immuta Reports, and drive search results in the Immuta UI. Governors can create tags or import tags from external catalogs in the Governance UI. Data Owners and Governors can then apply these tags to or remove them from projects, data sources, and specific columns within the data sources.
Time-based restrictions only expose data within a defined time range, which is set by the Data Owner and is based on the event time column of the data source.
User attributes are used to drive data source policies as well as to give users access to certain Immuta features.