Data Owner Introduction to Immuta
Audience: Data Owners
Content Summary: This section introduces Data Owners to the Immuta platform and supports them in sharing their data and enforcing policies on that data through Immuta.
This page introduces Data Owners to Immuta as a whole, while the rest of this Data Owner section includes conceptual pages, which define major features and terms, and tutorial pages, which provide step-by-step instructions for managing data sources, policies, and projects.
- Managing Data Sources Tutorial
- Managing Policies Tutorial
- Managing Projects Tutorial
- Viewing Audit Logs Tutorial
Immuta: A Single Access Point
The Immuta platform solves two of the largest issues facing data-driven organizations: access and governance. In large organizations, it can be difficult, if not impossible, for data scientists to access all the data they need. Once they do get access, it’s often difficult to make sure they use the data in ways that are compliant.
The Immuta platform is meant to solve both problems by providing a single, unified access point for data across an organization and ensuring that all restrictions placed on data are dynamically enforced through the platform. Because the Immuta platform unifies access to data, governing that access is of primary concern. Implemented properly, the Immuta platform can ensure that only the right people see the right data under the right conditions.
Immuta User Roles
User roles in Immuta are fluid and interdependent, and understanding these different roles is essential to effectively sharing, analyzing, and protecting data and maintaining compliance.
Data Owners: In order for data to be available in the Immuta platform, a Data Owner — the individual or team responsible for the data — needs to connect their data to Immuta. Once data is connected to Immuta, that data is called a data source. In the process of creating a data source, Data Owners are able to set policies on their data that restrict which users can access it in the first place, which rows within the data a user can access, and which columns within the data a user can see outright or have masked. Data Owners can also decide whether to make their data source public, which makes it available for discovery to all users in the Immuta Web UI, or made private, so that only the Data Owner and its assigned subscribers know it exists.
Data Users: Data Users use the data that’s been made available through Immuta. Data Users can browse the Immuta Web UI seeking access to data and easily connect their third-party data science tools to Immuta for all their data science needs.
Project Owners: The two main types of Project Owners are either Data Owners who want to restrict how their data will be utilized using purpose-based restrictions, or Data Users who want to efficiently organize their data sources.
Governors: Governors set Global Policies within Immuta, meaning they can restrict the ways that data is used within Immuta across multiple projects and data sources. Governors can also set purpose-based usage restrictions on projects, which can help limit the ways that data is used within Immuta. By default, Governors can subscribe to data sources; however, this setting can be disabled in the Immuta Configuration, removing the Governor's ability to create or subscribe to data sources. Additionally, users can be a Governor and Admin simultaneously by default, but this setting can also be changed in the Configuration Builder, rendering the Governor and Admin roles mutually exclusive.
Application Admins: Application Admins manage manage the configuration of Immuta for their organization. These users can configure Immuta to use external identity managers and catalogs, enable or disable data handlers, adjust email and cache settings, generate system API keys, and manage various other advanced settings.
User Admins: Another type of System Administrator is the User Admin, who is able to manage the permissions, attributes, and groups that attach to each user. Permissions are only managed locally within Immuta, but groups and attributes can be managed locally or derived from user management frameworks such as LDAP or Active Directory that are external to Immuta. By default, Admins can subscribe to data sources; however, this setting can be disabled in the Immuta Configuration, removing the Admin's ability to create or subscribe to data sources. Additionally, users can be an Admin and Governor simultaneously by default, but this setting can also be changed in the Configuration Builder, rendering the Admin and Governor roles mutually exclusive.
Data Owner UI
Data Owners have access to the Data Sources page, Projects page, and their User Profile page by default. The sections below describe the features and actions that can be executed from each of these pages.
Data Sources Page
Once users navigate to this page, a list of data sources appears in the center window. Users can navigate between the All Data Sources tab and the My Data Sources tab to filter this list.
Additionally, users may use the Search bar in the upper left corner of the Immuta console to filter search results by data source name, tag, project, connection strings, or columns.
To navigate to a specific data source, users simply click on it from this list, and they will be taken to the Data Source Overview page.
In addition to the data source's health, this page provides detailed information about the data source and is organized by tabs across the top of the page: Overview, Users, Policies, Data Dictionary, Queries, Metrics, Discussions, Contacts, and Lineage. The visibility and appearance of the tabs will vary slightly depending on the type of user accessing the data source.
This tab includes detailed information about the data source in the left side-panel, including its Description, Technology, Table Name, File Type, the Remote Database and Remote Table, the Parent Server, and the Data Source ID.
In the middle window, the information displayed is divided into three categories:
- Documentation: Data Owners can provide additional information about their data source here. If documentation has not been created, only the data source name will appear.
- Connections: This section provides users' SQL connection string and information for connecting the Immuta Query Engine to external analytics tools, including PySpark 1.6, PySpark 2.0, Python+Psycopg2, Python+pyodbc, R, and RStudio.
- Tags: This section lists tags associated with the data source.
This tab contains information about the users associated with the data source, their username, when their access expires, what their role is, and an Actions button that details the users' subscription history, including the reason users need access to the data and how they plan to use it.
This tab is visible to everyone, but Data Owners and Governors can manage users from this page.
This tab lists the policies associated with the data source and includes three components:
- Subscribers: Lists who may access the data source. If a Subscription Policy has already been set by a Global Policy, a notification and a Disable button appear at the bottom of this section. Data Owners can click the Disable button to make changes to the Subscription Policy.
- Data Policies: Lists policies that enforce privacy controls on the data source. Data Owners can use this section to manage policies.
- Activity Panel: Records all changes made to policies by Data Owners or Governors, including when the data source was created, the name and type of the policy, when the policy was applied or changed, and if the policy is in conflict on the data source. Global policy changes are identified by the Governance icon; all other updates are labeled by the Data Sources icon.
This tab is visible to everyone, but Data Owners and Governors can manage policies from this page.
Data Dictionary Tab
The Data Dictionary is a table that details information about each column in a data source. The information within the Data Dictionary is generated automatically when the data source is created if the remote storage technology supports SQL. Otherwise, Data Owners or Experts can manually create Data Dictionaries. The Data Dictionary tab includes three sections:
- Name: The name of the column in the table.
- Type: The type of value, which may be text, integer, decimal, or timestamp.
- Actions: Users may use the buttons in this column to edit, comment, or tag items in the Data Dictionary.
This tab allows users to keep track of their personal queries, share their queries with others, sample public queries, and debug queries.
If users have issues with a query they're running, they can send a request for a query debug, which sends the query plan to the Data Owner.
This tab details the data source usage and general statistics. Object-backed data sources will also provide the total number of records available, and query-backed data sources will provide the total number of rows.
Users are able to comment on or ask questions about the Data Dictionary columns and definitions, public queries, and the data source in general. Resolved comments and questions are available for review to keep a complete history of all the knowledge sharing that has occurred on a data source.
Contact information for Data Owners is provided for each data source, which allows users to ask questions about accessibility and attributes required for viewing the data.
This tab lists all projects, derived data sources, or parent data sources associated with the data source and includes the reason the data source was added to a project, who added the data source to the project or created it, and when the data source was added to the project or created.
When users submit a Debug Query or Unmask request in the UI, a Tasks tab appears beside the Lineage tab for the requesting user and the user receiving the request. This tab contains information about the request and allows users to view and manage the tasks listed.
This page lists all the public projects available to be joined by others in the All Projects tab and all projects
users own or belong to are listed in the My Projects tab. Additionally, users with the
CREATE_PROJECT permission can
create a new project from this page.
To view details about a specific project, click the project name.
After navigating to a specific project from the Projects page, the following information about the project is visible to users on the Overview tab:
- Project Details: Information about the project appears in the sidebar on the left of the Overview tab. Details include when the project was created, the purposes associated with the project, a description of the project, the project ID, and credentials.
- Documentation: If Project Owners choose, they may add documentation about their project, which will appear in this section to viewers. If no additional documentation about the project is added, only the project name will appear here.
- Data Sources: The data sources associated with the project are listed here. Users can click on individual data sources to view the reason why it was added to the project and they can navigate to the data source itself. Project Owners can also manage their project data sources in this section.
- Tags: Tags associated with the data source are listed here. Project Owners can manage tags from this section.
- Activity Panel: All activity associated with with the project is listed in the sidebar on the right of the screen. Information recorded here includes who added data sources and tags to the project, members who have been added and removed from the project, and policy updates to the project.
This page includes a list of project members, their contact information and role, and when their membership expires. From this page, Project Owners can add and remove members from the project.
This tab allows Project Owners to choose who may request access to their project or whether or not their project is visible at all to users who are not project members.
The Project Equalization section enables Project Owners to level all members' access to data so that data appears the same to all project members, regardless of their individual attributes or groups.
The Subscribers section allows Project Owners to make their project open to anyone, to users who request and are granted access, to users with specified groups and attributes, or only to users the Project Owners manually add.
Project members can view, create, reply to, delete, and resolve discussion threads in this tab.
Data Sources Tab
A list of data sources within the project appears in this tab. Project members can view, comment on, and add data sources to the project here as well. Any project member can add data sources to the project, unless the Allow Masked Joins or Project Equalization features are enabled; in those instances, only Project Owners can add data sources to the project.
Data Fingerprints Tab
Data fingerprints capture summary statistics of a data source when a data source is created, when a policy is applied or changed, or when a user manually updates the data source fingerprint from the Policies tab.
The Data Fingerprints tab allows users to create fingerprint versions of the data sources in a project, which allows users to see how the data changes over time and how policies affect the data. In order for this tab to be available, Project Equalization must be enabled.
After a fingerprint version is created, users can view the changes newly enforced policies have on the data.
Query Editor Page
The Query Editor allows users who are subscribed to a data source to preview data and write and execute queries directly in the Immuta UI for any data sources they are subscribed to. Additionally, Data Owners can examine how their policies impact the underlying data.
Table List and Schema View
This panel contains a list of tables the user is subscribed to, and this list will automatically update when users switch their current project. Clicking a table in the list displays the schema view, which shows all columns with their respective data types.
Users can enter, modify, and execute their own queries in this panel. After users click Run Query, results will appear in the Query Results panel.
Query Results View
This panel displays the data returned by the query. Table columns can be resized or re-arranged via clicking and dragging, and results can be filtered. Currently displayed results can also be exported to .csv (limited to 1000 rows.)
User Profile Page
The SQL Credentials tab and the API Key tab display the following information about the user in a pane on the left of the page. With the exception of the HDFS Principal, this information may be edited by the user at any time.
- Name: The user's full name.
- Email: The user's email address.
- Position: The user's current position.
- About: A short description about the user.
- Location: The user's work location.
- Organization: The organization that a user is associated with.
- Phone Number: The user's phone number.
- HDFS Principal: An HDFS principal that is linked to the user's Immuta account. Only an admin may set this field. Your HDFS User Principal is used to provide access to Immuta's HDFS Access Pattern. Although you cannot change or remove your HDFS principal (it must be set by an Administrator), you can associate it with a Project Purpose to access data in HDFS under that purpose.
- Receive System Notifications as Emails: The user can opt to receive email notifications. To create and manage webhooks that notify users or other systems of activity in Immuta, follow our Managing Webhooks instructions in the API Reference Guides section.
SQL Credentials Tab
In order to connect to the Query Engine using the SQL Access Pattern, each user must create SQL credentials. SQL credentials can be accessed by clicking the SQL Credentials tab in the center pane.
For more information on SQL credentials, see Managing SQL Accounts.
API Key Tab
API keys allow for a secure way to communicate with the Immuta REST API without requiring the username and password. Each key can be revoked at any time and new ones generated. Once a key is revoked it can no longer be used to access the REST API, and users will need to authenticate any tool that they were using with the revoked API key with a new key.
Once in the API keys tab, a user can generate API keys or revoke API keys.
An API key can be linked to a project. By linking an API key to a project, you will be limiting that API key's visibility to only data sources associated with that project.
The Requests tab allows users to view and manage all pending access requests directly from their profile page.