A data product is a valuable output derived from data. It is designed to provide insights, support decision-making, and drive business value. Data products are created to meet business needs or solve problems within a domain or organization. These products can take various forms. In most cases, it is a collection of tables and views, which is what is supported by Immuta.
The Monitor and secure sensitive data platform query activity use case covers how to register your data metadata, but it's worth discussing in more depth here because you have multiple data product owners.
As a first step, all the tables/views on the data platform(s) must be onboarded in Immuta. With the no default subscription policy setting (which is the Immuta default), onboarding objects in Immuta will not make any changes to existing accesses already in place.
Immuta only starts taking control when the first policy has been applied. We suggest that one team registers all objects from your data platform(s). Alternatively, different domain owners can register their own domain data. It is recommended to use a system account to do the data registration.
Regardless of which user registered the data, it's critical that schema monitoring is enabled to ensure future tables and views are discovered by Immuta and automatically registered to the proper domain.
Be aware that registering your data product metadata with Immuta does not make it available to users. It simply makes Immuta aware of its existence. To allow users to see it and subscribe, you would manage policies on it, discussed in the Applying federated governance guide next.
Any team member that has the necessary user credentials in the underlying platform to read the data.
Data sources can be made visible (discoverable) to users in the Immuta UI. This allows users to read the metadata associated with data sources (e.g., tags, descriptions, data dictionary, and contacts) and permits users to request access to the data itself.
This allows customers to use Immuta as a data catalog and marketplace. In case an organization uses an external marketplace, Immuta easily integrates with it via the API to bring in metadata or manage access controls for users that requested access via external marketplace.
While we have been talking about data mesh, using Immuta as a marketplace for a data shopping experience is not limited to data mesh use cases - it can be used this way for any use case.
Immuta integrates with your identity manager in order to enforce controls in your data platform(s) - you likely configured this when completing the Monitor and secure sensitive data platform query activity use case. This setup has another point, though: it also allows all your users to authenticate into Immuta with their normal credential to leverage it for this "shopping" experience.
Any action taken in Immuta will automatically be reflected in the data platform via Immuta's integration.
Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Instead of centralizing your data governance and giving users too much governance over all your data, you delegate and control their power over data sources they own by granting them permission within domains in Immuta.
Domains are typically defined by your organization’s data governance board and closely aligned with business departments or units, such as Sales, Customer Service, or Finance. Domains can also be defined by value stream, such as Patient Health Records, Fraud Detection, or Quality Control.
The task of setting up domains in Immuta can be done by any member of the team with the GOVERNANCE
permission and assigning domain permissions can be done by any member of the team with the USER_ADMIN
permission.
When delegating control to your data product owners, it's possible to further segment that control within domains. If you want to give the user control to build policies on the data sources (the data products) in the domain, that is managed by giving them the Manage Policies
permission in the domain.
You may have read the Automate data access control decisions use case already. If so you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.
This is because you want your data product owners to tag data with facts - what they have intimate knowledge of because they built the data product - and not have to be knowledgeable about all policies across your organization. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts, what is in the column, which is why ABAC makes much more sense.
Understanding that, read the automate data access control decisions use case's Managing data metadata guide.
It is important to distinguish between tag definition and tag application. While tag definition (e.g., a tag called “Business Unit” with the values “Finance”, “Marketing”, “Sales”) should be strongly governed to guarantee consistency and coherence, tag application can be fully decentralized, meaning every domain or data owner can apply tag values (from the centrally governed list) to their data. There needs to be a process in place for data owners to request the definition of a new tag in case they identify any gaps.
It is important to leverage Immuta Discover's sensitive data discovery (SDD) to monitor your data products. This allows you to uncover if and when sensitive data may be leaked unknowingly by a data product and mitigate that leak quickly.
The Monitor and secure sensitive data platform query activity use case covers this in great detail and is highly recommended for a data mesh environment.
This guide is intended for users who wish to delegate control of data products within a data mesh without breaking their organization's security and compliance standards.
It's best to start with the Monitor and secure sensitive data platform query activity use case prior to this use case because it takes you through the basic configuration of Immuta.
This guide is designed for those interested in understanding how Immuta can effectively assist organizations in adopting core data mesh principles. By leveraging this document, readers will be empowered to implement the essential procedures for integrating Immuta within their organization's data mesh framework. It is recommended to read the ebook Data Security for Data Mesh Architectures before starting with this use case.
Even if you are not interested in data mesh, this use case will explain how you can create a self-serve "data shopping experience" in Immuta.
Follow these steps to learn more about and start using Immuta Detect and Secure to apply federated governance for data mesh and self-serve data access:
Complete the Monitor and secure sensitive data platform query activity use case.
Opt to review the Automate data access control decisions and Compliantly open more sensitive data for ML and analytics use cases.
Define domains. This step is critical in order to delegate policy controls over certain segments of your data.
Manage data products to allow your data product creators to compliantly release data products for consumption.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful to ensure federated governance.
Work with your data product creators and global compliance users to Apply federated governance. Enforce global standards across all data products while empowering data product owners to expand on those policies.
Allow your data consumers to self-serve access data through the Immuta data portal by Discovering and subscribing to data products.
In the distributed domains of a data mesh architecture, data governance and access control must be applied vertically (locally) within specific domains or data products and horizontally (globally). Global policies should be authored and applied in line with the ecosystem’s most generic and all-encompassing principles, regardless of the data’s domain (e.g., mask all PII data). Localized domain- or product-level policies should be fine-grained and applicable to only context-specific purposes or use cases (e.g., only show rows in the Sales table where the country value matches the user's office location). In Immuta we distinguish between subscription policies and data policies.
To access a data source, Immuta users must first be subscribed to that data source. A subscription policy determines who can request access to a data source or group of data sources. Alternatively, a subscription policy can also be configured to automatically provide access based on a user’s profile, such as being part of a certain group or having a particular attribute.
It is possible to build Immuta subscription policies across all data products (horizontal), as shown in the diagram above, and then have those policies merged with additional subscription policies authored at the domain level (vertical). When this occurs, those subscription policies are merged as prescribed by the two or more policies being merged.
Whether the requirements for access are merged with an AND or an OR is prescribed by this setting in the policy builder for each of the individual policies:
Always Required = AND
Share Responsibility = OR
You can find a specific example of subscription policy merging in the Automate data access control decisions use case, but in the case of data mesh, the policies are authored by completely separate users - one user at the global (horizontal) level with GOVERNANCE
permission and the second at the domain (vertical) level with the Manage policies
permission (in domain).
When building subscription policies, it can impact what a user can discover and, if desired, "put in their shopping cart" to use.
We'll discuss the "shopping" experience in the next guide, but how you are able to manage this in subscription policies is found here:
Allow Data Source Discovery: Normally, if a user does not meet the subscription policy, that data source is hidden from them in the Immuta UI. Should you check this option when building your subscription policy, the inverse is true: anyone can see this data source. This is important if you want users to understand if the data product exists, even if they don't have access.
Require Manual Subscription: Even if the user does meet the policy, instead of automatically subscribing them, they would have to discover and subscribe themselves. If they meet the policy, they will automatically be subscribed with no intervention. This is important if you want the users to maintain the list of data products they see in the data platform rather than all data products they have access to.
Request Approval to Access: This allows the user to request access, even if they don't meet the policy. Rules can determine what user manually overrides the policy to let them in.
Once a user is subscribed to a data source via Immuta or has a pre-existing direct access on the underlying data platform, the data policies that are applied to that data source determine what data the user sees. Data policy types include masking, row-level, and other privacy-enhancing techniques.
An exemplary three-step approach to managing data policies would be
Create global data policies on the tags resulting from the out-of-the-box sensitive data discovery.
Develop business specific frameworks and protection rules and controlled them via data policies.
Update the global data policies as new sensitive data is potentially released by data products and discovered using Immuta Detect.
Data mesh is a higher level use case that pulls from concepts learned across the other use cases:
Monitor and secure sensitive data platform query activity: How can you proactively monitor for data products that are leaking sensitive data?
Automate data access control decisions: How do you manage table access across your data products?
Compliantly open more sensitive data for ML and analytics: How do you manage granular access and mitigate concerns about sensitive data leaks in data products?
Our recommended strategy is that you decide if you want to automatically subscribe users to data products or if you want a workflow where they discover and subscribe to data products. In either case, you can delegate some policy ownership to the data product creators both by allowing them to tag their data with facts that drive global data policies and allowing domain-specific subscription policy authoring which merges with global subscription policies.