Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Select your use case
Detect is one of the Immuta flagship modules. Immuta Detect continually monitors your data environment to help answer questions about your most active data users, the most accessed data, and the events happening within your data environment. This understanding can help drive prioritization of where to place access control policies in Immuta’s other flagship module, Immuta Secure, which is why you should start here.
Below are two use cases to get started with Detect. They contain the basic Immuta configuration steps that you need to accomplish before using Immuta Secure. It is highly recommended that you follow the below use case, regardless of Detect support.
Currently, Immuta Detect only supports tag filtering and sensitivity with Snowflake. So you are presented with two paths to choose from:
If you use Snowflake: Monitor and secure sensitive data platform query activity. This will guide you through the basics of Immuta configuration and culminate with findings in Detect.
If don't use Snowflake: General Immuta configuration. This will guide you through the required Immuta configurations in order to move on to Immuta Secure.
In order to take advantage of all the capabilities of Immuta, you must make Immuta aware of your data metadata. This is done by with Immuta as data sources. It’s important to remember that Immuta is not reading your actual data at all; it is simply discovering your information schemas and pulling that information back as the foundation for everything else.
This section offers the best practices when onboarding data sources into Immuta.
If you have an external data catalog, like Collibra or Alation, first; then register your data in Immuta. This process will automatically tag your data with the external catalog tags as you register it.
Find more on this topic in the guide.
Use Immuta's setting to onboard metadata without affecting your users' access. This means you onboard all metadata in Immuta without any impact on current accesses which gives you time to fully convert your operations to Immuta without causing unnecessary data downtime. Immuta will only take control when the first policies are applied. Because of this, register all tables.
While it can be tempting to start small and register only the pieces of data that you intend to protect, you must remember that Immuta is not just about access control. It’s important to register your data metadata so that Immuta can also track activity and understand where that sensitive data lies (with Immuta Detect). In other words, Immuta can’t tell you where you have problems unless you first tell it to look at your metadata.
Without the no default subscription policy, Immuta will set each data source's subscription policy to the most restrictive option which automatically locks data down during onboarding. To unlock the data and give your users access again, new subscription policies must be set.
If you are delegating the registration and control of data, then please read our use case for more information.
Use the to register a schema; then use schema monitoring to find new data sources and automatically register them.
One of the greatest benefits of a modern data platform is that you can manage all your data transformations at the data tier. This means that data is constantly changing in the data platform, which may result in the need for access control changes as well. This is why it is critical that you when registering metadata with Immuta. This will allow Immuta to constantly monitor and update for these changes.
It’s also important to understand that many data engineering tools make changes by destructively recreating tables and views, which results in all policies being dropped in the data platform. This is actually a good thing, because this gives to update the access as the changes are found (policy uptime) while the only user that can see the data being recreated is the creator of that change (data downtime for all other users). This is why schema monitoring and column detection are so critical.
This guide is for users who wish to understand their data estate and where there may be security gaps or non-compliant user query activity that needs to be addressed. It also contains details for configuring Immuta that must be accomplished before moving on to the Secure your data use cases.
This use case is tailored to quickly get you monitoring queries in your data platform and understanding where you may have security gaps using Immuta Discover and Immuta Detect. If you are not using Snowflake, instead move to the General Immuta configuration use case because filtering by tags and sensitivity in Immuta Detect is currently only available on Snowflake.
As part of this use case, you will learn special considerations and configurations for setting up Immuta for Immuta Detect. Upon completion, you will understand existing security gaps and it will help guide your Immuta Secure journey.
Follow these steps to configure Immuta and start using Detect:
Configure your users in Immuta, using the user identity best practices in order to review and summarize user activity and plan your first policy.
Read the native integration architecture overview and connect Immuta to your database. Consider the Snowflake roles best practices.
Register data sources in order to review and summarize data activity and plan your first policy.
Start Using Immuta Detect. To get the most out of it, consider populating sensitivity using Automate entity and sensitivity discovery (SDD) and then configure Detect with SDD.
This guide outlines best practices for managing user identities in Immuta with your identity manager, such as Active Directory and Okta.
Reusing information you have already captured today is a good practice. A lot of information about users is in your identity manager platform and can be used in Immuta for user onboarding and policies.
All users protected by Immuta must be registered in Immuta, even though people might not log in to Immuta.
SAML is commonly used as a single sign-on mechanism for users to log in to Immuta. This means you can use your organization's SSO, which complies with your security standards.
Every user that will be protected by Immuta needs to have a user on the platform to enforce policy, regardless of if they are logging in to Immuta. SCIM should be used to provision users from your identity manager platforms to Immuta automatically. The advantage here is that not all end-users need to log in to Immuta to create their accounts, and updates in your identity manager will be automatically reflected in Immuta, hence updating the access in your platforms.
Details on how to configure your individual identity manager's protocols can be found here:
There are several different combinations of supported protocol configurations, so consider those as you plan your user synchronization.
In Immuta, permissions control what actions a user is allowed to take through the API and UI. The different permissions can be found in the Immuta permissions guide.
We recommend using identity manager groups to manage permissions. When you configure the identity manager integration, you can enable group permissions. This allows you to control the permissions via identity manager groups and use the group assignment and approval process currently in place.
Immuta is not just a location to define your policy logic; Immuta also enforces that logic in your data platform. How that occurs varies based on each data platform, but the overall architecture remains consistent and follows the NIST Zero Trust framework. The below diagram describes the recommended architecture from NIST:
Immuta lives in the middle control plane. To do this, Immuta knows details about the subjects and enterprise resources, acts as the policy decision point through policies administered by policy administrators, and makes real-time policy decisions using the internal Immuta policy engine.
Lastly, and of importance to how Immuta Secure functions, Immuta also enables the policy enforcement point by administering the policies natively in your data platform in a way that can react to policy changes and live queries.
To use Immuta, you must configure the Immuta native integration, which will require some level of privileged access to administer policies in your data platform, depending on your data platform and how the Immuta integration works. Please refer to Snowflake roles best practices for Snowflake before configuring the native integration.
Intermingling your pre-existing roles in Snowflake with Immuta can be confusing at first. This guide outlines some best practices on how to think about roles in each platform.
Roles play a crucial role in Snowflake by organizing and controlling access to data, platform permissions, and data warehouses. Immuta also leverages Snowflake roles to grant users permission to read data based on subscription policies.
Users who consume data (directly in Snowflake or through other applications) need roles to access objects. But roles are also used to control write, Snowflake warehouses, and Snowflake permissions through system-defined roles.
To manage this at scale, Immuta recommends taking a 4-layer approach, where you separate the different permissions into different roles:
Roles for read access (Immuta managed)
Roles for write access (optional, soon supported by Immuta)
Roles for warehouse, internal billing
Roles for Snowflake permissions (optional)
Read access is managed by Immuta. By using subscription policies, data access can be controlled to the table level. Attribute-based table GRANTS help you scale compared to RBAC, where access control is typically done on a schema or database level.
Since Immuta leverages Snowflake roles, you can still use existing roles in Snowflake. This means you can gradually migrate to an Immuta-protected Snowflake.
Write access is typically granted on a schema or database. This makes it easy to manage in Snowflake through manual grants. We recommend creating roles that give insert, update, and delete permissions to a specific schema or database and attach this role to a user. This attachment can be done manually or using your identity manager groups. (See the Snowflake documentation for details.) Note that Immuta is working towards supporting write policies, so this will not need to be separately managed for long.
Warehouses are granted to users to give them access to computing resources. Since this is directly tied to Snowflake’s consumption model, warehouses are typically linked to cost centers for (internal) billing purposes. Immuta recommends creating a role/warehouse per team/domain/cost center and granting this warehouse role to users using identity manager groups.
Snowflake permissions are granted through system-defined roles like ACCOUNTADMIN
or SECURITYADMIN
. These are high-privilege roles that are only granted to administrators. This can be done manually or using AD groups.
Snowflake allows users to select a specific role, but you can also use all roles simultaneously. Immuta recommends using all roles since that helps to separate the different roles.
This feature is called ‘secondary roles’ and can be enabled using the following command in Snowflake: USE SECONDARY ROLES ALL
Alternatively, you could create personal roles and grant the warehouse-role/immuta-read-role and possibly the snowflake-permission-role and write-role to this.
Immuta has two types of service accounts to connect to Snowflake:
Policy role: This role gives Immuta the power to create and apply policy. Immuta can create this policy role automatically, or you can run the provided bootstrap script manually to create the policy role.
Data ownership role: This role is used to register data sources. A service account/role is recommended so that when the user moves or leaves the organization, Immuta will still have the proper credentials to connect to Snowflake. You can follow one of the two best practices:
A central role for registration (recommended): It is recommended that you create a service role/user with USAGE
permissions for all objects in Snowflake. This allows Immuta to register all the objects from Snowflake, populate the Immuta catalog, and scan the objects for sensitive data using Immuta Discover. Immuta will not apply policy directly by default, so no existing access will be impacted.
A role per team/domain (alternative): Alternatively, if you cannot create a role with USAGE
permissions for all objects, you can allow the different domains or teams in the organization to use a service user/role scoped to their data to register data sources. This is delegating metadata registration and aligns well with data mesh type use cases and means every team is responsible for registering their data sets in Immuta.
Requirement:
Snowflake Enterprise Edition or higher
Native SDD and classification frameworks enabled in Immuta. If you do not know if they are enabled, collaborate with your Immuta representative to turn on native SDD and classification frameworks in your Immuta tenant.
Users and Data Sources have been registered in Immuta:
Snowflake tables registered as Immuta data sources
Snowflake users registered in Immuta
Currently, Detect only supports filtering by tag and showing sensitivity of audit records for Snowflake.
This onboarding process is recommended for organizations that have not tagged any sensitive data yet. Immuta will identify, classify, and tag your data. After you are fully onboarded, you will see Detect dashboards with information on your organization's data use and data sensitivity, and the Discover data inventory dashboard will show details about the data that was scanned.
Enable sensitive data discovery (SDD): SDD will sample and tag your data based on the sensitive data detected. These tags are necessary for the classification framework tags in step 2 to be applied.
Activate Immuta's built-in frameworks: Once you activate the Data Security Framework and the Risk Assessment Framework, they will tag your data with classification tags. Specific classification tags contain the metadata required to assign your data sensitivity levels.
Adjust or accept entity and classification tags: After SDD and classification frameworks have been enabled and run, it may be necessary to adjust the output tags based on your organization's data, security, and compliance needs.
Grant permissions: Grant the appropriate users the AUDIT
permission to view Immuta Detect dashboards.
View Immuta Detect: Once all tags are correctly applied, the Detect dashboards will reflect accurate audit information. Navigate through Immuta Detect and explore the dashboards that visualize the sensitive data in your data environment.
After you are happy with the Detect dashboards on the select data sources you enabled, you can integrate Detect with more of your data environment.
Enable SDD for all data sources: If you already had SDD enabled before starting Detect onboarding, skip this step. Once you are satisfied with the SDD tags and classification tags applied to your selected data sources, and the classification tags look correct, you should enable SDD for all data sources. This will add entity and classification tags to the rest of the data sources within your environment. You can choose to run SDD on all data sources, or run another payload with just a select few to gradually onboard the rest of your tables.
Within the Immuta product ecosystem, Immuta Detect is responsible for surfacing and indexing a wide range of security-related events, making it a rich source of data security posture insights.
In a typical deployment, Immuta Detect efficiently surfaces and processes a vast number of data security events. While these events all have security relevance, it may be challenging to understand their potential impacts without manual investigation. At the same time, the sheer volume of events typically greatly exceeds what a team can manually explore.
Enter Immuta Discover: Immuta’s data discovery and security analysis engine can identify, categorize, and classify data. Immuta Discover analyzes data available within the operational context of an event in conjunction with applicable legal, regulatory, compliance, and security frameworks to make deep inferences about the status of the data. For example, in a medical context, Immuta Discover can understand the difference between anonymized and identified medical data.
With additional classification metadata powered by Immuta Discover, Immuta Detect analyzes data security events for sensitivity, ensuring that highly significant events remain highly visible. In the context of the previous example, Immuta Detect can detect and flag the accidental identification of anonymized medical data.
With Discover, Immuta Detect can provide insightful oversight of who accesses sensitive data, where it is stored, and how it is used, enabling
Rapid and exact compliance monitoring and assessment
Insights into data usage patterns for setting data access policy
Simplified and expedient audit responses
Context-aware analysis of data flows as seen through the lens of security or regulatory compliance frameworks
Immuta Discover works in three phases: identification, categorization, and classification.
In the first phase, data is identified by its kind – for example, a name or an age. This identification can be manually performed, externally provided, or automatically determined by Immuta Discover through column-level analysis. This is commonly termed entity identification.
In the second phase, data is categorized in the context where it appears, subject to any active data compliance or security frameworks. For example, a record occurring in a clinical context containing both a name and individual health entities is Protected Health Information under HIPAA.
Though entirely customizable, for this purpose, Immuta provides a default framework known as the Immuta Data Security Framework (Immuta DSF.) Immuta DSF gives a fine-grained categorization into a consistent set of security and compliance concepts, including things like whether or not a record pertains to an individual, the composition and kinds of any identifiers that are present, the subject matter of the data, whether it belongs to any commonly controlled data categories, etc. The rules of the framework use the entities found in the table in phase 1 to drive how the data is categorized.
The categorization provided by the Immuta DSF may be used directly. Still, it is best leveraged as a starting point for purpose-built compliance frameworks implementing organization-specific compliance categories or other relevant high-level regulatory and compliance frameworks, such as those for categorizing data into categories defined under CCPA, GDPR, GLBA, HIPAA, etc.
Bottom line, think of categorization as a way to apply higher level categories to the fine-grained entities discovered in phase 1 through rules you can customize. These categories are presented as tags in Immuta, just like the entities in phase 1, and thus, can be used for Immuta Secure policies.
In the third and final phase, data is classified according to its sensitivity level (e.g., Customer Financial Data is Highly Sensitive). Again, Immuta supplies sensitivity classification defaults based on standard industry practice. Just like how categories are built from phase 1 entities, classification builds on the phase 2 categories. Customers are free to customize this classification under their respective views. These classifications are key to surfacing sensitive queries in Detect based on your definition of sensitive.
There are good reasons to automate data discovery and analysis with Immuta Discover:
It formalizes the entire process, producing a coherent set of classification rules.
It makes it possible to automatically and uniformly scale compliance to new data sources.
It enables Immuta Detect to automatically detect additional threats, such as unauthorized or attempted access to sensitive data, and for soft enforcement of organizational data access policies. (For example, that access to personal information, direct identifiers, or login credentials be masked.)
Speed. Automating data discovery and analysis with Immuta Discover enables faster access to data by removing the manual effort of tagging and classifying new tables and columns.
Please be aware that
Some customization may be necessary. Although Immuta's sensitive data discovery discovers over 60 types of sensitive data, only some data elements may be relevant. Further, unique sensitive data elements may not be covered out of the box. In these cases, it is possible to create new sensitive data discovery rules or adjust the sensitivity of existing rules through testing to ensure data is properly discovered and tagged.
New global templates should be created to find only entities that are relevant to the organization. This will ensure extraneous tags are not added to data elements.
Some customers may already have an existing data catalog tagging data; Immuta’s sensitive data discovery can work in combination with the data catalog.
Because data environments are not static, it is imperative that data tagging is automatically performed with new or changed data so that policies can be enabled in real-time, lowering the risk of data leaks.
Many organizations have invested in an enterprise data catalog as part of their data governance programs. Entity tags from the data catalogs will be pulled into Immuta in a one-way sync because the catalog is the system of record for entity tags. The tags pulled in from the data catalog can later be mapped to categories in the same way that entities automatically discovered in phase 1 are mapped to categories. This in turn will associate the appropriate sensitivity via classification to the external tags.
For a concrete example, consider a scenario where the Collibra catalog has tags for Longitude
and Latitude
. The following rule assigns Immuta DSF.Longitude
to any column tagged Collibra.Location.Longitude
. These rules appear in the rules array in the framework definition.
Incorporating tags from external catalog rules is fairly straightforward. External tags are referenced in rules, except the source field identifies the external catalog. The source field generally varies depending on the external catalog system. The correct value for the field may be identified by examining tag objects listed with the tags API, which includes the source field.
This guide is for anyone ready to begin their journey with Immuta who isn't using Snowflake. Specifically, it provides details for configuring Immuta that must be accomplished before moving on to the Secure your data use cases.
If you are using Snowflake with Immuta, see the Monitor and secure sensitive data platform query activity use case.
This use case provides the basics for setting up Immuta and helps you understand the logic and decisions behind the tasks.
Follow these steps to configure Immuta:
Configure your users in Immuta, using the user identity best practices.
Read the native integration architecture overview and connect Immuta to your databases. Consider the Databricks roles best practices if using Databricks.
Register data sources in order to start using Immuta features on the data sources.
Immuta is not just a location to define your policy logic; Immuta also enforces that logic in your data platform. How that occurs varies based on each data platform, but the overall architecture remains consistent and follows the NIST Zero Trust framework. The below diagram describes the recommended architecture from NIST:
Immuta lives in the middle control plane. To do this, Immuta knows details about the subjects and enterprise resources, acts as the policy decision point through policies administered by policy administrators, and makes real-time policy decisions using the internal Immuta policy engine.
Lastly, and of importance to how Immuta Secure functions, Immuta also enables the policy enforcement point by administering the policies natively in your data platform in a way that can react to policy changes and live queries.
To use Immuta, you must configure the Immuta native integration, which will require some level of privileged access to administer policies in your data platform, depending on your data platform and how the Immuta integration works. If using Databricks, please refer to Databricks roles best practices for Databricks before configuring the native integration.
In order to take advantage of all the capabilities of Immuta, you must make Immuta aware of your data metadata. This is done by registering your data with Immuta as data sources. It’s important to remember that Immuta is not reading your actual data at all; it is simply discovering your information schemas and pulling that information back as the foundation for everything else.
This section offers the best practices when onboarding data sources into Immuta.
If you have an external data catalog, like Collibra or Alation, configure the catalog integration first; then register your data in Immuta. This process will automatically tag your data with the external catalog tags as you register it.
Use Immuta's no default subscription policy setting to onboard metadata without affecting your users' access. This means you onboard all metadata in Immuta without any impact on current accesses which gives you time to fully convert your operations to Immuta without causing unnecessary data downtime. Immuta will only take control when the first policies are applied. Because of this, register all tables.
While it can be tempting to start small and register only the pieces of data that you intend to protect, you must remember that Immuta is not just about access control. It’s important to register your data metadata so that Immuta can also track activity and understand where that sensitive data lies (with Immuta Detect). In other words, Immuta can’t tell you where you have problems unless you first tell it to look at your metadata.
Without the no default subscription policy, Immuta will set each data source's subscription policy to the most restrictive option which automatically locks data down during onboarding. To unlock the data and give your users access again, new subscription policies must be set.
If you are delegating the registration and control of data, then please read our Data mesh use case for more information.
Use the /api/v2/data
endpoint to register a schema; then use schema monitoring to find new data sources and automatically register them.
One of the greatest benefits of a modern data platform is that you can manage all your data transformations at the data tier. This means that data is constantly changing in the data platform, which may result in the need for access control changes as well. This is why it is critical that you enable schema monitoring and column detection when registering metadata with Immuta. This will allow Immuta to constantly monitor and update for these changes.
It’s also important to understand that many data engineering tools make changes by destructively recreating tables and views, which results in all policies being dropped in the data platform. This is actually a good thing, because this gives Immuta a chance to update the access as the changes are found (policy uptime) while the only user that can see the data being recreated is the creator of that change (data downtime for all other users). This is why schema monitoring and column detection are so critical.
Immuta Detect provides value from the moment the dashboards are visible, which can be enabled for organizations with Snowflake, Databricks Spark, and Databricks Unity Catalog integrations. Currently, organizations with Snowflake integrations can get even more value with data sensitivity and tagging. To determine and surface the sensitivity of your data access, enable and tune classification.
Completing all the steps below will fully onboard you with Detect and Discover:
Prerequisites:
The onboarding process assumes that these prerequisites have already been set up, but here are the Immuta features and configuration required to enable Detect. Each integration can be used alone or a Snowflake integration can be used with either Databricks Spark or Databricks Unity Catalog. Databricks Spark and Databricks Unity Catalog are not supported together with Detect:
For Snowflake integrations:
Native query audit enabled: This feature can be enabled when first configuring the integration or when editing the integration.
(Recommended) Table grants enabled: While not required, it is recommended to enable this feature to properly audit unauthorized query events. Without it, unauthorized events will still show as successful. Project workspaces cannot be used with table grants, so if your organization relies on them, leave this feature disabled.
Benefits and limitations of enabling table grants
With table grants enabled:
Unauthorized query events will be audited and present in the Detect dashboards.
Table grants will manage the privileges in Snowflake for Immuta tables, making it more efficient than without.
Without table grants:
Unauthorized events are unavailable because users will have successful queries of zero rows, even if they do not have access to the table.
You can use project workspaces. Table grants is not compatible with project workspaces. If your organization depends on that capability, table grants is not recommended.
Snowflake tables and users registered in Immuta: Detect only audits events by users registered in Immuta on tables registered in Immuta. If you do not register the tables and users, their actions will not appear in the audit records or on the Detect dashboards.
For Databricks Spark integrations:
For Databricks Unity Catalog integrations:
Databricks Unity Catalog integration with native query audit enabled Note that it is enabled by default when configuring the integration.
Recommended:
This setting is not required for Detect, but can be used for better functionality:
No subscription policy by default: This feature sets the subscription policy of all new data sources to none when they are registered. Using this feature, allows for organizations to register all Snowflake tables in Immuta. Their audit information will appear in the Detect dashboards, but users' access to them will not be impacted by Immuta until a subscription policy is set.
Requirement:
Immuta permission USER_ADMIN
Actions:
Grant users the AUDIT
permission to see the Detect dashboards.
Navigate through Immuta Detect and explore the dashboards that visualize user and query audit information for your data environment.
These actions will result in users seeing the Detect dashboards containing information on the audit events in your data environment. These dashboards will not contain any information on the sensitivity of your data.
To see sensitivity information using a Snowflake integration, proceed with the steps below.
Only available with Snowflake integrations: Discover classification is supported with Databricks and Snowflake integrations; however, the sensitivity can only be visualized in Detect dashboards with Snowflake integrations.
There are two options to tag data and activate classification frameworks to determine the sensitivity of your data:
(Recommended) Use Immuta sensitive data discovery (SDD) to automatically categorize and tag your data: This option is the smoothest onboarding experience because it is the most automated process. You will not need to manually tag your data, and the framework to determine sensitivity is already set to use the SDD tags.
Use your organization's external tags: This option requires more manual configuration, but is best for organizations that have already configured tags for their tables. Please contact your Immuta representative for guidance.
After completing either of the tutorials above, data sources are tagged with entity tags and classification tags. Once users start querying data, and after the data latency with Snowflake, the Detect dashboards will show audit information with sensitivity information and the Discover data inventory dashboard will show details about the data that was scanned.
If you notice some sensitivity types are not appearing as you expect, proceed with the step below.
Only available with Snowflake integrations: Discover classification is supported with Databricks and Snowflake integrations; however, the sensitivity can only be visualized in Detect dashboards with Snowflake integrations.
Requirement:
Immuta permissions AUDIT
and GOVERNANCE
Actions:
After Discover has run SDD and the classification frameworks, it may be necessary to adjust the resulting tags based on your organization's data, security, and compliance needs:
After completing the tutorials above, all data appears as the appropriate sensitivity type on the Detect dashboards with Snowflake data sources.
Detect activity pages will have active charts when configured correctly with supported integrations after audit logs have been ingested. The user viewing must have the Immuta AUDIT
permission.
Detect supports the following integration for activity pages with dynamic query sensitivity that will determine and visualize the sensitivity of user queries:
Snowflake with native query audit enabled
Detect supports the following integrations for activity pages, but will not visualize any sensitivity:
Databricks Spark
Databricks Unity Catalog with native query audit enabled
See the prerequisites for more information on the required configuration for each integration.
Query events sensitivity is determined by the tags with sensitivity metadata on the columns queried from Snowflake data sources. Immuta comes with a built-in framework with sensitivity tags, the Risk Assessment Framework. Ensure you have completed the configuration steps for onboarding Detect with Discover.
Check your data source tags
If you have completed the above steps and still see query events as "Indeterminate" or "Nonsensitive", check that the right tags were applied in the data dictionary:
Navigate to the data source dictionary page.
Confirm one of the following tags is applied to one of the queried data columns:
RAF.Confidentiality.Very High
RAF.Confidentiality.High
RAF.Confidentiality.Medium
Detect uses the sensitivity scores associated with these tags to classify a query's sensitivity. When the queried columns have these tags and the associated classification rules in RAF or Data Security Framework (DSF) are enabled at the time of audit query processing, the query event will indicate the proper classification.
If there are no RAF tags applied, check if there are any DSF or Discovered tags applied. These tags are necessary for RAF tags to be applied.
If you see Discovered tags but no RAF or DSF, activate the frameworks.
If you do not see any Discovered, DSF, or RAF tags, run SDD.
Activate the frameworks
If you do not see any RAF tags, ensure the Data Security Framework and Risk Assessment Framework are active:
Navigate to the classification frameworks page.
Check the status of the Data Security Framework and the Risk Assessment Framework.
If the frameworks are inactive, activate them. Once activated, allow time for the frameworks to run on your data sources. Then, check the data source again for RAF and DSF tags.
Run sensitive data discovery
If both frameworks are activated, there are no RAF tags, and there are no Discovered tags, run SDD to apply Discovered tags.
This guide outlines best practices for managing user identities in Immuta with your identity manager, such as Active Directory and Okta.
Reusing information you have already captured today is a good practice. A lot of information about users is in your identity manager platform and can be used in Immuta for user onboarding and policies.
All users protected by Immuta must be registered in Immuta, even though people might not log in to Immuta.
SAML is commonly used as a single sign-on mechanism for users to log in to Immuta. This means you can use your organization's SSO, which complies with your security standards.
Every user that will be protected by Immuta needs to have a user on the platform to enforce policy, regardless of if they are logging in to Immuta. SCIM should be used to provision users from your identity manager platforms to Immuta automatically. The advantage here is that not all end-users need to log in to Immuta to create their accounts, and updates in your identity manager will be automatically reflected in Immuta, hence updating the access in your platforms.
Details on how to configure your individual identity manager's protocols can be found here:
There are several different combinations of supported protocol configurations, so consider those as you plan your user synchronization.
In Immuta, permissions control what actions a user is allowed to take through the API and UI. The different permissions can be found in the Immuta permissions guide.
We recommend using identity manager groups to manage permissions. When you configure the identity manager integration, you can enable group permissions. This allows you to control the permissions via identity manager groups and use the group assignment and approval process currently in place.
Intermingling your pre-existing roles in Databricks with Immuta can be confusing at first. Below outlines some best practices on how to think about roles in each platform.
Access to data, platform permissions, and the ability to use clusters and data warehouses are controlled in Databricks Unity Catalog with permissions to individual users or groups. Immuta can control those permissions to grant users permission to read data based on subscription policies.
This section discusses best practices for Databricks Unity Catalog permissions for end-users.
Users who consume data (directly in your Databricks workspace or through other applications) need permission to access objects. But permissions are also used to control write, Databricks clusters and warehouses, and other object types that can be registered in Databricks Unity Catalog.
To manage this at scale, Immuta recommends taking a 3-layer approach, where you separate the different permissions into different privileges:
Privileges for read access (Immuta managed)
Privileges for write access (optional, soon supported by Immuta)
Privileges for warehouse and clusters, internal billing
Read access is managed by Immuta. By using subscription policies, data access can be controlled to the table level. Attribute-based table GRANTS help you scale compared to RBAC, where access control is typically done on a schema or catalog level.
Since Immuta leverages native Databricks Unity Catalog GRANTs, you can combine Immuta’s grants with grants done manually in Databricks Unity Catalog. This means you can gradually migrate to an Immuta-protected Databricks workspace.
Write access is typically granted on a schema, catalog, or volume level. This makes it easy to manage in Databricks Unity Catalog through manual grants. We recommend creating groups that give INSERT
, UPDATE
, or DELETE
permissions to a specific schema or catalog and attach this group to a user. This attachment can be done manually or using your identity manager groups. (See the Databricks documentation for details.) Note that Immuta is working toward supporting write policies, so this will not need to be separately managed for long.
Warehouses and clusters are granted to users to give them access to computing resources. Since this is directly tied to Databricks’ consumption model, warehouses and clusters are typically linked to cost centers for (internal) billing purposes. Immuta recommends creating a group per team/domain/cost center, applying this group for cluster/warehouse privileges, and granting this group to users using identity manager groups.
Immuta has two types of service accounts to connect to Databricks:
Policy role: Immuta needs to use a service principal to be able to push policies to Databricks Unity Catalog and to pull audits to Immuta (optional). This principal needs USE CATALOG
and USE SCHEMA
on all catalogs and schemas, and SELECT
and MODIFY
on all tables in the metastore managed by Immuta.
Data ownership role: You will also need a user/principal for the data source registration. A service account/principal is recommended so that when the user moves or leaves the organization, Immuta still has the proper credentials to connect to Databricks Unity Catalog. You can follow one of the two best practices:
A central role for registration (recommended): It is recommended that you create a service role/user with SELECT
permissions for all objects in your metastore. Immuta can register all the tables and views from Databricks, populate the Immuta catalog, and scan the objects for sensitive data using Immuta Discover. Immuta will not apply policy directly by default, so no existing access will be impacted.
A service principal per domain (alternative): Alternatively, if you cannot create a service principal with SELECT
permissions for all objects, you can allow the different domains or teams in the organization to use a service user/principal scoped to their data. This is delegating metadata registration and aligns well with data mesh type use cases and means every team is responsible for registering their data sets in Immuta.