Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This guide is for users who wish to understand their data estate and where there may be security gaps or non-compliant user query activity that needs to be addressed. It also contains details for configuring Immuta that must be accomplished before moving on to the Secure your data use cases.
This use case is tailored to quickly get you monitoring queries in your data platform and understanding where you may have security gaps using Immuta Discover and Immuta Detect. If you are not using Snowflake, instead move to the General Immuta configuration use case because filtering by tags and sensitivity in Immuta Detect is currently only available on Snowflake.
As part of this use case, you will learn special considerations and configurations for setting up Immuta for Immuta Detect. Upon completion, you will understand existing security gaps and it will help guide your Immuta Secure journey.
Follow these steps to configure Immuta and start using Detect:
Configure your users in Immuta, using the user identity best practices in order to review and summarize user activity and plan your first policy.
Read the native integration architecture overview and connect Immuta to your database. Consider the Snowflake roles best practices.
Register data sources in order to review and summarize data activity and plan your first policy.
Start Using Immuta Detect. To get the most out of it, consider populating sensitivity using Automate entity and sensitivity discovery (SDD) and then configure Detect with SDD.
Within the Immuta product ecosystem, Immuta Detect is responsible for surfacing and indexing a wide range of security-related events, making it a rich source of data security posture insights.
In a typical deployment, Immuta Detect efficiently surfaces and processes a vast number of data security events. While these events all have security relevance, it may be challenging to understand their potential impacts without manual investigation. At the same time, the sheer volume of events typically greatly exceeds what a team can manually explore.
Enter Immuta Discover: Immuta’s data discovery and security analysis engine can identify, categorize, and classify data. Immuta Discover analyzes data available within the operational context of an event in conjunction with applicable legal, regulatory, compliance, and security frameworks to make deep inferences about the status of the data. For example, in a medical context, Immuta Discover can understand the difference between anonymized and identified medical data.
With additional classification metadata powered by Immuta Discover, Immuta Detect analyzes data security events for sensitivity, ensuring that highly significant events remain highly visible. In the context of the previous example, Immuta Detect can detect and flag the accidental identification of anonymized medical data.
With Discover, Immuta Detect can provide insightful oversight of who accesses sensitive data, where it is stored, and how it is used, enabling
Rapid and exact compliance monitoring and assessment
Insights into data usage patterns for setting data access policy
Simplified and expedient audit responses
Context-aware analysis of data flows as seen through the lens of security or regulatory compliance frameworks
Immuta Discover works in three phases: identification, categorization, and classification.
In the first phase, data is identified by its kind – for example, a name or an age. This identification can be manually performed, externally provided, or automatically determined by Immuta Discover through column-level analysis. This is commonly termed entity identification.
In the second phase, data is categorized in the context where it appears, subject to any active data compliance or security frameworks. For example, a record occurring in a clinical context containing both a name and individual health entities is Protected Health Information under HIPAA.
Though entirely customizable, for this purpose, Immuta provides a default framework known as the Immuta Data Security Framework (Immuta DSF.) Immuta DSF gives a fine-grained categorization into a consistent set of security and compliance concepts, including things like whether or not a record pertains to an individual, the composition and kinds of any identifiers that are present, the subject matter of the data, whether it belongs to any commonly controlled data categories, etc. The rules of the framework use the entities found in the table in phase 1 to drive how the data is categorized.
The categorization provided by the Immuta DSF may be used directly. Still, it is best leveraged as a starting point for purpose-built compliance frameworks implementing organization-specific compliance categories or other relevant high-level regulatory and compliance frameworks, such as those for categorizing data into categories defined under CCPA, GDPR, GLBA, HIPAA, etc.
Bottom line, think of categorization as a way to apply higher level categories to the fine-grained entities discovered in phase 1 through rules you can customize. These categories are presented as tags in Immuta, just like the entities in phase 1, and thus, can be used for Immuta Secure policies.
In the third and final phase, data is classified according to its sensitivity level (e.g., Customer Financial Data is Highly Sensitive). Again, Immuta supplies sensitivity classification defaults based on standard industry practice. Just like how categories are built from phase 1 entities, classification builds on the phase 2 categories. Customers are free to customize this classification under their respective views. These classifications are key to surfacing sensitive queries in Detect based on your definition of sensitive.
There are good reasons to automate data discovery and analysis with Immuta Discover:
It formalizes the entire process, producing a coherent set of classification rules.
It makes it possible to automatically and uniformly scale compliance to new data sources.
It enables Immuta Detect to automatically detect additional threats, such as unauthorized or attempted access to sensitive data, and for soft enforcement of organizational data access policies. (For example, that access to personal information, direct identifiers, or login credentials be masked.)
Speed. Automating data discovery and analysis with Immuta Discover enables faster access to data by removing the manual effort of tagging and classifying new tables and columns.
Please be aware that
Some customization may be necessary. Although Immuta's sensitive data discovery discovers over 60 types of sensitive data, only some data elements may be relevant. Further, unique sensitive data elements may not be covered out of the box. In these cases, it is possible to create new sensitive data discovery rules or adjust the sensitivity of existing rules through testing to ensure data is properly discovered and tagged.
New global templates should be created to find only entities that are relevant to the organization. This will ensure extraneous tags are not added to data elements.
Some customers may already have an existing data catalog tagging data; Immuta’s sensitive data discovery can work in combination with the data catalog.
Because data environments are not static, it is imperative that data tagging is automatically performed with new or changed data so that policies can be enabled in real-time, lowering the risk of data leaks.
Many organizations have invested in an enterprise data catalog as part of their data governance programs. Entity tags from the data catalogs will be pulled into Immuta in a one-way sync because the catalog is the system of record for entity tags. The tags pulled in from the data catalog can later be mapped to categories in the same way that entities automatically discovered in phase 1 are mapped to categories. This in turn will associate the appropriate sensitivity via classification to the external tags.
For a concrete example, consider a scenario where the Collibra catalog has tags for Longitude
and Latitude
. The following rule assigns Immuta DSF.Longitude
to any column tagged Collibra.Location.Longitude
. These rules appear in the rules array in the framework definition.
Incorporating tags from external catalog rules is fairly straightforward. External tags are referenced in rules, except the source field identifies the external catalog. The source field generally varies depending on the external catalog system. The correct value for the field may be identified by examining tag objects listed with the tags API, which includes the source field.
Immuta is not just a location to define your policy logic; Immuta also enforces that logic in your data platform. How that occurs varies based on each data platform, but the overall architecture remains consistent and follows the NIST Zero Trust framework. The below diagram describes the recommended architecture from NIST:
Immuta lives in the middle control plane. To do this, Immuta knows details about the subjects and enterprise resources, acts as the policy decision point through policies administered by policy administrators, and makes real-time policy decisions using the internal Immuta policy engine.
Lastly, and of importance to how Immuta Secure functions, Immuta also enables the policy enforcement point by administering the policies natively in your data platform in a way that can react to policy changes and live queries.
To use Immuta, you must configure the Immuta native integration, which will require some level of privileged access to administer policies in your data platform, depending on your data platform and how the Immuta integration works. Please refer to Snowflake roles best practices for Snowflake before configuring the native integration.
Intermingling your pre-existing roles in Snowflake with Immuta can be confusing at first. This guide outlines some best practices on how to think about roles in each platform.
Roles play a crucial role in Snowflake by organizing and controlling access to data, platform permissions, and data warehouses. Immuta also leverages Snowflake roles to grant users permission to read data based on subscription policies.
Users who consume data (directly in Snowflake or through other applications) need roles to access objects. But roles are also used to control write, Snowflake warehouses, and Snowflake permissions through system-defined roles.
To manage this at scale, Immuta recommends taking a 4-layer approach, where you separate the different permissions into different roles:
Roles for read access (Immuta managed)
Roles for write access (optional, soon supported by Immuta)
Roles for warehouse, internal billing
Roles for Snowflake permissions (optional)
Since Immuta leverages Snowflake roles, you can still use existing roles in Snowflake. This means you can gradually migrate to an Immuta-protected Snowflake.
Warehouses are granted to users to give them access to computing resources. Since this is directly tied to Snowflake’s consumption model, warehouses are typically linked to cost centers for (internal) billing purposes. Immuta recommends creating a role/warehouse per team/domain/cost center and granting this warehouse role to users using identity manager groups.
Snowflake permissions are granted through system-defined roles like ACCOUNTADMIN
or SECURITYADMIN
. These are high-privilege roles that are only granted to administrators. This can be done manually or using AD groups.
Snowflake allows users to select a specific role, but you can also use all roles simultaneously. Immuta recommends using all roles since that helps to separate the different roles.
Alternatively, you could create personal roles and grant the warehouse-role/immuta-read-role and possibly the snowflake-permission-role and write-role to this.
Immuta has two types of service accounts to connect to Snowflake:
Data ownership role: This role is used to register data sources. A service account/role is recommended so that when the user moves or leaves the organization, Immuta will still have the proper credentials to connect to Snowflake. You can follow one of the two best practices:
A central role for registration (recommended): It is recommended that you create a service role/user with USAGE
permissions for all objects in Snowflake. This allows Immuta to register all the objects from Snowflake, populate the Immuta catalog, and scan the objects for sensitive data using Immuta Discover. Immuta will not apply policy directly by default, so no existing access will be impacted.
In order to take advantage of all the capabilities of Immuta, you must make Immuta aware of your data metadata. This is done by with Immuta as data sources. It’s important to remember that Immuta is not reading your actual data at all; it is simply discovering your information schemas and pulling that information back as the foundation for everything else.
This section offers the best practices when onboarding data sources into Immuta.
If you have an external data catalog, like Collibra or Alation, first; then register your data in Immuta. This process will automatically tag your data with the external catalog tags as you register it.
Find more on this topic in the guide.
Use Immuta's setting to onboard metadata without affecting your users' access. This means you onboard all metadata in Immuta without any impact on current accesses which gives you time to fully convert your operations to Immuta without causing unnecessary data downtime. Immuta will only take control when the first policies are applied. Because of this, register all tables.
While it can be tempting to start small and register only the pieces of data that you intend to protect, you must remember that Immuta is not just about access control. It’s important to register your data metadata so that Immuta can also track activity and understand where that sensitive data lies (with Immuta Detect). In other words, Immuta can’t tell you where you have problems unless you first tell it to look at your metadata.
Without the no default subscription policy, Immuta will set each data source's subscription policy to the most restrictive option which automatically locks data down during onboarding. To unlock the data and give your users access again, new subscription policies must be set.
If you are delegating the registration and control of data, then please read our use case for more information.
Use the to register a schema; then use schema monitoring to find new data sources and automatically register them.
One of the greatest benefits of a modern data platform is that you can manage all your data transformations at the data tier. This means that data is constantly changing in the data platform, which may result in the need for access control changes as well. This is why it is critical that you when registering metadata with Immuta. This will allow Immuta to constantly monitor and update for these changes.
It’s also important to understand that many data engineering tools make changes by destructively recreating tables and views, which results in all policies being dropped in the data platform. This is actually a good thing, because this gives to update the access as the changes are found (policy uptime) while the only user that can see the data being recreated is the creator of that change (data downtime for all other users). This is why schema monitoring and column detection are so critical.
Read access is managed by Immuta. By using , data access can be controlled to the table level. help you scale compared to RBAC, where access control is typically done on a schema or database level.
Write access is typically granted on a schema or database. This makes it easy to manage in Snowflake through manual grants. We recommend creating roles that give insert, update, and delete permissions to a specific schema or database and attach this role to a user. This attachment can be done manually or using your identity manager groups. (See the for details.) Note that Immuta is working towards supporting write policies, so this will not need to be separately managed for long.
This feature is called ‘’ and can be enabled using the following command in Snowflake: USE SECONDARY ROLES ALL
Policy role: This role gives Immuta the power to create and apply policy. Immuta can , or you can to create the policy role.
A role per team/domain (alternative): Alternatively, if you cannot create a role with USAGE
permissions for all objects, you can allow the different domains or teams in the organization to use a service user/role scoped to their data to register data sources. This is delegating metadata registration and aligns well with type use cases and means every team is responsible for registering their data sets in Immuta.
This guide outlines best practices for managing user identities in Immuta with your identity manager, such as Active Directory and Okta.
Reusing information you have already captured today is a good practice. A lot of information about users is in your identity manager platform and can be used in Immuta for user onboarding and policies.
All users protected by Immuta must be registered in Immuta, even though people might not log in to Immuta.
SAML is commonly used as a single sign-on mechanism for users to log in to Immuta. This means you can use your organization's SSO, which complies with your security standards.
Every user that will be protected by Immuta needs to have a user on the platform to enforce policy, regardless of if they are logging in to Immuta. SCIM should be used to provision users from your identity manager platforms to Immuta automatically. The advantage here is that not all end-users need to log in to Immuta to create their accounts, and updates in your identity manager will be automatically reflected in Immuta, hence updating the access in your platforms.
Details on how to configure your individual identity manager's protocols can be found here:
There are several different combinations of supported protocol configurations, so consider those as you plan your user synchronization.
In Immuta, permissions control what actions a user is allowed to take through the API and UI. The different permissions can be found in the Immuta permissions guide.
We recommend using identity manager groups to manage permissions. When you configure the identity manager integration, you can enable group permissions. This allows you to control the permissions via identity manager groups and use the group assignment and approval process currently in place.
Requirement:
Snowflake Enterprise Edition or higher
Native SDD and classification frameworks enabled in Immuta. If you do not know if they are enabled, collaborate with your Immuta representative to turn on and in your Immuta tenant.
:
Users and Data Sources have been registered in Immuta:
Snowflake tables registered as Immuta data sources
Snowflake users registered in Immuta
Currently, Detect only supports filtering by tag and showing sensitivity of audit records for Snowflake.
This onboarding process is recommended for organizations that have not tagged any sensitive data yet. Immuta will identify, classify, and tag your data. After you are fully onboarded, you will see Detect dashboards with information on your organization's data use and data sensitivity, and the Discover data inventory dashboard will show details about the data that was scanned.
After you are happy with the Detect dashboards on the select data sources you enabled, you can integrate Detect with more of your data environment.
: SDD will sample and tag your data based on the sensitive data detected. These tags are necessary for the classification framework tags in step 2 to be applied.
: Once you activate the , they will tag your data with classification tags. contain the metadata required to assign your data sensitivity levels.
: After SDD and classification frameworks have been enabled and run, it may be necessary to adjust the output tags based on your organization's data, security, and compliance needs.
: Grant the appropriate users the AUDIT
permission to view Immuta Detect dashboards.
: Once all tags are correctly applied, the Detect dashboards will reflect accurate audit information. Navigate through Immuta Detect and explore the dashboards that visualize the sensitive data in your data environment.
: If you already had SDD enabled before starting Detect onboarding, skip this step. Once you are satisfied with the SDD tags and classification tags applied to your selected data sources, and the classification tags look correct, you should enable SDD for all data sources. This will add entity and classification tags to the rest of the data sources within your environment. You can choose to run SDD on all data sources, or run another payload with just a select few to gradually onboard the rest of your tables.
Immuta Detect provides value from the moment the dashboards are visible, which can be enabled for organizations with Snowflake, Databricks Spark, and Databricks Unity Catalog integrations. Currently, organizations with Snowflake integrations can get even more value with data sensitivity and tagging. To determine and surface the sensitivity of your data access, enable and tune classification.
Completing all the steps below will fully onboard you with Detect and Discover:
Prerequisites:
The onboarding process assumes that these prerequisites have already been set up, but here are the Immuta features and configuration required to enable Detect. Each integration can be used alone or a Snowflake integration can be used with either Databricks Spark or Databricks Unity Catalog. Databricks Spark and Databricks Unity Catalog are not supported together with Detect:
For Snowflake integrations:
Native query audit enabled: This feature can be enabled when first configuring the integration or when editing the integration.
(Recommended) Table grants enabled: While not required, it is recommended to enable this feature to properly audit unauthorized query events. Without it, unauthorized events will still show as successful. Project workspaces cannot be used with table grants, so if your organization relies on them, leave this feature disabled.
Benefits and limitations of enabling table grants
With table grants enabled:
Unauthorized query events will be audited and present in the Detect dashboards.
Table grants will manage the privileges in Snowflake for Immuta tables, making it more efficient than without.
Without table grants:
Unauthorized events are unavailable because users will have successful queries of zero rows, even if they do not have access to the table.
You can use project workspaces. Table grants is not compatible with project workspaces. If your organization depends on that capability, table grants is not recommended.
Snowflake tables and users registered in Immuta: Detect only audits events by users registered in Immuta on tables registered in Immuta. If you do not register the tables and users, their actions will not appear in the audit records or on the Detect dashboards.
For Databricks Spark integrations:
For Databricks Unity Catalog integrations:
Databricks Unity Catalog integration with native query audit enabled Note that it is enabled by default when configuring the integration.
Recommended:
This setting is not required for Detect, but can be used for better functionality:
No subscription policy by default: This feature sets the subscription policy of all new data sources to none when they are registered. Using this feature, allows for organizations to register all Snowflake tables in Immuta. Their audit information will appear in the Detect dashboards, but users' access to them will not be impacted by Immuta until a subscription policy is set.
Requirement:
Immuta permission USER_ADMIN
Actions:
Grant users the AUDIT
permission to see the Detect dashboards.
Navigate through Immuta Detect and explore the dashboards that visualize user and query audit information for your data environment.
These actions will result in users seeing the Detect dashboards containing information on the audit events in your data environment. These dashboards will not contain any information on the sensitivity of your data.
To see sensitivity information using a Snowflake integration, proceed with the steps below.
Only available with Snowflake integrations: Discover classification is supported with Databricks and Snowflake integrations; however, the sensitivity can only be visualized in Detect dashboards with Snowflake integrations.
There are two options to tag data and activate classification frameworks to determine the sensitivity of your data:
(Recommended) Use Immuta sensitive data discovery (SDD) to automatically categorize and tag your data: This option is the smoothest onboarding experience because it is the most automated process. You will not need to manually tag your data, and the framework to determine sensitivity is already set to use the SDD tags.
Use your organization's external tags: This option requires more manual configuration, but is best for organizations that have already configured tags for their tables. Please contact your Immuta representative for guidance.
After completing either of the tutorials above, data sources are tagged with entity tags and classification tags. Once users start querying data, and after the data latency with Snowflake, the Detect dashboards will show audit information with sensitivity information and the Discover data inventory dashboard will show details about the data that was scanned.
If you notice some sensitivity types are not appearing as you expect, proceed with the step below.
Only available with Snowflake integrations: Discover classification is supported with Databricks and Snowflake integrations; however, the sensitivity can only be visualized in Detect dashboards with Snowflake integrations.
Requirement:
Immuta permissions AUDIT
and GOVERNANCE
Actions:
After Discover has run SDD and the classification frameworks, it may be necessary to adjust the resulting tags based on your organization's data, security, and compliance needs:
After completing the tutorials above, all data appears as the appropriate sensitivity type on the Detect dashboards with Snowflake data sources.
Detect activity pages will have active charts when configured correctly with supported integrations after audit logs have been ingested. The user viewing must have the Immuta AUDIT
permission.
Detect supports the following integration for activity pages with dynamic query sensitivity that will determine and visualize the sensitivity of user queries:
Snowflake with native query audit enabled
Detect supports the following integrations for activity pages, but will not visualize any sensitivity:
Databricks Spark
Databricks Unity Catalog with native query audit enabled
See the prerequisites for more information on the required configuration for each integration.
Query events sensitivity is determined by the tags with sensitivity metadata on the columns queried from Snowflake data sources. Immuta comes with a built-in framework with sensitivity tags, the Risk Assessment Framework. Ensure you have completed the configuration steps for onboarding Detect with Discover.
Check your data source tags
If you have completed the above steps and still see query events as "Indeterminate" or "Nonsensitive", check that the right tags were applied in the data dictionary:
Navigate to the data source dictionary page.
Confirm one of the following tags is applied to one of the queried data columns:
RAF.Confidentiality.Very High
RAF.Confidentiality.High
RAF.Confidentiality.Medium
Detect uses the sensitivity scores associated with these tags to classify a query's sensitivity. When the queried columns have these tags and the associated classification rules in RAF or Data Security Framework (DSF) are enabled at the time of audit query processing, the query event will indicate the proper classification.
If there are no RAF tags applied, check if there are any DSF or Discovered tags applied. These tags are necessary for RAF tags to be applied.
If you see Discovered tags but no RAF or DSF, activate the frameworks.
If you do not see any Discovered, DSF, or RAF tags, run SDD.
Activate the frameworks
If you do not see any RAF tags, ensure the Data Security Framework and Risk Assessment Framework are active:
Navigate to the classification frameworks page.
Check the status of the Data Security Framework and the Risk Assessment Framework.
If the frameworks are inactive, activate them. Once activated, allow time for the frameworks to run on your data sources. Then, check the data source again for RAF and DSF tags.
Run sensitive data discovery
If both frameworks are activated, there are no RAF tags, and there are no Discovered tags, run SDD to apply Discovered tags.