Managing Data Metadata
Prerequisites
Your schema metadata is registered using either of the Detect use cases:
Monitor and secure sensitive data platform query activity use case (Snowflake or Databricks)
General Immuta configuration use case (if not using Snowflake nor Databricks).
Enriching your data metadata with tags
You may have read the Automate data access control decisions use case already. If so, you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.
This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use tag lineage, described below), especially in an data ecosystem with constant change.
Understanding that, read the Managing data metadata guide.
However, there are some considerations specific to this use case with regard to data metadata.
Automated data tagging
Automated data tagging with Immuta sensitive data discovery (SDD) is recommended with this use case because it significantly reduces the manual effort involved in classifying and managing data. It ensures that data is consistently and accurately tagged, which is crucial for implementing effective data policies. Moreover, it allows for real-time tagging of data, ensuring that new or updated data is immediately protected by the appropriate policies. This is critical when all columns need to be considered for policy immediately, which is the case with this use case.
Schema monitoring enabled
While not directly related to data tagging, schema monitoring is a feature in Immuta that allows organizations to keep track of changes in their data environments. When enabled, Immuta actively monitors the data platform to find when new tables or columns are created or deleted. It then automatically registers or disables these tables in Immuta and updates the tags. This feature ensures that any global policies set in Immuta are applied to these newly created or updated data sources. It is assumed you are using schema monitoring when also using SDD. Without this feature, data will remain in data downtime while waiting for the policies to update.
Data tags are facts
As discussed above, with ABAC your data tags need to be facts. Otherwise, a human must be involved to bake access logic into your tags. As an example, it's easy for SDD to find an address, but it's harder for a human to decide if that address should be sensitive and who should have access to it - all defined with a single tag - for every column!
If you focus on tagging with facts, and use Immuta's frameworks to build higher level logic on those tags, then you are set up to build data policies in a scalable and error proof manner with limited data downtime.
Tag lineage
There is one way you can accomplish this use case using orchestrated RBAC: lineage. Immuta's lineage feature, currently in private preview and for Snowflake only, is able to propagate tags based on transform lineage. What this means is that columns that existed in tables for which the downstream table was derived, those upstream table's tags will get carried through to that new downstream table's columns. This is powerful because you can tag the root table(s) once - with your policy logic tags, as described in Managing data metadata - and they will carry through to all downstream tables' columns without any work. Be aware that tag lineage only works for tables; for views, the tags will not be propagated. However, policies on backing tables will be enforced on the views (which is why they are not propagated). Also be aware that if a table exists after some series of views, the tags will propagate to that table. As an example,
Last updated