Register Data Sources
In order to take advantage of all the capabilities of Immuta, you must make Immuta aware of your data metadata. This is done by registering your data with Immuta as data sources. It’s important to remember that Immuta is not reading your actual data at all; it is simply discovering your information schemas and pulling that information back as the foundation for everything else.
This section offers the best practices when onboarding data sources into Immuta.
Configure your catalog
If you have an external data catalog, like Collibra or Alation, configure the catalog integration first; then register your data in Immuta. This process will automatically tag your data with the external catalog tags as you register it.
Register as much as possible
Use Immuta's no default subscription policy setting to onboard metadata without affecting your users' access. This means you onboard all metadata in Immuta without any impact on current accesses which gives you time to fully convert your operations to Immuta without causing unnecessary data downtime. Immuta will only take control when the first policies are applied. Because of this, register all tables.
While it can be tempting to start small and register only the pieces of data that you intend to protect, you must remember that Immuta is not just about access control. It’s important to register your data metadata so that Immuta can also track activity and understand where that sensitive data lies (with Immuta Detect). In other words, Immuta can’t tell you where you have problems unless you first tell it to look at your metadata.
Without the no default subscription policy, Immuta will set each data source's subscription policy to the most restrictive option which automatically locks data down during onboarding. To unlock the data and give your users access again, new subscription policies must be set.
If you are delegating the registration and control of data, then please read our Data mesh use case for more information.
Let Immuta monitor for change
/api/v2/data endpoint to register a schema; then use schema monitoring to find new data sources and automatically register them.
One of the greatest benefits of a modern data platform is that you can manage all your data transformations at the data tier. This means that data is constantly changing in the data platform, which may result in the need for access control changes as well. This is why it is critical that you enable schema monitoring and column detection when registering metadata with Immuta. This will allow Immuta to constantly monitor and update for these changes.
It’s also important to understand that many data engineering tools make changes by destructively recreating tables and views, which results in all policies being dropped in the data platform. This is actually a good thing, because this gives Immuta a chance to update the access as the changes are found (policy uptime) while the only user that can see the data being recreated is the creator of that change (data downtime for all other users). This is why schema monitoring and column detection are so critical.
Skip data source stats
Stats on your data sources are used for specific policy scenarios (e.g., masking with format preserving masking, k-anonymization, or randomized response). If your organization doesn't need these policies, stats are an unnecessary feature that will negatively impact performance. Use the
Skip Stats Job tag when registering data sources to skip the stats job.
Snowflake data sources automatically skip the stats job upon registration, without the
Skip Stats Job tag, as long as there are no active policies requiring them.