Dataiku access pattern examples on Immuta-Snowflake integration protected datasets

The purpose of this article is to show few examples of how Dataiku can access Immuta-protected data sources in Snowflake. There are a few integration patterns, each with its own caveats, and it is highly recommended to collaborate with the Dataiku team to ensure the right approach is adopted.

Example Approach 1:

One group and one connection per Dataiku sandbox project for all users with differing data clearance

Read-Only:

Read-Write:

When users are part of the same Dataiku group, they will be able to view each other's data in Dataiku projects. Given this dataleak challenge, the next approach is created to guardrail the Dataiku write schema based on user persona and their corresponding data clearance.

Example Approach 2:

Three groups and connections per Dataiku sandbox project, one group and connection per user persona data clearance.

Read-Only:

Read-Write:

Notes:

When a user uses Dataiku connection they parse their snowflake user to retrieve data based on their corresponding snowflake ID & the role in connection.
Users with same clearance level operate within the same Dataiku group / project which eliminates dataleak issues within Dataiku
Separate connection per group - One Snowflake role for read and write per group
Separate groups per persona in Dataiku per usecase
Costly - 3x Growth in Dataiku groups & connections, snowflake roles, snowflake compute needs, and potential Write-schema data.

Example Approach 3:

One group and one connection per Dataiku sandbox project for all users with differing data clearances.

Read-Only:

Read-Write:

Notes:

Ingest Dataiku created snowflake tables in Immuta as soon as they are created through Dataiku. SDD must be enabled in Immuta and the table, columns tags applied. Users in Dataiku can read-write the data based on Immuta Data & Subscription policies.
True ABAC
User entitlement to specific data can be elevated or reduced based on business needs
Data downtime -
1. Velocity & Volume of data change could be high.
2. "New" column policy will require Immuta admins to validate new columns created
Caching can be disabled in Dataiku to avoid Data-leak when users sample the data post a workflow is executed.