Dataiku access pattern examples on Immuta-Snowflake integration protected datasets
Last updated
Last updated
The purpose of this article is to show few examples of how Dataiku can access Immuta-protected data sources in Snowflake. There are a few integration patterns, each with its own caveats, and it is highly recommended to collaborate with the Dataiku team to ensure the right approach is adopted.
One group and one connection per Dataiku sandbox project for all users with differing data clearance
Read-Only:
Read-Write:
When users are part of the same Dataiku group, they will be able to view each other's data in Dataiku projects. Given this dataleak challenge, the next approach is created to guardrail the Dataiku write schema based on user persona and their corresponding data clearance.
Three groups and connections per Dataiku sandbox project, one group and connection per user persona data clearance.
Read-Only:
Read-Write:
When a user uses Dataiku connection they parse their snowflake user to retrieve data based on their corresponding snowflake ID & the role in connection.
Users with same clearance level operate within the same Dataiku group / project which eliminates dataleak issues within Dataiku
Separate connection per group - One Snowflake role for read and write per group
Separate groups per persona in Dataiku per usecase
Costly - 3x Growth in Dataiku groups & connections, snowflake roles, snowflake compute needs, and potential Write-schema data.
One group and one connection per Dataiku sandbox project for all users with differing data clearances.
Read-Only:
Read-Write:
Ingest Dataiku created snowflake tables in Immuta as soon as they are created through Dataiku. SDD must be enabled in Immuta and the table, columns tags applied. Users in Dataiku can read-write the data based on Immuta Data & Subscription policies.
True ABAC
User entitlement to specific data can be elevated or reduced based on business needs
Data downtime -
Velocity & Volume of data change could be high.
"New" column policy will require Immuta admins to validate new columns created
Caching can be disabled in Dataiku to avoid Data-leak when users sample the data post a workflow is executed.
Again, it is recommended to collaborate with Dataiku team to ensure the right direction is being adopted.
Remap - since there will be individual connection string per group, if a workflow will be shared by one group with another then the bundle needs to be created with remapping, as mentioned in the doc at - Data Connections OR
Share the project / workflow to bundle it and parse the connection as a variable or the target groups connection string. More at - Snowflake — Dataiku DSS 13 documentation