Skip to content

You are viewing documentation for Immuta version 2.8.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Policy Enforcement in the SQL Access Pattern

Audience: Data Owners and Data Users

Content Summary: This page explains the major policy types in Immuta and how they can affect the output of SQL queries.

If a data source has policies, those policies are applied on a per-user basis when users run queries with their Immuta SQL credentials. Users with different attributes will often receive different results from running the same query against that data source.

For a thorough explanation of data source policies, see Policies Explained.

Masking

If a column in a data source contains sensitive information, the data in that column may be completely or partially masked or not masked at all depending on users' attributes. When SQL query results are returned to users, data in a column with a masking policy applied may

  • be replaced with a constant, hash, or null value;
  • contain a number that has been rounded up or down from its original value;
  • be partially replaced if it contains a match to a regular expression policy; or
  • display the real data, but only if the user possesses the attributes specified in the policy.

Row Redaction

Some rows in a data source may not be visible in queries if the data source has row-level policies and the querying user does not possess the necessary attributes to view that row.

Example: A data source with a column titled group that contains the values A, B or C. The row-level security policy for the data source establishes that a user will only be able to see any given row if the value of the group column in that row matches the name of an Immuta group that user belongs to. The data source table (named example) has 30 rows, and the group column contains 10 A's, 10 B's, and 10 C's.

Users 1, 2, and 3 run the query SELECT * FROM example using their Immuta SQL credentials:

  • User 1 belongs to groups A, B, and C. The query returns all 30 rows.
  • User 2 belongs only to group A. The query returns only the 10 rows with the value 'A' in the group column.
  • User 3 does not belong to any of the relevant groups. The query will fail with the following error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.

Minimization

Data sources that contain a minimization policy can limit the amount of data a user can run queries against based on specified percentages and the user's attributes.

Example: A data source contains 1000 rows, and a minimization policy restricts users without the 'Percentage:All' attribute to 30% of the data. Based on a unique id column, queries run by

  • Users who possess the specified attribute will return a maximum of 1000 rows.
  • Users without the attribute will only return data from a maximum of 300 rows.

Time-based Restrictions

If a data source has a time-based restriction policy, the amount of data a user can run queries against may be limited based on the row's eventTime and the user's attributes.

Example: A data source is configured to only return rows that occurred in the last 100 days if the user is not a operating under the 'Auditor' purpose:

  • Uses who meet this condition can run queries against all of the rows in the data, regardless of their eventTime.
  • Users who do not meet this condition can only run queries against rows whose eventTime occurred in the last 100 days (anything older will not be queried).

Purpose-based Restrictions

Another way of restricting data access is by utilizing purpose-based restrictions, which restrict usage of the data source to one or more purposes. If a user wishes to run SQL queries against a purpose-restricted data source, they must use the SQL credentials provided by a project containing that purpose. These users will have access to all of the data in the data source. Users without appropriate credentials will have access to none.

Example: A user is subscribed to a data source called example that is restricted by the Research purpose. The user is also a member of the project Research Project that contains the Research purpose and the example data source. The user runs SELECT * FROM example using the following sets of credentials:

  • The user runs the query with the credentials provided by Research Project. The query runs successfully against all of the data.
  • The user runs the query with the Immuta SQL credentials from their Profile page. The query returns an error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.
  • The user runs the query with the credentials provided by a different project containing the same data source but not the same purpose. The query returns an error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.

Differential Privacy

Data sources that utilize a differentially private policy can restrict users who do not meet the policy conditions to only run a certain type of SQL query, called aggregates. Examples of aggregates include MIN, MAX, AVG, COUNT, and SUM. These users must be careful to avoid aggregate queries that are too specific. Immuta will only return differentially private results for broad aggregate queries. Differentially private results are also cached so that security cannot be circumvented by running the same query multiple times or by multiple users.

Example 1: User A is subscribed to a differentially private data source called people. This user does not meet the policy conditions. The user is most interested in the firstname and lastname columns of the data source, and runs the following 3 queries:

  • The user runs SELECT * FROM people. The query returns an error: Invalid differentially private query. Non aggregate included in SELECT clause.
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John. The query returns a count (with added noise) of the number of people with the first name 'John'.
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'. There are too few people named 'John Doe' in the people table for Immuta to return a differentially private result. The query returns an error: Your query is too specific. This data source is limited to broad / general aggregate queries.

Example 2: Another user, User B, is subscribed to the same data source. This user also does not meet the policy conditions. When they run the same queries:

  • The user runs SELECT * FROM people. The query returns an error: Invalid differentially private query. Non aggregate included in SELECT clause.
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John. The query returns the same count as User A.
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'. There are too few people named 'John Doe' in the people table for Immuta to return a differentially private result. The query returns an error: Your query is too specific. This data source is limited to broad / general aggregate queries.

Example 3: User C is subscribed to the same Data source, but User C does meet the policy conditions. When they run the same queries:

  • The user runs SELECT * FROM people. The query returns real data matching the query (no noise is added).
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John. The query returns the real count of people with the first name 'John' (no noise is added).
  • The user runs SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'. The query returns the real count of people with the first name 'John' and last name 'Doe' (no noise is added).

For more detailed information on differential privacy, see Differential Privacy.