Policy Enforcement in the SQL Access Pattern
Audience: Data Owners and Data Users
Content Summary: This page explains the major policy types in Immuta and how they can affect the output of SQL queries.
If a data source has policies, those policies are applied on a per-user basis when users run queries with their Immuta SQL credentials. Users with different attributes will often receive different results from running the same query against that data source.
For a thorough explanation of data source policies, see Policies Explained.
Masking
If a column in a data source contains sensitive information, the data in that column may be completely or partially masked or not masked at all depending on users' attributes. When SQL query results are returned to users, data in a column with a masking policy applied may
- be replaced with a constant, hash, or null value;
- contain a number that has been rounded up or down from its original value;
- be partially replaced if it contains a match to a regular expression policy; or
- display the real data, but only if the user possesses the attributes specified in the policy.
Row Redaction
Some rows in a data source may not be visible in queries if the data source has row-level policies and the querying user does not possess the necessary attributes to view that row.
Example: A data source with a column titled group
that contains the values A, B
or C. The row-level security policy for the data source establishes that a user will only be able to
see any given row if the value of the group
column in that row matches the name of an Immuta group that
user belongs to. The data source table (named example
) has 30 rows, and the group
column contains 10 A's,
10 B's, and 10 C's.
Users 1, 2, and 3 run the query SELECT * FROM example
using their Immuta SQL credentials:
- User 1 belongs to groups A, B, and C. The query returns all 30 rows.
- User 2 belongs only to group A. The query returns only the 10 rows with the value 'A' in the
group
column. - User 3 does not belong to any of the relevant groups. The query will fail with the following error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.
Minimization
Data sources that contain a minimization policy can limit the amount of data a user can run queries against based on specified percentages and the user's attributes.
Example: A data source contains 1000 rows, and a minimization policy restricts users without the
'Percentage:All' attribute to 30% of the data. Based on a unique id
column, queries run by
- Users who possess the specified attribute will return a maximum of 1000 rows.
- Users without the attribute will only return data from a maximum of 300 rows.
Time-based Restrictions
If a data source has a time-based restriction policy, the amount of data a user can run queries against may
be limited based on the row's eventTime
and the user's attributes.
Example: A data source is configured to only return rows that occurred in the last 100 days if the user is not a operating under the 'Auditor' purpose:
- Uses who meet this condition can run queries against all of the rows in the data, regardless of their
eventTime
. - Users who do not meet this condition can only run queries against rows whose
eventTime
occurred in the last 100 days (anything older will not be queried).
Purpose-based Restrictions
Another way of restricting data access is by utilizing purpose-based restrictions, which restrict usage of the data source to one or more purposes. If a user wishes to run SQL queries against a purpose-restricted data source, they must use the SQL credentials provided by a project containing that purpose. These users will have access to all of the data in the data source. Users without appropriate credentials will have access to none.
Example: A user is subscribed to a data source called example
that is restricted by the Research
purpose.
The user is also a member of the project Research Project
that contains the Research
purpose and the example
data source. The user runs SELECT * FROM example
using the following sets of credentials:
- The user runs the query with the credentials provided by
Research Project
. The query runs successfully against all of the data. - The user runs the query with the Immuta SQL credentials from their Profile page. The query returns an error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.
- The user runs the query with the credentials provided by a different project containing the same data source but not the same purpose. The query returns an error: You do not have access to any policies for the data in the table, 'example'. Please contact the data source administrator for more information.
Differential Privacy
Data sources that utilize a differentially private policy can restrict users who do not meet the policy conditions
to only run a certain type of SQL query, called aggregates. Examples of aggregates include MIN
, MAX
, AVG
,
COUNT
, and SUM
. These users must be careful to avoid aggregate queries that are too specific. Immuta will only
return differentially private results for broad aggregate queries. Differentially private results are also cached
so that security cannot be circumvented by running the same query multiple times or by multiple users.
Example 1: User A is subscribed to a differentially private data source called people
. This user does not
meet the policy conditions. The user is most interested in the firstname
and lastname
columns of the data source,
and runs the following 3 queries:
- The user runs
SELECT * FROM people
. The query returns an error: Invalid differentially private query. Non aggregate included in SELECT clause. - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John
. The query returns a count (with added noise) of the number of people with the first name 'John'. - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'
. There are too few people named 'John Doe' in thepeople
table for Immuta to return a differentially private result. The query returns an error: Your query is too specific. This data source is limited to broad / general aggregate queries.
Example 2: Another user, User B, is subscribed to the same data source. This user also does not meet the policy conditions. When they run the same queries:
- The user runs
SELECT * FROM people
. The query returns an error: Invalid differentially private query. Non aggregate included in SELECT clause. - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John
. The query returns the same count as User A. - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'
. There are too few people named 'John Doe' in thepeople
table for Immuta to return a differentially private result. The query returns an error: Your query is too specific. This data source is limited to broad / general aggregate queries.
Example 3: User C is subscribed to the same Data source, but User C does meet the policy conditions. When they run the same queries:
- The user runs
SELECT * FROM people
. The query returns real data matching the query (no noise is added). - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John
. The query returns the real count of people with the first name 'John' (no noise is added). - The user runs
SELECT COUNT(*) FROM people WHERE firstname='John' AND lastname='Doe'
. The query returns the real count of people with the first name 'John' and last name 'Doe' (no noise is added).
For more detailed information on differential privacy, see Differential Privacy.