Skip to content

Managing Write Space with Sentry Object Ownership

Audience: System Administrators

Content Summary: This document outlines how users can share derivative data sources they create from upstream Immuta-protected data sources.

Sharing Derived Data Sources Workflow

  1. The organization protects table x in Immuta with some masking and row redaction policies (whatever policy it may be).
  2. A user is subscribed to table x through Immuta.
  3. Once subscribed, the user runs a SparkSQL job (using immuta-pyspark, for example) and the output of that job will be written to their personal scratch space in HDFS (configuration of HDFS directory ACLs described here).
  4. The user can then log in to Impala or Hive and ensure they are using the scratch space database to create an external table pointing to the data they wrote in step 3. That newly created table will only ever be visible to that user due to Sentry Object Ownership being enabled. (Instructions for enabling the scratch space database are below.) Internal Hive tables should not be used; data written by Immuta users should be stored in the secure HDFS Write Space and not in the Hive Warehouse.
  5. To share that table more broadly, the user can log in to Immuta, and, using the CREATE_DATA_SOURCE_IN_PROJECT permission, expose that derivative data source in the context of a project, and the policies from the parent tables will be automatically inherited, eliminating the need for the user to know what policy to enforce. To expose a data source within a project, the project must be equalized. This feature also captures data source lineage. Here is a short video on that feature.

    • It's recommended that you reserve the CREATE_DATA_SOURCE permission for users who know what policies to enforce on the data and that you can impersonate the users writing to the scratch write space, in case you want to expose their derived data sources outside of a project.
    • Immuta's Summer 2019 release will also manage HDFS and database creation and configuration per project, allowing project-specific workspaces in HDFS and Hive/Impala to be created rather than manual creation of user-specific scratch spaces, which will limit the user to writing to the scratch space to only when they are acting under the project, guaranteeing the output landing in those spaces could have only been derived from the data sources in the project.

CDH Configurations

Sentry Object Ownership, introduced in CDH 5.16, can be used to create a safe “write” or “scratch” space for Immuta users in Hive and Impala. The write space is one database in Hive and Impala that Immuta users have been granted the CREATE privilege in via Sentry.

Object ownership designates an owner for a database, table, or view in Sentry. When object ownership is enabled, the OWNER privilege is automatically granted to the user that creates the object. This relieves the database administrator from the task of explicitly assigning permissions to the user.

Follow the steps below to configure an Immuta scratch space with Sentry Object Ownership. For additional information on Sentry Object Ownership, see the official Cloudera Documentation.

  1. Install and configure Sentry.
  2. Navigate to the Sentry Configuration in Cloudera Manager to turn Sentry Object Ownership on.

    1. Search for sentry.db.policy.store.owner.as.privilege.
    2. Set the value to ALL privileges using the radio button.

      • Make sure you do not select ALL privileges with GRANT, as this option would allow users to circumvent Immuta by sharing data via GRANT statements.
    3. Save your changes and restart the Sentry service.

  3. As a Sentry admin user, create the scratch space database. In this example, the database will be called immuta_scratch_space. Run the following query in beeline or impala-shell.

    CREATE DATABASE immuta_scratch_space;
    
  4. Create a Sentry role for Immuta users, if you don’t have one already. In this example, the role will be called immuta_user, and it will contain all members of the group immuta_users.

    CREATE ROLE immuta_user;
    GRANT ROLE immuta_user TO GROUP immuta_users;
    
  5. Grant CREATE privileges on the scratch space database to the immuta_user role.

    GRANT CREATE ON DATABASE immuta_scratch_space TO ROLE immuta_user;
    
  6. Revoke all other privileges for users in the immuta_user role. It is important that the only database that these users are able access via Hive or Impala directly is the scratch space database.