Immuta Hadoop Filesystem Access Pattern
Audience: Data Owners and Data Users
Content Summary: Immuta integrates with your Hadoop cluster to provide policy-compliant access to data sources directly through HDFS. This page instructs how to access data through the HDFS access pattern, which only enforces file-level controls on data. For more information on installing and configuring the Immuta Hadoop plugin, see the Administration guide. There is also a Spark SQL access pattern should you need to enforce row-level and column-level controls on data.
The Immuta Hadoop plugin can also be integrated with an existing kerberos setup to allow users to access HDFS data using their existing kerberos principals, with data access and policy enforcement managed by Immuta.
Immuta HDFS Principal
When Immuta is installed on the cluster, users can only access data through HDFS using the HDFS principal that has been set for them in Immuta. This principal can only be set by an Immuta Administrator or imported from an external Identity Manager, but Immuta users can view their principal via the profile page.
Associating a Project with your Immuta HDFS Principal
If you wish to access data in HDFS while acting under a Project Purpose you must associate that project with your Immuta HDFS Principal via the profile page. This is required if the data that you wish to access has Purpose-based restrictions.
- Navigate to the Details section of your profile.
Under HDFS Principal, click SELECT PROJECT.
Choose your desired project from the drop-down menu in the modal. Then click Save.
Your HDFS Principal is now tied to your selected project.
To remove a project association from your HDFS Principal, click SELECT PROJECT again and select None from the drop-down menu.
In order to access data through Immuta's HDFS Access Pattern, you must be authenticated as the user or principal that is assigned to your Immuta HDFS principal.
- For clusters secured with kerberos, you must successfully
kinitwith your Immuta HDFS principal before attempting to access data.
- For insecure clusters, you must be logged in to the cluster as the system user that is assigned to your HDFS principal.
Immuta's HDFS access pattern allows you to access data two different ways:
immuta:///namespace allows you to access files in relation to the Immuta data source that it is part of. For example, if you want to access a file called
december_report.csvthat is part of an Immuta data source called
reports, you can access it with the following path:
Note that the path to the file is relative to the Immuta data source that it falls under, not the real path in HDFS. Also,
immuta:///is restricted to only paths that a user can see - files that the user is not authorized for will not be visible.
The HDFS access pattern also allows users to access data using native HDFS paths. Authorized data source subscribers can access the file
december_report.csvthrough its native path in HDFS:
Note that in order for a user to access data using
hdfs:///paths, there must be a
<user>corresponds to the user's Immuta HDFS principal. Also,
hdfs:///paths will allow users to see locations of all files, but they will only be able to read files that they have access to in Immuta.
Both methods of accessing data will be audited and compliant with data source policies. If users are not subscribed to or are policy-restricted by the data source that a file in HDFS falls under, they will not be able to access the file using either namespace.
HDFS User Impersonation
Immuta users with the
IMPERSONATE_HDFS_USER permission can create HDFS, Hive, and Impala data sources as any
HDFS user (provided that they have the proper credentials). For more information, see the tutorial for
HDFS data sources.
For Impala and Hive data sources, see the
Query-backed Data Source tutorial.