Object-Backed Data Source Tutorial
Audience: Data Owners
Content Summary: Object-backed data sources are data storage technologies that do not support SQL and can range from NoSQL technologies, to blob stores, to filesystems, to APIs. Object-backed data sources act like key/value stores and are often called ingested sources because Immuta must ingest metadata about the data source to provide access and create policy restrictions. Data Owners provide Immuta metadata about the blobs they are exposing so that Immuta understands how to reach the blobs and apply policies.
This guide outlines the process of creating object-backed data sources, such as Amazon S3, Apache HDFS, Azure Blob Storage, Custom, FTP, and Persisted.
If your storage technology is not listed above, navigate to the Query-backed Data Sources Tutorial.
Step 1: Create a New Data Source
To create a new data source,
- Click the plus button in the bottom left corner of the Immuta console.
- Select the Data Source icon.
Alternatively,
- Navigate to the My Data Sources page.
- Click the New Data Source button.
Step 2: Select Your Storage Technology
Select the storage technology containing the data you wish to expose by clicking a tile. Please note that the list of enabled technologies is configurable and may differ from the image below.
Select from the list below for specific instructions on creating a data source for your chosen storage technology. If your storage technology is not listed, please refer to the Query-backed Data Source Tutorial.
Step 3: Enter Basic Information
Here you provide information about your source that makes it discoverable to users.
- Complete the Data Source Name field, which will be the name shown in the Immuta UI.
-
Enter the SQL Table Name, which will be the name of the table presented in the Immuta Query Engine; this name is not the name of the table you are getting the data from. Note that for object-backed data sources, this table will only store metadata about blobs in this data source.
Step 4: Manually Re-crawl Data Sources
Some object-backed data sources can be manually re-crawled to fetch fresh metadata about the data objects. If your data source is not set up to ingest the metadata automatically, you may need to perform this action from time to time.
- Navigate to the Data Source Overview page.
-
Click on the menu icon in the upper right corner and select Re-crawl.