Skip to content

Object-Backed Data Source Tutorial

Audience: Data Owners

Content Summary: Object-backed data sources are data storage technologies that do not support SQL and can range from NoSQL technologies, to blob stores, to filesystems, to APIs. Object-backed data sources act like key/value stores and are often called ingested sources because Immuta must ingest metadata about the data source to provide access and create policy restrictions. Data Owners provide Immuta metadata about the blobs they are exposing so that Immuta understands how to reach the blobs and apply policies.

Unlike query-backed data sources, which support data access through both the Immuta Query Engine and the Immuta Virtual Filesystem, object-backed data sources only allow users to access the unstructured data through the Filesystem. However, Data Owners can pass Immuta features (new data they create) from within the data they are exposing that are queryable via the Query Engine.

This guide outlines the process of creating object-backed data sources, such as Amazon S3, Apache HDFS, Azure Blob Storage, Custom, FTP, and Persisted.

If your storage technology is not listed above, navigate to the Query-backed Data Sources Tutorial.

Step 1: Create a New Data Source

To create a new data source,

  1. Click the plus button in the bottom left corner of the Immuta console.
  2. Select the Data Source icon.

Alternatively,

  1. Navigate to the My Data Sources page.
  2. Click the New Data Source button.

Step 2: Select Your Storage Technology

Select the storage technology containing the data you wish to expose by clicking a tile. Please note that the list of enabled technologies is configurable and may differ from the image below.

Data Source Creation Select Backend

Select from the list below for specific instructions on creating a data source for your chosen storage technology. If your storage technology is not listed, please refer to the Query-backed Data Source Tutorial.

Step 3: Enter Basic Information

Here you provide information about your source that makes it discoverable to users.

  1. Complete the Data Source Name field, which will be the name shown in the Immuta UI.
  2. Enter the SQL Table Name, which will be the name of the table presented in the Immuta Query Engine; this name is not the name of the table you are getting the data from. Note that for object-backed data sources, this table will only store metadata about blobs in this data source.

    Data Source Creation Basic Information

Step 4: Manually Re-crawl Data Sources

Some object-backed data sources can be manually re-crawled to fetch fresh metadata about the data objects. If your data source is not set up to ingest the metadata automatically, you may need to perform this action from time to time.

  1. Navigate to the Data Source Overview page.
  2. Click on the menu icon in the upper right corner and select Re-crawl.

    Data Source Re-crawl