Skip to content

Azure Blob Storage Data Source Creation Tutorial

Audience: Data Owners

Content Summary: This guide details configuring an Azure Blob Storage data source in Immuta. To explore data source creation guides for other storage technologies, see the Object-backed and Query-backed guides.

Step 1: Enter Connection Information

To connect a data source to an Azure Blob Storage container, you must first create a Shared Access Signature for your Azure Blob Storage account.

Enter the Shared Access Signature Token and the corresponding URL for your Azure Storage account. Follow the steps below to retrieve your SAS credentials from the Azure Portal.

Azure Blob Storage Connection Information

Generating and Retrieving Shared Access Signature Credentials

  1. Open the Azure Portal Web UI.
  2. Find and select your desired Azure Storage Account resource.
  3. Under SETTINGS select Shared access signature.

    Azure Blob Storage Portal Sidebar

  4. Configure the SAS Token's allowed services, resource types, and permissions to match the following image.

    Azure Blob Storage SAS Settings

  5. Set a reasonable expiration date for your SAS Token. When your SAS Token expires, your Immuta data source will no longer be able to fetch data from Azure.

  6. Select Generate SAS and save the provided credentials.

Step 2: Select the Container

Select the container that you wish to base this data source on. The data source will contain all of the blobs in this container, and it will also maintain the container's directory structure.

Azure Blob Storage Container Configuration

Advanced Configurations

In these final options, you can edit advanced configurations for your data source. None of these configurations are required to create the data source.

Azure Blob Storage Advanced

Option 1: Determine Refresh Interval

If left blank or set to 0, Azure blob data will only be indexed once when the data source is initially created. Otherwise, the Azure blob data will be re-indexed based on the selected time interval.

Refresh Interval

  • Set Time: This is how often Immuta will re-index data located in the remote Azure blob container.
  • Set Period: This is the time period and can be set to minutes, hours or days.

If you do not set a refresh interval, Immuta will never automatically crawl your container. You can always manually crawl from the Data Source Overview page.

Option 2: Select Data Format

While object-backed data sources can be any format (images, videos, etc.), Immuta can still work under the assumption that some will have common formats. Should your blobs be comma separated, tab-delimited, or json, you can mask values through the Immuta interface. Specifying the data format will allow you to create masking policies for the data source.

Data Format

Option 3: Populate Event Time

Event time allows you to catalog the blobs in your data source by date. It can also be used for creating data source minimization policies.

By default, Immuta will use each blob's Last Modified date attribute from Azure for Event Time. However, this is not always an accurate way to represent event time. Should you want to provide a customized event time, you can do that via blob attributes. You can specify the key of the metadata attribute that contains the date in ISO 8601 format, for example: 2015-11-15T05:13:32+00:00.

Azure Blob Storage Event Time

Option 4: Configure Tags and Features

Immuta will extract any existing metadata from Azure blobs. This metadata can also be used to apply tags or features to blobs. When configuring tags and features, note that Attribute Name refers to the key of your desired blob metadata attribute in Azure.

Azure Blob Storage Tags and Features