Azure Blob Storage Data Source Creation Tutorial
Audience: Data Owners
Content Summary: This guide details configuring an Azure Blob Storage data source in Immuta. To explore data source creation guides for other storage technologies, see the Object-backed and Query-backed guides.
Step 1: Enter Connection Information
To connect a data source to an Azure Blob Storage container, you must first create a Shared Access Signature for your Azure Blob Storage account.
Enter the Shared Access Signature Token and the corresponding URL for your Azure Storage account. Follow the steps below to retrieve your SAS credentials from the Azure Portal.
Generating and Retrieving Shared Access Signature Credentials
- Open the Azure Portal Web UI.
- Find and select your desired Azure Storage Account resource.
Under SETTINGS select Shared access signature.
Configure the SAS Token's allowed services, resource types, and permissions to match the following image.
Set a reasonable expiration date for your SAS Token. When your SAS Token expires, your Immuta data source will no longer be able to fetch data from Azure.
- Select Generate SAS and save the provided credentials.
Step 2: Select the Container
Select the container that you wish to base this data source on. The data source will contain all of the blobs in this container, and it will also maintain the container's directory structure.
In these final options, you can edit advanced configurations for your data source. None of these configurations are required to create the data source.
Option 1: Determine Refresh Interval
If left blank or set to 0, Azure blob data will only be indexed once when the data source is initially created. Otherwise, the Azure blob data will be re-indexed based on the selected time interval.
- Set Time: This is how often Immuta will re-index data located in the remote Azure blob container.
- Set Period: This is the time period and can be set to minutes, hours or days.
If you do not set a refresh interval, Immuta will never automatically crawl your container. You can always manually crawl from the Data Source Overview page.
Option 2: Select Data Format
While object-backed data sources can be any format (images, videos, etc.), Immuta can still work under the assumption that some will have common formats. Should your blobs be comma separated, tab-delimited, or json, you can mask values through the Immuta interface. Specifying the data format will allow you to create masking policies for the data source.
Option 3: Populate Event Time
Event time allows you to catalog the blobs in your data source by date. It can also be used for creating data source minimization policies.
By default, Immuta will use each blob's
Last Modified date attribute from Azure for Event Time.
However, this is not always an accurate way to represent event time.
Should you want to provide a customized event time, you can do that via blob attributes.
You can specify the key of the metadata attribute that contains the
date in ISO 8601 format, for example:
Option 4: Configure Tags
Immuta will extract any existing metadata from Azure blobs. This metadata can also be used to apply tags
to blobs. When configuring tags, note that
Attribute Name refers to the key of your desired blob
metadata attribute in Azure.