Create an Azure Blob Storage Data Source

Azure Blob Storage data source API reference guide

The azureblob endpoint allows you to connect and manage Azure Blob Storage data sources in Immuta.

Additional fields may be included in some responses you receive; however, these attributes are for internal purposes and are therefore undocumented.

Azure Blob workflow

Create a data source

POST /azureblob/handler

Save the provided connection for an Azure Blob Storage data source.

Payload parameters

Attribute

Description

Required

private

boolean When false, the data source will be publicly available in the Immuta UI.

Yes

blobHandler

array[object] A list of full URLs providing the locations of all blob store handlers to use with this data source.

Yes

blobHandlerType

string Describes the type of underlying blob handler that will be used with this data source (e.g., MS SQL).

Yes

recordFormat

string The data format of blobs in the data source, such as json, xml, html, or jpeg.

Yes

type

string The type of data source: ingested (metadata will exist in Immuta) or queryable (metadata is dynamically queried).

Yes

name

string The name of the data source. It must be unique within the Immuta instance.

Yes

sqlTableName

string A string that represents this data source's table in Immuta.

Yes

organization

string The organization that owns the data source.

Yes

Response parameters

Attribute

Description

integer The handler ID.

dataSourceId

integer The ID of the data source.

warnings

string This message describes issues with the created data source, such as the data source being unhealthy.

connectionString

string The connection string used to connect the data source to Immuta.

Request example

The following request saves the provided connection information (in example-payload.json) as a data source.

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/handler

Request payload example

{
  "handler": {
    "metadata": {
      "tagAttributes": [],
      "eventTimeAttribute": "",
      "useDirectoryForTags": false,
      "sasToken": "?sv=your=sas?token",
      "sasTokenUrl": "https://your.blob.example.windows.net/sastoken-url",
      "container": "demodata"
    }
  },
  "dataSource": {
    "blobHandler": {
      "scheme": "https",
      "url": ""
    },
    "blobHandlerType": "Azure Blob Storage",
    "recordFormat": "",
    "type": "ingested",
    "name": "dev",
    "sqlTableName": "dev"
  }
}

Response example

{
  "id": 18,
  "dataSourceId": 18
}

Get information about a data source

GET /azureblob/handler/{handlerId}

Return the handler metadata associated with the provided handler ID.

Query parameters

Attribute

Description

Required

handlerId

integer The specific handler ID.

Yes

skipCache

boolean If true, the handler cache will be skipped when retrieving the handler data.

Response parameters

Attribute

Description

dataSourceId

integer The data source ID.

value

array Details regarding the handler, including container, accountname, sasTokenURL, ingestUserId, tagAttributes, dataSourceName, refreshInterval, eventTimeAttribute, useDirectoryForTags.

Request example

The following request returns the handler metadata associated with the provided handler ID.

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://your-immuta-url.com/azureblob/handler/67

Response example

{
  "dataSourceId": 427,
  "metadata": {
    "container": "integration",
    "accountName": "integration-tests",
    "sasTokenUrl": "https://your.blob.example.windows.net/",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "Test",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  },
  "type": "azureBlobStorageHandler",
  "connectionString": "integration-tests/integration",
  "id": 427,
  "createdAt": "2021-09-22T18:45:47.744Z",
  "updatedAt": "2021-09-22T18:45:47.969Z"
}

Manage data sources

Method

Path

Purpose

PUT

/azureblob/handler/{handlerId}

Update the provided information for an Azure Blob Storage data source.

PUT

/azureblob/bulk

Update the handler metadata associated with the provided connection string.

PUT

/azureblob/handler/{handlerId}/crawl

Re-crawl the data source and update the metadata.

Update a specific data source

PUT /azureblob/handler/{handlerId}

Update the provided information for an Azure Blob Storage data source.

Query parameters

Attribute

Description

Required

handlerId

integer The specific handler ID.

Yes

skipCache

boolean When true, will skip the handler cache when retrieving metadata.

Response parameters

Attribute

Description

integer The ID of the handler.

dataSourceId

integer The data source ID.

metadata

array Details regarding the updated information.

Request example

The following request with the payload below updates the metadata for the data source with the handler ID 18.

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/handler/18

Payload example

{
  "dataSourceId": 18,
  "metadata": {
    "container": "testdata",
    "accountName": "integration-tests",
    "sasTokenUrl": "https://your.blob.example.windows.net/",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "dev",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  },
  "type": "azureBlobStorageHandler",
  "connectionString": "your/testdata",
  "id": 18,
  "createdAt": "2021-09-23T18:47:52.976Z",
  "updatedAt": "2021-09-23T18:47:53.194Z"
}

Response example

{
  "id": 18,
  "dataSourceId": 18,
  "metadata": {
    "sasToken": "2:your?sastoken==",
    "container": "testdata",
    "accountName": "your-account-name",
    "sasTokenUrl": "2:your?sastokenurlTS",
    "ingestAPIKey": "996samplee89c1apia7ckey9",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "dev",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  }
}

Update multiple data sources

PUT /azureblob/bulk

Update the data source metadata associated with the provided connection string.

Payload parameters

Attribute

Description

Required

handler

metadata Includes metadata about the handler, such as ssl, port, database, hostname, username, and password.

Yes

connectionString

string The connection string used to connect to the data sources.

Yes

Response parameters

Attribute

Description

bulkId

string The ID of the bulk data source update.

connectionString

string The connection string shared by the data sources bulk updated.

jobsCreated

integer The number of jobs that ran to update the data sources; this number corresponds to the number of data sources updated.

Request example

The following request updates the autoIngest value to true for data sources with the connection string specified in the payload below.

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/bulk

Payload example

{
  "ids": [
    5, 6
  ],
  "connectionString": "integration-tests/integration",
  "handler": {
    "metadata": {
      "autoIngest": true
    }
  }
}

Response example

{
  "bulkId": "bulk_ds_update_dd2600809bf8418dbea2706d6f456636",
  "connectionString": "integration-tests/integration",
  "jobsCreated": 0
}

Re-crawl the data source

PUT /azureblob/handler/{handlerId}/crawl

Re-crawls the data source and updates the metadata.

Query parameters

Attribute

Description

Required

HandlerId

integer The specific handler ID.

Yes

Response parameters

The response returns a string of characters that identify the job run.

Request example

The following request re-crawls the data source.

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://your-immuta-url.com/azureblob/hanfler/427/crawl

Response example

a4de5af0-1be1-11ec-8131-6fe77107bfa9

PreviousCreate an Azure Synapse Analytics Data Source NextCreate a Databricks Data Source

Last updated 1 year ago

Was this helpful?