# Create an Azure Blob Storage Data Source

The `azureblob` endpoint allows you to connect and manage Azure Blob Storage data sources in Immuta.

{% hint style="info" %}
Additional fields may be included in some responses you receive; however, these attributes are for internal purposes and are therefore undocumented.
{% endhint %}

## Azure Blob workflow

1. [Create a data source](#create-a-data-source).
2. [Get information about a data source](#get-information-about-a-data-source).
3. [Manage data sources](#manage-data-sources).

## Create a data source

<mark style="color:green;">`POST`</mark> `/azureblob/handler`

Save the provided connection for an Azure Blob Storage data source.

#### Payload parameters

| Attribute       | Description                                                                                                                    | Required |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------ | -------- |
| private         | `boolean` When `false`, the data source will be publicly available in the Immuta UI.                                           | **Yes**  |
| blobHandler     | `array[object]` A list of full URLs providing the locations of all blob store handlers to use with this data source.           | **Yes**  |
| blobHandlerType | `string` Describes the type of underlying blob handler that will be used with this data source (e.g., `MS SQL`).               | **Yes**  |
| recordFormat    | `string` The data format of blobs in the data source, such as `json`, `xml`, `html`, or `jpeg`.                                | **Yes**  |
| type            | `string` The type of data source: `ingested` (metadata will exist in Immuta) or `queryable` (metadata is dynamically queried). | **Yes**  |
| name            | `string` The name of the data source. It must be unique within the Immuta instance.                                            | **Yes**  |
| sqlTableName    | `string` A string that represents this data source's table in Immuta.                                                          | **Yes**  |
| organization    | `string` The organization that owns the data source.                                                                           | **Yes**  |
| category        | `string` The category of the data source.                                                                                      | No       |
| description     | `string` The description of the data source.                                                                                   | No       |
| hasExamples     | `boolean` When `true`, the data source contains examples.                                                                      | No       |

#### Response parameters

| Attribute        | Description                                                                                                   |
| ---------------- | ------------------------------------------------------------------------------------------------------------- |
| id               | `integer` The handler ID.                                                                                     |
| dataSourceId     | `integer` The ID of the data source.                                                                          |
| warnings         | `string` This message describes issues with the created data source, such as the data source being unhealthy. |
| connectionString | `string` The connection string used to connect the data source to Immuta.                                     |

### Request example

The following request saves the provided connection information (in `example-payload.json`) as a data source.

```shell
curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/handler
```

#### Request payload example

```json
{
  "handler": {
    "metadata": {
      "tagAttributes": [],
      "eventTimeAttribute": "",
      "useDirectoryForTags": false,
      "sasToken": "?sv=your=sas?token",
      "sasTokenUrl": "https://your.blob.example.windows.net/sastoken-url",
      "container": "demodata"
    }
  },
  "dataSource": {
    "blobHandler": {
      "scheme": "https",
      "url": ""
    },
    "blobHandlerType": "Azure Blob Storage",
    "recordFormat": "",
    "type": "ingested",
    "name": "dev",
    "sqlTableName": "dev"
  }
}
```

### Response example

```json
{
  "id": 18,
  "dataSourceId": 18
}
```

## Get information about a data source

<mark style="color:green;">`GET`</mark> `/azureblob/handler/{handlerId}`

Return the handler metadata associated with the provided handler ID.

#### Query parameters

| Attribute | Description                                                                              | Required |
| --------- | ---------------------------------------------------------------------------------------- | -------- |
| handlerId | `integer` The specific handler ID.                                                       | **Yes**  |
| skipCache | `boolean` If `true`, the handler cache will be skipped when retrieving the handler data. | No       |

#### Response parameters

| Attribute    | Description                                                                                                                                                                                                    |
| ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| dataSourceId | `integer` The data source ID.                                                                                                                                                                                  |
| value        | `array` Details regarding the handler, including `container`, `accountname`, `sasTokenURL`, `ingestUserId`, `tagAttributes`, `dataSourceName`, `refreshInterval`, `eventTimeAttribute`, `useDirectoryForTags`. |

### Request example

The following request returns the handler metadata associated with the provided handler ID.

```shell
curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://your-immuta-url.com/azureblob/handler/67
```

### Response example

```json
{
  "dataSourceId": 427,
  "metadata": {
    "container": "integration",
    "accountName": "integration-tests",
    "sasTokenUrl": "https://your.blob.example.windows.net/",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "Test",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  },
  "type": "azureBlobStorageHandler",
  "connectionString": "integration-tests/integration",
  "id": 427,
  "createdAt": "2021-09-22T18:45:47.744Z",
  "updatedAt": "2021-09-22T18:45:47.969Z"
}
```

## Manage data sources

| Method | Path                                   | Purpose                                                                                                      |
| ------ | -------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| PUT    | `/azureblob/handler/{handlerId}`       | [Update the provided information for an Azure Blob Storage data source](#update-a-specific-data-source).     |
| PUT    | `/azureblob/bulk`                      | [Update the handler metadata associated with the provided connection string](#update-multiple-data-sources). |
| PUT    | `/azureblob/handler/{handlerId}/crawl` | [Re-crawl the data source and update the metadata](#re-crawl-the-data-source).                               |

### Update a specific data source

<mark style="color:green;">`PUT`</mark> `/azureblob/handler/{handlerId}`

Update the provided information for an Azure Blob Storage data source.

#### Query parameters

| Attribute | Description                                                                  | Required |
| --------- | ---------------------------------------------------------------------------- | -------- |
| handlerId | `integer` The specific handler ID.                                           | **Yes**  |
| skipCache | `boolean` When `true`, will skip the handler cache when retrieving metadata. | No       |

#### Response parameters

| Attribute    | Description                                        |
| ------------ | -------------------------------------------------- |
| id           | `integer` The ID of the handler.                   |
| dataSourceId | `integer` The data source ID.                      |
| metadata     | `array` Details regarding the updated information. |

#### Request example

The following request with the payload below updates the metadata for the data source with the handler ID `18`.

```shell
curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/handler/18
```

**Payload example**

```json
{
  "dataSourceId": 18,
  "metadata": {
    "container": "testdata",
    "accountName": "integration-tests",
    "sasTokenUrl": "https://your.blob.example.windows.net/",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "dev",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  },
  "type": "azureBlobStorageHandler",
  "connectionString": "your/testdata",
  "id": 18,
  "createdAt": "2021-09-23T18:47:52.976Z",
  "updatedAt": "2021-09-23T18:47:53.194Z"
}
```

#### Response example

```json
{
  "id": 18,
  "dataSourceId": 18,
  "metadata": {
    "sasToken": "2:your?sastoken==",
    "container": "testdata",
    "accountName": "your-account-name",
    "sasTokenUrl": "2:your?sastokenurlTS",
    "ingestAPIKey": "996samplee89c1apia7ckey9",
    "ingestUserId": "azure blob storage_indexer_example",
    "tagAttributes": [],
    "dataSourceName": "dev",
    "refreshInterval": 0,
    "eventTimeAttribute": "",
    "useDirectoryForTags": false
  }
}
```

### Update multiple data sources

<mark style="color:green;">`PUT`</mark> `/azureblob/bulk`

Update the data source metadata associated with the provided connection string.

#### Payload parameters

| Attribute        | Description                                                                                                                | Required |
| ---------------- | -------------------------------------------------------------------------------------------------------------------------- | -------- |
| handler          | `metadata` Includes metadata about the handler, such as `ssl`, `port`, `database`, `hostname`, `username`, and `password`. | **Yes**  |
| connectionString | `string` The connection string used to connect to the data sources.                                                        | **Yes**  |

#### Response parameters

| Attribute        | Description                                                                                                                      |
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| bulkId           | `string` The ID of the bulk data source update.                                                                                  |
| connectionString | `string` The connection string shared by the data sources bulk updated.                                                          |
| jobsCreated      | `integer` The number of jobs that ran to update the data sources; this number corresponds to the number of data sources updated. |

#### Request example

The following request updates the `autoIngest` value to `true` for data sources with the connection string specified in the payload below.

```shell
curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.com/azureblob/bulk
```

**Payload example**

```json
{
  "ids": [
    5, 6
  ],
  "connectionString": "integration-tests/integration",
  "handler": {
    "metadata": {
      "autoIngest": true
    }
  }
}
```

#### Response example

```json
{
  "bulkId": "bulk_ds_update_dd2600809bf8418dbea2706d6f456636",
  "connectionString": "integration-tests/integration",
  "jobsCreated": 0
}
```

### Re-crawl the data source

<mark style="color:green;">`PUT`</mark> `/azureblob/handler/{handlerId}/crawl`

Re-crawls the data source and updates the metadata.

#### Query parameters

| Attribute | Description                        | Required |
| --------- | ---------------------------------- | -------- |
| HandlerId | `integer` The specific handler ID. | **Yes**  |

#### Response parameters

The response returns a string of characters that identify the job run.

#### Request example

The following request re-crawls the data source.

```shell
curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://your-immuta-url.com/azureblob/hanfler/427/crawl
```

#### Response example

```json
a4de5af0-1be1-11ec-8131-6fe77107bfa9
```
