Skip to content

Data Source HTTP API

Audience: Data Owners and Data Users

Content Summary: The Immuta data source metadata contains all the details about your data sources. This page describes the API to search all of your data sources.

Data Source Search Endpoint

Method Path Successful Status Code
GET /dataSource 200

Query Parameters

  • blobHandlerType (array[string]): filters data sources by blob handler type(s).
  • column (array[string]): filters data sources by column name(s).
  • connectionString (array[string]): filters data sources by connection string(s).
  • getHandlerTypeFacet (boolean): when true, returns the handler type facet.
  • maxDate (date): returns data sources created before the maxDate you enter in your query.
  • minDate (date): returns data sources created after the minDate you enter in your query.
  • mode (integer): specifies the query mode, which must be 0 (FULL), 1 (COUNT), 4 (TAG), 5 (MIN_MAX), or 6 (STATUS).
  • nameOnly (boolean, default false): when true, the query will only filter by data source name.
    • Example: "nameOnly=true&query=my-data-source-name" would return only one hit
  • offset (integer, default 0): used in combination with size to fetch specific pages.
    • Example: "size=10&offset=10" returns the second page.
    • Example: "size=10&offset=20" returns the third page.
  • publicOnly (boolean): when false, the query will only filter data sources that are private.
  • searchText (string): searches text. By default, this will filter data sources by name, description, category, and organization.
  • size (integer, default 10): pages results by default; size is the number of results to return per page.
  • sortField (string): sorts results by field, which must be subscribers, freshness, name, category, organization, blobHandlerType, subscriptionStatus, recordCount, or status.
  • sortOrder (string, default desc): sorts results by order, which must be asc or desc.
  • status (array[string]): filters data sources by health check status, which must be passed, failed, or unknown.
  • subscription (*array[string]): filters data sources by subscription types, which must be automatic, approval, or policy.
  • tag (array[string]): filters data sources by tags associated with the data sources.

Example Request to List All Data Sources:

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://demo.immuta.com/dataSource

Example Request to Search Top 100 Data Sources:

This request sorts the data sources by the number of subscribers.

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://demo.immuta.com/dataSource?size=100&sortField=subscribers

Example Request to Query the Data Dictionary:

This request queries the Data Dictionary in the dataSource in response.hits.

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://demo.immuta.com/dictionary/${dataSource["id"]}

Response Parameters

Responses include the following data source object attributes:

  • hits: array of data source metadata
  • count: total number of results available. The number of results in the hits array may be less than the count, depending on the size query parameter.

Example Response:

{
   "hits":
    [
        {
            "name": "dataSource1",
            "id": 1,
            "recordFormat": "json",
            "deleted": false,
            "category": "test",
            "description": null,
            "organization": "immuta",
            "freshness": "2016-09-01T22:09:17.000Z",
            "private": false,
            "subscribers": 3,
            "recordCount": 0,
            "parentCount": 0,
            "subscriptionStatus": "expert",
            "blobHandlerType": "PostgreSQL",
            "subscriptionvisibility": null
        },
        {
            "name": "dataSource2",
            "id": 2,
            "recordFormat": "json",
            "deleted": false,
            "category": "testing",
            "description": "sample description",
            "organization": "immuta",
            "freshness": "2018-03-09T17:02:41.243Z",
            "private": false,
            "subscribers": 2,
            "recordCount": 11,
            "parentCount": 0,
            "subscriptionStatus": "not_subscribed",
            "blobHandlerType": "Persisted",
            "subscriptionvisibility": 2704
        }
    ],
    "count": 22
}

Get Data Source Endpoint

Method Path Successful Status Code
GET /dataSource/{dataSourceId} 200

Request Path Parameters

  • dataSourceId (integer): ID of the data source that is being queried.
    • Example: 1

Example Request:

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://demo.immuta.com/dataSource/1

Response Parameters

This endpoint returns a data source object.

Example Response:

{
    "name": "loan data",
    "recordFormat": "csv",
    "category": "financial",
    "description": "sample loan data",
    "organization": "Immuta",
    "policyHandler": null,
    "sqlTableName": "loan_data",
    "blobHandler": {
        "url": "https://demo.immuta.com/pg/handler/1",
        "manualDictionary": false
    },
    "createdBy": "13",
    "deleted": false,
    "private": true,
    "type": "queryable",
    "useDatesAsDirectory": false,
    "recordCount": 0,
    "rowCount": 43,
    "documentation": null,
    "minRecordDate": null,
    "maxRecordDate": null,
    "freshness": "2017-08-14T14:15:18.490Z",
    "statsExpiration": "2017-08-15T14:15:19.588Z",
    "id": 980,
    "blobHandlerType": "PostgreSQL",
    "policyHandlerType": "None",
    "roots": [],
    "hasSamples": false,
    "subscriptionType": "approval",
    "status": {
        "sql": {
            "status": "passed",
            "message": "Passed"
        },
        "status": "passed",
        "lastAttempt": {
            "date": "2017-08-14T14:15:19.434Z",
            "userId": "jeff@immuta.com",
            "profileId": 13,
            "iamId": "bim"
        },
        "lastUpdate": {
            "date": "2017-08-14T14:15:19.434Z",
            "userId": "jeff@immuta.com",
            "profileId": 13,
            "iamId": "bim"
        }
    },
    "createdAt": "2017-08-14T14:15:18.490Z",
    "updatedAt": "2017-08-14T14:15:20.108Z",
    "subscribedAsUser": true,
    "subscriptionId": 2840,
    "acknowledgeRequired": false,
    "subscriptionStatus": "owner",
    "requestedState": "owner",
    "approved": true,
    "subscriptionExpiration": null,
    "filterId": null,
    "subscribers": 1,
    "tags": []
}

Data Source Creation Endpoint

In order to create a data source, you must construct a request that includes the blob handler and data source metadata, and then this request must be then sent to the blob handler service to create the data source. See List Blob Handler Endpoint for instructions on listing blob handlers to begin constructing a request.

Method Path Successful Status Code
POST {blobHandlerBaseUrl}/handler 200
PUT {blobHandlerBaseUrl}/handler 200

Request Path Parameters

  • blobHandlerBaseUrl (string): The base URL to the blob handler. Blob handler-based URLs can be found by sending a request to the List Blob Handlers Endpoint.

Request Parameters

Example Request Payload:

{
    "dataSource": {
        "private": true,
        "useDatesAsDirectory": false,
        "blobHandler": {
            "scheme": "https",
            "url": ""
        },
        "blobHandlerType": "PostgreSQL",
        "recordFormat": "json",
        "type": "queryable",
        "name": "API Test Data Source",
        "sqlTableName": "api_test_data_source",
        "organization": "API Test Org",
        "category": "API Test",
        "description": "Test Data Source created using the Immuta API.",
        "hasSamples": false,
        "owner": {
            "profiles": [],
            "groups": []
        },
        "expert": {
            "profiles": [],
            "groups": []
        },
        "ingest": {
            "profiles": [],
            "groups": []
        }
    },
    "handler": {
        "metadata": {
            "isChildData Source": false,
            "format": "json",
            "staleDataTolerance": 1728000,
            "ssl": true,
            "username": "testuser",
            "password": "secretpassword",
            "hostname": "test-database.test-db.com",
            "port": 5432,
            "database": "test",
            "schema": "public",
            "table": "us_states",
            "columns": [
                {
                    "name": "gid",
                    "dataType": "integer"
                },
                {
                    "name": "name",
                    "dataType": "character varying"
                },
                {
                    "name": "stusps",
                    "dataType": "character varying"
                }
            ],
            "blobId": ["gid"],
            "bodataTableName": "api_test_data_source",
            "dataSourceName": "API Test Data Source"
        }
    }
}

Example Request:

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://demo.immuta.com/pg/handler

Response Parameters

  • id (integer): The blob handler ID
  • dataSourceId (integer): The data source ID

Example Response:

{
    "id": 100,
    "dataSourceId": 156
}

List Blob Handlers Endpoint

When interacting with data sources programmatically, most actions require sending requests directly to blob handlers. The blob handler discovery endpoint can be used to query Immuta for all registered blob handlers. This list does not include custom blob handlers.

Method Path Successful Status Code
GET /dataSource/blobHandlerTypes 200

Example Request:

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    https://demo.immuta.com/dataSource/blobHandlerTypes

Response Parameters

  • name (string): The name of the blob handler.
    • Example: "PostgreSQL"
  • clientUrl (string): The URL to the blob handler for the UI. This may be a relative URL.
    • Example: "/pg"
  • baseUrl (string): The base URL for the blob handler.
    • Example: "https://demo.immuta.com/pg"
  • config (object): Configuration options for the blob handler.
    • Example: { "port": 5432 }
  • displayOrder (integer): The display order for the UI.
    • Example: 20

Example Response:

[
    {
        "name": "PostgreSQL",
        "clientUrl": "/pg",
        "baseUrl": "https://demo.immuta.com/pg",
        "config": {
            "port": 5432
        },
        "displayOrder": 20
    },
    {
        "name": "MySQL",
        "clientUrl": "/mysql",
        "baseUrl": "https://demo.immuta.com/mysql",
        "config": {
            "port": 3306
        },
        "displayOrder": 30
    },
    {
        "name": "Apache Impala",
        "clientUrl": "/impala",
        "baseUrl": "https://demo.immuta.com/impala",
        "config": {
            "skipFileSizeChecks": true,
            "port": 21050
        },
        "displayOrder": 120
    },
    {
        "name": "Apache HDFS",
        "clientUrl": "/hdfs",
        "baseUrl": "https://demo.immuta.com/hdfs",
        "config": {
            "allowManualCrawl": true,
            "disableMasking": true
        },
        "displayOrder": 110
    }
]

Blob Handler Objects

The configuration parameters for each handler is dependent on the handler type. See each type below to properly configure your data source.

Available Default Blob Handlers:

ODBC Blob Handler Object

The ODBC blob handler is the most common blob handler in Immuta.

All configuration data for the ODBC handler is contained in a metadata object.

  • ssl (boolean): Set to true to enable SSL communication with the remote database.
    • Example: true
  • username (string): The username used to connect to the remote database.
    • Example: "testuser"
  • password (string): The password used to connect to the remote database.
    • Example: "secretpassword"
  • hostname (string): The hostname of the remote database instance.
    • Example: "test.immuta-db.com"
  • port (integer): The port of the remote database instance.
    • Example: 5432
  • database (string): The database in the remote database that corresponds to this handler.
    • Example: "testdatabase"
  • format (string): The format of the blobs that will be returned for this handler (either json or csv).
    • Example: "json"
  • staleDataTolerance (integer): The length in seconds that data from this handler can be cached.
    • Example: 1728000
  • schema (string): The schema in the remote database that corresponds to this handler.
    • Example: "public"
  • table (string): The table in the remote database that corresponds to this handler.
    • Example: "states"
  • columns (array[object]): The columns that should be included in this data source.
    • Example: [{ "name": "id", "dataType": "integer" }, { "name": "description", "dataType": "charactervarying", "remoteColumn": "desc" }]
  • blobId (array[string]): The columns that compose the blob ID for records in this data source.
    • Example: ["id"]
  • bodataTableName (string): The name of the table associated with this data source that will be created in Postgres.
    • Example: "test_data_source"
  • dataSourceName (string): The name of the data source to which this handler corresponds.
    • Example: "Test Data Source"
  • query (string): The query that represents the data source. This query will be run to fetch blobs, stats, and the catalog for this data source.
    • Example: "SELECT id, desc FROM states"
  • connectionStringOptions (string): Additional connection string options to be used when connecting to the remote database.
    • Example: "sslmode=require;"
  • sid (string): SID/Database from Oracle TSN used to build connection string.
    • Example: "mydb2"

Example:

"handler": {
    "metadata": {
        "format": "json",
        "staleDataTolerance": 1728000,
        "ssl": true,
        "username": "testuser",
        "password": "secretpassword",
        "hostname": "test-database.test-db.com",
        "port": 5432,
        "database": "test",
        "schema": "public",
        "table": "us_states",
        "columns": [
            {
                "name": "gid",
                "dataType": "integer"
            },
            {
                "name": "name",
                "dataType": "character varying"
            },
            {
                "name": "stusps",
                "dataType": "character varying"
            }
        ],
        "blobId": [],
        "bodataTableName": "api_test_data_source",
        "dataSourceName": "API Test Data Source"
    }
}

HDFS Blob Handler Object

  • hostname (string): The hostname of the HDFS NameNode.
    • Example: "namenode.hdfs.immuta.com"
  • port (integer): The port of the HDFS NameNode.
    • Example: 8020
  • directory (string): The base directory in HDFS from which this handler will index files.
    • Example: "/data/testdata"
  • username (string): The username the handler will use when connecting to HDFS.
    • Example: "testuser"
  • kerberos (boolean): Whether or not the HDFS cluster is secured with Kerberos.
    • Example: true
  • realm (string): The Kerberos realm the handler will use when connecting to HDFS.
    • Example: "IMMUTA.COM"
  • eventTimeAttribute (string): The file x-attr to be used for blob event time. Use "" to use last modified date.
    • Example: "processed"
  • tagAttributes (array[string]): An array of x-attr file attributes to be used to tag blobs.
    • Example: ["sometag"]
  • featureAttributes (array[string]): An array of x-attr file attributes to be used to generate blob features.
    • Example: ["userid", "color"]
  • useDirectoryForTags (boolean): Whether or not to use the HDFS directory structure as tags for the blobs.
    • Example: true

Example:

"handler": {
    "metadata": {
        "featureAttributes": [],
        "tagAttributes": [],
        "eventTimeAttribute": "",
        "kerberos": true,
        "realm": "IMMUTA.COM",
        "username": "testuser",
        "hostname": "namenode.hdfs.immuta.com",
        "port": 8020,
        "directory": "/data/testdata",
        "useDirectoryForTags": true
    }
}

Data Source Object

  • private (boolean): Whether or not the data source should be publicly available in the Immuta Web UI.
    • Example: true
  • blobHandler (object): A list of full URLs providing the locations of all blob store handlers to use with this data source.
    • Example: { "scheme":"https", "url":"" }
  • blobHandlerType (string): Describes the type of underlying blob handler that will be used with this data source (e.g., Custom, MS SQL).
    • Example: "PostgreSQL"
  • recordFormat (string): The data format of blobs in this data source (e.g., json, xml, html, jpeg).
    • Example: "json"
  • type (string): The type of data source, whether it is ingested (metadata will exist in Immuta) or queryable (metadata is dynamically queried).
    • Example: "queryable"
  • name (string): The name of the data source. Must be unique within the Immuta instance.
    • Example: "Test API Data Source"
  • sqlTableName (string): A string that represents this data source's table in Postgres. Is either a foreign table or feature table.
    • Example: "test_api_data_source"
  • organization (string): The organization that owns this data source.
    • Example: "Test Org"
  • category (string): The category of the data source.
    • Example: "Finance"
  • description (string): The description of the data source.
    • Example: "This data source contains FY2017 finance information."
  • owner (object): Users and groups that should be added as owners to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.
    • Example: { "profiles": [3, 5], "groups": [4, 1999] }
  • expert (object): Users and groups that should be added as expert users to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.
    • Example: { "profiles": [87, 199], "groups": [324] }
  • ingest (object): Users and groups that should be added as ingest users to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.
    • Example: { "profiles": [34, 23], "groups": [32] }
  • hasSamples (boolean): Whether or not the data source contains samples.
    • Example: false

Example:

"dataSource": {
    "private": true,
    "useDatesAsDirectory": false,
    "blobHandler": {
        "scheme": "https",
        "url": ""
    },
    "blobHandlerType": "PostgreSQL",
    "recordFormat": "json",
    "type": "queryable",
    "name": "API Test Data Source",
    "sqlTableName": "api_test_data_source",
    "organization": "API Test Org",
    "category": "API Test",
    "description": "Test Data Source created using the Immuta API.",
    "hasSamples": false,
    "owner": {
        "profiles": [2, 5],
        "groups": [2]
    },
    "expert": {
        "profiles": [3],
        "groups": [9]
    },
    "ingest": {
        "profiles": [],
        "groups": []
    }
}