Data Source HTTP API
Audience: Data Owners and Data Users
Content Summary: The Immuta data source metadata contains all the details about your data sources. This page describes the API to search all of your data sources.
Data Source Search Endpoint
Method | Path | Successful Status Code |
---|---|---|
GET | /dataSource |
200 |
Query Parameters
blobHandlerType
(array[string]): filters data sources by blob handler type(s).column
(array[string]): filters data sources by column name(s).connectionString
(array[string]): filters data sources by connection string(s).getHandlerTypeFacet
(boolean): whentrue
, returns the handler type facet.maxDate
(date): returns data sources created before themaxDate
you enter in your query.minDate
(date): returns data sources created after theminDate
you enter in your query.mode
(integer): specifies the query mode, which must be0
(FULL
),1
(COUNT
),4
(TAG
),5
(MIN_MAX
), or6
(STATUS
).nameOnly
(boolean, default false): whentrue
, the query will only filter by data source name.- Example:
"nameOnly=true&query=my-data-source-name"
would return only one hit
- Example:
offset
(integer, default 0): used in combination withsize
to fetch specific pages.- Example:
"size=10&offset=10"
returns the second page. - Example:
"size=10&offset=20"
returns the third page.
- Example:
publicOnly
(boolean): whenfalse
, the query will only filter data sources that are private.searchText
(string): searches text. By default, this will filter data sources by name, description, category, and organization.size
(integer, default 10): pages results by default;size
is the number of results to return per page.sortField
(string): sorts results by field, which must besubscribers
,freshness
,name
,category
,organization
,blobHandlerType
,subscriptionStatus
,recordCount
, orstatus
.sortOrder
(string, defaultdesc
): sorts results by order, which must beasc
ordesc
.status
(array[string]): filters data sources by health check status, which must bepassed
,failed
, orunknown
.subscription
(*array[string]): filters data sources by subscription types, which must beautomatic
,approval
, orpolicy
.tag
(array[string]): filters data sources by tags associated with the data sources.
Example Request to List All Data Sources:
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://demo.immuta.com/dataSource
Example Request to Search Top 100 Data Sources:
This request sorts the data sources by the number of subscribers.
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://demo.immuta.com/dataSource?size=100&sortField=subscribers
Example Request to Query the Data Dictionary:
This request queries the Data Dictionary in the dataSource
in response.hits
.
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://demo.immuta.com/dictionary/${dataSource["id"]}
Response Parameters
Responses include the following data source object attributes:
hits
: array of data source metadatacount
: total number of results available. The number of results in thehits
array may be less than thecount
, depending on thesize
query parameter.
Example Response:
{
"hits":
[
{
"name": "dataSource1",
"id": 1,
"recordFormat": "json",
"deleted": false,
"category": "test",
"description": null,
"organization": "immuta",
"freshness": "2016-09-01T22:09:17.000Z",
"private": false,
"subscribers": 3,
"recordCount": 0,
"parentCount": 0,
"subscriptionStatus": "expert",
"blobHandlerType": "PostgreSQL",
"subscriptionvisibility": null
},
{
"name": "dataSource2",
"id": 2,
"recordFormat": "json",
"deleted": false,
"category": "testing",
"description": "sample description",
"organization": "immuta",
"freshness": "2018-03-09T17:02:41.243Z",
"private": false,
"subscribers": 2,
"recordCount": 11,
"parentCount": 0,
"subscriptionStatus": "not_subscribed",
"blobHandlerType": "Persisted",
"subscriptionvisibility": 2704
}
],
"count": 22
}
Get Data Source Endpoint
Method | Path | Successful Status Code |
---|---|---|
GET | /dataSource/{dataSourceId} |
200 |
Request Path Parameters
dataSourceId
(integer): ID of the data source that is being queried.- Example:
1
- Example:
Example Request:
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://demo.immuta.com/dataSource/1
Response Parameters
This endpoint returns a data source object.
Example Response:
{
"name": "loan data",
"recordFormat": "csv",
"category": "financial",
"description": "sample loan data",
"organization": "Immuta",
"policyHandler": null,
"sqlTableName": "loan_data",
"blobHandler": {
"url": "https://demo.immuta.com/pg/handler/1",
"manualDictionary": false
},
"createdBy": "13",
"deleted": false,
"private": true,
"type": "queryable",
"useDatesAsDirectory": false,
"recordCount": 0,
"rowCount": 43,
"documentation": null,
"minRecordDate": null,
"maxRecordDate": null,
"freshness": "2017-08-14T14:15:18.490Z",
"statsExpiration": "2017-08-15T14:15:19.588Z",
"id": 980,
"blobHandlerType": "PostgreSQL",
"policyHandlerType": "None",
"roots": [],
"hasSamples": false,
"subscriptionType": "approval",
"status": {
"sql": {
"status": "passed",
"message": "Passed"
},
"status": "passed",
"lastAttempt": {
"date": "2017-08-14T14:15:19.434Z",
"userId": "jeff@immuta.com",
"profileId": 13,
"iamId": "bim"
},
"lastUpdate": {
"date": "2017-08-14T14:15:19.434Z",
"userId": "jeff@immuta.com",
"profileId": 13,
"iamId": "bim"
}
},
"createdAt": "2017-08-14T14:15:18.490Z",
"updatedAt": "2017-08-14T14:15:20.108Z",
"subscribedAsUser": true,
"subscriptionId": 2840,
"acknowledgeRequired": false,
"subscriptionStatus": "owner",
"requestedState": "owner",
"approved": true,
"subscriptionExpiration": null,
"filterId": null,
"subscribers": 1,
"tags": []
}
Data Source Creation Endpoint
In order to create a data source, you must construct a request that includes the blob handler and data source metadata, and then this request must be then sent to the blob handler service to create the data source. See List Blob Handler Endpoint for instructions on listing blob handlers to begin constructing a request.
Method | Path | Successful Status Code |
---|---|---|
POST | {blobHandlerBaseUrl}/handler |
200 |
PUT | {blobHandlerBaseUrl}/handler |
200 |
Request Path Parameters
blobHandlerBaseUrl
(string): The base URL to the blob handler. Blob handler-based URLs can be found by sending a request to the List Blob Handlers Endpoint.
Request Parameters
dataSource
(object): The data source configuration. See Data Source Configuration for more details.handler
(object): The blob handler configuration. See Blob Handler Configuration for more details.
Example Request Payload:
{
"dataSource": {
"private": true,
"useDatesAsDirectory": false,
"blobHandler": {
"scheme": "https",
"url": ""
},
"blobHandlerType": "PostgreSQL",
"recordFormat": "json",
"type": "queryable",
"name": "API Test Data Source",
"sqlTableName": "api_test_data_source",
"organization": "API Test Org",
"category": "API Test",
"description": "Test Data Source created using the Immuta API.",
"hasSamples": false,
"owner": {
"profiles": [],
"groups": []
},
"expert": {
"profiles": [],
"groups": []
},
"ingest": {
"profiles": [],
"groups": []
}
},
"handler": {
"metadata": {
"isChildData Source": false,
"format": "json",
"staleDataTolerance": 1728000,
"ssl": true,
"username": "testuser",
"password": "secretpassword",
"hostname": "test-database.test-db.com",
"port": 5432,
"database": "test",
"schema": "public",
"table": "us_states",
"columns": [
{
"name": "gid",
"dataType": "integer"
},
{
"name": "name",
"dataType": "character varying"
},
{
"name": "stusps",
"dataType": "character varying"
}
],
"blobId": ["gid"],
"bodataTableName": "api_test_data_source",
"dataSourceName": "API Test Data Source"
}
}
}
Example Request:
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
--data @example-payload.json \
https://demo.immuta.com/pg/handler
Response Parameters
id
(integer): The blob handler IDdataSourceId
(integer): The data source ID
Example Response:
{
"id": 100,
"dataSourceId": 156
}
List Blob Handlers Endpoint
When interacting with data sources programmatically, most actions require sending requests directly to blob handlers. The blob handler discovery endpoint can be used to query Immuta for all registered blob handlers. This list does not include custom blob handlers.
Method | Path | Successful Status Code |
---|---|---|
GET | /dataSource/blobHandlerTypes |
200 |
Example Request:
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://demo.immuta.com/dataSource/blobHandlerTypes
Response Parameters
name
(string): The name of the blob handler.- Example:
"PostgreSQL"
- Example:
clientUrl
(string): The URL to the blob handler for the UI. This may be a relative URL.- Example:
"/pg"
- Example:
baseUrl
(string): The base URL for the blob handler.- Example:
"https://demo.immuta.com/pg"
- Example:
config
(object): Configuration options for the blob handler.- Example:
{ "port": 5432 }
- Example:
displayOrder
(integer): The display order for the UI.- Example:
20
- Example:
Example Response:
[
{
"name": "PostgreSQL",
"clientUrl": "/pg",
"baseUrl": "https://demo.immuta.com/pg",
"config": {
"port": 5432
},
"displayOrder": 20
},
{
"name": "MySQL",
"clientUrl": "/mysql",
"baseUrl": "https://demo.immuta.com/mysql",
"config": {
"port": 3306
},
"displayOrder": 30
},
{
"name": "Apache Impala",
"clientUrl": "/impala",
"baseUrl": "https://demo.immuta.com/impala",
"config": {
"skipFileSizeChecks": true,
"port": 21050
},
"displayOrder": 120
},
{
"name": "Apache HDFS",
"clientUrl": "/hdfs",
"baseUrl": "https://demo.immuta.com/hdfs",
"config": {
"allowManualCrawl": true,
"disableMasking": true
},
"displayOrder": 110
}
]
Blob Handler Objects
The configuration parameters for each handler is dependent on the handler type. See each type below to properly configure your data source.
Available Default Blob Handlers:
ODBC Blob Handler Object
The ODBC blob handler is the most common blob handler in Immuta.
All configuration data for the ODBC handler is contained in a metadata
object.
ssl
(boolean): Set to true to enable SSL communication with the remote database.- Example:
true
- Example:
username
(string): The username used to connect to the remote database.- Example:
"testuser"
- Example:
password
(string): The password used to connect to the remote database.- Example:
"secretpassword"
- Example:
hostname
(string): The hostname of the remote database instance.- Example:
"test.immuta-db.com"
- Example:
port
(integer): The port of the remote database instance.- Example:
5432
- Example:
database
(string): The database in the remote database that corresponds to this handler.- Example:
"testdatabase"
- Example:
format
(string): The format of the blobs that will be returned for this handler (eitherjson
orcsv
).- Example:
"json"
- Example:
staleDataTolerance
(integer): The length in seconds that data from this handler can be cached.- Example:
1728000
- Example:
schema
(string): The schema in the remote database that corresponds to this handler.- Example:
"public"
- Example:
table
(string): The table in the remote database that corresponds to this handler.- Example:
"states"
- Example:
columns
(array[object]): The columns that should be included in this data source.- Example:
[{ "name": "id", "dataType": "integer" }, { "name": "description", "dataType": "character varying", "remoteColumn": "desc" }]
- Example:
blobId
(array[string]): The columns that compose the blob ID for records in this data source.- Example:
["id"]
- Example:
bodataTableName
(string): The name of the table associated with this data source that will be created in Postgres.- Example:
"test_data_source"
- Example:
dataSourceName
(string): The name of the data source to which this handler corresponds.- Example:
"Test Data Source"
- Example:
query
(string): The query that represents the data source. This query will be run to fetch blobs, stats, and the catalog for this data source.- Example:
"SELECT id, desc FROM states"
- Example:
connectionStringOptions
(string): Additional connection string options to be used when connecting to the remote database.- Example:
"sslmode=require;"
- Example:
sid
(string): SID/Database from Oracle TSN used to build connection string.- Example:
"mydb2"
- Example:
Example:
"handler": {
"metadata": {
"format": "json",
"staleDataTolerance": 1728000,
"ssl": true,
"username": "testuser",
"password": "secretpassword",
"hostname": "test-database.test-db.com",
"port": 5432,
"database": "test",
"schema": "public",
"table": "us_states",
"columns": [
{
"name": "gid",
"dataType": "integer"
},
{
"name": "name",
"dataType": "character varying"
},
{
"name": "stusps",
"dataType": "character varying"
}
],
"blobId": [],
"bodataTableName": "api_test_data_source",
"dataSourceName": "API Test Data Source"
}
}
HDFS Blob Handler Object
hostname
(string): The hostname of the HDFS NameNode.- Example:
"namenode.hdfs.immuta.com"
- Example:
port
(integer): The port of the HDFS NameNode.- Example:
8020
- Example:
directory
(string): The base directory in HDFS from which this handler will index files.- Example:
"/data/testdata"
- Example:
username
(string): The username the handler will use when connecting to HDFS.- Example:
"testuser"
- Example:
kerberos
(boolean): Whether or not the HDFS cluster is secured with Kerberos.- Example:
true
- Example:
realm
(string): The Kerberos realm the handler will use when connecting to HDFS.- Example:
"IMMUTA.COM"
- Example:
eventTimeAttribute
(string): The file x-attr to be used for blob event time. Use""
to use last modified date.- Example:
"processed"
- Example:
tagAttributes
(array[string]): An array of x-attr file attributes to be used to tag blobs.- Example:
["sometag"]
- Example:
featureAttributes
(array[string]): An array of x-attr file attributes to be used to generate blob features.- Example:
["userid", "color"]
- Example:
useDirectoryForTags
(boolean): Whether or not to use the HDFS directory structure as tags for the blobs.- Example:
true
- Example:
Example:
"handler": {
"metadata": {
"featureAttributes": [],
"tagAttributes": [],
"eventTimeAttribute": "",
"kerberos": true,
"realm": "IMMUTA.COM",
"username": "testuser",
"hostname": "namenode.hdfs.immuta.com",
"port": 8020,
"directory": "/data/testdata",
"useDirectoryForTags": true
}
}
Data Source Object
private
(boolean): Whether or not the data source should be publicly available in the Immuta Web UI.- Example:
true
- Example:
blobHandler
(object): A list of full URLs providing the locations of all blob store handlers to use with this data source.- Example:
{ "scheme":"https", "url":"" }
- Example:
blobHandlerType
(string): Describes the type of underlying blob handler that will be used with this data source (e.g., Custom, MS SQL).- Example:
"PostgreSQL"
- Example:
recordFormat
(string): The data format of blobs in this data source (e.g.,json
,xml
,html
,jpeg
).- Example:
"json"
- Example:
type
(string): The type of data source, whether it is ingested (metadata will exist in Immuta) or queryable (metadata is dynamically queried).- Example:
"queryable"
- Example:
name
(string): The name of the data source. Must be unique within the Immuta instance.- Example:
"Test API Data Source"
- Example:
sqlTableName
(string): A string that represents this data source's table in Postgres. Is either a foreign table or feature table.- Example:
"test_api_data_source"
- Example:
organization
(string): The organization that owns this data source.- Example:
"Test Org"
- Example:
category
(string): The category of the data source.- Example:
"Finance"
- Example:
description
(string): The description of the data source.- Example:
"This data source contains FY2017 finance information."
- Example:
owner
(object): Users and groups that should be added as owners to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.- Example:
{ "profiles": [3, 5], "groups": [4, 1999] }
- Example:
expert
(object): Users and groups that should be added as expert users to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.- Example:
{ "profiles": [87, 199], "groups": [324] }
- Example:
ingest
(object): Users and groups that should be added as ingest users to this data source. Profiles must be a list of profile IDs and groups must be a list of group ids.- Example:
{ "profiles": [34, 23], "groups": [32] }
- Example:
hasSamples
(boolean): Whether or not the data source contains samples.- Example:
false
- Example:
Example:
"dataSource": {
"private": true,
"useDatesAsDirectory": false,
"blobHandler": {
"scheme": "https",
"url": ""
},
"blobHandlerType": "PostgreSQL",
"recordFormat": "json",
"type": "queryable",
"name": "API Test Data Source",
"sqlTableName": "api_test_data_source",
"organization": "API Test Org",
"category": "API Test",
"description": "Test Data Source created using the Immuta API.",
"hasSamples": false,
"owner": {
"profiles": [2, 5],
"groups": [2]
},
"expert": {
"profiles": [3],
"groups": [9]
},
"ingest": {
"profiles": [],
"groups": []
}
}