Run Sensitive Data Discovery on Data Sources
Note
In previous documentation, rule is referred to as classifier or identifier and framework is referred to as template.
Attributes overview
All the attributes for rules and frameworks are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.
Attribute | Description |
---|---|
sources | string The name of the data sources to apply the framework to. |
all | boolean If true , SDD will run on all Immuta data sources. The default is false . |
wait | integer The number of seconds to wait for the SDD jobs to finish. The value -1 will wait until the jobs complete. The default is -1 . |
dryRun | boolean When true , SDD will not update the tags on the data source(s) and will just return what tags would have been applied or removed. See this section for an example. Default is false . |
template | string If passed, Immuta will run SDD with this framework instead of the applied framework on the data source(s). Passing template when dryRun is false will cause an error. |
Run SDD on data sources
-
Specify the data sources you would like to run SDD on, and save the payload in a .json file.
{ "sources": [ "Insurance Data" ] }
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
{ "all": true }
-
Trigger SDD using one of these methods:
Immuta CLI
immuta api sdd/run -X POST --input ./example-payload.json
HTTP API
curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer dea464c07bd07300095caa8" \ --data @example-payload.json \ https://your-immuta-url.immuta.com/sdd/run
If sensitive data discovery was successfully run, you will receive a response similar to this:
{
"Insurance Data": {
"id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
"state": "completed",
"output": {
"diff": {
"addedTags": {
"ssn": [
"Discovered.PII"
],
"email": [
"Discovered.PII"
]
},
"removedTags": {
"ssn": [
"Discovered.Country.US"
]
}
},
"sddTagResult": {
"ssn": [
"Discovered.Entity.Social Security Number",
"Discovered.Identifier Direct",
"Discovered.PHI",
"Discovered.PII"
],
"email": [
"Discovered.Entity.Electronic Mail Address",
"Discovered.Identifier Direct",
"Discovered.PHI",
"Discovered.PII"
]
}
}
}
}
Test SDD on a data source
Users can test how SDD will apply tags to their data sources by completing a dryRun
, which allows users to test
frameworks and tags:
-
Test frameworks: If a framework is specified in the payload when the
dryRun
istrue
, SDD will use this framework instead of the framework applied to the data source. Note: SDD will error if a framework is specified here whendryRun
isfalse
. -
Test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not rules or frameworks are applying tags correctly without updating the data source.
After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a framework before triggering SDD again.
To complete a dryRun
,
-
Specify the data sources you would like to run sensitive data discovery on and set
dryRun
totrue
in the payload in a .json file. Note: You can also apply a framework to a data source as adryRun
, like in the example below. However, whendryRun
isfalse
, a framework cannot be included in the payload. Instead, the framework must be added to the data source before running SDD.{ "sources": [ "Medical Claims" ], "dryRun": true, "template": "PII_REVISION" }
-
Trigger SDD using one of these methods:
Immuta CLI
immuta api sdd/run -X POST --input ./example-payload.json
HTTP API
curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer dea464c07bd07300095caa8" \ --data @example-payload.json \ https://your-immuta-url.immuta.com/sdd/run
-
You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:
{ "Medical Claims": { "id": "86fc4f70-380f-11ec-a432-81748c911385", "state": "completed", "output": { "diff": { "addedTags": {}, "removedTags": { "dob": [ "Discovered.Entity.Date", "Discovered.Entity.Date of Birth", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ], "ssn": [ "Discovered.Country.US", "Discovered.Entity.Social Security Number", "Discovered.Identifier Direct", "Discovered.PHI" ], "state": [ "Discovered.Country.US", "Discovered.Entity.Location", "Discovered.Entity.State", "Discovered.Identifier Indirect" ], "gender": [ "Discovered.Entity.Gender", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ], "date_of_service": [ "Discovered.Entity.Date", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ] } }, "sddTagResult": { "ssn": [ "Discovered.PII" ] } } } }
-
Once you are satisfied with how tags are applied by SDD, set
dryRun
tofalse
or omit it from the payload.{ "sources": [ "Medical Claims" ], "dryRun": false }
-
Trigger SDD again:
Immuta CLI
immuta api sdd/run -X POST --input ./example-payload.json
HTTP API
curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer dea464c07bd07300095caa8" \ --data @example-payload.json \ https://your-immuta-url.immuta.com/sdd/run
-
If the request was successful, you will receive a response similar to this one:
{ "Medical Claims": { "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04", "state": "completed", "output": { "diff": { "addedTags": {}, "removedTags": { "dob": [ "Discovered.Entity.Date", "Discovered.Entity.Date of Birth", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ], "ssn": [ "Discovered.Country.US", "Discovered.Entity.Social Security Number", "Discovered.Identifier Direct", "Discovered.PHI" ], "state": [ "Discovered.Country.US", "Discovered.Entity.Location", "Discovered.Entity.State", "Discovered.Identifier Indirect" ], "gender": [ "Discovered.Entity.Gender", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ], "date_of_service": [ "Discovered.Entity.Date", "Discovered.Identifier Indirect", "Discovered.PHI", "Discovered.PII" ] } }, "sddTagResult": { "ssn": [ "Discovered.PII" ] } } } }
What's next
Continue to one of the following tutorials:
- Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
- Create a framework: Although only data governors can create rules, data owners
can add rules to frameworks, which they then apply to their data sources to override
minConfidence
or tags for rules within the framework. - Create a rule: Data governors can create rules to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.