Skip to content

Derived Data Sources

Audience: Data Owners and Data Users

Content Summary: This page describes how derived data sources inherit policies from their parent sources. For a tutorial, see Create a Derived Data Source.

Policy Inheritance

Demo: Automated Policy Inheritance

Introduction

A derived data source is a data source that is created within an equalized project and contains data from its parent sources. Consequently, when the derived data source is created, it will inherit the needed Data Policies from those parent data sources to keep the data secure.

Policy inheritance for derived data sources is a feature unique to the environment that an equalized project creates. Within the equalized project every data user sees the same data and work can be shared and collaborated on without any risk of a user viewing more than they should. When a derived data source is created, it inherits the Data Policies from its parent sources and a Subscription Policy is created from the equalized entitlements on the project, allowing project members to safely share secure data.

Let's look at an example.

Example

Consider these data sources, within an equalized Project 1, that each contain Subscription and Data Policies:

Data Source A

Subscription Policy: Allow users to subscribe to the data source when user is a member of group Medical Claims

Data Policies:

  • Mask by making null the value in the column(s) city except for members of group Legal
  • Mask by making null the value in the column(s) gender for everyone

Data Source A

Data Source B

Subscription Policy: Allow users to subscribe to the data source when user is approved by anyone with permission Owner and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

Data Source B

If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:

Data Source C

Subscription Policy: Allow user to subscribe when they satisfy all of the following:

  • is a member of group Legal and is a member of group Medical Claims
  • is approved by anyone with permission Owner (of Data Source B) and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

Data Source C

Derived Data Sources Inherit Policies from Parent Sources

Sensitive Data Detection applies Discovered tags to derived data sources; however, because they inherit policies from their parent sources, the Global Policies that contain these tags will not apply to derived data sources.

Limitations

Now there are a few details to note here,

  • Notice that one of the Data Policies in Data Source A, mask by making null the value in the column(s) gender for everyone, is not included in Data Source C. This is because the creator could not have seen the values in the parent sources; therefore, there are no values in the derived data source to be masked.

    Most Local Data Policies will not need to be present in the derived data source with the exception of limit usage to purpose(s) policies. And no Global Policies will be added to a derived data source.

  • Data Source C's policies are reliant on which groups are in the project, and as the groups change so do the policies.

    For example, if there were a data user in the project who was not in the Legal group, then that trait would not be needed in the Subscription Policy because, with equalization, those values would not be visible to the project members in the parent data source.

All of the project members are in the groups Medical and Legal, so those groups are a part of the Subscription Policy, seen here.

Data Source C

Some project members have both groups Medical and Legal; however, one project member only has the group Medical Claims. Once the project has been equalized, everyone sees the same amount as the member with the least permissions; so the Subscription Policy only has to have the traits of that member.

Data Source C: Option 2

  • The Subscription and Data Policies in the derived data source will always be the minimum required permissions and traits because of project equalization.

  • Data Source C's policies will not adapt with the parent data sources. Any changes in the parent data source policies will be logged in the Lineage tab of the derived data source page but will not be changed in the derived data source policies.

    Lineage Tab

    The data owner may choose to add new Local Data Policies to the derived data source to keep up with any changes, but the inherited policies are not adjustable.

  • Any changes within the parent data source's data will not trickle down into the derived data source. After the creation of the derived data source, their relationship is for auditing and lineage, not for updating content.

Using Data Outside the Project

If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.

For detailed instructions on creating a derived data source, navigate to Create a Derived Data Source.