Skip to content

Commit dbc7443

Browse files
authored
Merge pull request #29 from agile-lab/main
Adding business and security information , and contacts
2 parents 6f65b5c + 2a021b6 commit dbc7443

File tree

3 files changed

+108
-27
lines changed

3 files changed

+108
-27
lines changed

README.md

+29-12
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,28 @@ The fixed structure must be technology-agnostic. The first fields of teh fixed s
4040
* `Email: [Option[String]]` point of contact between consumers and maintainers of the Data Product. It could be the owner or a distribution list, but must be reliable and responsive.
4141
* `OwnerGroup [String]`: LDAP user/group that is owning the data product.
4242
* `DevGroup [String]`: LDAP user/group that is in charge to develop and maintain the data product.
43-
* `InformationSLA: [Option[String]]` describes what SLA the Data Product team is providing to answer additional information requests about the Data Product itself.
43+
* `SupportSLA: [Option[String]]` describes what SLA the Data Product team is providing when some support is needed.
44+
* `SupportHours: [Option[String]]` define when the suport is available. Ex During working days from 9 to 18
45+
* `ResponseTime: [Option[String]]` define the amount of time needed to take care of an incoming feature
46+
* `ResolutionTime: [Option[String]]` define the amount of time needed to fix the date
47+
* `InformationTime: [Option[String]]` define the amount of time needed to answer clarification questions.
4448
* `Status: [Option[String]]` this is an enum representing the status of this version of the Data Product. Allowed values are: `[Draft|Published|Retired]`. This is a metadata that communicates the overall status of the Data Product but is not reflected to the actual deployment status.
4549
* `Maturity: [Option[String]]` this is an enum to let the consumer understand if it is a tactical solution or not. It is really useful during migration from Data Warehouse or Data Lake. Allowed values are: `[Tactical|Strategic]`.
4650
* `Billing: [Option[Yaml]]` this is a free form key-value area where is possible to put information useful for resource tagging and billing.
4751
* `Tags: [Array[Yaml]]` Tag labels at DP level ( please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
52+
* `BusinessConcepts: [Array[Yaml]]` Link with Business Concepts coming from the Business Ontology/Glossary at DP level ( please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)). Source field must be "Glossary" and the href must link to the Uri of the external glossary or ontology
53+
* `SecurityInfo: [Yaml]` Security attributes provide guidance to understand who can access this Data Product and which authorizations are needed
54+
* `Confidentiality: [Option[String]]` This field indicates the level of confidentiality assigned to the data product. It defines how sensitive the data is and determines the access controls and protections that need to be in place. Common examples might include "Public," "Internal," "Confidential," or "Secret."
55+
* `Visibility: [Option[String]]` This field defines the scope of visibility for the data product. It dictates which users, teams, or systems can view or access the data. For example, it could specify whether the data is visible to only specific internal departments
56+
* `GDPR: [Option[String]]` This field indicates whether the data product is subject to the General Data Protection Regulation (GDPR), and if so, what specific measures or classifications apply. Yes or No
57+
* `BusinessInfo: [Yaml]`
58+
* `ValueProposition: [Option[String]]`: Describe the valu eproposition of the data product from a business standpoint
59+
* `ValueGeneration: [Option[String]]`: Define what kind of value this DP will generate. It could be a Foundation DP ( tipically a source aligned one), otherwise can be "Operation Monitoring" collecting information about the company processes and providing decision support, then "Revenue Generation" for those DP that can be directly monetized.
60+
* `StakeholderRoles: Array[String]`: List of stakeholders involved, interested and supporting this data product
61+
* `PricingType: [Option[String]]`: It could be Subscription or Pay as You Consume
62+
* `PricingInfo: [Yaml]`: Free structure field to describe the pricing structure of the data product
63+
* `StrategicInitiatives: Array[String]` Provides the linking between the Data Product and the strategic initiatives of the company, for example is possible to link Company OKR
64+
* `TargetConsumption: [Array[String]]` Define which are the ideal consumption cases for this data product. It could be analytics, reporting, online application, etc.
4865
* `Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific execution environment. It can also refer to an additional file. At this level we also embed all the information to provision the general infrastructure (resource groups, networking, etc.) needed for a specific Data Product. For example if a company decides to create a ResourceGroup for each data product and have a subscription reference for each domain and environment, it will be specified at this level. Also, it is recommended to put general security here, Azure Policy or IAM policies, VPC/Vnet, Subnet. This will be filled merging data defined at common level with values defined specifically for the selected environment.
4966

5067
The **unique identifier** of a Data Product is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a URN which ends in the following way: `$DPDomain:$DPName:$DPMajorVersion`.
@@ -81,18 +98,18 @@ Constraints:
8198
* `IntervalOfChange: [Option[String]]` how often changes in the data are reflected.
8299
* `Timeliness: [Option[String]]` the skew between the time that a business fact occurs and when it becomes visibile in the data.
83100
* `UpTime: [Option[String]]` the percentage of port availability.
84-
* `TermsAndConditions: [Option[String]]` If the data is usable only in specific environments.
85101
* `Endpoint: [Option[URL]]` this is the API endpoint that self-describe the output port and provide insightful information at runtime about the physical location of the data, the protocol must be used, etc.
86-
* `biTempBusinessTs: [Option[String]]` name of the field representing the business timestamp, as per the "bi-temporality" definition; it should match with a field in the related `Schema`
87-
* `biTempWriteTs: [Option[String]]` name of the field representing the technical (write) timestamp, as per the "bi-temporality" definition; it should match with a field in the related `Schema`
88-
* `DataSharingAgreement: [Yaml]` This part is covering usage, privacy, purpose, limitations and is independent by the data contract.
89-
* `Purpose: [Option[String]]` what is the goal of this data set.
90-
* `Billing: [Option[String]]` how a consumer will be charged back when it consumes this output port.
91-
* `Security: [Option[String]]` additional information related to security aspects, like restrictions, masking, sensibile information and privacy.
92-
* `IntendedUsage: [Option[String]]` any other information needed by the consumer in order to effectively consume the data, it could be related to technical stuff (e.g. extract no more than one year of data for good performances ) or to business domains (e.g. this data is only useful in the marketing domains).
93-
* `Limitations: [Option[String]]` If any limitation is present it must be made super clear to the consumers.
94-
* `LifeCycle: [Option[String]]` Describe how the data will be historicized and how and when it will be deleted.
95-
* `Confidentiality: [Option[String]]` Describe what a consumer should do to keep the information confidential, how to process and store it. Permission to share or report it.
102+
* `DataSharingAgreement: [Yaml]` This part is covering usage, privacy, purpose, limitations and is independent by the data contract.
103+
* `TermsAndConditions: [Option[String]]` If the data is usable only in specific environments.
104+
* `Purpose: [Option[String]]` what is the goal of this data set.
105+
* `Billing: [Option[String]]` how a consumer will be charged back when it consumes this output port.
106+
* `Security: [Option[String]]` additional information related to security aspects, like restrictions, masking, sensibile information and privacy.
107+
* `IntendedUsage: [Option[String]]` any other information needed by the consumer in order to effectively consume the data, it could be related to technical stuff (e.g. extract no more than one year of data for good performances ) or to business domains (e.g. this data is only useful in the marketing domains).
108+
* `Limitations: [Option[String]]` If any limitation is present it must be made super clear to the consumers.
109+
* `LifeCycle: [Option[String]]` Describe how the data will be historicized and how and when it will be deleted.
110+
* `Confidentiality: [Option[String]]` Describe what a consumer should do to keep the information confidential, how to process and store it. Permission to share or report it.
111+
* `biTempBusinessTs: [Option[String]]` name of the field representing the business timestamp, as per the "bi-temporality" definition; it should match with a field in the related `Schema`
112+
* `biTempWriteTs: [Option[String]]` name of the field representing the technical (write) timestamp, as per the "bi-temporality" definition; it should match with a field in the related `Schema`
96113
* `Tags: [Array[Yaml]]` Tag labels at OutputPort level, here we can have security classification for example (please refer to [OpenMetadata documentation](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/type/taglabel)).
97114
* `SampleData: [Option[Yaml]]` provides a sample data of your Output Port (please refer to [OpenMetadata specification](https://docs.open-metadata.org/v1.0.0/main-concepts/metadata-standard/schemas/entity/data/table#properties)).
98115
* `SemanticLinking: [Option[Yaml]]` here we can express semantic relationships between this output port and other outputports (also coming from other domains and data products). For example, we could say that column "customerId" of our SQL Output Port references the column "id" of the SQL Output Port of the "Customer" Data Product.

data-product-specification.cue

+33-5
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ import "strings"
3939
description?: string | null
4040
fullyQualifiedName?: string | null
4141
tags?: [... #OM_Tag]
42+
businessTerms?: [... #OM_Tag]
4243
constraint?: #OM_Constraint | null
4344
ordinalPosition?: number | null
4445
if dataType =~ "(?i)^(JSON)$" {
@@ -57,14 +58,13 @@ import "strings"
5758
upTime?: string | null
5859
...
5960
}
60-
termsAndConditions?: string | null
6161
endpoint?: #URL | null
62-
biTempBusinessTs?: string | null
63-
biTempWriteTs?: string | null
62+
dataSharingAgreement: #DataSharingAgreement
6463
...
6564
}
6665

6766
#DataSharingAgreement: {
67+
termsAndConditions?: string | null
6868
purpose?: string | null
6969
billing?: string | null
7070
security?: string | null
@@ -92,9 +92,11 @@ import "strings"
9292
retentionTime?: string | null
9393
processDescription?: string | null
9494
dataContract: #DataContract
95-
dataSharingAgreement: #DataSharingAgreement
95+
biTempBusinessTs?: string | null
96+
biTempWriteTs?: string | null
9697
tags: [... #OM_Tag]
9798
sampleData?: #OM_TableData | null
99+
sampleQuery?: string | null
98100
semanticLinking?: {...} | null
99101
specific: {...}
100102
...
@@ -189,10 +191,36 @@ dataProductOwnerDisplayName: string
189191
devGroup: string
190192
ownerGroup: string
191193
email?: string | null
192-
informationSLA?: string | null
194+
supportSLA: {
195+
supportHours: string | null
196+
responseTime: string | null
197+
resolutionTime: string | null
198+
informationTime: string | null
199+
}
193200
status?: string & =~"(?i)^(draft|published|retired)$" | null
194201
maturity?: string & =~"(?i)^(tactical|strategic)$" | null
195202
billing?: {...} | null
203+
businessInfo: {
204+
valueProposition: string | null
205+
valueGeneration?: string & =~"(?i)^(Foundation|RevenueGeneration|OperationMonitoring)$" | null
206+
strategicInitiatives: [... string] | null
207+
stakeholderRoles: [... string] | null
208+
pricingType: string & =~"(?i)^(PayPerUse|Subscription)$" | null
209+
pricingInfo: {...} | null
210+
...
211+
}
212+
securityInfo: {
213+
confidentiality: string & =~"(?i)^(Public|Internal|Confidential|Restricted|Secret)$"| null
214+
visibility: string & =~"(?i)^(Global|Department)$" | null
215+
GDPR: string & =~"(?i)^(Yes|No)$" | null
216+
...
217+
}
218+
contacts: {
219+
ownerContact: string
220+
suportContact: string
221+
}
222+
targetConsumption: [... string] | null
196223
tags: [... #OM_Tag]
224+
businessConcepts: [... #OM_Tag]
197225
specific: {...}
198226
components: [#Component, ...#Component]

example.yaml

+46-10
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,40 @@ dataProductOwnerDisplayName: Tom Smith
1111
email: mailto:distribution_list@corp.com
1212
ownerGroup: dataproduct1_corp.com
1313
devGroup: dataproduct1_dev_corp.com
14-
informationSLA: 2WD
14+
ownerGroup: dataproduct1_corp.com
15+
devGroup: dataproduct1_dev_corp.com
16+
supportSLA:
17+
supportHours: 8x5
18+
responseTime: 1H
19+
resolutionTime: undefined
20+
informationTime: 2WD
1521
status: DRAFT
1622
maturity: Strategic
1723
billing: {}
24+
businessInfo:
25+
valueProposition: Unlock some capability for the organization
26+
valueGeneration: OperationMonitoring
27+
okr: increase the margin
28+
pricingType: Subscription
29+
stakeholderRoles:
30+
- CMO
31+
securityInfo:
32+
visibility: Department
33+
confidentiality: Confidential
34+
gdpr: Yes
35+
contacts:
36+
ownerContact: paolo.platter@agilelab.it
37+
suportContact: support.finance@agilelab.it
38+
targetConsumption:
39+
- Analytics
40+
- Reporting
41+
- OnlineApplication
1842
tags: []
43+
businessConcepts:
44+
- tagFQN: Margin
45+
source: Glossary
46+
labelType: Manual
47+
state: Confirmed
1948
specific: {}
2049
components:
2150
- id: urn:dmb:cmp:my_domain:my_data_product:1:my_raw_s3_port
@@ -39,16 +68,17 @@ components:
3968
intervalOfChange: 1 hours
4069
timeliness: 1 minutes
4170
upTime: 99.9%
42-
termsAndConditions: only usable in development environment
71+
4372
endpoint: https://myurl/development/my_domain/my_data_product/1.0.0/my_raw_s3_port
44-
dataSharingAgreements:
45-
purpose: this output port want to provide a rich set of profitability KPIs related to the customer
46-
billing: 5$ for each full scan
47-
security: In order to consume this output port an additional security check with compliance must be done
48-
intendedUsage: the dataset is huge so it is recommended to extract maximum 1 year of data and to use these KPIs in the marketing or sales domain, but not for customer care
49-
limitations: is not possible to use this data without a compliance check
50-
lifeCycle: the maximum retention is 10 years, and eviction is happening on the first of january
51-
confidentiality: if you want to store this data somewhere else, PII columns must be masked
73+
dataSharingAgreements:
74+
termsAndConditions: only usable in development environment
75+
purpose: this output port want to provide a rich set of profitability KPIs related to the customer
76+
billing: 5$ for each full scan
77+
security: In order to consume this output port an additional security check with compliance must be done
78+
intendedUsage: the dataset is huge so it is recommended to extract maximum 1 year of data and to use these KPIs in the marketing or sales domain, but not for customer care
79+
limitations: is not possible to use this data without a compliance check
80+
lifeCycle: the maximum retention is 10 years, and eviction is happening on the first of january
81+
confidentiality: if you want to store this data somewhere else, PII columns must be masked
5282
tags:
5383
- tagFQN: experimental
5484
source: Tag
@@ -59,6 +89,7 @@ components:
5989
labelType: Manual
6090
state: Confirmed
6191
sampleData: {}
92+
sampleQuery: select * from dp.table
6293
semanticLinking: {}
6394
specific:
6495
directory: history
@@ -127,6 +158,11 @@ components:
127158
source: Tag
128159
labelType: Manual
129160
state: Confirmed
161+
businessTerms:
162+
- tagFQN: BusinessAddress
163+
source: Glossary
164+
labelType: Manual
165+
state: Confirmed
130166
- name: first_hire_date
131167
dataType: date
132168
description: the date of his/her first hire in mybank. No matter is a temporary or permanent contract

0 commit comments

Comments
 (0)