---- CDMC diagram ----
This document is a constituent part of the Cloud Data Management Capabilities (CDMC™) model (“the Model”) and is provided as a free license to any organization registered with EDM Council Inc. (“EDM Council”) as a recipient (“Recipient”) of the document. While this is a Free License available to both members and non-members of the EDM Council, acceptance of the CDMC Terms of Use is required to protect the Recipient’s use of proprietary EDMC property and to notify the Recipient of future updates to the Model.
CDMC™ and all related materials are the sole property of EDM Council Inc. All rights, titles and interests therein are vested in the EDM Council. The Model and related material may be used freely by the Recipient for their own internal purposes. It may only be distributed beyond the Recipient’s organization with prior written authorization of EDM Council. The Model may only be used by the Recipient for commercial purposes or external assessments if the Recipient’s organization has entered into a separate licensing and Authorized Partner Agreement with EDM Council governing the terms for such use.
Please accept these CDMC™ Terms of Use by registering at:
https://app.smartsheet.com/b/form/6e2b0bf4a3024affb98daad174b08483
FOREWORD – JOHN BOTTEGA, EDMC PRESIDENT
Introduction
When industry identifies a challenge, it’s amazing what can be done when talented people collaborate. This is the underlying story of CDMC – Cloud Data Management Capability Framework.
The art of data management has evolved. Once thought of as a behind the scenes technology function, understanding, curating, protecting and using our information resource is a front and center business, technology and operations function. Data is now the life-blood of our industry and our personal lives. As data professionals, we have a responsibility to ensure information is accurate, timely, trusted, and protected and that it is being put to use effectively and ethically.
It is this goal that has propelled the profession of data management. Chief Data Officers, Heads of Data Quality, Data Governance and Data Architecture are becoming commonplace in our businesses. We now bear the responsibility of curating information from a defensive posture—controlling risk, privacy, safety and security, as well as from an offensive posture—increasing revenue, penetrating new markets, developing new products and services.
To better equip the data professional, the EDM Council developed a data management best practice framework known as DCAM – Data Management Capability Assessment Framework.
------- logo with right copy --------
For 16 months, this team worked tirelessly to build a cloud data management framework that would help the industry better manage data in the cloud, better protect data in the cloud, and better enable organizations to realize the benefits of the cloud environment.
------- second logo with right copy --------
Sincerely,
John Bottega
President, EDM Council
Acknowledgements
We would like to provide special acknowledgement to our CDMC Co-Chairs Oli Bage (LSEG) and Richard Perris (Morgan Stanley) for both their founding inspiration in advocating the CDMC Project to the EDM Council and for their extraordinary CDMC contributions and leadership over the last 18 months. Additionally, special thanks to Morgan Stanley for donating the initial draft of cloud principles that helped jump start the CDMC Project in the early days. Finally, special acknowledgement to our CDMC Project Manager, Jubair Patel (Microsoft formerly with Capco), who with steadfast support from the Capco team, kept the global CDMC project on track and was also an exemplary cloud subject matter contributor.
Over 100 companies have contributed to the production of the CDMC Framework:
- Cloud Service Providers: Amazon AWS, Google, IBM and Microsoft
- Leading financial organizations, including: Barclays, Citi Bank, Credit Suisse, Deutsche Bank, DTCC, Fannie Mae, Freddie Mac, Goldman Sachs, HSBC, JP Morgan, LSEG, M&G, Morgan Stanley, Societie Generale, Standard Bank, Sterling National Bank, TD Bank and UBS
- Other major organizations, including: CPA Canada and Schneider Electric
- Technology Providers, including: BigID, Collibra, Informatica, Privitar, Securiti, Solidatus and Snowflake
- Consultancies and System Integrators, including: Accenture, Capco, KPMG and Ortecha
EDM Council would like to thank the 300+ individuals who have participated. Those who have provided permission to be named are listed in the following document:
https://edmcouncil.org/resource/resmgr/cdmc_master/CDMC_Framework_Acknowledgeme.pdf
Revision History
------- insert revision history table ---------
Introduction
Purpose
Digital transformation is fundamentally changing how we do business – personally and professionally. Much of this transformation is taking place in the cloud environment across the globe. Cloud implementations are occurring in all sectors across all industries. There are many benefits of managing and storing data in a cloud environment, including cost savings, flexibility, mobility, improved information security, increased collaboration, and realizing new insights within an organization’s data assets.
As with any new technology, cloud computing entails many challenges. New cloud implementations face a variety of data, technology and planning difficulties. There remains a lack of consistent industry best practices for applying data management capabilities during migrations to and operations in single, multiple and hybrid cloud environments.
Consequently, an organization will likely face cost and complexity risks when adopting cloud computing technologies. Adoption can be especially difficult for regulated entities that must demonstrate precise, consistent data control in both on-premises and cloud environments. Cloud service providers (CSPs) and technology providers also face complexity as they seek to understand the data management priorities of organizations, resulting in challenges to improving their cloud implementations.
The Cloud Data Management Capabilities (CDMC™) Framework defines the best practice capabilities necessary to manage and control data in cloud environments. The creation of this framework represents an important milestone in the global adoption of industry best practices for data management. The overall objective is to build trust, confidence and dependability for the adoption of cloud technologies, offering benefits to each of the constituencies within the cloud ecosystem:
- Cloud Service and Technology Consumers – provides a structured framework of auditable processes and controls, especially for sensitive data.
- Cloud service providers – provides requirements and controls that can be automated within CSP platforms, accelerating adoption and increasing market confidence.
- Application, Technology and Data Providers – applies standard, certified CDMC capabilities and controls to services and solutions to ensure a high degree of reliability and operational effectiveness.
- Consultants and System Integrators – enables training and assessments, gap analysis, strategy development, and execution services for end clients adopting cloud technologies.
- Regulators – provides industry guidance for auditing and validating key cloud environment controls, especially for sensitive data.
CDMC is a best practice assessment and certification framework for managing and controlling data in single, multiple, and hybrid cloud environments. CDMC is used to assess the capabilities of an organization that are necessary to support controlled integration and migration to the cloud environments. The framework focuses and expands on capabilities critical to controlling important and sensitive data and highlights features of contemporary cloud platforms that present opportunities for standardization and automation of data management and control.
Though CDMC is a standalone framework, it assumes that an organization already has a strong foundation of data management capabilities. A broader set of capabilities is covered in other frameworks such as the Data Management Capability Assessment Model (DCAM®) of the EDM Council. Effective data management fundamentals, together with the features and capabilities defined in CDMC, will enable an organization to build trustworthy and secure cloud environments—both now and well into the future.
Approach
CDMC was produced by the EDM Council CDMC Work Group formed in May 2020 with over 300 individual business executives, engineers, technologists and data professionals. The group includes participants from over 100 organizations across the globe, including major CSPs, technology service organizations, privacy firms and major consultancy and advisory firms. The objectives of the initiative were to:
- Develop a framework that provides direction and guidance on core data management capabilities in cloud data management aligned with industry best practices.
- Develop a consistent CDMC scoring model for industry organizations to measure maturity and readiness against the cloud data management capabilities.
- Collaborate with cloud service and technology providers and industry organizations on a set of priorities for accelerating capabilities for cloud migration and implementations while allowing cloud service and technology providers the opportunity to apply their unique innovations and services to meet these industry requirements.
- Establish methods to continuously improve the CDMC Framework and facilitate training and education on these best practices.
The structure of CDMC and the approach to its creation leveraged the structure and approach of the DCAM® framework, which the EDM Council has maintained since 2014.
CDMC – A FRAMEWORK FOR CLOUD DATA MANAGEMENT
Many organizations must establish a broad set of controls to manage data responsibly and comply with applicable regulatory entities. Standards and best practices enable an organization to harness the enormous opportunity offered by cloud technologies while avoiding the challenges of developing and adapting home-grown controls and spending time on isolated feature requests between individual companies and CSPs.
Controlling data in cloud environments requires a complex set of data management capabilities:
- An organization must establish clear accountability, controls and governance for data migrated to or created in cloud environments.
- A critical requirement is always to know what data resides in cloud environments and the sensitivity of each of the data assets. Such tracking is essential to automating controls for data access and use. Tracking is also vital to enforcing the controls and maintaining evidence for required transparency, security, and protection levels.
- Data management controls must be established throughout the data lifecycle.
- Data assets must be fit-for-purpose and kept to required schedules for retention and archiving.
- As with on-premises data assets, the design of the data architecture and configuration of supporting technologies are important for ensuring that business objectives are met.
CDMC captures the requirements for these capabilities in six areas. These six Components of the framework include 14 Capabilities and a total of 37 Sub-capabilities. The definition and scope of each component are presented below:
---- insert CDMC diagram ----
1.0 Governance & Accountability
The Governance & Accountability component is a set of capabilities that ensure an organization has clear accountability, controls and governance for data migrated to or created in cloud environments. These capabilities provide the foundation of well-governed business cases, effective data ownership, governance of data sourcing and consumption and management of data sovereignty and cross-border data movement risks.
This CDMC component helps to:
- Define business cases for managing data in cloud environments, including a value realization framework.
- Ensure that the roles and responsibilities of data owners extend to data in cloud environments.
- Ensure that data sourcing is managed with authoritative sources and authorized distributors.
- Exploit opportunities for automation in the cloud environment to support governance of data consumption.
- Improve understanding of the requirements for managing data sovereignty and cross-border data movement risks.
- Implement controls for data sovereignty and cross-border data movement risk.
2.0 Cataloguing & Classification
The Cataloging & Classification component is a set of capabilities for creating, maintaining and using data catalogs that are both comprehensive and consistent. This component includes classifications for information sensitivity. These capabilities ensure that data managed in cloud environments is easily discoverable, readily understandable and supports well-controlled, efficient data use and reuse.
This CDMC component helps to:
- Define the scope and granularity of data to be cataloged.
- Define the characteristics of data as metadata.
- Catalog the data and the data sources.
- Connect the metadata among multiple sources.
- Share metadata with authorized users to promote discovery, reuse and access.
- Enable sharing of metadata and data discovery across multiple catalogs, platforms and applications.
- Define, apply and use the information sensitivity classifications.
3.0 Accessibility & Usage
The Accessibility & Usage component is a set of capabilities to manage, enforce and track entitlements and to ensure that data access, use and outcomes of data operations are done in an appropriate and ethical matter.
This CDMC component helps to:
- Express and capture data rights and obligations as metadata.
- Ensure that parties respect data rights and obligations over data they are entitled to access.
- Track and report on data access for both regulatory compliance and billing purposes.
- Establish formal organization structures for oversight of data ethics.
- Operationalize ethical access and use of data and ethical outcomes of data decisions.
4.0 Protection & Privacy
The Protection & Privacy component is a set of capabilities for collecting evidence that demonstrates compliance with the organizational policy for data sensitivity and protection. The purpose of these capabilities is to ensure that all sensitive data has adequate protection from compromise or loss as required by regulatory, industry and ethical obligations.
This CDMC component helps to ensure that:
- Data loss protection regimes are implemented.
- Evidence is gathered to demonstrate the application of required data security controls has been accomplished.
- A data privacy framework is defined and approved.
- A data privacy framework is operational.
- Data obfuscation techniques are applied to all data types according to classification and security policies.
5.0 Data Lifecycle
The Data Lifecycle component is a set of capabilities for defining and applying a data lifecycle management framework and ensuring that data quality in cloud environments is managed across the data lifecycle.
This CDMC component helps to:
- Define, adopt and implement a data lifecycle management framework.
- Ensure that data at all stages of the data lifecycle is properly managed.
- Define, code, maintain and deploy data quality rules.
- Implement processes to measure data quality, publish metrics and remediate data quality issues.
6.0 Data & Technical Architecture
The Data & Technical Architecture component is a set of capabilities for ensuring that data movement into, out of and within cloud environments is understood and that architectural guidance is provided on key aspects of the design of cloud computing solutions.
This CDMC component helps to:
- Establish and apply principles for data availability and resilience.
- Support business requirements for backup and point-in-time recovery of data.
- Facilitate optimization of the usage and associated costs of cloud services.
- Support data portability and the ability to migrate between cloud service providers.
- Automate identifying data processes and flows within and between cloud environments while capturing metadata to describe data movement as it passes along the data supply chain.
- Identify, track and manage changes to data lineage, and provide the ability to explain lineage at a point-in-time.
- Provide tooling to report and visualize lineage such that the outputs are meaningful from a business and technical perspective.
Structure of CDMC
As introduced above, CDMC is organized into six components. Each component is preceded with a definition that describes the components, explains why it is important and explains how it relates to the overall cloud data management process. These definitions are written for business and operational executives to understand the cloud data management process better. The components are organized into 14 capabilities and 37 sub-capabilities. The capabilities and sub-capabilities are the essences of the CDMC Framework. They define the goals of data management at a practical level and establish the operational requirements that are needed for sustainable cloud data management. Each sub-capability has a corresponding set of measurement criteria. The measurements are used in an assessment of your cloud data management journey.
---- insert graphic ----
- Component – a group of capabilities that together deliver a foundational tenet of cloud data management. A component functions as a reference guide for data practitioners who are accountable for executing the tenet.
- Upper Matter – high-level context for the component—used as a background for understanding the component by data practitioners.
- Definition – formal description of the component—supporting common data management understanding and language.
- Scope – a set of statements to establish the guardrails for what is included in the component—used to understand and communicate reasonable boundaries.
- Overview – more detailed context and accounting at a practical level to understand the operational execution required for sustainable cloud data management—used as a guide by the respective data practitioners.
- Value Proposition – a set of statements to identify the business value of delivering the cloud data management component—used to inform the varied business cases for developing the data management initiative.
- Core Questions – high-level but probing inquiries—used to explore the cloud data management component.
- Core Artifacts – artifacts required to execute the capability—used to understand deliverables required to support the capability.
- Capability – a group of sub-capabilities that together execute tasks and achieve the stated objectives used as a reference tool by the data practitioners accountable for the execution.
- Description – brief aggregate explanation of what is included in the sub-capabilities required to achieve the capability—used in the assessment process to inform the respondent of the scope of what they are rating.
- Sub-Capability – more granular activities required to achieve the capability—used as a reference tool by the data practitioners accountable for the execution.
- Description – a brief explanation of what is included in the sub-capability—used in the assessment process to inform the respondent of the scope of what they are rating.
- Objective – identified goals or desired outcomes from executing the sub-capability—used as a basis for defining cloud data management process design requirements.
- Advice for Data Practitioners – more detailed but casual insight on the best practices of how to execute the sub-capability with an audit review perspective—used by the data practitioner.
- Advice for Cloud Service and Technology Providers – more detailed but casual insight on how cloud technologies can support the sub-capability—used by cloud service and technology providers.
- Questions – inquiries to direct interrogation of the capability/sub-capability current-state—used by the data practitioner to inform a perspective of the assessment scoring.
- Artifacts – required documents or evidence of adherence—used for assessment and audit reference and to link to supporting best practice material—when available.
- Scoring Guidance – insight for defining an assessment score—used when completing an assessment survey.
Each CDMC Component includes references to Key Controls & Automations, which are specifications of key controls that must be established at the capability level and highlight opportunities to support the control with automation. These are used as a reference tool by data practitioners accountable for the controls and cloud service and technology providers who support their implementation and automation.
CDMC Use Cases
Organizations can use CDMC in multiple ways:
- As a well-defined control framework.
- As a tool to assess readiness for migration to and operation in cloud environments.
- As a certification model for cloud service and technology consumers.
- As a certification model for cloud service and technology providers.
Framework
When an organization adopts the standard CDMC Framework, it introduces a consistent understanding and way of describing cloud data management. CDMC is a comprehensive framework—presented as a best practice paradigm—of the capabilities required to manage data in single, multiple and hybrid cloud environments. It helps accelerate the development of a cloud data management initiative and make it operational. The CDMC Framework:
- Provides a common and measurable cloud data management framework.
- Establishes common language for the practice of cloud data management.
- Translates industry experience and expertise into operational standards.
- Documents cloud data management capability requirements.
- Proposes evidence-based artifacts.
Assessment
Performing an assessment measures the readiness of an organization to migrate to and operate in cloud environments. The assessment produces results that translate the practice of cloud data management into a quantifiable science. The benefits that an organization can gain from assessment outcomes include:
- Baseline measurement of the cloud data management capabilities in the organization compared to an industry standard.
- Quantifiable measurement of the organization’s progress in establishing the required cloud data management capabilities into its operations.
- Identification of cloud data management capability gaps to inform a prioritized roadmap for future development and improvement.
- Focused attention to the funding requirements of the cloud data management initiative.
Effective use of the CDMC Framework as an assessment tool requires the definition of the assessment objectives and strategy, planning for the management of the assessment and adequate training of the participants to establish a base understanding of the framework. Organizations may either perform a self-assessment or may engage the services of a CDMC Authorized Partner to perform an independent assessment.
CDMC Scoring Guide
The CDMC Framework is designed to assess which phase of attainment the organization reaches for each capability. It is not an assessment of the maturity or scope to which the organization has applied the capabilities. The scoring scheme used throughout the framework is as follows:
---- insert six columns ----
A CDMC assessment must also examine if the key controls have been established. This measurement provides a binary result for each control—the control is either established or not established.
Certification - Consumers
Organizations that achieve all capabilities and establish all key controls can obtain the CDMC Certification. This certification process involves an independent assessment of the achievement of the capabilities and the existence of the controls performed by a CDMC Authorized Partner. If successful, the organization receives a CDMC digital certificate issued by the EDM Council and remains valid for 12 months. This certification is similar to other cloud computing certification programs such as SOC2.
Certification - Providers
Cloud service providers or cloud technology and solution providers can subject their platforms and products to a certification assessment against all or relevant CDMC Key Controls elements to protect sensitive data in cloud environments. An independent CDMC Authorized Partner must perform this certification assessment. Upon successfully completing a certification assessment, the EDM Council will issue a CDMC digital certificate that remains valid for 12 months. This certificate can be commercially represented in the market to indicate that the platform or product supports the respective CDMC Key Controls.
Support Materials
Additional materials support the CDMC Framework presented in this document in the following resources.
CDMC Controls Test Specifications
Specifications of the CDMC Key Controls tests within the framework form the basis of cloud products and services certification against the framework.
Reference: CDMC Controls Test Specification Version 1.1 – to be published Q4 2021
CDMC Information Model
An ontology that draws on and combines related open frameworks and standards to describe the information required to support cloud data management. This ontology provides a foundation for the interoperability of data catalogs and automation of controls across cloud service and technology providers.
Reference: CDMC Information Model Version 1.1 – to be published Q4 2021
CDMC Management Requirements Model
A generic model of data management requirements with mappings to both CDMC and DCAM capabilities shows the relationship and dependencies CDMC capabilities have on basic data management capabilities.
Reference: Data Management Requirements Model V1.1 – to be published Q4 2021
Training
The EDM Council and Authorized Partners offer a 2-day training course on the CDMC Framework.
Reference: https://edmcouncil.org/page/CDMCTraining
Business Glossary
The EDM Council has developed a data management business glossary containing approximately 200 data management term names and definitions. CDMC v1.1 has applied these terms consistently across the document. Where a term is defined in the glossary, the word or phrase is italicized and underlined in the text.
The business glossary is available via the following link: https://www.edmcportal.org/glossary/.
1.0 Governance & Accountability
------- insert component graphic ---------
Upper Matter
Introduction
Governance and accountability are the backbones of the successful management of data in cloud environments. The cloud environment introduces challenges and opportunities for scale, standardization, automation, and the shared responsibility model. Consequently, it is important to apply an effective data governance program to data that resides in a cloud environment. All stakeholders should have a clear understanding of data controls and accountability for each role. The approach is similar to how data governance, controls and accountability are applied to conventional data management in an organization.
Description
The Governance & Accountability component is a set of capabilities that ensure an organization has clear accountability, controls and governance for data migrated to or created in cloud environments. These capabilities provide the foundation of well-governed business cases, effective data ownership, governance of data sourcing and consumption and management of data sovereignty and cross-border data movement risks.
Scope
- Defining business cases for managing data in the cloud, including a value realization framework.
- Ensure the roles and responsibilities of data owners extend to data in the cloud.
- Ensure that data sourcing is managed with authoritative sources and authorized distributors.
- Leverage cloud automation opportunities in the governance of data consumption.
- Understand requirements for managing data sovereignty and cross-border data movement risks.
- Implement controls for data sovereignty and cross-border data movement risk.
Overview
Business cases for cloud data management must articulate how to manage risk, deliver value, and align with the organization’s overall business, data, and cloud computing strategies. The business cases should provide a basis for ensuring there is accountability for the quality of the outcomes.
Business cases
Business cases for managing data in a cloud environment need to outline planned activities, dependencies, risks (including plans to mitigate risks, where feasible), timelines, exit strategies, and outcomes based on the use case for that data. The value to be realized as a part of outcomes should link directly to the organization’s broader data management strategy and cloud strategy. A framework of measures, metrics or key performance indicators must be established to demonstrate progress throughout the cloud data management implementation. The framework should include the depth of distinct capabilities matured (such as the number of personas with separate role-based access controls) and the coverage across the spectrum of capabilities ( such as the number of users securely accessing the cloud data management catalog).
Cloud data management business cases must be approved by an appropriate authority and sponsored by accountable stakeholders. Successfully managing data in cloud environments requires substantial support from both business and technology stakeholders within an organization. The interests of these groups need to be aligned before deployment and consistently represented through deployment.
Cloud data management business cases must be enforceable and periodically reviewed by sponsors throughout the deployment. Reviews should compare the original data strategy and cloud strategy that the business case was founded on against the details of interim outcomes and milestones achieved. Acceleration or deceleration of activities within the business case should be considered according to changing the cloud environment and data priorities.
The business cases should outline the key benefits of managing data in the cloud. While cost reduction and risk mitigation benefits are more tangible and easier to project, value-added features are critical to gaining approval from business stakeholders. The benefits should be demonstrated regularly to maintain momentum. Examples include:
- Scalability and transparency in managing the products and analytics outputs of data science teams.
- Better utilization of data management resources by simplifying capacity management.
- Availability of marketplace solutions and accelerators to rapidly mature data management capabilities.
- Controlled democratization of data access resulting from centralized storage in the cloud.
- The value from eliminating fixed capital costs and flexibility in the provisioning infrastructure comes at the expense of increased difficulty in forecasting future costs—appropriate mitigating controls should be included in the business case.
- Performing early experimentation and prototyping, enabling the pursuit of quick wins at relatively low risk.
There are potential sources of business value realized from managing data in a cloud environment that cannot be easily replicated for data managed on-premises, such as:
- Concentrating enterprise data management tools, including ease of integration and standardization (storage, compliance, cataloging, analytics, security, lineage, sourcing, quality) into fewer providers, reduces architectural variances and complexity.
- Provisioning data management infrastructure on variable schedules to account for performance fluctuations.
- Centralizing management of operating expenses to automatically correct for misutilized resources and systemically track the project budget against the plan.
- Data management in the cloud can be decoupled from any existing data management conducted on-premises, removing the requirement to consider constraints posed by integrating legacy architectures into the data management solution in the business case.
Data ownership
Data ownership is fundamental to successful data governance, regardless of whether data is on-premises or resides in a cloud environment. Effective data ownership is an enabler for cloud adoption and can drive how an organization leverages new capabilities available in the cloud. It is essential that data ownership is well established and that the responsibilities of data owners extend across all environments.
Data ownership is critical to ensuring the appropriate governance of data in the cloud. The data owner has overall accountability for the meaning, content, quality, distribution and use of a given data set. The data owner is supported by other roles such as data stewards, data architects and metadata managers in executing this accountability.
An important responsibility of a data owner is to ensure that the authoritative sources and authorized distributors of their data are identified and consumption from non-authoritative sources is governed. This importance is accentuated in the cloud, multi-cloud and hybrid cloud environments—where there is increased potential of the unnecessary proliferation of copies of data. Data management in a cloud environment offers the opportunity to support data sourcing and consumption governance with automation. One example of this is automating denial of data consumption from non-authoritative sources.
Data owners also play a role in ensuring data sovereignty requirements are understood and addressed in managing risks associated with cross-border data movement. Data sovereignty requirements are another area in which cloud computing can increase the potential for a wider geographic footprint of data storage and consumption and offer the opportunity to automate control of data sovereignty and cross-border data movement risks.
Characteristics of a data owner may include someone who has a good understanding of the meaning and purpose of the data; should be aligned to and familiar with the business areas with which the data is associated; should have a good understanding of the related business processes and outputs, and should be aware of data consumers to consider the impact of changes to the data.
Data ownership is agnostic to cloud service providers, except when the cloud provider generates the data, such as API or app log files. Ownership is not impacted when data is moved between cloud service providers, and the ownership of the technical data (log files) should not change with each cloud provider. Ownership is the sole responsibility of the organization, not the cloud provider. Cloud service providers should deliver the capability to execute data ownership activities for all data objects.
The effects of cloud service providers on data ownership include:
- Addressing ownership of new data types, such as log files, that the cloud service providers generate.
- Ensuring compliance with data sovereignty requirements in environments where data can be easily moved across borders. The data owner’s responsibility for establishing guidelines and controls for data sovereignty increases because of the broad geographic footprint of cloud computing and the abilities of some global data services.
- Understanding the controls available for cloud-managed data and the support available for executing data ownership responsibilities. The design and implementation of controls for cloud-managed data may differ from on-premises controls. Data owners should be familiar with the differences to ensure their adequate protection of cloud-managed data. While the controls remain consistent, the implementations of those controls may vary with each cloud service provider.
Data sourcing and consumption
Cloud computing provides an opportunity to reinforce requirements for data that is to be consumed from authoritative sources. The ability to expose metadata associated with data assets enables the discovery of data sources and the enforcement of consumption restrictions. Standardization of data sourcing processes that employ metadata can support automating authorization of data provisioning and consumption.
Migration of data assets into cloud environments or creating new data assets in cloud environments can trigger governance workflows that ensure those assets are tagged as authoritative sources, authorized distributors or non-authoritative sources. Similarly, standardization and control of data provisioning and consumption can ensure that the use of data is tracked and that the purpose of the data consumption is recorded.
An organization may want to consider the implementation of cloud data marketplaces supported by automation and driven by discoverable metadata:
- Automation can remove the need for a central team to manage data provisioning and access manually for data producers.
- Automation can facilitate standardization of the data entitlement process, leading to greater transparency for determining cost attributions (apportioning the cost of data sourcing according to variations in consumption) for data consumers.
- Automation enhances the transparency of data sources, data usage, provisioning, and organizational accountability for both data producers and consumers.
Data sovereignty & cross-border data movement
Data sovereignty and cross-border data movement requirements relate to:
- When data must or must not be stored locally within a particular jurisdiction.
- The storage, transfer or access of data across a border.
These restrictions on data movement across borders are established for various reasons and are generally implemented through privacy, security, bank secrecy, outsourcing or data localization laws, rules, and regulations. The rules also affect how data can be accessed or shared with government authorities and law enforcement across international borders. The increased risk of significant fines and penalties for violating data sovereignty and cross-border data movement requirements is causing more organizations to re-evaluate when and how they store, access or transfer data globally. Increasingly strict data sovereignty and cross-border data movement requirements must be part of the data strategy for an organization. It is important to document how these requirements will affect business, data storage and processing activities.
Data sovereignty and cross-border data movement requirements are applicable whether data is stored and processed on-premises or in one or more cloud environments. The use of cloud service provider (CSP)s—especially multiple CSPs in a global, hybrid cloud environment—increases the complexity of understanding where data (and which data) is being stored, accessed or processed at any given time. This complexity means that organizations should have a framework established by which to understand requirements and ensure compliance. It is also important to extend data sovereignty and cross-border data movement reviews and evaluate controls and clearance processes with an increasingly larger set of parties to ensure compliance.
Data sovereignty and cross-border data movement considerations influence the geographic locations where an organization can locate or process data.
- Organizations need to track and report on the exact jurisdictional location of data to prove compliance with increasingly restrictive requirements.
- Organizations should employ processes such as tagging and classification to apply jurisdictional rules and mitigate data sovereignty and cross-border data movement risk.
- Organizations should mitigate data sovereignty and cross-border data movement risk with tools, such as advanced data masking and encryption solutions. Refer to CDMC 4.1 Data is Secured, and Controls are Evidenced.
Organizations often use multiple cloud service providers, typically with on-premises systems and applications, increasing the complexity of data tracking or the risk of storing, accessing or transferring data in a non-compliant manner. Applications and technology in cloud environments evolve rapidly and change quickly, putting pressure on compliance efforts. Tracking, tagging and automation can make it easier to implement controls around data sovereignty requirements. Most cloud service configuration is performed using Infrastructure-as-Code, and this creates a greater opportunity to implement controls at build time and deployment time.
Organizations remain responsible for compliance with data sovereignty and cross-border data movement requirements, including:
- Interpreting data sovereignty and cross-border data movement rules.
- Checking their applicability to the datasets.
- Implementation of granular data location controls.
- Auditing to determine where data has been stored, accessed or transferred over long periods.
- Reporting on compliance with the data sovereignty and cross-border data movement policies and procedures of the organization.
A cloud service provider should provide tooling and support to help the organization implement these requirements. Data practitioners need to understand how the cloud or technology service provider handles data backups, replication, and caching. While the providers are responsible for the functionality, the accountability remains with the organization. Cloud and technology service providers need to provide increased transparency and auditability.
Value Proposition
Organizations that establish strong governance and data management controls over data residing in cloud applications have an opportunity to realize all of the benefits of a cloud implementation while managing the associated risks. Data governance and accountability in cloud environments help define effective business case processes, identify accountable stakeholders and data owners, ensure the proper management of data sourcing, and provide proper tracking and control of data movement concerning data sovereignty guidelines.
Effective data governance and controls help an organization exploit cloud data management capabilities to increase the effectiveness of data ownership, improve the ability to track and report on data usage, enforce policy, better monitor data owner assignment, improve data access controls to authoritative sources and better monitor and control data sovereignty requirements. Data management in a cloud environment enables an organization to move from systems not built to track data location to new data environments. Data location and types of data can be readily tracked and audited for better compliance.
Core Questions
- Has data governance been established for managing data in cloud environments?
- Have business cases for managing data in the cloud been defined?
- Do cloud data management business cases include a value realization framework?
- Are cloud data management business cases governed?
- Have the roles and responsibilities of data owners been extended to data in the cloud?
- Are data owners in place for all cloud data?
- Are all cloud data assets identified as authoritative sources, authorized distributors or non-authoritative sources?
- Does the governance of data consumption leverage cloud automation opportunities?
- Are requirements for managing data sovereignty and cross-border data movement risks defined?
- Have controls for data sovereignty and cross-border data movement risk been implemented?
Core Artifacts
- Cloud Data Management Business Cases
- Data Ownership Roles and Responsibilities
- Data Catalog Report – indicating data owner
- Register of Authoritative Sources and Authorized Distributors
- Data Sovereignty and Cross-Border Data Movement Requirements Definition
- Data Sovereignty and Cross-Border Data Movement Issues Log
1.1 CLOUD DATA MANAGEMENT BUSINESS CASES ARE DEFINED AND GOVERNED
The organization must have clearly defined business cases for the management of data in cloud environments. These must include a framework of measures of the value to be realized. Each business case must be approved by an appropriate authority and sponsored by accountable stakeholders.
Scope of Controls
The framework addresses the control of data in cloud, multi-cloud and hybrid-cloud environments. Controls that address technology risks in other areas such as software development and service management are not within the scope of the document.
Many of the controls refer to being applicable to sensitive data. Each organization will have a scheme for classifying their sensitive and important data and will determine the specific classifications to which the controls must be applied. Examples of classifications that may be in scope include:
- Personal Information (PI) / Sensitive Personal Data
- Personally Identifiable Information (PII)
- Client Identifiable Information
- Material Non-Public Information (MNPI)
- Specific Information Sensitivity Classifications (such as ‘Highly Restricted’ and ‘Confidential’)
- Critical Data Elements used for important business processes1 (including regulatory reporting)
- Licensed data
CDMC Key Controls
- 1. Data Control Compliance
- 2. Ownership Field
- 3. Authoritative Data Sources and Provisioning Points
- 4. Data Sovereignty and Cross-Border Movement
- 5. Cataloging
- 6. Classification
- 7. Entitlements and Access for Sensitive Data
- 8. Data Consumption Purpose
- 9. Security Controls
- 10. Data Protection Impact Assessments
- 11. Data Retention, Archiving and Purging
- 12. Data Quality Measurement
- 13. Cost Metrics
- 14. Data Lineage
Major Stakeholder Group | CDMC Framework Stakeholder Roles | Primary CDM Requirement | Primary CDM Responsibility | Illustrative Planning Horizon | Ongoing Commitment and Review |
Cybersecurity, Privacy, Legal and Compliance | Chief Privacy Officer / Head of Cyber / Head of Tech Risk | Privacy, security and technology risks are managed according to risk appetite. Cost is proportionate. Maintenance and controls are robust and sustainable. | Balance cloud data management requirements with a specific focus on privacy, security, information lifecycle management and integrity. Continuity controls are well-defined and followed. | 2-3 year | Annual review of CDM business cases with communication of any deviations through quarterly exception reporting supplemented with ad hoc reports |
Legal, Compliance & Audit | Cloud data management conforms to legal and regulatory interpretation and fulfills organization compliance obligations and policies. | Legal rules on data sharing, restriction, and disposition are well-defined, implementable, and communicated to the control owners. | 2-3 year | Annual review of CDM business cases with communication of any deviations through quarterly exception reporting supplemented with ad hoc reports |
1.1.1 CLOUD DATA MANAGEMENT BUSINESS CASES ARE DEFINED
Description
As an organization moves its data and operations to cloud environments, it is important to develop, communicate, cultivate, and support business cases for cloud data management. An effective cloud data management business case defines the objectives and expected outcomes of the implementation. It is vital to develop an entire cloud business case framework of metrics, measures and key performance indicators to articulate the value of cloud data management.
Objectives
- Define a standard process to develop and gain approval for cloud data management business cases, justifying what is needed to manage data in the cloud environment.
- Ensure cloud data management business cases include measures of the effectiveness of the corresponding cloud data management capabilities.
- Document cloud data management business cases to include all relevant business problem types for the organization and list the stakeholder responsible for each business case.
- Design measures, metrics, or key performance indicators with targets to enable the measurement of progress.
- Ensure cloud data management business cases metrics and targets are specific, measurable, achievable, relevant, and time-based.
- Ensure cloud data management business cases detail elements of value such as new revenue generated, amount of cost reduction and any mitigated risks.
ADVICE FOR DATA PRACTITIONERS
To fully demonstrate the value of data management in the cloud, practitioners must develop a value realization framework that includes metrics, measures and key performance indicators for each business case. The framework should include expected outcomes already defined in the organization’s business, data, and cloud strategies.
The precision in the outcome estimates within each business case should be documented. Also, document the risks in failing to achieve the targeted outcomes and explicitly communicate these risks to sponsors and stakeholders. The accuracy and data quality of the metrics must faithfully reflect progress against these business cases. Each business case should quantify each outcome along its respective timeline.
Cloud data management business case standard
A business case must include metrics of the effectiveness of all cloud data management capabilities that are in use by the organization. Each metric must be specific, measurable, achievable, relevant, and time-based. Metrics must align with the business problems being addressed. Gain approval on targets for each metric and identify stakeholders that are to be responsible for achieving the targets. Each metric should have the ability to measure progress. Each element of the value realization framework must be included in the business case:
- Metrics dictionary – a library of measures that align with business outcomes or CDMC capabilities.
- Metrics accountability – document stakeholder accountability for each metric.
- Metrics traceability – document the correspondence of each business case outcome to the best practices of the organization or industry best practices.
- Outcome projections – document original targets and projections for revenue added, costs reduced, and risks mitigated.
- Assumption evidence – document the variables and assumptions (such as discount rates or estimates of regulatory fines) and how each was derived and included in calculations.
- Metrics tracking – document trends (not merely snapshots) accompanied by stated targets with timelines.
- Impact assessment – evidence of other cloud data management efforts already underway in the organization. Quantify mutually beneficial and any potential detrimental interactions that may result.
Practitioners should ensure that the implementation team and sponsor are transparent in resource consumption when reporting to stakeholders. Establish a baseline or point-of-reference against which to measure tangible value realized in the transition to managing data in the cloud. Create a method for isolating the value resulting from managing data in the cloud from other factors that may also affect revenue added, costs reduced, and risks mitigated. Document how the definition of value generated by managing data in the cloud might need to change as the target state approaches.
Pre-determine the critical junctures at which sunk costs incurred in the implementation phase exceed thresholds that require a review of project scope and progress against any value realized up to each critical juncture. Estimate projected new revenue generated by managing data in a cloud environment and compare that with existing revenue generated from the on-premises environment. Similarly, provide estimates of reduced costs that result from managing data in a cloud environment and compare with the costs from managing data on-premises. One example is the lower storage costs that are typical in a cloud environment.
Discount these estimates using a suitable time value of money. Consider any intermediate costs incurred to complete the transition ( such as temporarily redundant data storage and contracting costs)—separate these one-time costs from any new maintenance costs expected to remain in the future state. Lastly, specify any risks that have been mitigated by managing data in a cloud environment and compare them with similar risks in the on-premises environment. For example, there is typically an increased compliance burden with GDPR/CCPA regulation when employing cloud-native tools to track data lineage.
Suggested approach to the identification of success factors and measurements (and constructing the value realization framework). While many organizations will already have adopted frameworks for value realization that can be adapted to suit data management in the cloud, the CDMC has provided one potential approach to providing a structured framework that realizes the objectives outlined above:
Example #1: Value Realization Framework - Baseline Business Outcomes and Metrics
---- insert graphic ----
Example #2: Value Realization Framework - Deeper Realization of an Outcome Already Defined
---- insert graphic ----
Example #3: Value Realization Framework - Outcomes Realized Through Maturity of Multiple Capabilities
---- insert graphic ----
Example #4: Value realization framework - Outcomes Contributing to The Organization’s Broader Data/Cloud Strategy
---- insert graphic ----
Advice for cloud service and technology providers
Cloud service and technology providers should understand the data management business outcomes organizations are looking to achieve when migrating data to cloud environments. Providers should develop and communicate metrics that organizations can readily employ to optimize data management in the cloud environment. Monitoring and tracking capabilities should enable visibility into all costs incurred from managing data in the cloud environment.
In addition, a provider should offer tools and dashboards to automate a broad set of baseline metrics that demonstrate the benefits of managing data through the cloud service. Examples of such metrics include scale of data in cloud, % of data governed, % of data categorized, % of data profiled, % of data with lineage, scale of re-use, % of data measured and the number of access points enabled.
Also, providers should showcase various case studies and benchmarks of quantitative and qualitative outcomes resulting from previous data management implementations in the cloud. These examples should include case studies on meeting regulatory requirements that help avoid pitfalls when data is managed appropriately in cloud environments.
Research and develop additional content on avoiding anti-patterns in data management design in the cloud that may result in unnecessary costs.
Questions
- Is there a standard process to develop and approve cloud data management business cases?
- Does each cloud data management business case include measures of the effectiveness for the corresponding cloud data management capabilities?
- Are cloud data management business cases structured to include all relevant business problems being addressed, and does each business case list the stakeholders responsible for achieving the targets?
- Have measures, metrics, or key performance indicators been designed with targets to measure progress?
- Are cloud data management business cases metrics and targets specific, measurable, achievable, relevant, and time-based?
- Do cloud data management business cases detail elements of value such as new revenue generated, amount of cost reduction and risks mitigated?
Artifacts
- Value Realization Framework – including measures, metrics, or key performance indicators with targets to measure progress
- Cloud Data Management Business Case Standard – including the methodology and framework with standard accountability, assumptions, metrics, traceability, outcome projections and monitoring
- Repository of Cloud Data Management Business Cases
Scoring
Not Initiated
No formal standard cloud data management business cases exist.
Conceptual
No formal standard cloud data management business cases exist, but the need is recognized, and the development is being discussed
Developmental
Formal standard cloud data management business cases are being developed.
Defined
Formal standard cloud data management business cases are defined and validated by stakeholders.
Achieved
Formal standard cloud data management business cases are defined and adopted by the organization.
Enhanced
The formal standard cloud data management business cases are established as part of business-as-usual practice with continuous improvement.
1.1.2 CLOUD DATA MANAGEMENT BUSINESS CASES ARE SYNDICATED AND GOVERNED
Description
Each cloud data management business case must be approved by an appropriate authority and sponsored by accountable stakeholders. Successfully managing data in cloud environments requires substantial support from both business and technology stakeholders within an organization. The interests of these groups must be aligned early and consistently represented through deployment.
Each cloud data management business case must be enforceable and periodically reviewed by sponsors throughout deployment and the cloud data management lifecycle. Reviews will ensure that the business cases meet requirements as the organization's objectives evolve and the stakeholders change.
Objectives
- Ensure cloud data management business cases consider the requirements of all key stakeholders.
- Obtain approval and support from all key stakeholders of cloud data management business cases.
- Conduct regular reviews of cloud data management business cases.
- Structure and version the cloud data management business cases to support an audit.
- Implement governance oversight to ensure that data migrated to, stored in or created in a cloud environment fulfills the requirements of both the cloud data management business cases and risk mitigation intentions of the organization.
ADVICE FOR DATA PRACTITIONERS
Cloud data management business cases must account for the priorities of the various stakeholders. Some of these priorities are complementary, and some are competing.
An organization should seek to use business cases to balance delivery and execution with risk management and sustainability. The risk-to-benefit appetite of each organization depends on the industry and regulatory environment in which it operates. It is important to consider the risk appetite with full transparency. An organization may choose to address some cloud data management considerations. However, the organization must always adhere to legal requirements and address these in business cases.
An organization must conduct sufficient oversight of data management controls to ensure a suitable standard for data that will migrate, be stored in, or be created in the cloud. Oversight may occur through automated controls, workflow adjustments, governance reviews, tollgates or other means. Any actions taken should be proportionate to the risk appetite, regulatory environment and size of the organization.
Periodic business case reviews should compare the original business strategy, data strategy and cloud strategy on which the business case was founded against interim outcomes. Decisions on whether to accelerate or delay activities for a specific business case should depend on changes in cloud data management priorities.
Key stakeholders must approve changes to the business cases with sufficient authority and with appropriate governance. In addition, it is vital to get explicit approval from each of the stakeholders.
The table below is a list of potential stakeholders, though it is not an exhaustive list. Keep in mind that some organizations may not need each role. The specific roles and responsibilities depend on the business requirements and strategy of each organization. The organization should engage with human resources and vendors to ensure that proper data management and cloud skills are available to support cloud data management and include appropriate funding in the business case. This list of stakeholders aims to help data practitioners ensure the major stakeholder groups and perspectives have been considered. In addition, plan timeframes are given for each stakeholder group.
----- INSERT TABLE -----
----- INSERT TABLE -----
----- INSERT TABLE -----
----- INSERT TABLE -----
----- INSERT TABLE -----
Cloud data management business case standard
Advice for cloud service and technology providers
Cloud service and technology providers must understand and contribute to the organization's cloud data management business cases to help them achieve optimal business outcomes and minimize the risks of cloud data management.
Typically, providers have considerable cross-industry experience in helping organizations realize business value from cloud adoption. Understanding and providing input to the business cases benefit the organization from the provider's insight into what has worked well previously. While providers can offer considerable experience in what can work well, it is important that advice remains high-level, non-prescriptive and presented as considerations and challenges to ensure the business case is truly driven and owned by the organization.
CSPs should provide appropriate automations to support business cases to control any data migrated, stored or created in the cloud environment to support the organization’s oversight of control mechanisms.
Questions
- Have all key stakeholder requirements been considered and balanced when constructing the business cases?
- Have all key stakeholders approved all business cases, and are they aware of their support roles in the intended outcomes?
- Has the organization set the frequency at which the business cases should be reviewed?
- Has a structure for cloud data management business cases been defined that enables them to be audited?
- Does an oversight mechanism exist that is supported by appropriate controls and demonstrates that data created in, stored in or migrated to the cloud conforms to the requirements of the cloud data management business cases?
Artifacts
- Policy, Standard and Procedure – defining and operationalizing the management and governance of cloud data management business cases
- Cloud Data Management Stakeholder Matrix
- Cloud Data Management Business Case Template
- Cloud Data Management Business Case Approval Form
- Cloud Data Management Business Case Governance Forum Charter
Scoring
Not Initiated
No formal governance of cloud data management business cases exists.
Conceptual
No formal governance of cloud data management business cases exists, but the need is recognized, and the development is being discussed
Developmental
The formal governance of cloud data management business cases is being developed.
Defined
The formal governance of cloud data management business cases is defined and validated by stakeholders.
Achieved
The formal governance of cloud data management business cases is established and adopted by the organization.
Enhanced
The formal governance of cloud data management business cases is established as business-as-usual practice with continuous improvement.
1.2 DATA OWNERSHIP IS ESTABLISHED FOR BOTH MIGRATED AND CLOUD-GENERATED DATA
The roles and responsibilities of data owners must be extended to instances of data in cloud environments. Data ownership must be specified for all data, whether migrated to the cloud from the on-premises environment or created in cloud environments.
1.2.1 DATA OWNER ROLE AND RESPONSIBILITIES ARE DEFINED
Description
Implementing the concept of data ownership requires defining the role and responsibilities of the data owner and ensuring the role is applied to data managed in the cloud environment and on-premises.
Objectives
- Define roles and responsibilities of the data owner and mandate by the data management policy.
- Extend data owner responsibilities to data hosted in cloud environments.
- Adapt and extend data owner responsibilities to any new data types used by cloud service providers (CSPs).
- Determine if any data owner responsibilities will have more importance concerning data residing in a cloud environment.
- Define cloud technology support requirements for each relevant data owner role and responsibility.
Advice for data practitioners
The data owner role must be assigned to a senior business executive to have the necessary authority to perform the role. This required seniority ensures ongoing accountability, even when personnel changes occur. Data management policy should explicitly ensure that data ownership accountability belongs to the appropriate executive. In most organizations, responsibility for the execution of data ownership tasks will be delegated to supporting roles such as data stewards. Definition of the data owner role should extend to and clarify how the execution responsibilities are delegated. This role definition should also be incorporated in and supported by the data management policy.
A data owner is accountable for the meaning, content, quality, distribution and storage of a given set of data or the contents of a data domain. The data owner must ensure that all data drawn by its data consumers meet fit-for-purpose criteria and align with organizational standards. Adopting cloud computing data management services can support a data owner with automated capabilities that are typically more effective and efficient than conventional systems.
The data owner has full responsibility for understanding the quality and scope of the content in a data domain. Cloud computing technology typically provides comprehensive, real-time data catalog and data lineage solutions. Rich metadata is available from many of these solutions. This metadata enhances the ability of the data owner to understand the data landscape and eases the execution of data ownership responsibilities.
Many data owners have responsibility for various on-premises applications that rest upon various platforms and legacy technologies. Lack of homogenization and transparency across these data domains makes applying granular control across all environments challenging. Many cloud environments can improve standardization of functionality, granular controls standardization and monitoring capabilities.
Cloud environments should provide standards for monitoring data and provide summaries for the entire data landscape. Data owners will use the monitoring dashboards to drill down to identify various sources of data quality and control failures. Such views can extend from data assets down to individual data elements.
Enhancements in data storage and management homogenization significantly improve the visibility and precision of data consumer utilization. Consequently, data owners can understand which data element controls require prioritization. Better controls improve the ability of the data owner to enforce data security and immutability.
A data owner should provide transparency about the content, location and consumption of their data. Cloud data management can help a data owner manage responsibilities, operate more efficiently, improve transparency and facilitate better systems integration.
Typically, a data owner must also solve data quality and manage control exceptions. In support of such tasks, the data owner should also have the ability to interact with an integrated workflow, direct a course of action or redirect to another data owner.
Advice for cloud service and technology providers
It is important to recognize that a data owner may not have a strong affinity for technology. This understanding is especially true if the data owner is from a business, finance, risk, or another background—not Information Technology. Such users should have resources available to navigate and interrogate interactive dashboards and perform some workflow tasks. Any technology competency beyond that expectation should be regarded as optional.
With these expectations in mind, a cloud service provider should:
- Provide dashboards, workflow tasks and task execution tracking.
- Provide corresponding training that does not require coding, tedious querying, or any IT knowledge.
- Provide the ability to the data owner to execute or manage responsibilities in the data domain.
- If necessary, automate any capabilities for the data owner to develop and maintain the integration of a data element list, definitions, data quality rules, controls, data lineage and enterprise data model integration.
- Provide intuitive, non-programmatic interfaces to interact with any automations.
- Provide some ability for data owners that may have technical and coding expertise to extend or customize dashboards, workflows and task execution.
- Work with the organization to determine if any data owner responsibilities (such as sovereignty) have more importance in managing data in a cloud environment.
Questions
- Have data owner roles and responsibilities been defined?
- Have data owner responsibilities been extended to data management capabilities at the CSP?
- Does the data owner's responsibility include data that is generated by and stored at the CSP?
- Does the data owner's responsibility include all activities that have higher importance for managing data at the CSP?
- Does the CSP provide technology to support data owner roles and responsibilities?
Artifacts
- Data Management Policy, Standard and Procedure – defining and operationalizing data owner roles and responsibilities
Scoring
Not Initiated
Data owner roles and responsibilities are not defined by policy.
Conceptual
Data owner roles and responsibilities are not defined by policy, but the need is recognized, and the development is being discussed.
Developmental
Data owner roles and responsibilities defined by policy are being developed.
Defined
Data owner roles and responsibilities defined by policy are validated by stakeholders.
Achieved
Data owner roles and responsibilities defined by policy are established and adopted by the organization.
Enhanced
Data owner roles and responsibilities defined by policy are established as part of business-as-usual practice with continuous improvement.
1.2.2 DATA OWNERSHIP IS ESTABLISHED IN THE CLOUD
Description
Identifying and assigning ownership for data that resides in a cloud environment should follow the same guidelines for on-premises data ownership. Ownership of all data elements in any data domain within a cloud environment is mandatory and specified by data management policy and standards.
It is essential to specify data ownership for all data categories.
- Source data – data migrated from on-premises data stores or other cloud environments, or data created within the cloud environment such as a system of record hosted in the cloud.
- Derived data – data that uses any existing input data to create new data. Whether generated in a cloud environment or elsewhere, derived data will most often consist of data generated from calculators, models, metrics, aggregations, return datasets and materialized views.
- Log data – data that tracks usage, activities and operations in a cloud environment. The owner of log data is typically the technology function that is different from the operational data owner. Log files are critical for data privacy, compliance, auditing and organization information barriers.
- Third-Party Data – data inbound to a cloud environment from an external source, such as public data, open data, client reference data, instrument data, and other counterparty data.
Objectives
- Ensure that data ownership is consistently assigned and maintained, whether the data resides on-premises or in a cloud environment.
- Gain approval and adopt cloud environment data ownership and accountability policy, standards and procedures that apply consistently across on-premises and cloud environments.
- Establish data ownership before any data consumer engages with the data.
- Track data ownership events and changes in each cloud environment according to data management policy and standards.
Advice for data practitioners
A cloud environment exhibits a shared responsibility model. Consequently, data practitioners should work with their cloud and technology providers to establish data ownership for all data and metadata within—or exported by—a data ecosystem. While some data management responsibilities belong to the cloud service provider (CSP), all data ownership must remain with the organization.
According to the organization's data management policy, managing data ownership in processes that import or add new data into an on-premises or cloud data ecosystem is essential. Develop and maintain an inventory of data to effectively manage data ownership assignments. Sufficiently document and maintain data ownership assignments as metadata and conduct periodic review and maintenance routines. It is also important to define ownership for both persistent and temporary data, such as data kept only for the duration of intermediate steps of a calculation.
Advice for cloud service and technology providers
Cloud service and technology providers should ensure proper documentation of data ownership assignments for cloud data and metadata. This documentation should be created through automated processes. This service should support validation, maintenance and auditing of data ownership assignments.
Questions
- Is data ownership consistently assigned and maintained across both on-premises and cloud environments?
- Have policies, standards and procedures been defined, verified, sanctioned, published and adopted for cloud and on-premises data ownership assignment?
- Is assignment of data ownership required before data is available for consumption?
- Have technologies been selected that record and track data ownership for all cloud environments?
Artifacts
- Data Management Policy, Standard and Procedure – defining and operationalizing data owner roles and responsibilities
- Process Documentation – inclusive of the required assignment of data owner to data in the cloud
- Data Catalog Report
- Cloud data inventory with data owner identification
- data owner log reflecting assignment and changes over time
- Data Management Tool Stack – inclusive of automated tools to support the required assignment of data ownership in the cloud
Scoring
Not Initiated
Formal data ownership is not established in the cloud.
Conceptual
Data ownership is not established in the cloud, but the need is recognized, and the development is being discussed.
Developmental
Data ownership in the cloud is being developed.
Defined
Data ownership in the cloud is defined and validated by stakeholders.
Achieved
Data ownership in the cloud is established and adopted by the organization.
Enhanced
Data ownership in the cloud is established as part of business-as-usual practice with continuous improvement.
1.3 Data sourcing and consumption are governed and supported by automation
The organization must ensure that data is consumed from authoritative sources or authorized distributors, with data governance that manages the designation of this authority. Cloud platforms must provide automation to enforce consumption from authoritative sources and authorized distributors or highlight consumption from non-authoritative sources.
1.3.1 DATA SOURCING IS MANAGED AND AUTHORIZED
Description
A data source is an origination point for data that transfers into a primary system. Data sourcing is the act of locating and connecting to a data source, then ingesting data from that source. Data within a cloud environment may originate within that environment, an external cloud environment or on-premises environments. A data source may be one of several in a chain of data sources. An authoritative data source is a repository or system designated by a data management governing body as the primary or most reliable source for this information.
Objectives
- Formalize a classification scheme of authoritative data sources and their provisioning points.
- Obtain agreement on the usage requirements, system integrations and provisioning points for each authoritative data source.
- Educate stakeholders and data consumers about authoritative data sources.
- Establish procedures to identify, review and approve new authoritative data sources and their provisioning points.
- Enable discovery of each authoritative data source by authorized data domains, capture metadata that includes a scope definition.
Advice for data practitioners
Managing the authorization of data sources is a function of data governance. Authorization and consumption of an authoritative data source should be standardized and be applied consistently across all organizational environments—whether in on-premises or cloud environments. Authorization and consumption may differ when comparing data sources that depend on data ingested into the cloud with data generated in the cloud.
A data management governing body designates a data source as authoritative when it is a definitive or standard source for one or more data domains. The use of an authoritative data source is typically governed by established policies of one or more organizations. The authority to make such a data source available for provisioning and consumption must be clear to all custodians and data consumers. To prevent the unauthorized proliferation of valuable data—and to ensure data integrity, validity, and security—it is essential to establish the responsibilities of data source administrators and data consumers.
The use of authoritative data sources may be constrained to a geography, product, business unit or time period. For an organization that accesses data from authoritative data sources, it is vital to establish processes supported by policy. These policies will ensure that authoritative data sources exhibit approved provisioning points and each data source is identified, approved, utilized for approved application development. Each data source should be periodically reviewed for accuracy, compliance and continuing value to the organization.
Data that has been ingested into a cloud environment may originate from other data sources. If necessary, it should be possible to determine that these data sources are authoritative. Data source authorization status and scope should be recorded in a central data catalog visible to stakeholders.
A common data sourcing use case involves creating a new authoritative cloud environment data source that consolidates data from disparate on-premises and other cloud environments. In such cases, a cloud environment may be created within the existing cloud environment, and such data may not necessarily reside in an authoritative data source. In all cloud environment scenarios, using authoritative data sources must be explicitly required by policy and approved by the organization.
Any data that resides within a cloud environment or originates from a source external to the cloud environment should be subject to review to determine whether it is authoritative or not. Unless explicitly known at inception, any new data source should be designated as non-authoritative to ensure that a review occurs to confirm that the data source is authoritative. When practicable, automate data ingestion processes to send alerts when new data is created and trigger a review when necessary.
Establish and conduct periodic reviews of all data sources. Such reviews should include existing and prospective authoritative data sources. These reviews should also consider whether existing authoritative data sources continue to satisfy organizational policies. Any sources that are no longer compliant should be removed.
Advice for cloud service and technology providers
Ensure data catalogs provide metadata tagging capabilities for identifying the status and scope of authoritative data sources. Set the default status of any new data sources to be non-authoritative and prompt stakeholders to determine the status of each.
Provide processes and controls to align authorized provisioning points in the cloud environment with each authoritative data source. Provide data source consumption reports that allow stakeholder review of authoritative data sources. Provide methods for easily discontinuing authoritative data source designation for sources that are no longer viable or complaint.
Offer functionality that automates data source authorization workflows initiated by change events and provides status visibility to all data consumers of authoritative data sources. Provide methods for verifying, making connections and consuming authoritative data sources.
Provide strategic advice for maximizing the value of managing authoritative data sources in the cloud environment.
Questions
- Has a classification system been formalized to approve authoritative data sources and their provisioning points?
- Has the agreement been obtained on data use requirements, obligations and provisioning points for each authoritative data source?
- Are education initiatives in place for stakeholders and data consumers to create and maintain an understanding of authoritative data sources?
- Have procedures been established to identify, approve and review new authoritative data sources and provisioning points?
- Has metadata been captured and made available to discover authoritative data sources by data domains—including the scope of use for the data source?
Artifacts
- Data Standards – authoritative source methodology overview, data source classification scheme, requirements and obligations
- Developer Guide – instructions on how to discover authoritative data sources
- Communication Plan – briefing document that describes authoritative data sources in use by the organization
- Data Management Procedure – defining and operationalizing data source identification and review
- Data Catalog – directory of all active, authoritative data sources
Scoring
Not Initiated
No formal management and authorization of data sourcing exist.
Conceptual
No formal management and authorization of data sourcing exist, but the need is recognized, and the development is being discussed.
Developmental
Formal management and authorization of data sourcing are being developed.
Defined
Formal management and authorization of data sourcing are defined and validated by stakeholders.
Achieved
Formal management and authorization of data sourcing are established and adopted by the organization.
Enhanced
Formal management and authorization of data sourcing are established as part of business-as-usual practice with continuous improvement.
1.3.2 Data Consumption is governed and supported by automation
Description
Data consumption and usage from any environment are largely governed by sourcing from authoritative data sources—respecting all applicable legal, ethical and organization policy restrictions. Cloud platforms should enforce controls to ensure that data is consumed from authoritative data sources. Consuming applications must specify the required data and reference data catalog entries, while the cloud platform should automate the access and transfer of data from authoritative data sources.
Objectives
- Document in data sharing agreements all data consumption allowances and restrictions as required by the organization's policies.
- Ensure that each data access requests include metadata that specifies the intended use of the data.
- Ensure each requested data element can be mapped to an authoritative data source.
- Implement reporting to track the use of authoritative data sources and govern the use of non-authoritative data sources.
- Exploit metadata to automate data provisioning and consumption.
Advice for data practitioners
For many organizations, using a cloud platform can change the perception of automated data provisioning from being a best practice to becoming a necessity. Full automation requires rich metadata in data catalogs, facilitating access requests to authoritative sources and providing access to the data. For example, a data set entry in the catalog would include an API specification and location and either an endpoint or information for navigating the virtualization layer.
Automating correct data consumption requires a comprehensive taxonomy that specifies conditions for access, use, allowances, and restrictions.
Data access event logging should always be in place—both for auditing and governance purposes. APIs for data provisioning and consumption are a common method for enforcing logging and automating reporting. Tracking the use of non-authoritative sources clarifies the extent of data distribution and is especially important when consuming sensitive data. Refer to CDMC 3.2 Ethical Access, Use, & Outcomes of Data Are Managed.
Implementing automated provisioning can be configured to provide additional control by preventing the consumption of non-authoritative sources. Automation also ensures that data lineage metadata is properly maintained.
Documenting best practices for creating and using provisioning and access APIs is critical to the automation's implementation and support. Exploiting cloud computing capabilities for reporting on data storage, throughput volumes, user access and data usage can provide valuable insights to data owners.
Advice for cloud service and technology providers
Cloud service and technology providers should deliver capabilities that support the ability of data owners to track and control the distribution and consumption of data. Data owners should conduct various types of reviews and controls that correspond to the data classification.
Providers should furnish APIs to support automating the provisioning and consumption of data. These APIs should integrate with the data catalogs to enforce consumption from authorized sources and capture consumption events from non-authoritative sources. APIs should also be available to log provisioning and consumption events at a low level of detail. The logs should be available for audit and reporting purposes. Cloud service and technology providers should supply documentation on best practices for provisioning and access APIs and provide these documents to data practitioners to support implementation and automation.
Providers should offer integrations with workflow functionality for exception reporting and approval of consumption from non-authorized data.
Questions
- Are allowances and restrictions regarding data consumption documented in data sharing agreements as required by the organization's policies?
- Does each data access requests include metadata that specifies the intended use of the data?
- Can each requested data element be mapped to an authoritative data source?
- Is there reporting to track the use of authoritative sources and govern the use of non-authoritative sources?
- Has metadata been exploited to automate data provisioning and consumption?
Artifacts
- Data Management Policy, Standard and Procedure – defining and operationalizing data sharing agreements
- Data Sharing Agreements – including allowances and restrictions captured as metadata
- Data Use Taxonomy
- Data Catalog – mapping data elements to authoritative sources
- Data Catalog Reporting – with consumption information highlighting the use of authoritative and non-authoritative sources
- API Documentation – detailing integration with data catalogs and with guidance to support the implementation of automation
Scoring
Not Initiated
No formal governance and automated support of data consumption exist.
Conceptual
No formal governance and automated support of data consumption exist, but the need is recognized, and the development is being discussed
Developmental
Formal governance and automated support of data consumption are being developed.
Defined
Formal governance and automated support of data consumption are defined and validated by stakeholders.
Achieved
Formal governance and automated support of data consumption are established and adopted by the organization.
Enhanced
Formal governance and automated support of data consumption are established as part of business-as-usual practice with continuous improvement.
Control 3: Authoritative Sources and provisioning points | |
Component |
1.0 Governance & Accountability |
Capability |
1.3 Data Sourcing and Consumption are Governed and Supported by Automation |
Control Description |
A register of Authoritative Data Sources and Provisioning Points must be populated for all data assets containing sensitive data or otherwise must be reported to a defined workflow. |
Risks Addressed |
Architectural strategy for an organization is not fully defined. Authorized sources have not been defined or suitably controlled. Data is duplicative and/or contradictory, resulting in process breaks, architectural inefficiencies, increased cost of ownership and accentuating existing operational risks on all dependent business processes. |
Drivers / Requirements |
An important responsibility of a data owner is to designate the authoritative data sources and provisioning points of data for a specific scope of data. Policy controls require a data asset to be identified as authoritative or not when it is shared. |
Legacy / On-Premises Challenges |
Identification and remediation of the use of non-authoritative sources or copies of data require significant manual effort. |
Automation Opportunities |
|
Benefits |
Infrastructure that can run automated workflows to identify and retire non-authoritative data provides a cost savings opportunity to eliminate the manual effort involved in this work. |
Summary |
Data assets automatically tagged as authoritative or non-authoritative will greatly simplify policy compliance and eliminate manual costs of controlling data sourcing and consumption. |
Additional Documentation
This document is a constituent part of the CDMC™ framework focusing on the key controls for effective management of data risk in cloud, multi-cloud and hybrid environments. This section provides a summary of additional parts of the overall framework.
CDMC Framework
Full documentation of the 6 components, 14 capabilities and 37 sub-capabilities of the CDMC framework, along with the 14 controls presented in this document. This 150+ page document details the objectives of each sub-capability and presents best practice advice written from both the data practitioner and cloud service and technology provider perspectives. A set of questions, artifacts and scoring guidance for each sub-capability provide the basis for organizations to perform capability assessments.
Reference: CDMC Framework Version 1.1 – published September 2021
CDMC Controls Testing Procedures
Specifications of tests of the 14 key controls within the framework to form the basis of certification of cloud products and services against the framework.
Reference: CDMC Controls Testing Procedures V1.1 – to be published Q4 2021
CDMC Information Model
An ontology that draws on and combines related open frameworks and standards to describe the information required to support cloud data management. This provides a foundation for interoperability of data catalogs and automation of controls across cloud service and technology providers.
Reference: CDMC Information Model Version 1.1 – to be published Q4 2021
Data Management Business Glossary
A standard set of over 150 data management terms, with definitions and commentary for each.
Reference: https://www.edmcportal.org/glossary/
Feedback and Additional Information
Feedback on the document should be contributed via the Cloud Data Management Interest Community on EDMConnect: https://edmconnect.edmcouncil.org/clouddatamanagementinterestcommunity/home
For further information on the CDMC initiative please visit: https://edmcouncil.org/page/CDMC.
Any enquiries regarding EDM Council membership or CDMC Authorized Partnership should be directed to info@edmcouncil.org.