5.0 Data Quality Management (DQM)
Upper Matter
Introduction
The Data Quality Management function defines the goals, approaches and plans of action that ensure data content is of sufficient quality to support defined business and strategic objectives of the organization. The function should be developed in alignment with business objectives, measured against defined data quality (DQ) dimensions and based on an analysis of the current state of DQ. Data Quality Management is a series of processes across the full data supply chain to ensure that the data provisioned meets the needs of its intended consumers.
DQ requires an understanding of how data is sourced, defined, transformed, provisioned and consumed. DQ is not a process itself but describes the degree in which data is fit-for-purpose for a given business process or operation.
Definition
The Data Quality Management (DQM) component is a set of capabilities to define data profiling, DQ measurement, defect management, root cause analysis and data remediation. These capabilities allow the organization to execute processes across the data control environment ensuring that data is fit for its intended purpose.
Scope
- Establish a DQM function within the Office of Data Management (ODM).
- Work with data management (DM) Program Management Office (PMO) to design and implement sustainable business-as-usual processes and tools for DQM.
- Execute DQM processes against business-critical data. DQM processes include profiling & grading, measurement, defect management, root cause fix, remediation.
- Establish DQ metrics and reporting routines.
- Ensure that the DQM governance is integrated into the Data Governance (DG).
Value Proposition
Organizations that build, formalize and assign DQ responsibilities into daily routine and methodology achieve a sustainable organization-wide data culture.
Organizations that effectively implement Data Quality Management and achieve the appropriate level of DQ across the data ecosystem get a return on investment from several areas:
- Better risk management
- Enhanced analytics
- Better client service and product innovation
- Improved operational efficiencies
Overview
DQ is a broad conceptual term that needs to be understood in the context of how data is intended to be used. Perfect data is not always a viable objective. The quality of the data needs to be defined in terms that are relevant to the data consumers to ensure that it is fit for its intended purpose. The overall goal of DM is to ensure that data consumers have confidence in the data they receive. These consumers are using this data to support their business functions. For them to make accurate decisions the data must reflect the facts the data is designed to represent without the need for reconciliation or manual transformation.
The organization needs to develop a DQM strategy and establish the overall plans for managing the integrity and relevance of its data. One of the essential objectives is to create a shared culture of DQ stemming from executive management and integrated throughout the operations of the organization. To achieve this cultural shift, the organization must agree on both requirements and the measurement of DQ that can be applied across multiple business functions and applications. This will enable business sponsors, data producers, data consumers and technology stakeholders to link DQ management processes with objectives.
DQ can be segmented into dimensions:
- Accuracy: the relationship of the content with original intent
- Completeness: the availability of required data attributes
- Coverage: the availability of required data records
- Conformity: alignment of data content with required standards
- Consistency: how well the data complies with the required formats/definitions
- Timeliness: the currency of content representation as well as whether the data is available/can be used when needed
- Uniqueness: the degree that no record or attribute is recorded more than once
The identification and prioritization of data quality dimensions foster effective communication about DQ expectations and are an essential prerequisite of the DM initiative.
Creating a profile of the current state of DQ is an important aspect of the overall DQM function. A new profile should be created periodically when data is transformed. The goal is to assess patterns in the data as well as to identify anomalies and commonalities as a baseline of what is currently stored in databases and how actual values may differ from expected values. Once the data profile is established, the organization needs to evaluate the data against the quality tolerances and thresholds defined by the DQ requirements. The evaluation also examines business requirements to validate that the data is fit-for-purpose.
The purpose of this evaluation process is to measure the quality of the most important business attributes of the existing data and to determine what content needs remediation. A responsibility of the data producer and data consumer is to identify the data that is critical to the data consumer’s business process. Prioritizing the data based on criticality then informs the DQM function which attributes require a heightened level of control and quality review. The designation of criticality requires that the highest level of accuracy and DQ treatment is applied. The assessment process identifies the data that needs to be cleansed to meet data consumer requirements. Data cleansing should be performed against a predefined set of business rules to identify defects that can be linked to operational processes.
Data cleansing should be performed as close to the point of capture as possible. There should be clear accountability and a defined strategy for data cleansing to ensure that cleansing rules are known and to avoid duplicate cleansing processes at multiple points in the data lifecycle. The overall goal is to clean data once at the point of data capture based on verifiable documentation and business rules as well as to fix the processes that allowed defective data into the system at the root cause. Data corrections must be communicated to, and aligned with, all downstream repositories and upstream systems. It is important to have a consistent and documented process for issue escalation and change verification for both data producers and data vendors.
It is also important to ensure that data meets quality standards throughout the lifecycle so that it can be integrated into operational data stores. This aspect of the DQ management process is about the identification of data that is missing, determination of data that needs to be enriched and the validation of data against internal standards to prevent data errors before data is propagated into production environments.
For DQ to be sustained, a strong governance structure with the highest level of organizational support from senior executive management must be in place. This supports the DQM activities and ensures compliance to DQ processes. DQ processes need to be documented, operationalized and routinely validated via DM reviews and formal audit processes.
DQ cannot be achieved through central control. Organization-wide DQ requires the commitment and participation of a broad set of stakeholders. DQ is the result of a series of business processes creating a data supply chain. Therefore, stakeholders, along that chain must be in place, authorized and held responsible for the quality of data as it flows through their respective areas. DQ requires coordinated organizational support. DQM processes and objectives must be part of the operational culture of an organization for it to be sustained and successful.
Processes, Tools, & Constructs
- Business Element/Data Element Construct
- Business Element, Business-Based Rules Construct
- Critical Data Element Criteria and Measurement Construct
- Data Quality Rules Construct
- Data Profiling Construct
- Quality Metrics and Dashboards
- Defect Management Construct
- Root Cause Analysis Construct
- Capability Optimization
- RACI Matrix
- Process Designs and End-to-End Process Integration
- Procedures Guide
- Process Performance Measurement
Core Questions
- Is it understood that poor quality data is an indication of a broken business process or technology?
- Is it understood that instituting a DQ system is a cultural shift that touches all aspects of business, operations and technology processes?
- Is the required training in place to sustain the DQM function?
- Are the necessary people and funding resources earmarked to implement and operate the DQM function?
- Are the necessary resources in place to provide organization-wide training to support a sustainable, DQ cultural change?
5.1 Data Quality Management
The DQM function strategy and approach must be defined and approved by stakeholders. Roles and responsibilities across the stakeholders must be established with operational processes in place and auditable.
5.1.1 DQM Approach and Plan
Description
The strategy and approach must be defined for the DQM function and reflect the related vision and objectives of the Data Management Strategy (DMS). Once established, it must be formally empowered by senior management and its role communicated to all stakeholders.
5.1.2 DQM Roles and Responsibilities
Description
DQM requires a network of data stewards and subject matter experts to ensure data is properly captured, processed and delivered. Accountable parties must be identified, and the roles and responsibilities must be clearly communicated.
5.1.3 DQM Processes
Description
Formal processes must be established for the activities of the DQM function. These processes align with the DM policy and standards of the organization and include procedures, tools and routines. The routines are required for steady-state operations.
5.1.4 Data Quality Management Process Audit
Description
DQM processes must be implemented to streamline and support review and audit. Audit review processes are established and where appropriate supported by reporting capabilities to streamline the process.
5.2 Data is Profiled and Measured
Profiling and measuring the data includes: 1) prioritizing the data in scope based on criticality and materiality; 2) defining and testing data quality rules based on business rules; and 3) measuring that the data is fit-for-purpose.
5.2.1 Data Identification and Prioritization
Description
The data in scope as defined by the business objectives must be prioritized based on its criticality and materiality to the data consumer business process.
5.2.2 Data Profiling
Description
Data quality rules based on business rules must be defined and tested to confidently validate the data is fit-for-use throughout its life cycle.
5.2.3 Data Quality Rules
Description
Data is profiled, results analyzed and graded. The in-scope data must be profiled to determine the baseline data quality of the data set. This analysis must include both a row-based analysis examining the accuracy of the record and a column-based, statistical analysis. Metadata must also be reviewed as part of the analysis to ensure remaining data quality dimensions are accounted for and that the description and intended use of the data is properly defined. While the mechanism to perform checks should be automated, manual checks may occur provided they follow the same requirements and do not impact the data integrity.
5.3 Data Quality Maintenance
Monitoring and maintaining the data includes: 1) implementing data quality control points; 2) capturing DQ metrics to identify defective data, and 3) continuous monitoring of the data.
5.3.1 Data Quality Controls
Description
Data control points must be developed to quantitatively assess the quality of data as it flows through business and technology processes.
5.3.2 Data Quality Issues Management
Description
Control points along the data supply chain capture DQ metrics that are used to produce DQ dashboards which are used to identify defective data. The DQ defects must be part of the issue management routine of the DM initiative. The DQ issue management process must track an issue to resolution and provide continuous stakeholder communication.
5.3.3 Continuous Data Quality Monitoring
Description
Data quality is monitored at control points. Control points must be established where data enters a business process or when it enters a consuming application. To achieve continuous monitoring the data must be checked anytime there is data entering either type of control point. Notifications of discovered data quality issues should be enabled.
5.4 Data Quality Remediation Management
Data remediation processes must be developed, documented, and executed to resolve the most pressing data quality issues. The process must include correcting the existing data and performing root-cause-fix to eliminate future data defects or accepting the defect.
5.4.1 Root Cause Analysis
Description
Data issues are logged with appropriate materiality or severity, to enable prioritization. Data issues should be logged irrespective of the suspected root cause. Based on the current state analysis, remediation plans must be developed to address the most pressing issues. Ongoing DQ evaluation and maintenance and timelines must also be established.
5.4.2 Data Quality Remediation
Description
Root-cause analysis process is documented. Data remediation must include both correcting the existing data that is defective and determining the root-cause of the data issue to avoid the recurrence of defective data in the future.