How to Define Relevant and Meaningful Data Quality Indicators for your Organisation

We know that Data Quality Indicators (DQIs) are an invaluable tool for providing assurance over the quality of the data within your oganisation. 

Used correctly, they provide both data owners, stewards, as well as the consumers of data and management with a view of whether their data can be trusted. 

But where do you start?

How do you set your tolerances?  Do you just adopt a strategy of starting with a low tolerance and gradually moving the target as quality hopefully improves?

How do you go about deciding how to weight your measurements to accurately reflect their importance to the business? 

Or do you just weight everything equally?

In this article, I’ll go through my approach to developing a set of indicators that are tailored to your business, and add value, as opposed to merely ticking the proverbial box.

Firstly, though, let’s recap on the principal dimensions of quality. 

Data Quality Dimensions

DAMA has defined the following dimensions pertaining to quality. 

  1. Completeness

  2. Accuracy

  3. Uniqueness

  4. Consistency

  5. Timeliness

  6. Validity

Which dimensions do I use?

I’ve come across some organisations that felt they needed to apply each of these dimensions to every Critical Data Element identified. 

Case in point…

As an auditor, I would often be shown busy and, on the surface, comprehensive DQI dashboards, but when the business was asked to provide a rationale as to how they arrived at the measurements, it turned out that they had just taken each of the standard dimensions and tried to “fit” them to their data.

 Little to no thought was given to how the thresholds related to the usage of the data or how the weightings of each metric were arrived at.  Very often they were all given an identical weighting.

Metrics for the sake of metrics don’t really add value. 

You don’t need to use every single dimension for each data element being measured. 

Instead, it’s best to think of the dimensions, thresholds and weightings as tools. 

For the DIY enthusiasts amongst us, would you feel the need to use every tool in your toolbox during each job? 

Of course not. 

The same is true with data quality dimensions. 

Used correctly, these tools will tell a story about your organisation’s ability to operate effectively with the data that it uses. 

That’s the utopia… how do you get there?

The Business Process

As with most things in data governance, it’s not about the data.

Instead, in this case, it’s really about the business process. 

Data doesn’t exist in a vacuum. 

It forms part of the critical processes that your organisation operates each day.

So, you need to ask yourself two questions:

1.        How do each of my organisation’s core processes actually use data?

2.        What is the impact if those processes are not able to operate effectively?

You need to know the answers to these questions to determine:

·        Dimensions: the dimensions to use

·        Thresholds: the thresholds you apply

·        Weighting: how to weight them in a way that reflects their impact

An Example

Let me give you an example. 

My first introduction to data governance was in the context of the Solvency II Directive for insurance carriers. 

Prior to data being fed into statistical models like those used to produce Solvency Capital Reserve numbers, there are processes in place to generate the parameters within which the calculation engine will operate. 

The Actuarial team takes, say, the claim reserve estimates and using statistical techniques, produces a co-efficient of variation (CoV). 

For all of those reading who are not statistically inclined (myself included), the best way to describe this is the way it was explained to me; the degree of wobble around the mean.

The individual claim amounts booked into core systems don’t feature within the data that’s ultimately feeding the model. 

Instead, they are just one stage in the process of deriving those parameters that are used as inputs. 

The first stage is the setting of reserves, which is a combination of statistical techniques and judgement. 

The second stage is the CoV picks as we’ve seen, which is also an exercise involving statistical calculations and the use of judgment.

So, at each stage of these processes, the individual claim amounts become more and more abstract from the underlying premium and claim figures from which they are derived. 

Thinking about the actual premium and claim values then, there simply was no need for all of them to be 100% accurate and complete. 

If, say, 10% of claim values were entered incorrectly, it’s unlikely that this would move the needle once you take into account the processes above.

Probably why the Solvency II Directive only ever required that data be materially accurate and complete. 

So the key data risks do not really revolve around the incorrect processing of claim numbers, but rather the way in which that data is aggregated prior to the setting of reserve IBNRs, the IBNR selection process itself and then the process of parameterisation.

So rather than producing a set of DQIs based on the accuracy and completeness of the underlying numbers, to deliver something of value, measuring the final parameters is far more impactful.

In this case it wasn’t really valuable to just place a basic measure ensuring that, for example, each line of business had a CoV; the model would have produced an error if this occurred. 

But it was valuable to check that the CoV’s for each line had been subject to a peer review as the data policy mandated and to use that as the measure of completeness.

These were the factors that would move the dial; therefore the dimensions used to measure quality had to reflect this. 

Determining How To Measure Quality For Your Process

The lesson here is that you need to understand how the business process is actually using the data to determine what you need to measure and how you will measure it. 

Then ask yourself what level of poor-quality data would compromise the process when determining what your thresholds need to be.

Once you have answered these questions you can take this a stage further and weight your measurements to get a more accurate view of your overall data quality and its impact on the process and organisation as a whole. 

To do this, think about how the process you’re looking at relates to the organisation’s higher objectives.  I’m thinking here about things like the following:

·        Quality of the product or service delivered by the organisation

·        Satisfaction of customers and other stakeholders

·        Public confidence in the organisation

 

How would these objectives be harmed if the process was not able to operate effectively due to poor data?

 

By asking yourself these questions, you will be able to define a set of Data Quality Indicators that provide value and true insight for your organisation, enabling management to determine where to focus its attention in resolving data quality issues.

 

 

 

In the next article, we’ll focus on data governance as an exercise in change management and how to determine capacity for change. 

Subscribe here to get future articles in this series.

--

Need Data Governance help? 

Book a call here to discover how we can support you.

Previous
Previous

How to Identify Capacity for Change: The Essential Ingredient For Your Data Governance Program

Next
Next

How to Identify Your Organisation’s Critical Data