The Strategic Value of Data Lineage in Identifying Your Data Risks

Beyond Technical Tracking: A Comprehensive Approach

For many, data lineage is merely an out-of-the-box capability in ETL tools that demonstrates the origin and transformations within data.

However, its potential extends far beyond this basic functionality.  A detailed lineage, incorporating the technical flows and the principal processes concerned with creating, manipulating and augmenting your data is a rich artefact that will enable you to:

1.        Proactively identify key data quality risks

2.        Assess and calibrate data control frameworks

3.        Facilitate interactions with auditors and regulators

A Cautionary Tale: The London Whale

A stark example of the risks associated with inadequate lineage can be seen in the now infamous 2012 London Whale case. 

Due to an error in an Excel spreadsheet used to model risk, one financial institution seriously underestimated the downside of its synthetic credit portfolio, resulting in $6 billion in losses.

Had they documented a comprehensive business lineage which mapped the relevant operations and enabled an assessment of the risk of error?  One assumes not.

An Example of Comprehensive Lineage

Let’s take a look at what a robust lineage looks like.  See the table below for a sample partial flow from the insurance industry in respect of the claim registration process:

You can see how the breakdown of the key steps, combining both the technical and business lineage, enables an identification of the risks which, in turn, allows for an assessment of the control environment in place. 

Key Components of a Robust Data Lineage

To build a comprehensive data lineage, you need to gather four critical pieces of information:

1.        Point of origination of the data being tracked

2.        Responsible business processes

3.        Existing control environments

4.        Technical data flow

 #1 Data Origin

Understanding the golden source of data is fundamental. In our insurance example, risk registration data originates from an Underwriter Front Sheet, compiled from a broker's slip.  You need to understand the source to enable an assessment of its validity. 

In this example, it’s evident that we have an immediate risk of the Front Sheet being miscoded.

#2 Business Process Understanding

Data is primarily created within business processes before being used and transformed downstream.  Therefore, a thorough understanding of these processes and how they interrelate is vital.

#3Controls

Situated within each business process should be a series of controls.  When mapping business processes, it’s crucial to document:

·        Where the controls sit within it

·        The scope of each control

·        Responsible parties

 #4 Technical Lineage

You’ll need to understand how the data flows across systems, (horizontal lineage) as well as whether any transformations that occur (vertical lineage). 

Tracking both horizontal (system-to-system) and vertical (transformational) lineage helps identify critical data touchpoints and potential loss and corruption risks.

Collaborative Approach to Building Lineage

So now we have a clear grasp on the details that need to be pulled together to create a meaningful lineage, how do you collect and collate this information?

You’ll need to adopt a collaborative approach, leveraging:

·        Business Analysts and Architects

·        Information Technology team

·         2nd Line of Defence

Let’s take each one of these in turn. 

Stakeholder Contributions

Business Analysts and Architects

The Business Analyst or Architecture team will likely have a good grasp on the core processes within your organisation. 

They should be able to give you a good high-level understanding of the principal steps in each process and, crucially, how the various processes within a data flow interact with each other, together with the systems used. 

Further details of the processes can then be obtained from the process owners and those who directly operate those processes.

The IT Team

Your IT Team can help you overlay this with an understanding of how the data flows through your firm's systems architecture, which should include any transformations to the data as part of this.

You’ll need to check whether any automated reconciliations are in place when data moves from one system to another.   If there are, confirm how these reconciliations work.   You’ll want to find out:

·        Who: the individual responsible

·        What: data points checked & thresholds

·        How: the process inc. exception handling

2nd line

The Risk Management team will have a Risk & Control matrix.  This will outline the risks either they or the business teams have identified, together with the mitigating controls in place.  As with the IT technical flows, this is a rich source of information with which to overlay on top of the process steps you have collected. 

If you work in a SOX organisation, the key processes that feed into the financial returns should be documented to a reasonable degree of granularity, allowing you to supplement what you’ve gleaned from the Business Analyst and Architecture teams. 

Your Compliance department is another source of information and help.  They will likely have had to create flows for tracking PII information.   You may be able to use this to help you with more detailed mapping.

Implementation Considerations

The amount of time it takes to build a detailed lineage for even one flow should not be underestimated. 

Whilst well worth it, it’s going to require investment. 

Not all processes are well documented.

Neither are all controls. 

Your IT department may have poor documentation standards.

All of this will slow you down. 

But don’t let these factors stop you.  If anything, they should spur you on. 

Contrary to what we’re often told, when it comes to our data flows, it’s what we don’t know that will hurt us!

Strategic Recommendation

Despite challenges, developing a comprehensive data lineage is crucial. Start by mapping high-level process flows, identifying key handoff points, and gradually building complexity.

The resulting artefact will become an invaluable tool for managing data quality risks and enhancing organisational control frameworks.

 

In the next article, we’ll focus on how to build a data control framework.

 

Subscribe here to get future articles in this series.

--

Need Data Governance help? 

Book a call here to discover how we can support you.

Previous
Previous

Building a Comprehensive Data Controls Framework: Beyond the Basics of Quality Metrics

Next
Next

Is your Data Governance Rear-View and Reactive or Forward-Looking and Proactive?