Master Data Management - Basic Concepts

mcb921
Oct 5, 2022
4 min read

I wasn't going to talk about it, but given recent conversations, I felt compelled to do so.

What is Master Data Management (MDM)?

It is a data governance discipline to increase registration data quality, by making use of data quality rules, business process flows and STEWARDSHIP, resulting in data better organized and controlled.

Clear? No?

OK, so have you ever gone to a website where you need to register, and then went into a store (from the same website company) and had to register again? That's because they lack a good MDM.

Have you tried to make a report, but you have 20 versions of the same item? You are probably missing an MDM.

Have you extracted a list like customer type, and you found items identical but mixed in lowercase and uppercase? There you go, MDM!

MDM Reference Architecture

There are several architecture patterns, but the main ones are these, anything else is a variation.

1 - Consolidation

This one is usually done to serve analytical purposes, in this type of architecture the data is created/modified at the sources, MDM receives this data (usually in pull mode), apply the business logic and generate a golden record, but the data is not sent back to the sources, it will go to Decision-Making systems.

Benefits

360 view
Harmonized data
First step to a robust implementation
Faster implementation

Downside

Limited value

2 - Coexistence / Hybrid

Here we have an evolution of the consolidation model, where the data is sent back to the sources, so whenever there is a change in any source it's propagated to all sources, but record creation/modification still happens at source

Benefits

Data federated
Single source of truth
Improved data quality

Downside

Shared responsibility
Longest implementation
Complex workflows
Multiple touchpoints

3 - Centralized

In this case MDM holds the role of originator, so all data is first created in MDM, then propagated to downstream

Benefits

Centralized responsibility(duh!)
Simple workflows
Fewer touchpoints

Downsides

System criticality
Change management
Longer implementation
Run overhead

Workflows

Sometimes forgotten, workflows are a critical piece of the MDM strategy, this is what governs the data lifecycle, it dictates which attributes must be populated in a given step (i.e. creation), who must review and approve, and what is the next step.

Steps can be manual or automated, depending on the need, and workflows can have parallel paths.

For example, when a product is created, commercial team and finance team must populate respective attributes before the product is made available for consumers of MDM.

Matching Rules

Rules are the core of any MDM system, these are what ensures unique records and data quality, Matching Rules specifically are how the MDM knows that incoming records are part of the same entity, and this is the first thing that happens inside the MDM engine.

They work very straight forward:

Compare similarity across records (incoming and existing)
Generate a matching score (the higher, the more similar)
Compare score with thresholds (usually 3 ranges are defined)
Do something (accept, reject or suggest)

Comparing records can be done with several algorithms, from simple exact match to fuzzy match to AI match, this is where MDM tools differentiate from each other, the more capable the rules the better (theoretically) the data.

Defining the thresholds is the tricky part, we need to give some guidance to the system on what to do, so if the score, which is the level of confidence of the match, is lower than 50% we could say that this is not a match, however if the score is higher than 90% it's a match.

Of course no rule is 100% correct, and false matches will happen, the key is fine-tuning for the lowest number of mistakes.

He is a simple example of rule definition, this is an abstraction, the actual rule will need more parameters.

Let's see how our friend John Smith performs with these rules...

Transitive Match

This is the ability to match 2 unwatchable records through an iterative approach, like record A matches to B but doesn't match to C, but record B match to C, so records A, B and C are the same, even if they don't match directly between themselves.

Survivalship Rules

All right, so we have all the records grouped, we know which ones are versions of the same, and which are distinct groups, so what now?

We will define how the attributes must be populated for our golden record, meaning must define rules for each attribute (even if the rule is the same for all). We can specify that we have a preferred source, so the same attribute is populated in many sources to pick that, or we can defined advanced rules like if the type of record is X then get source B or pick the highest value among all sources, etc.

The point is that the survivalship rules will define which attribute "survives" from which source.

Let's assume the following rules.

Continuing with my previous example, this would be a possible outcome, the MDM record is unique and complete, we added email from e-commerce, pick the longest name and created a cross-reference to source systems exposed for downstream.

Golden Record

After all of that, which should happen in a flash, you have your golden record set, and are ready to share data with better quality.

I hope this gives at least a high-level perspective on how MDM works, let me know!

Master Data Management - Basic Concepts

Recent Posts

Comments