Master Data Management: What, Why, How & Who

Master data management (MDM) arose out of the necessity for businesses to improve the consistency and quality of their key data assets, such as product data, asset data, customer data, location data, etc. Many businesses today, especially global enterprises have hundreds of separate applications and systems (ie ERP, CRM) where data that crosses organizational departments or divisions can easily become fragmented, duplicated and most commonly out of date. When this occurs, answering even the most basic, but critical questions about any type of performance metric or KPI for a business accurately becomes a pain.

Getting answers to basic questions such as “who are our most profitable customers?”, “what product(s) have the best margins?” or in some cases, “how many employees do we have”? become tough to answer – or at least with any degree of accuracy.

Basically, the need for accurate, timely information is acute and as sources of data increase, managing it consistently and keeping data definitions up to date so all parts of a business use the same information is a never ending challenge.

To meet this challenges, businesses turn to master data management (MDM).

What you’ll learn from this article:

This article explains what MDM is, why it is important, how to manage it and who should be involved, while identifying some key MDM management patterns and best practices. Specifically, it covers:

Let’s get started!

What is Master Data?

Most software systems have lists of data that are shared and used by several of the applications that make up the system.

For example:

A typical ERP system will have at the very least Customer Master, Item Master and Account Master data lists. This master data is often one of the key assets of a company. In fact, it’s not unusual for a company to be acquired primarily for access to its Customer Master data.

Rudimentary Master Data Definition

One of the most important steps in understanding master data is getting to know the terminology. To start, there are some very well understood and easily identified master data items, such as “customer” and “product.” Truth be told, many define master data simply by reciting a commonly agreed upon master data item list, such as: Customer, Product, Location, Employee and Asset.

But how you identify elements of data that should be managed by a MDM software is much more complex and defies such rudimentary definitions. And that has created a lot of confusion around what master data is and how it is qualified.

To give a more comprehensive answer to the question of “what is master data?”, we can look at the 6 types of data typically found in corporations:

  1. Unstructured Data: Data found in email, white papers, magazine articles, corporate intranet portals, product specifications, marketing collateral and PDF files.
  2. Transactional Data: Data about business events (often related to system transactions, such as sales, deliveries, invoices, trouble tickets, claims and other monetary and non-monetary interactions) that have historical significance or are needed for analysis by other systems. Transactional data are unit level transactions that use master data entities. Unlike master data, transactions are inherently temporal and instantaneous by nature.
  3. Metadata: Data about other data. It may reside in a formal repository or in various other forms, such as XML documents, report definitions, column descriptions in a database, log files, connections and configuration files.
  4. Hierarchical Data: Data that stores the relationships between other data. It may be stored as part of an accounting system or separately as descriptions of real world relationships, such as company organizational structures or product lines. Hierarchical data is sometimes considered a super MDM domain because it is critical to understanding and sometimes discovering the relationships between master data.
  5. Reference Data: A special type of master data used to categorize other data or used to relate data to information beyond the boundaries of the enterprise. Reference data can be shared across master or transactional data objects (e.g. countries, currencies, time zones, payment terms, etc.)
  6. Master Data: The core data within the enterprise that describes objects around which business is conducted. It typically changes infrequently and can include reference data that is necessary to operate the business. Master data is not transactional in nature, but it does describe transactions. The critical nouns of a business that master data covers generally fall into four domains and further categorizations within those domains are called subject areas, sub-domains or entity types.

The four general master data domains are:


Within the customer’s domain, there are customer, employee and salesperson sub-domains.


Within products domain, there are product, part, store and asset sub-domains.


Within the locations domain, there are office location and geographic division sub-domains.


Within the other domain, there are things like contract, warranty and license sub-domains.

Some of these sub-domains may be further divided. For instance, customer may be further segmented based on incentives and history, since your company may have normal customers as well as premiere and executive customers. Meanwhile, product may be further segmented by sector and industry. This level of granularity is helpful because requirements, lifecycle and CRUD cycle for a product in the Consumer Packaged Goods (CPG) sector is likely very different from those for products in the clothing industry. The granularity of domains is essentially determined by the magnitude of differences between the attributes of the entities within them.

Deciding What Master Data to Manage

While identifying master data entities is pretty straightforward, not all data that fits the definition for master data should necessarily be managed as such. In general, master data is typically a small portion of all of your data from a volume perspective, but it’s some of the most complex data and the most valuable to maintain and manage.

So, what data should you manage as master data?

We recommend using the following criteria, all of which should be considered together when deciding if a given entity should be treated as master data.

In Summary…

While it is simple to enumerate the various master data entity types, it is sometimes more challenging to decide which data items in a company should be treated as master data.

Often, data that does not normally comply with the definition for master data may need to be managed as such and data that does comply with the definition may not.

Ultimately, when deciding on what entity types should be treated as master data, it is better to categorize them in terms of their behavior and attributes within the context of the business needs than to rely on simple lists of entity types.

Why Bother With Managing Master Data?

Because master data is used by multiple applications, an error in the data in one place can cause errors in all the applications that use it.

For example:

An incorrect address in the customer master might mean orders, bills and marketing literature are all sent to the wrong address. Similarly, an incorrect price on an item master can be a marketing disaster and an incorrect account number in an account master can lead to huge fines or even jail time for the CEO-a career-limiting move for the person who made the mistake.

Real Life Master Data Example: Why You Need Master Data

A Typical Master Data Horror Story

A credit card customer moves from 2847 North 9th St. to 1001 11th St. North. The customer changed his billing address immediately but did not receive a bill for several months. One day, the customer received a threatening phone call from the credit card billing department asking why the bill has not been paid. The customer verifies that they have the new address and the billing department verifies that the address on file is 1001 11th St. North. The customer asks for a copy of the bill to settle the account.

After two more weeks without a bill, the customer calls back and finds the account has been turned over to a collection agency. This time, the customer finds out that even though the address in the file was 1001 11th St. North, the billing address is listed as 101 11 th St. North. After several phone calls and letters between lawyers, the bill finally gets resolved and the credit card company has lost a customer for life.

In this case, the master copy of the data was accurate, but another copy of it was flawed. Master data must be both correct and consistent. Even if the master data has no errors, few organizations have just one set of master data. Many companies grow through mergers and acquisitions, and each company that the parent organization acquires comes with its own customer master, item master and so forth.

This would not be bad if you could just union the new master data with the current master data, but unless the company acquired is in a completely different business in a faraway country, there’s a very good chance that some customers and products will appear in both sets of master data-usually with different formats and different database keys.

If both companies use the Dun & Bradstreet Number or Social Security Number as the customer identifier, discovering which customer records are for the same customer is a straightforward issue; but that seldom happens. In most cases, customer numbers and part numbers are assigned by the software that creates the master records, so the chances of the same customer or the same product having the same identifier in both databases is pretty remote. Item masters can be even harder to reconcile if equivalent parts are purchased from different vendors with different vendor numbers.

In summary:

Merging master lists together can be very difficult since the same customer may have different names, customer numbers, addresses and phone numbers in different databases. For example, William Smith might appear as Bill Smith, Wm. Smith and William Smithe. Normal database joins and searches will not be able to resolve these differences.

A very sophisticated tool that understands nicknames, alternate spellings and typing errors will be required. The tool will probably also have to recognize that different name variations can be resolved if they all live at the same address or have the same phone number.

The Benefits of Creating a Common Master Data List

While creating a clean master list can be a daunting challenge, there are many positive benefits to the bottom line that come from having a common master list, including:

  • A single, consolidated bill, which saves money and improves customer satisfaction
  • No concerns about sending the same marketing literature to a customer from multiple customer lists, which wastes money and irritates the customer
  • A cohesive view of customers across the organization, that way users know before they turn a customer account over to a collection agency whether or not that customer owes money to other parts of the organization or, more importantly, if that customer is another division’s biggest source of business
  • A consolidated view of items to eliminate wasted money and shelf space as well as the risk of artificial shortages that come from stocking the same item under different part numbers

Finally, the movement toward SOA and SaaS make MDM a critical issue.

For example:

If you create a single customer service that communicates through well-defined XML messages, you may think you have defined a single view of your customers. But if the same customer is stored in five databases with three different addresses and four different phone numbers, what will your customer service return?

Similarly, if you decide to subscribe to a CRM service provided through SaaS, the service provider will need a list of customers for its database. Which list will you send?

For all of these reasons, maintaining a high quality, consistent set of master data for your organization is rapidly becoming a necessity. The systems and processes required to maintain this data are known as Master Data Management.

What is Master Data Management?

Master Data Management (MDM) is the technology, tools and processes that ensure master data is coordinated across the enterprise. MDM provides a unified master data service that provides accurate, consistent and complete master data across the enterprise and to business partners.

There are a couple things worth noting in this definition:

  1. MDM is not just a technological problem. In many cases, fundamental changes to business process will be required to maintain clean master data and some of the most difficult MDM issues are more political than technical.
  2. MDM includes both creating and maintaining master data. Investing a lot of time, money and effort in creating a clean, consistent set of master data is a wasted effort unless the solution includes tools and processes to keep the master data clean and consistent as it gets updated and expands over time.

Depending on the technology used, MDM may cover a single domain (customers, products, locations or other) or multiple domains. The benefits of multi-domain MDM include a consistent data stewardship experience, a minimized technology footprint, the ability to share reference data across domains, a lower total cost of ownership and a higher return on investment.

The 6 Disciplines of a Strong MDM Program

Given that MDM is not just a technological problem, meaning you can’t just install a piece of technology and have everything sorted out, what does a strong MDM program entail?

Before you get started with a master data management program, your MDM strategy should be built around these 6 disciplines:

  1. Governance: Directives that manage the organizational bodies, policies, principles and qualities to promote access to accurate and certified master data. Essentially, this is the process through which a cross-functional team defines the various aspects of the MDM program.
  2. Measurement: How are you doing based on your stated goals? Measurement should look at data quality and continuous improvement.
  3. Organization: Getting the right people in place throughout the MDM program, including master data owners, data stewards and those participating in governance.
  4. Policy: The requirements, policies and standards to which the MDM program should adhere.
  5. Process: Defined processes across the data lifecycle used to manage master data.
  6. Technology: The master data hub and any enabling technology.

Getting Started With Your MDM Program

Once you secure buy-in for your MDM program, it’s time to get started. While MDM is most effective when applied to all the master data in an organization, in many cases the risk and expense of an enterprise-wide effort are difficult to justify.

PRO TIP: It is often easier to start with a few key sources of master data and expand the effort once success has been demonstrated and lessons have been learned.

If you do start small, you should include an analysis of all the master data that you might eventually want to include in your program so that you do not make design decisions or tool choices that will force you to start over when you try to incorporate a new data source. For example, if you’re initial customer master implementation only includes the 10,000 customers your direct sales force deals with, you don’t want to make design decisions that will preclude adding your 10,000,000 web customers later.

Your MDM project plan will be influenced by requirements, priorities, resource availability, time frame and the size of the problem. Most MDM projects include at least these phases:

As you can see, MDM is a complex process that can go on for a long time. Like most things in software, the key to success is to implement MDM incrementally so that the business realizes a series of short-term benefits while the complete project is a long-term process.

Additionally, no MDM project can be successful without the support and participation of the business users. IT professionals do not have the domain knowledge to create and maintain high-quality master data. Any MDM project that does not include changes to the processes that create, maintain and validate master data is likely to fail.

The rest of this article will cover the details of the technology and processes for creating and maintaining master data.

How Do You Create a Master List?

Whether you buy a MDM tool or decide to build your own, there are two basic steps to creating master data:

  1. Cleaning and standardizing the data
  2. Matching data from all the sources to consolidate duplicates.

Cleaning and Standardizing Master Data

Before you can start cleaning and normalizing your data, you must understand the data model for the master data. As part of the modeling process, you should have defined the contents of each attribute and defined a mapping from each source system to the master data model. Now, you can use this information to define the transformations necessary to clean your source data.

Cleaning the data and transforming it into the master data model is very similar to the Extract, Transform and Load (ETL) processes used to populate a data warehouse. If you already have ETL tools and transformation defined, it might be easier just to modify these as required for the master data instead of learning a new tool. Here are some typical data cleansing functions:

  • Normalize data formats: Make all the phone numbers look the same, transform addresses and so on to a common format.
  • Replace missing values: Insert defaults, look up ZIP codes from the address, look up the Dun & Bradstreet Number.
  • Standardize values: Convert all measurements to metric, convert prices to a common currency, change part numbers to an industry standard.
  • Map attributes: Parse the first name and last name out of a contact name field, move Part# and partno to the PartNumber field.

Most tools will cleanse the data that they can and put the rest into an error table for hand processing. Depending on how the matching tool works, the cleansed data will be put into a master table or a series of staging tables. As each source gets cleansed, you should examine the output to ensure the cleansing process is working correctly.

Matching Data to Eliminate Duplicates

Matching master data records to eliminate duplicates is both the hardest and most important step in creating master data. False matches can actually lose data (two Acme Corporations become one, for example) and missed matches reduce the value of maintaining a common list.

As a result, the matching accuracy of MDM tools is one of the most important purchase criteria.

Some matches are pretty trivial to do. If you have Social Security Numbers for all your customers or if all your products use a common numbering scheme, a database JOIN will find most of the matches. This hardly ever happens in the real world, however, so matching algorithms are normally very complex and sophisticated. Customers can be matched on name, maiden name, nickname, address, phone number, credit card number and so on, while products are matched on name, description, part number, specifications and price.

PRO TIP: The more attribute matches and the closer the match, the higher degree of confidence the MDM software has in the match.

This confidence factor is computed for each match, and if it surpasses a threshold, the records match. The threshold is normally adjusted depending on the consequences of a false match.

For example:

You might specify that if the confidence level is over 95 percent, the records are merged automatically, and if the confidence level is between 80 percent and 95 percent, a data steward should approve the match before they are merged.

How Should You Merge Your Data?

Most merge tools merge one set of input into the master list, so the best procedure is to start the list with the data in which you have the most confidence and then merge the other sources in one at a time. If you have a lot of data and a lot of problems with it, this process can take a long time.

PRO TIP: You might want to start with the data from which you expect to get the most benefit once it’s consolidated and then run a pilot project with that data to ensure your processes work and that you are seeing the business benefits you expect.

From there, you can start adding other sources as time and resources permit. This approach means your project will take longer and possibly cost more, but the risk is lower. This approach also lets you start with a few organizations and add more as the project demonstrates success instead of trying to get everybody on board from the start.

Another factor to consider when merging your source data into the master list is privacy. When customers become part of the customer master, their information might be visible to any of the applications that have access to the customer master. If the customer data was obtained under a privacy policy that limited its use to a particular application, you might not be able to merge it into the customer master.

Because of implications around privacy, you might want to add a lawyer to your MDM planning team.

At this point, if your goal was to produce a list of master data, you are done. Print it out or burn it to an external hard drive and move on. If you want your master data to stay current as data gets added and changed, you will have to develop infrastructure and processes to manage the master data over time.

The next section provides some options on how to do just that.

How Do You Maintain a Master List?

There are many different tools and techniques for managing and using master data. We will cover three of the more common scenarios here:

  1. Single copy: In this approach, there is only one master copy of the master data. All additions and changes are made directly to the master data. All applications that use master data are rewritten to use the new data instead of their current data. This approach guarantees consistency of the master data, but in most cases it’s not practical. That’s because modifying all your applications to use a new data source with a different schema and different data is, at least, very expensive. If some of your applications are purchased, it might even be impossible.
  2. Multiple copies, single maintenance: In this approach, master data is added or changed in the single master copy of the data, but changes are sent out to the source systems in which copies are stored locally. Each application can update the parts of the data that are not part of the master data, but they cannot change or add master data.

    For example:

    The inventory system might be able to change quantities and locations of parts, but new parts cannot be added and the attributes that are included in the product master cannot be changed. This reduces the number of application changes that will be required, but the applications will minimally have to disable functions that add or update master data. Users will have to learn new applications to add or modify master data and some of the things they normally do will not work anymore.

  3. Continuous merge: In this approach, applications are allowed to change their copy of the master data. Changes made to the source data are sent to the master, where they are merged into the master list. The changes to the master are then sent to the source systems and applied to the local copies. This approach requires few changes to the source systems. If necessary, the change propagation can be handled in the database so no application code is changed.

    On the surface, this seems like the ideal solution because application changes are minimized and no retraining is required. Everybody keeps doing what they are doing, but with higher quality, more complete data. However, this approach does have several issues:

    • Update conflicts are possible and difficult to reconcile: What happens if two of the source systems change a customer’s address to different values? There’s no way for the MDM software to decide which one to keep, so intervention by the data steward is required. In the meantime, the customer has two different addresses. This must be addressed by creating data governance rules and standard operating procedures to ensure that update conflicts are reduced or eliminated.
    • Additions must be remerged: When a customer is added, there is a chance that another system has already added the customer. To deal with this situation, all data additions must go through the matching process again to prevent new duplicates in the master.
    • Maintaining consistent values is more difficult: If the weight of a product is converted from pounds to kilograms and then back to pounds, rounding can change the original weight. This can be disconcerting to a user who enters a value and then sees it change a few seconds later.

In general, all these things can be planned for and dealt with, making the user’s life a little easier at the expense of a more complicated infrastructure to maintain and more work for the data stewards. This might be an acceptable trade-off, but it’s one that should be made consciously.

A Few Thoughts On Versioning and Auditing

No matter how you manage your master data, it’s important to be able to understand how the data got to the current state.

For example:

If a customer record was consolidated from two different merged records, you might need to know what the original records looked like in case a data steward determines that the records were merged by mistake and should really be two different customers. The version management should include a simple interface for displaying versions and reverting all or part of a change to a previous version.

The normal branching of versions and grouping of changes that source control systems use can also be very useful for maintaining different derivation changes and reverting groups of changes to a previous branch. Data stewardship and compliance requirements will often include a way to determine who made each change and when it was made.

To support these requirements, an MDM software should include a facility for auditing changes to the master data. In addition to keeping an audit log, the MDM software should include a simple way to find the particular change for which you are looking. An MDM software can audit thousands of changes a day, so search and reporting facilities for the audit log are important.

A Few Thoughts On Hierarchy Management

In addition to the master data itself, the MDM software must maintain data hierarchies-for example, bill of materials for products, sales territory structure, organization structure for customers and so forth. It’s important for the MDM software to capture these hierarchies, but it’s also useful for an MDM software to be able to modify the hierarchies independently of the underlying systems.

For example:

When an employee moves to a different cost center, there might be impacts to the Travel and Expense system, payroll, time reporting, reporting structures and performance management. If the MDM software manages hierarchies, a change to the hierarchy in a single place can propagate the change to all the underlying systems.

There might also be reasons to maintain hierarchies in the MDM software that do not exist in the source systems.

For example:

Revenue and expenses might need to be rolled up into territory or organizational structures that do not exist in any single source system. Planning and forecasting might also require temporary hierarchies to calculate “what if” numbers for proposed organizational changes. Historical hierarchies are also required in many cases to roll up financial information into structures that existed in the past, but not in the current structure.

For these reasons, a powerful, flexible hierarchy management feature is an important part of an MDM software.

Who Should Be Involved in Your MDM Program?

Now that you understand the what and why, let’s talk about the who and really, there are a several different ways to think about who to involve in an MDM program. First, let’s take a high-level look at three core roles:

  1. Data Governance: Individuals who drive the definition, requirements and solution. These users help administrators know what to create and data stewards know what to manage and how to manage it.

    Data governance users dictate to data stewards how data should be managed, including the processes for doing so, and then hold the data stewards accountable to following those requirements. Data governance users also dictate to administrators what to create during the implementation of the MDM solution, especially from a data matching and quality perspective.

    Data governance users also need to maintain a feedback loop from the MDM software to ensure everything is working as expected. This feedback covers the measurement perspective of the MDM program and might include information like:

    • How long does it take to onboard a new customer?
    • Is that process getting faster or slower?
    • How is the company doing compared to its SLA?
    • If there are any areas that are slipping, why is that happening?
    • How well is the data matching working?
    • How many business rules are failing from a data quality perspective?
  2. Administrators: Individuals in IT who are responsible for setting up and configuring the solution.
  3. Data Stewards: Boots on the ground individuals responsible for fixing, cleaning and managing the data directly within the solution. Ideally, data stewards come from departments across the business, such as finance and marketing. Typically, the activities that data stewards take on within the MDM program are defined by data governance users.

Other MDM roles can include and vary by organization/project type:

Master Data Management Stakeholders:

Aside from the roles that execute and manage an MDM strategy, one of the keys to a successful MDM project is active commitment by the key stakeholders. The stakeholders for a typical MDM engagement include those representing both the business and IT. Active stakeholders usually include, but are not limited to, the following types of roles:

  • Business or IT Executive Sponsor
  • IT Project Lead
  • Subject-matter experts from the impacted Line-of-business
  • Data Stewards
  • IT delivery team

As MDM stakeholders are defined throughout an organization, it is critical to secure their engagement and be committed to their organization’s MDM journey. Through multiple implementations, Profisee has identified several “Health” indicators to help determine the MDM stakeholder impact:

Healthy Signs

  • Executive incentives tied to project results
  • Investments in change management and training
  • Subject matter experts dedicated full-time
  • The right sponsor is appropriately engaged and funded
  • Regular Steering Committee meetings are being held, decisions and actions are being taken in a timely fashion and are effective
  • All appropriate stakeholder groups are effectively represented and engaged

Unhealthy Signs

  • No executive sponsor visible
  • Resistance to new ideas
  • No “experts” available

Master Data Management Steering Committee

It’s recommended that management-level representation from the MDM stakeholders form a Steering Committee to facilitate cross-functional decision-making. Here are a few characteristics of an effective Steering Committee:

  • Be sized appropriately – Big enough to represent the priority stakeholders, but small enough to quickly analyze key information and make decisions.
  • Focused on fast decision-making
  • Become a vehicle for removing organizational barriers and not simply a regular meeting for listening to reporting from the Project Team members
  • Not be a substitute for hands-on Sponsorship

Once the stakeholders are identified, the MDM Project Charter should include formation of a Steering Committee. Based on running hundreds or MDM projects, Profisee recommends the following roles participate in the Steering Committee. Note that there may be more than one team member per role, or some roles may not be applicable or a company’s organizational structure.


While it’s easy to think of master data management as a technological issue, a purely technological solution without corresponding changes to business processes and controls will likely fail to produce satisfactory results.

This article has covered the reasons for adopting master data management, the process of developing a solution, several options for the technological implementation of the solution and who should be involved along the way to make sure the program runs smoothly.

This article is an update of the original article titled “The What, Why, and How of Master Data Management” by Kirk Haselden and Roger Wolter, originally published in 2006. Special thanks to Roger and Kirk for their contributions, and allowing Profisee to repubish their article, with updates for today.


Article by channel:

Read more articles tagged: Data Management