Key terms – Master Data Management – Salesforce Data Architect Theory – Salesforce Certified Data Architect Study Guide

Key terms

The following terms are used a lot in the field of MDM. When dealing with MDM situations as a Salesforce Data Architect, you may come across these.

Harmonizing data

Data harmonization is the practice of taking data from multiple sources and processing it (often through machine learning and automation) into a standard, accurate, and comprehensive format, meaning anyone interacting with the data can do so in the same way (utilizing a common data language) . For example, calling a product a Stock Keeping Unit (SKU) instead of a product code internally and a product externally.

Consolidating data

As the name would suggest, data consolidation is the practice of bringing together and consolidating data from several data sources into a centrally managed data master or hub. It should be noted that this is slightly different from the golden record, as that refers to the surfacing of information, whereas data consolidation refers to the actual management of consolidated data (although it is perfectly acceptable to surface golden records from the MDM data hub).

Data survivorship

Data survivorship is concerned with what happens to data that is identified as a duplicate across the

IT enterprise. Typically, data record survivorship can be categorized as follows:

  • A survival-of-the-fittest approach: The record deemed most suitable is taken as the one to use.
  • Forming a golden record: The suitably identified pieces of data (as determined by business rules on data quality) are brought together to form a single source of truth.
  • Context-aware survivorship: Duplicate data records aren’t actually altered, in that they aren’t merged into other records, archived, or deleted. This is typically used in enterprises that need to form different views of data (think different golden records depending on the context in which data is being viewed or interacted with). As you would expect, this approach involves an extra layer of data management and, therefore, may have performance concerns associated with it.

Thresholds and weights

When matching data, weights are the values given to data records, and thresholds are what determine the action to be taken on those records. For example, a threshold may cause two records to automatically be linked, or flag two records for manual intervention. Weights are what calculate the matching score that feeds into the thresholds. For example, a matched email address, postcode, and surname would give a higher score than matching two records purely on a first name, as these may be common across the data set. Our high-weighted record match may feed into a threshold that automatically links the records together.

Canonical modeling

A canonical data model is a way of modeling data in its simplest form. Typically, this is used in a middleware scenario whereby data is modeled differently in source systems. A canonical data model can be used as a common format from which system-specific data models can be generated or fed into.

An example of a canonical data example would be for a customer master (Salesforce) and a new contact requiring creation in two different connected systems through a middleware layer. Because the two source systems represent contacts in different formats, the middleware would take the Salesforce contact and convert it into a canonical (or simplest) format, from which a transformation can be done to the system-specific format for each source system requiring interaction to create the contact. New connected systems can immediately take advantage of this modeling logic.