The following terms are used a lot in the field of MDM. When dealing with MDM situations as a Salesforce Data Architect, you may come across these.
Data harmonization is the practice of taking data from multiple sources and processing it (often through machine learning and automation) into a standard, accurate, and comprehensive format, meaning anyone interacting with the data can do so in the same way (utilizing a common data language) . For example, calling a product a Stock Keeping Unit (SKU) instead of a product code internally and a product externally.
As the name would suggest, data consolidation is the practice of bringing together and consolidating data from several data sources into a centrally managed data master or hub. It should be noted that this is slightly different from the golden record, as that refers to the surfacing of information, whereas data consolidation refers to the actual management of consolidated data (although it is perfectly acceptable to surface golden records from the MDM data hub).
Data survivorship is concerned with what happens to data that is identified as a duplicate across the
IT enterprise. Typically, data record survivorship can be categorized as follows:
When matching data, weights are the values given to data records, and thresholds are what determine the action to be taken on those records. For example, a threshold may cause two records to automatically be linked, or flag two records for manual intervention. Weights are what calculate the matching score that feeds into the thresholds. For example, a matched email address, postcode, and surname would give a higher score than matching two records purely on a first name, as these may be common across the data set. Our high-weighted record match may feed into a threshold that automatically links the records together.
A canonical data model is a way of modeling data in its simplest form. Typically, this is used in a middleware scenario whereby data is modeled differently in source systems. A canonical data model can be used as a common format from which system-specific data models can be generated or fed into.
An example of a canonical data example would be for a customer master (Salesforce) and a new contact requiring creation in two different connected systems through a middleware layer. Because the two source systems represent contacts in different formats, the middleware would take the Salesforce contact and convert it into a canonical (or simplest) format, from which a transformation can be done to the system-specific format for each source system requiring interaction to create the contact. New connected systems can immediately take advantage of this modeling logic.