In this chapter, we’ll begin our journey into the theory behind the credential syllabus, starting with Data Modelling and Database Design. You will learn how a Salesforce data model is architected, the design decisions behind it, and how a data model you may create in your job as a Salesforce Data Architect affects sharing and security. We’ll go over the Salesforce sharing model so that you have an overview of what is available when considering options around data access.
Next, we’ll look at the different types of objects (such as standard, custom, big, and external), and how they fit into a data model design strategy. While we’re there, we’ll cover the different relationship types so that you can understand when to apply them.
Turning to performance, we’ll cover data skew to provide you with an understanding of this concept in Salesforce. We’ll then cover diagramming techniques so that you can begin to effectively design and represent Salesforce data models.
In this chapter, we’ll be covering the following topics:
Salesforce abstracts the underlying data structure in its database for a given customer org by providing a faux RDBMS view of the objects within it. For example, take the diagram you find when viewing the Sales Cloud ERM at https://architect.salesforce.com/diagrams/template-gallery/sales-cloud-overview-data-model:
Figure 2.1 – The Salesforce Sales Cloud data model
As we can see, a contact record will have an AccountId represented as a lookup (which would be a foreign key in RDBMS terminology). Take this example for Account and Contact from Schema Builder for a brand-new Developer Edition org:
Figure 2.2 – Account and Contact represented in Schema Builder in a brand-new Developer Edition org
See for Yourself
Feel free to sign up for a (completely free) Developer Edition Salesforce org at https:// developer.salesforce.com/signup in order to follow along with the examples in this book, including exploring Schema Builder.
Salesforce would have you think that Account and Contact exist as two separate database tables (and indeed there would be a separate table for each standard or custom object in the schema). Indeed, it is perfectly normal to think about the objects in this way in your day-to-day work as a Salesforce Data Architect.
The truth is, however, that Salesforce has a few large, underlying database tables (one for objects, one for fields, one for data that maps to those objects and fields, pivot tables, and so on) that provide for a virtual data structure for each org that is materialized at runtime. One virtual storage area contains standard objects and standard fields, another contains standard objects and custom fields, and another contains custom objects and their custom fields. This is explained in further detail athttps:// developer.salesforce.com/wiki/multi_tenant_architecture. In essence, the standard object/standard fields storage area is populated when a new org is provisioned, and the custom storage areas are added to whenever a new custom object or custom field is added (no matter whether that custom field is added to a standard or custom object). This structure can therefore be represented as follows, imagining a custom field is added to each of the standard Account and Contact objects:
Figure 2.3 – Representation of the Salesforce database (object data only) with
a custom field added to the Account and Contact standard objects
Let’s now add a custom object to see how the virtual storage is affected and what that change looks like in Schema Builder:
Figure 2.4 – Representation of the Salesforce database (object data only) with a custom
object added to the org and a custom field added to that new custom object
Figure 2.5 includes both the custom object and the custom field in the schema:
Figure 2.5 – Schema Builder with the representation of the newly added custom object and custom field
Armed with the knowledge of how the underlying Salesforce data model is represented (which is important for performance tuning and the like, covered later in this book), we can now turn our attention to what standard objects are available in Salesforce. These objects have been created to address common use cases (such as accounts to represent companies and contacts to represent people), and therefore should be used as much as possible. This not only reduces reinventing the wheel, but also Salesforce has extended the standard functionality to account for specific use cases, such as account teams. Account teams extend the sharing model for accounts to allow bespoke sharing among a team of folks that will collaboratively work on an account. Similar functionality is available for cases and opportunities (called case teams and opportunity teams, respectively). This teams functionality is an example of a standard feature supported in the Salesforce data model that isn’t extended to custom objects.
A full and exhaustive listing of the available objects in Salesforce and their fields is provided by the API documentation (available at https://developer.salesforce.com/docs/atlas. en-us.api.meta/api/sforce_api_objects_list.htm), but some of the most commonly used objects are as follows:
Standard Object Data Model Diagrams
The Salesforce standard data models are available at https://developer.salesforce. com/docs/atlas.en-us.api.meta/api/data_model.htm (remember, they show the logical representation of the objects and their relationships, not the physical database structure).
Now we are aware of the standard objects, and the three tables that facilitate standard objects/standard fields, standard objects/custom fields, and custom objects/custom fields, we can now turn our attention to understanding how data queries work, paying attention to performance across these tables.
When issuing a form of query to the Salesforce database for data, such as Salesforce Object Query Language (SOQL) or a REST API call, that request is turned into system-generated SQL for the underlying database. It is not possible to tune the SQL that is generated but being mindful of how our data queries will affect the generation of the SQL provides us with something that we can affect to a certain extent. It is within these parameters that we can optimize our data queries.
For example, consider this SOQL query for the first 10 account record names as they would appear in an alphabetically sorted list:
SELECT Name FROM Account ORDER BY Name ASC LIMIT 10
Given that we’re querying a standard field from a standard object, Salesforce will only need to retrieve data from the standard objects/standard fields table for Account.
When adding a custom field into the mix (which we’ll assume is unimaginatively called CustomField for the purposes of this explanation), we’re then asking the underlying database query to have a JOIN in order to allow data from the Standard Object/Custom Fields table to be retrieved. Therefore, a query as simple as this will cause data to be retrieved from more than one underlying database table:
SELECT Name, CustomField__c FROM Account ORDER BY Name ASC LIMIT 10;
Now consider a third scenario. We are going to query a custom object (called CustomObj__c) that has a lookup to an account (called Account__c), and also place the name of the account and the custom account field from the last example in the results:
SELECT Name, Account__c.Name, Account__c.CustomField__c FROM CustomObj__c ORDER BY Name ASC LIMIT 10;
We’ve now queried the Custom Object/Custom Fields, Standard Object/Custom Fields, and Standard Object/Standard Fields tables, all for the retrieval of 10 rows worth of data from 3 fields!
It is therefore best practice to design our data access queries to avoid joins as much as possible, therefore ensuring we get the best performance possible. Given the way in which the underlying Salesforce data is structured, having a well-structured data model (avoiding joins as much as possible) means reporting, dashboards, and list views will perform well.
Chapter 6, Understanding Large Data Volumes, describes the issues that can arise with large amounts of data (particularly with performance) and how to mitigate them.
With the Salesforce data model now introduced, let’s turn our attention to sharing and security, which affect how data is accessed.