Data Understanding and Visualization – MLS-C01 Study Guide

Data visualization is an art! No matter how much effort you and your team put into data preparation and preliminary analysis for modeling, if you don’t know how to show your findings effectively, your audience may not understand the point you are trying to make.

Often, such situations may be even worse when you are dealing with decision-makers. For example, if you choose the wrong set of charts to tell a particular story, people can misinterpret your analysis and make bad decisions.

Understanding the different types of data visualizations, and knowing how they fit with each type of analysis, will put you in a very good position in terms of engaging your audience and transmitting the information you want.

In this chapter, you will learn about some data visualization techniques. You will be covering the following topics:

  • Visualizing relationships in your data
  • Visualizing comparisons in your data
  • Visualizing compositions in your data
  • Visualizing distributions in your data
  • Building key performance indicators
  • Introducing QuickSight

You already know why you need to master these topics. Get started!

Visualizing relationships in your data

When you need to show relationships in your data, you are usually talking about plotting two or more variables in a chart to visualize their level of dependency. A scatter plot is probably the most common type of chart to show the relationship between two variables. Figure 5.1 shows a scatter plot for two variables, X and Y.

Figure 5.1 – Plotting relationships with a scatter plot

Figure 5.1 shows a clear relationship between X and Y. As X increases, Y also increases. In this particular case, you can say that there is a linear relationship between both variables. Keep in mind that scatter plots may also catch other types of relationships, not only linear ones. For example, it would also be possible to find an exponential relationship between the two variables.

Another nice chart to make comparisons with is the bubble chart. Just like a scatter plot, it will also show the relationship between variables; however, here, you can use a third dimension, which will be represented by the size of the point.

Figure 5.2 is a bubble chart that explains an investment schema, where the x axis is the annual rate, y is the investment period, and the size of the bubble indicates the amount allocated to each investment option.

Figure 5.2 – Plotting relationships with a bubble chart

Looking at Figure 5.2, you can see two types of relationships. The first one is the relationship between the annual rate and investment period: the longer the investment period, the higher your annual rate. The second one is the relationship between the amount invested and the annual rate: the higher the amount invested, the higher your annual rate. As you can see, this is a very effective way to present this type of analysis. Next, you will learn how to compare variables.