Humans are wired to look for connections and then use those connections to learn about the world around them. One way to notice connections is by looking for a pair of variables with a relationship. In order to learn about how the variables are related, we want to control one of the variables and see if there are changes in the other variable. For example, if we notice that people who tend to eat many calories also have a higher chance of having a heart attack, we might wonder if lowering our calorie intake would improve our health.
One common mistake people tend to make while using statistics is thinking that all relationships between variables are causal. Scatter plots can only show a relationship between the two variables. To determine if a change in one of the variables actually causes a change in the other variable, or if it has a causal relationship, the context must be better understood, and other options must be ruled out.
For example, we might expect to see a strong, positive relationship between the number of snowboard rentals and sales of hot chocolate during the months of September through January. This does not mean that an increase in snowboard rentals causes people to purchase more hot chocolate. Nor does it mean that increased sales of hot chocolate cause people to rent snowboards more. More likely there is a third variable, such as colder weather, that might be causing both variables to increase at the same time.
On the other hand, sometimes there is a causal relationship. A strong, positive relationship between hot chocolate sales and small marshmallow sales may be linked, because people buying hot chocolate may want to add small marshmallows to the drink, so an increase in the sales of hot chocolate are actually causing the marshmallow sale increase.
Finding relationships with the help of the correlation coefficient is a very good way to notice that there is a connection between variables. To determine whether the relationship is causal, the next step is usually to carefully design an experiment that isolates and precisely controls only one of the variables to determine how it affects the other variable.