Not all roles available for this page.
Sign in to view assessments and invite other educators
Sign in using your existing Kendall Hunt account. If you don’t have one, create an educator account.
There is an association between two variables if they are statistically related to each other. This means that the value of one variable can be used to estimate the value of the other. An association can apply to categorical data or numerical data.
A categorical variable is a variable that takes on values that are divided into groups, or categories.
For example, color is a categorical variable that can take on the values, red, blue, green, and so on.
In a causal relationship, a change in one of the variables causes a change in the other variable.
A correlation coefficient is a number between -1 and 1 that describes the strength and direction of a linear relationship between two numerical variables.
Correlation coefficient is close to 1.
Correlation coefficient is positive, and closer to 0.
Correlation coefficient is close to -1.
Two numerical variables have a negative relationship if an increase in the data for one variable tends to be paired with a decrease in the data for the other variable.
This scatter plot shows a negative relationship.
Two numerical variables have a positive relationship if an increase in the data for one variable tends to be paired with an increase in the data for the other variable.
This scatter plot shows a positive relationship.
A relative frequency table is a version of a two-way table that shows how often data values occur in relation to a total. Each entry in the table shows the frequency of one response divided by the total number of responses in the entire table or by the total number of responses in a row or a column.
Each entry in this relative frequency table represents the proportion of all the textbooks that have the characteristics given by its row and column. For example, out of all 1,000 textbooks, the proportion of textbooks that are new and \$10 or less is 0.025, or 2.5%.
frequency table
| $10 or less | more than \$10 but less than \$30 | $30 or more | total | |
|---|---|---|---|---|
| new | 25 | 75 | 225 | 325 |
| used | 275 | 300 | 100 | 675 |
| total | 300 | 375 | 325 | 1,000 |
relative frequency table
| $10 or less | more than \$10 but less than \$30 | $30 or more | |
|---|---|---|---|
| new | \(0.025 = \frac{25}{1000}\) | 0.300 | 0.225 |
| used | 0.275 | 0.300 | 0.100 |
A residual is the difference between an actual data value and its value predicted by a model. It can be found by subtracting the \(y\)-value predicted by the linear model from the \(y\)-value for the data point.
On a scatter plot, the residual can be seen as the vertical distance between a data point and the best-fit line.
The lengths of the dashed segments on this scatter plot show the residuals for each data point.
Two numerical variables have a strong relationship if the data is tightly clustered around the best-fit line.
A two-way table is a way of organizing data from two categorical variables in order to investigate the association between them.
This two-way table can be used to study the relationship between age group and cell phone ownership.
| has a cell phone | does not have a cell phone | |
|---|---|---|
| 10–12 years old | 25 | 35 |
| 13–15 years old | 38 | 12 |
| 16–18 years old | 52 | 8 |
A variable is a characteristic of individuals in a population that can take on different values.