Correlation
and linear regression are the most commonly used techniques for investigating
the relationship between two quantitative variables.
The goal of a correlation analysis is to see whether two measurement variables co vary, and to quantify the strength of the relationship between the variables, whereas regression expresses the relationship in the form of an equation.
For example, in students taking a Maths and English test, we could use correlation to determine whether students who are good at Maths tend to be good at English as well, and regression to determine whether the marks in English can be predicted for given marks in Maths.
What a Scatter Diagram Tells Us
The
starting point is to draw a scatter of points on a graph, with one variable on
the X-axis and the other variable on the Y-axis, to get a feel of the
relationship (if any) between the variables as suggested by the data. The
closer the points are to a straight line, the stronger the linear relationship between two variables.
Why Use Correlation?
We can
use the correlation coefficient, such as the Pearson Product Moment Correlation Coefficient, to test if there is a
linear relationship between the variables. To quantify the strength of the
relationship, we can calculate the correlation coefficient (r). Its numerical
value ranges from +1.0 to -1.0. r> 0 indicates positive linear relationship,
r < 0 indicates negative linear relationship while r = 0 indicates no linear
relationship.
Why Use Regression
In regression analysis, the problem of interest is the nature of the
relationship itself between the dependent variable (response) and the (explanatory) independent variable.
The
analysis consists of choosing and fitting an appropriate model, done by the
method of least squares, with a view to exploiting the relationship between the
variables to help estimate the expected response for a
given value of the independent variable. For example, if we are interested in
the effect of age on height, then by fitting a regression line, we can predict the height for a given age.
Uses of Correlation and Regression
There
are three main uses for correlation and regression.
·
One is to test hypotheses about cause-and-effect relationships. In this case,
the experimenter determines the values of the X-variable and sees whether
variation in X causes variation in Y. For example, giving people different
amounts of a drug and measuring their blood pressure.
·
The second main use for correlation and regression is to see
whether two variables are associated, without necessarily inferring a cause-and-effect relationship. In this case, neither variable
is determined by the experimenter; both are naturally variable. If an
association is found, the inference is that variation in X may cause variation
in Y, or variation in Y may cause variation in X, or variation in some other
factor may affect both X and Y.
·
The third common use of linear regression is estimating the value
of one variable corresponding to a particular value of the other variable.
Corresponding
to each observed frequency in an h x k contingency table, there is an expected
(or theoretical) frequency that is computed subject to some hypothesis
according to rules of probability. These frequencies, which occupy the cells of
a contingency table, are called cell frequencies. The total frequency in each
row or each column is called the marginal frequency.
By: Lera Gay Bacay
No comments:
Post a Comment