All About Knowledge: Pearson's Correlation Coefficient

Pearson's Correlation Coefficient

In statistics, the Pearson product-moment correlation coefficient is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. It is widely used in the sciences as a measure of the degree of linear dependence between two variables.

Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.

Pearson's correlation coefficient when applied to a population is commonly represented by the Greek letter ρ (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient. The formula for ρ^[7] is:

\rho_{X,Y}= \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}

where:

$\operatorname{cov}$ is the covariance
$\sigma_X$ is the standard deviation of $X$

The formula for ρ can be expressed in terms of mean and expectation. Since

$\operatorname{cov}(X,Y) = \operatorname{E}[(X-\mu_X)(Y-\mu_Y)]$ ^[7]

Then the formula for ρ can also be written as

\rho_{X,Y}=\frac{\operatorname{E}[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X\sigma_Y}

where:

$\operatorname{cov}$ and $\sigma_X$ are defined as above
$\mu_X$ is the mean of $X$
$\operatorname{E}$ is the expectation.

The formula for ρ can be expressed in terms of uncentered moments. Since

$\mu_X=\operatorname{E}[X]$
$\mu_Y=\operatorname{E}[Y]$
$\sigma_X^2=\operatorname{E}[(X-\operatorname{E}[X])^2]=\operatorname{E}[X^2]-\operatorname{E}[X]^2$
$\sigma_Y^2=\operatorname{E}[(Y-\operatorname{E}[Y])^2]=\operatorname{E}[Y^2]-\operatorname{E}[Y]^2$
$\operatorname{E}[(X-\mu_X)(Y-\mu_Y)]=\operatorname{E}[(X-\operatorname{E}[X])(Y-\operatorname{E}[Y])]=\operatorname{E}[XY]-\operatorname{E}[X]\operatorname{E}[Y],\,$

Then the formula for ρ can also be written as

\rho_{X,Y}=\frac{\operatorname{E}[XY]-\operatorname{E}[X]\operatorname{E}[Y]}{\sqrt{\operatorname{E}[X^2]-\operatorname{E}[X]^2}~\sqrt{\operatorname{E}[Y^2]- \operatorname{E}[Y]^2}}.

Covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.^[1] In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age and blood pressure. Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables.

The first step in studying the relationship between two continuous variables is to draw a scatter plot of the variables to check for linearity. The correlation coefficient should not be calculated if the relationship is not linear. For correlation only purposes, it does not really matter on which axis the variables are plotted. However, conventionally, the independent (or explanatory) variable is plotted on the x-axis (horizontally) and the dependent (or response) variable is plotted on the y-axis (vertically).

The nearer the scatter of points is to a straight line, the higher the strength of association between the variables. Also, it does not matter what measurement units are used.

Values of Pearson's correlation coefficient

Pearson's correlation coefficient (r) for continuous (interval level) data ranges from -1 to +1:

r = -1		data lie on a perfect straight line with a negative slope
r = 0		no linear relationship between the variables
r = +1		data lie on a perfect straight line with a positive slope

Positive correlation indicates that both variables increase or decrease together, whereas negative correlation indicates that as one variable increases, so the other decreases, and vice versa.

All About Knowledge

Wednesday, 27 January 2016

Pearson's Correlation Coefficient

Pearson's Correlation Coefficient

Values of Pearson's correlation coefficient

No comments:

Post a Comment