Covariance and Correlation

Covariance

Covariance is a measure of how much two random variables vary together.

  • If X and Y are independent then Cov(X,Y)=0Cov(X, Y ) = 0 . Warning: The converse is false: zero covariance does not always imply independence.

Zero Covariance doesn't mean Independent variables

The key point is that Cov(X,Y)Cov(X, Y ) measures the linear relationship between X and Y . In the above example XX and X2X^2 have a quadratic relationship that is completely missed by Cov(X,Y)Cov(X, Y ).

  • Cov(X1+X2,Y)=Cov(X1,Y)+Cov(X2,Y)Cov(X1 + X2, Y ) = Cov(X1, Y ) + Cov(X2, Y )

Correlation

The units of covariance Cov(X,Y)Cov(X, Y ) are ‘units of XX times units of YY ’. This makes it hard to compare covariances: if we change scales then the covariance changes as well. Correlation is a way to remove the scale from the covariance

Cor(X,Y)=ρ=Cov(X,Y)σxσyCor(X, Y ) = ρ = \frac{Cov(X,Y)}{σ_x σ_y}
  • ρ is dimensionless (it’s a ratio!).

  • −1 ≤ ρ ≤ 1. Furthermore, ρ = +1 if and only if Y = aX + b with a > 0, ρ = −1 if and only if Y = aX + b with a < 0.

Knowing squared correlation( ρ2ρ^2 ) helps to determine the variance in one variable given the other variable. If is ρ2ρ^2 0.7 between x and y, it means that x predict 70% variation in y.

Covariance & Coorelation

Correlation as Dot Product between vectors

Coorelation Doesn't Mean Causality

Correlation doesn't equal causation i.e if x and y are highly correlated doesn't means that either of x or y is cause of other.

Causation can be only determined by probabilty graphs.

Last updated