- [[linear regression closed-form calculus solution]], [[unbiased estimator]], [[variance of OLS estimates]], [[linear regression normal equation pros and cons]]
# Idea
The analytic approach to minimizing the cost function in [[linear regression]] is the "normal equation" method.
$y = X\beta$
We try to find the column vector $\beta$ that [[linear combination of columns|linearly combines the columns]] in $X$ in a way to produce the column vector $y$. Each observation/row can be written as an equation as follows:
$\begin{gathered}
y_1=\beta_0+\beta_1 x_{11}+\cdots+\beta_k x_{1 k}+\varepsilon_1 \\
y_2=\beta_0+\beta_1 x_{21}+\cdots+\beta_k x_{2 k}+\varepsilon_2 \\
\vdots \\
y_n=\beta_0+\beta_1 x_{n 1}+\cdots+\beta_k x_{n k}+\varepsilon_n
\end{gathered}$
In matrix terms ([[linear combination of columns]]):
$\begin{aligned}
& {\left[\begin{array}{c}
Y_1 \\
Y_2 \\
\vdots \\
Y_n
\end{array}\right]=\left[\begin{array}{c}
\beta_0+\beta_1 X_1 \\
\beta_0+\beta_1 X_2 \\
\vdots \\
\beta_0+\beta_1 X_n
\end{array}\right]+\left[\begin{array}{c}
\epsilon_1 \\
\epsilon_2 \\
\vdots \\
\epsilon_n
\end{array}\right]} \\
& {\left[\begin{array}{c}
Y_1 \\
Y_2 \\
\vdots \\
Y_n
\end{array}\right]=\left[\begin{array}{cc}
1 & X_1 \\
1 & X_2 \\
\vdots & \vdots \\
1 & X_n
\end{array}\right]\left[\begin{array}{l}
\beta_0 \\
\beta_1
\end{array}\right]+\left[\begin{array}{c}
\epsilon_1 \\
\epsilon_2 \\
\vdots \\
\epsilon_n
\end{array}\right]}
\end{aligned}$
The estimate of $\beta$ is given by
$
\hat{\beta}=\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{y}
$
## Parameter estimates
To find $\beta$, we need to eliminate $X$, which means we need to take the inverse of $X$. But we can only take the inverse of symmetric/square matrices, and $X$ is often rectangular (not square). To fix this issue, we pre-multiply both sides by the tranpose of $X$, ${X}^{\top}$:
$y = X\beta$
${X}^{\top}y = {X}^{\top}X\beta$
Now that ${X}^{\top}X$ is a square matrix, we compute its inverse and then pre-multiply it to both sides so we can eliminate ${X}^{\top}X$ from the right-hand side:
$({X}^{\top}X)^{-1}{X}^{\top}y = ({X}^{\top}X)^{-1}({X}^{\top}X)\beta$
$({X}^{\top}X)^{-1}{X}^{\top}y = \beta$
Thus,
$\beta = (X^{\top}X)^{-1}X^{\top}y$
$
\beta=\left[\begin{array}{c}
\beta_0 \\
\beta_1 \\
\cdot \\
\cdot \\
\cdot \\
\beta_n \\
\end{array}\right]=\left( {X}^{\top} {X}\right)^{-1} {X}^{\top} y
$
Because $X^{\top}X$ is basically the [[covariance]] of $X$ (if each column vector in $X$ is demeaned), and $X^{\top}y$ the covariance of $X$ and $y$, so the matrix solution is conceptually equivalent to the following:
$\frac{Cov(X,y)}{Cov(X)}$
When $X$ is a single column/feature, then the above covariance formula is equivalent to the matrix solution above!
## Code implementation
```r
b1 <- 0.10
b2 <- 0.20
b3 <- 0.30
X <- matrix(rnorm(30, 0, 5), nrow = 10)
y <- X[, 1] * b1 + X[, 2] * b2 + X[, 3] * b3
X
[,1] [,2] [,3]
[1,] -0.43493167 -5.43454408 1.766992
[2,] 6.91142004 -9.13041506 3.634068
[3,] 0.84245082 4.97640904 3.341305
[4,] 4.11595474 -0.05930891 -12.121587
[5,] -1.10447299 -2.99814197 -1.176787
[6,] -5.14695827 -0.88973993 9.898167
[7,] -0.05462845 -2.12990671 3.983973
[8,] -6.12495578 4.98329388 -8.546381
[9,] -12.98055694 3.63830354 -8.318344
[10,] 5.84561296 -8.63315298 2.455548
y
[1] -0.60030424 -0.04472051 2.08191835 -3.23674227 -1.06311183 2.27680617 0.76374762 -2.17975107
[9] -3.06589805 -0.40540497
solve(t(X) %*% X) %*% (t(X) %*% y)
[,1]
[1,] 0.1
[2,] 0.2
[3,] 0.3
round(coef(lm(y ~ X)), 2)
(Intercept) X1 X2 X3
0.0 0.1 0.2 0.3
```
```r
# one feature
b1 <- 0.13
X <- matrix(rnorm(10, 0, 5), nrow = 10)
y <- X[, 1] * b1
X
[,1]
[1,] 10.0285929
[2,] -10.3528574
[3,] 15.2787118
[4,] -1.3067530
[5,] -2.2719663
[6,] 0.7878028
[7,] 4.6669436
[8,] 1.5141414
[9,] -9.7807511
[10,] 1.7676835
y
[1] 1.3037171 -1.3458715 1.9862325 -0.1698779 -0.2953556 0.1024144 0.6067027 0.1968384
[9] -1.2714976 0.2297989
solve(t(X) %*% X) %*% (t(X) %*% y)
[,1]
[1,] 0.13
round(coef(lm(y ~ X)), 2)
(Intercept) X
0.00 0.13
cov(X, y) / cov(X)
[,1]
[1,] 0.13
```
# References
- [The Mean and Variance of Estimated Regression Parameters in a Full Rank Gauss-Markov Linear Model - YouTube](https://www.youtube.com/watch?v=jyBtfhQsf44)
- https://online.stat.psu.edu/stat462/node/132/
- [r - How are the standard errors of coefficients calculated in a regression? - Cross Validated](https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression/44841#44841)
- [How calculate variance-covariance matrix of coefficients for multivariate (multiple) linear regression? - Cross Validated](https://stats.stackexchange.com/questions/467306/how-calculate-variance-covariance-matrix-of-coefficients-for-multivariate-multi)