- [[linear regression closed-form calculus solution]], [[unbiased estimator]], [[variance of OLS estimates]], [[linear regression normal equation pros and cons]] # Idea The analytic approach to minimizing the cost function in [[linear regression]] is the "normal equation" method. $y = X\beta$ We try to find the column vector $\beta$ that [[linear combination of columns|linearly combines the columns]] in $X$ in a way to produce the column vector $y$. Each observation/row can be written as an equation as follows: $\begin{gathered} y_1=\beta_0+\beta_1 x_{11}+\cdots+\beta_k x_{1 k}+\varepsilon_1 \\ y_2=\beta_0+\beta_1 x_{21}+\cdots+\beta_k x_{2 k}+\varepsilon_2 \\ \vdots \\ y_n=\beta_0+\beta_1 x_{n 1}+\cdots+\beta_k x_{n k}+\varepsilon_n \end{gathered}$ In matrix terms ([[linear combination of columns]]): $\begin{aligned} & {\left[\begin{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{array}\right]=\left[\begin{array}{c} \beta_0+\beta_1 X_1 \\ \beta_0+\beta_1 X_2 \\ \vdots \\ \beta_0+\beta_1 X_n \end{array}\right]+\left[\begin{array}{c} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{array}\right]} \\ & {\left[\begin{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{array}\right]=\left[\begin{array}{cc} 1 & X_1 \\ 1 & X_2 \\ \vdots & \vdots \\ 1 & X_n \end{array}\right]\left[\begin{array}{l} \beta_0 \\ \beta_1 \end{array}\right]+\left[\begin{array}{c} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{array}\right]} \end{aligned}$ The estimate of $\beta$ is given by $ \hat{\beta}=\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{y} $ ## Parameter estimates To find $\beta$, we need to eliminate $X$, which means we need to take the inverse of $X$. But we can only take the inverse of symmetric/square matrices, and $X$ is often rectangular (not square). To fix this issue, we pre-multiply both sides by the tranpose of $X$, ${X}^{\top}$: $y = X\beta$ ${X}^{\top}y = {X}^{\top}X\beta$ Now that ${X}^{\top}X$ is a square matrix, we compute its inverse and then pre-multiply it to both sides so we can eliminate ${X}^{\top}X$ from the right-hand side: $({X}^{\top}X)^{-1}{X}^{\top}y = ({X}^{\top}X)^{-1}({X}^{\top}X)\beta$ $({X}^{\top}X)^{-1}{X}^{\top}y = \beta$ Thus, $\beta = (X^{\top}X)^{-1}X^{\top}y$ $ \beta=\left[\begin{array}{c} \beta_0 \\ \beta_1 \\ \cdot \\ \cdot \\ \cdot \\ \beta_n \\ \end{array}\right]=\left( {X}^{\top} {X}\right)^{-1} {X}^{\top} y $ Because $X^{\top}X$ is basically the [[covariance]] of $X$ (if each column vector in $X$ is demeaned), and $X^{\top}y$ the covariance of $X$ and $y$, so the matrix solution is conceptually equivalent to the following: $\frac{Cov(X,y)}{Cov(X)}$ When $X$ is a single column/feature, then the above covariance formula is equivalent to the matrix solution above! ## Code implementation ```r b1 <- 0.10 b2 <- 0.20 b3 <- 0.30 X <- matrix(rnorm(30, 0, 5), nrow = 10) y <- X[, 1] * b1 + X[, 2] * b2 + X[, 3] * b3 X [,1] [,2] [,3] [1,] -0.43493167 -5.43454408 1.766992 [2,] 6.91142004 -9.13041506 3.634068 [3,] 0.84245082 4.97640904 3.341305 [4,] 4.11595474 -0.05930891 -12.121587 [5,] -1.10447299 -2.99814197 -1.176787 [6,] -5.14695827 -0.88973993 9.898167 [7,] -0.05462845 -2.12990671 3.983973 [8,] -6.12495578 4.98329388 -8.546381 [9,] -12.98055694 3.63830354 -8.318344 [10,] 5.84561296 -8.63315298 2.455548 y [1] -0.60030424 -0.04472051 2.08191835 -3.23674227 -1.06311183 2.27680617 0.76374762 -2.17975107 [9] -3.06589805 -0.40540497 solve(t(X) %*% X) %*% (t(X) %*% y) [,1] [1,] 0.1 [2,] 0.2 [3,] 0.3 round(coef(lm(y ~ X)), 2) (Intercept) X1 X2 X3 0.0 0.1 0.2 0.3 ``` ```r # one feature b1 <- 0.13 X <- matrix(rnorm(10, 0, 5), nrow = 10) y <- X[, 1] * b1 X [,1] [1,] 10.0285929 [2,] -10.3528574 [3,] 15.2787118 [4,] -1.3067530 [5,] -2.2719663 [6,] 0.7878028 [7,] 4.6669436 [8,] 1.5141414 [9,] -9.7807511 [10,] 1.7676835 y [1] 1.3037171 -1.3458715 1.9862325 -0.1698779 -0.2953556 0.1024144 0.6067027 0.1968384 [9] -1.2714976 0.2297989 solve(t(X) %*% X) %*% (t(X) %*% y) [,1] [1,] 0.13 round(coef(lm(y ~ X)), 2) (Intercept) X 0.00 0.13 cov(X, y) / cov(X) [,1] [1,] 0.13 ``` # References - [The Mean and Variance of Estimated Regression Parameters in a Full Rank Gauss-Markov Linear Model - YouTube](https://www.youtube.com/watch?v=jyBtfhQsf44) - https://online.stat.psu.edu/stat462/node/132/ - [r - How are the standard errors of coefficients calculated in a regression? - Cross Validated](https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression/44841#44841) - [How calculate variance-covariance matrix of coefficients for multivariate (multiple) linear regression? - Cross Validated](https://stats.stackexchange.com/questions/467306/how-calculate-variance-covariance-matrix-of-coefficients-for-multivariate-multi)