Published on

Understanding Interactive and Explainable Feedback for Supporting Non-Experts with Data Preparation for Building a Deep Learning Model

Authors

Introduction

Parsing and display of math equations is included in this blog template. Parsing of math is enabled by remark-math and rehype-katex. KaTeX and its associated font is included in _document.js so feel free to use it on any page. 1

Inline math symbols can be included by enclosing the term between the $ symbol.

Math code blocks are denoted by $$.

If you intend to use the $ sign instead of math, you can escape it (\$), or specify the HTML entity ($) 2

Inline or manually enumerated footnotes are also supported. Click on the links above to see them in action.

Deriving the OLS Estimator

Using matrix notation, let nn denote the number of observations and kk denote the number of regressors.

The vector of outcome variables Y\mathbf{Y} is a n×1n \times 1 matrix,

\mathbf{Y} = \left[\begin{array}
  {c}
  y_1 \\
  . \\
  . \\
  . \\
  y_n
\end{array}\right]
Y=[y1...yn]\mathbf{Y} = \left[\begin{array} {c} y_1 \\ . \\ . \\ . \\ y_n \end{array}\right]

The matrix of regressors X\mathbf{X} is a n×kn \times k matrix (or each row is a k×1k \times 1 vector),

\mathbf{X} = \left[\begin{array}
  {ccccc}
  x_{11} & . & . & . & x_{1k} \\
  . & . & . & . & .  \\
  . & . & . & . & .  \\
  . & . & . & . & .  \\
  x_{n1} & . & . & . & x_{nn}
\end{array}\right] =
\left[\begin{array}
  {c}
  \mathbf{x}'_1 \\
  . \\
  . \\
  . \\
  \mathbf{x}'_n
\end{array}\right]
X=[x11...x1k...............xn1...xnn]=[x1...xn]\mathbf{X} = \left[\begin{array} {ccccc} x_{11} & . & . & . & x_{1k} \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ x_{n1} & . & . & . & x_{nn} \end{array}\right] = \left[\begin{array} {c} \mathbf{x}'_1 \\ . \\ . \\ . \\ \mathbf{x}'_n \end{array}\right]

The vector of error terms U\mathbf{U} is also a n×1n \times 1 matrix.

At times it might be easier to use vector notation. For consistency, I will use the bold small x to denote a vector and capital letters to denote a matrix. Single observations are denoted by the subscript.

Least Squares

Start:
yi=xiβ+uiy_i = \mathbf{x}'_i \beta + u_i

Assumptions:

  1. Linearity (given above)
  2. E(UX)=0E(\mathbf{U}|\mathbf{X}) = 0 (conditional independence)
  3. rank(X\mathbf{X}) = kk (no multi-collinearity i.e. full rank)
  4. Var(UX)=σ2InVar(\mathbf{U}|\mathbf{X}) = \sigma^2 I_n (Homoskedascity)

Aim:
Find β\beta that minimises the sum of squared errors:

Q=i=1nui2=i=1n(yixiβ)2=(YXβ)(YXβ)Q = \sum_{i=1}^{n}{u_i^2} = \sum_{i=1}^{n}{(y_i - \mathbf{x}'_i\beta)^2} = (Y-X\beta)'(Y-X\beta)

Solution:
Hints: QQ is a 1×11 \times 1 scalar, by symmetry bAbb=2Ab\frac{\partial b'Ab}{\partial b} = 2Ab.

Take matrix derivative w.r.t β\beta:

\begin{aligned}
  \min Q           & = \min_{\beta} \mathbf{Y}'\mathbf{Y} - 2\beta'\mathbf{X}'\mathbf{Y} +
  \beta'\mathbf{X}'\mathbf{X}\beta \\
                   & = \min_{\beta} - 2\beta'\mathbf{X}'\mathbf{Y} + \beta'\mathbf{X}'\mathbf{X}\beta \\
  \text{[FOC]}~~~0 & =  - 2\mathbf{X}'\mathbf{Y} + 2\mathbf{X}'\mathbf{X}\hat{\beta}                  \\
  \hat{\beta}      & = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}                              \\
                   & = (\sum^{n} \mathbf{x}_i \mathbf{x}'_i)^{-1} \sum^{n} \mathbf{x}_i y_i
\end{aligned}
minQ=minβYY2βXY+βXXβ=minβ2βXY+βXXβ[FOC]   0=2XY+2XXβ^β^=(XX)1XY=(nxixi)1nxiyi\begin{aligned} \min Q & = \min_{\beta} \mathbf{Y}'\mathbf{Y} - 2\beta'\mathbf{X}'\mathbf{Y} + \beta'\mathbf{X}'\mathbf{X}\beta \\ & = \min_{\beta} - 2\beta'\mathbf{X}'\mathbf{Y} + \beta'\mathbf{X}'\mathbf{X}\beta \\ \text{[FOC]}~~~0 & = - 2\mathbf{X}'\mathbf{Y} + 2\mathbf{X}'\mathbf{X}\hat{\beta} \\ \hat{\beta} & = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \\ & = (\sum^{n} \mathbf{x}_i \mathbf{x}'_i)^{-1} \sum^{n} \mathbf{x}_i y_i \end{aligned}

Footnotes

  1. For the full list of supported TeX functions, check out the KaTeX documentation

  2. $10 and $20.