Tips in programming (CS229)

Vectorizaton

Normal equation

$$h_\theta(x)=\sum_{j=0}^n\theta_jx_j\tag{1}$$
$$=\mathbf{\theta^\top x}\tag{2}$$

where

$$\mathbf{\theta}=
\begin{Bmatrix}
\theta_0\\
\theta_1\\
\theta_2
\end{Bmatrix}
$$

$$\mathbf{x}=
\begin{Bmatrix}
x_0\\
x_1\\
x_2
\end{Bmatrix}
$$

With unvectorized eqution $(1)$, we implement $h_\theta(x)$ as:

1
2
3
4
prediction = 0.0;
for j = 1: (n+1)
prediction = prediction + theta(j) * x(j)
end;

To simply our implementation and without using for loop, the vectorized equation $(2)$ can be rewrite to:

1
prediction = theta' * x;

theta’ stands for the transpose matrix of theta matrix.

C++ example

Unvectorized implementation

1
2
3
double prediction = 0.0;
for (j = 0; j < n; j++)
prediction += theta[j] * x[j];

Vectorized implementation

1
double prediction = theta.transpose() * x;

Gradient descendent

Gradient descent equations for $n=2$:

$$\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*x_0^{(i)}$$
$$\theta_1:=\theta_1-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*x_1^{(i)}$$
$$\theta_2:=\theta_2-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*x_2^{(i)}$$

Vectorized implementation

We can simplify the equations above to:

$$\theta:=\theta-\alpha\delta$$

Where

$$\delta_j=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})*x_j^{(i)}$$

$$\mathbf{\delta}=
\begin{Bmatrix}
\delta_0\\
\delta_1\\
\delta_2
\end{Bmatrix}$$

$$\mathbf{x^{(i)}}=
\begin{Bmatrix}
x_0^{(i)}\\
x_1^{(i)}\\
x_2^{(i)}
\end{Bmatrix}$$

$h_\theta$ and $x^{(i)}$ are vectors, $h_\theta*x^{(i)}$ turns out to be a scalar.

$(h_\theta x^{(i)}-y^{(i)})*x_j^{(i)}$ is a vector.

Using vectorization skill is much more efficient than for loop while there are many features for us to consider.