# Origins

Algorithms that try to mimic the brain.

We usually call sigmoid function as activation function and $\theta$ parameters as weight(s) in neural network algorithm.

# Notations

$a_i^{(j)}$ is the activation of unit $i$ in layer $j$, $\mathbf{\Theta}^{(j)}$ means the matrix of weights controlling function mapping from layer $j$ to layer $j+1$.

(Slide source: https://www.coursera.org/learn/machine-learning/)

where $g(z)$ is signoid function.

$x_1,x_1,x_3$ called input layer, $a_1^{(3)}$ called output layer, and the layers between input layer and output layer called hidden layer.

# Vectorized implementation of Forward propagation

## Simplify equation

for the equation in the figure above, we let
$$\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3=z_1^{(2)}$$
$$\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3=z_2^{(2)}$$
$$\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3=z_3^{(2)}$$

So the equation can be simplify to

$$a_1^{(2)}=g\left(z_1^{(2)}\right)$$
$$a_2^{(2)}=g\left(z_2^{(2)}\right)$$
$$a_3^{(2)}=g\left(z_3^{(2)}\right)$$

$$h_{\Theta}(x)=g\left(z_1^{(2)}\right)$$

## Vectorize

$$\mathbf{x}= \begin{Bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{Bmatrix}$$

$$\mathbf{z}^{(2)}= \begin{Bmatrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \\ \end{Bmatrix}$$

$$\mathbf{z}^{(2)}=\mathbf{\Theta}^{(1)}x\tag{1}$$
$$\mathbf{a}^{(2)}=g\left(\mathbf{z}^{(2)}\right)$$

Note that $\mathbf{a}^{(2)}$ and $\mathbf{z}^{(2)}$ here where both three-dimension matrix.

If we take
$$\mathbf{a}^{(1)}=\mathbf{x}$$

$(1)$ can rewrite into

$$\mathbf{z}^{(2)}=\mathbf{\Theta}^{(1)}\mathbf{a}^{(1)}$$

Finally, by defining $z^{(3)}=\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)}$ and adding $\mathbf{a_0}^{(2)}=1$ (now $\mathbf{a}^{(2)}$ is a 4-dimension matrix)to $\mathbf{a}$, we obtain
$$\mathbf{z}^{(3)}=\mathbf{\Theta}^{(2)}\mathbf{a}^{(2)}$$
The hypothesis equation will become
$$h_{\Theta}(x)=\mathbf{a}^{(3)}=g\left(\mathbf{z}^{(3)}\right)$$

Neural network learning its own features!