Neural Network (CS229)


Algorithms that try to mimic the brain.

We usually call sigmoid function as activation function and $\theta$ parameters as weight(s) in neural network algorithm.


$a_i^{(j)}$ is the activation of unit $i$ in layer $j$, $\mathbf{\Theta}^{(j)}$ means the matrix of weights controlling function mapping from layer $j$ to layer $j+1$.

(Slide source:

where $g(z)$ is signoid function.

$x_1,x_1,x_3$ called input layer, $a_1^{(3)}$ called output layer, and the layers between input layer and output layer called hidden layer.

Vectorized implementation of Forward propagation

Simplify equation

for the equation in the figure above, we let

So the equation can be simplify to




x_0 \\
x_1 \\
x_2 \\

z_1^{(2)} \\
z_2^{(2)} \\
z_3^{(2)} \\


Note that $\mathbf{a}^{(2)}$ and $\mathbf{z}^{(2)}$ here where both three-dimension matrix.

If we take

$(1)$ can rewrite into


Finally, by defining $z^{(3)}=\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)}$ and adding $\mathbf{a_0}^{(2)}=1$ (now $\mathbf{a}^{(2)}$ is a 4-dimension matrix)to $\mathbf{a}$, we obtain
The hypothesis equation will become

Neural network learning its own features!