Cost Function in Neural Network (CS229)



$m$ training sets.


Total numbers of layers in network.


numbers of units (without bias unit) in layer $l$ of the network. So $s_L$ will be the numbers of the output layer.

For example, in binary classification problem, $s_L=1$, $y$ can only be $0$ or $1$ for one output unit, it means that the result implies the input is / isn’t a specific class. Furthermore, in multi-classes problem, say $K$ distint classes, $y\in\mathbb{R}^{K}$, there are $K$ output units and $s_L=K$, $h_{\theta}(x)\in\mathbb{R}^{K}$.

As a note, we only use one vs all method when the classes number is greater than or equal to three, i.e. $K\ge3$ in a multi-classes problem.


$K=s_L$ also means the numbers of the output layer.


$i^{th}$ output.

Cost function

Regularized Logistic regression


Generalization of Regularized Logistic regression

For $h_{\Theta}(x)\in\mathbb{R}^K$, i.e. $h_{\Theta}(x)$ is a $K$ dimensional vector,


Same as above, $i$ start from $1$, we don’t regularize $\Theta_{i0}^{(l)}$.