# One vs all classification

## Examples

- Email foldering : work$(y=1)$, firends$(y=2)$, family$(y=3)$, hobby$(y=4)$, …
- Medical diagrams : not ill$(y=1)$, cold$(y=2)$, flu$(y=3)$, …
- Weather : Sunny$(y=1)$, Cloudy$(y=2)$, Rain$(y=3)$, Snow$(y=4)$, …

Figure source : Machine learning class on coursera by Andrew Ng

## How we do

Figure source : Machine learning class on coursera by Andrew Ng

There are three classes of data in the figure above, marked as different shape / color. First, we start class 1 as triangle, and we’re going to essantially create a new sort of fake training set, where the class 2 and 3 get assigned to the negative class. With the created new training set, we noted that $h_{\theta}^{(1)}(x)$ triangles are positive examples$(y=1)$ and circles are those negative examples$(y=0)$, and the superscript $(1)$ stands for class 1. In this case, we’ll get a decision boundary.

Next, we’re gonna do the same thing for the squares. Assign squares to a positive class and others to negative class, denote as $h_{\theta}^{(2)}(x)$ and get a new decision boundary, too.

Finally, we do the same thing for the third class and fit the third classfier as $h_{\theta}^{(3)}(x)$ and access the third decision boundary.

In summary, what we do is to fit three classifier

$$h_{\theta}^{(i)}(x)=P(y=i|x;\theta),i=1,2,3$$

Every classifier is trying to recognize the specific positive class according to their superscript $(i)$. On a new input $x$, to make a prediction, pick the class $i$ that maximizes $\max_{i}h_{\theta}^{(i)}(x)$.

We just basically pick the classifier, thinking whichever one of the three classifiers is most confident, and so the most enthusiastically says that it thinks it has the right clause. So whichever value of $i$ gives us the highest probability, we then predict $y$ to be that value.