All About Knowledge: February 2016

Wednesday, 3 February 2016

Generative vs. discriminative

Generative means "based on

P (x, y)

" and discriminative means "based on

P (y | x)

," but I'm confused on several points:

Wikipedia (+ many other hits on the web) classify things like SVMs and decision trees as being discriminative. But these don't even have probabilistic interpretations. What does discriminative mean here? Has discriminative just come to mean anything that isn't generative?
Naive Bayes (NB) is generative because it captures $P (x | y)$ and $P (y)$ , and thus you have $P (x, y)$ (as well as $P (y | x)$ ). Isn't it trivial to make, say, logistic regression (the poster boy of discriminative models) "generative" by simply computing $P (x)$ in a similar fashion (same independence assumption as NB, such that $P (x) = P (x_{0}) P (x_{1}) . . . P (x_{d})$ , where the MLE for $P (x_{i})$ are just frequencies)?

The fundamental difference between discriminative models and generative models is:

Discriminative models learn the (hard or soft) boundary between classes
Generative models model the distribution of individual classes
The generative model would allow you to evaluate the likelihood of new pairs (x,y). The discriminative model allows you to predict the likelihood of different values of y given a value of x.

The generative model also has MORE to learn, since (in theory), you can always marginalize out y(summing over y) to get p(x), and then dividng the generative probability by that, you have p(y | x), the discriminative model.

Why not approach classification through linear regression?

Classification and Logistic Regression

As Andrew Ng explains it, with linear regression you fit a polynomial through the data - say, like on the example below we're fitting a straight line through {tumor size, tumor type} sample set:

Above, malignant tumors get

1

and non-malignant ones get

0

, and the green line is our hypothesis

h (x)

. To make predictions we may say that for any given tumor size

x

, if

h (x)

gets bigger than

0.5

we predict malignant tumor, otherwise we predict benign.

Looks like this way we could correctly predict every single training set sample, but now let's change the task a bit.

Intuitively it's clear that all tumors larger certain threshold are malignant. So let's add another sample with a huge tumor size, and run linear regression again:

Now our

h (x) > 0.5 \to m a l i g n a n t

doesn't work anymore. To keep making correct predictions we need to change it to

h (x) > 0.2

or something - but that not how the algorithm should work.

We cannot change the hypothesis each time a new sample arrives. Instead, we should learn it off the training set data, and then (using the hypothesis we've learned) make correct predictions for the data we haven't seen before.

Hope this explains why linear regression is not the best fit for classification problems! Also, you might want to watch VI. Logistic Regression. Classification video on ml-class.org which explains the idea in more detail.