Machine Learning Course Notes

Still at it. I am loving this course.

Today we go to logistic regression, which is a fancy term and means that it is used to predict binary outcomes.

Binary outcomes are super-risky evaluations because while math doesn’t like discrete data, humans love it. Think about medical evaluations: you’re either ‘sick’ or ‘not sick’ in your own mind, but according to mathematized science, you have a particular combination of abnormal scores on a blood test, etc. These combine to produce a binary evaluation, “sick”, but that’s only because we need to cross a decision boundary to take action (begin treatment).

Logistic regression tackles this in a few ways. First, it lets you set where you think your decision boundary is going to be, when evaluated against a series of inputs (blood cell count, let’s say) and set an overall threshold for the evaluation. Let’s say that you assign a certain number of points to each input: 50 points per 100 red blood cells, -20 points if you work out every week, + 10 points for every cigarette you smoke. Then we say, if this person has more than 750 points, we declare them sick.

Now this point system isn’t perfect, there will be people we should have labeled sick with 300 points and people who are actually fine at 1000 points. Logistic regression gets around this by imposing a non-linear cost for being wrong. When fitting the curve (and figuring out that 750 level), the algorithm is penalized more heavily for misses at 1000 points than at 500.

Error in logistic regression is ALWAYS non-zero.

Leave a Reply