Today’s Fad

Here’s a quote from HN in response to the question: what’s the big deal with Machine Learning?

There’s this enormous focus on ‘web scale’ technologies. This focus necessarily invokes visualizing and making sense of terabytes and eventually even petabytes of data; conventional approaches would take thousands or millions of man hours to accomplish the same level of analysis that computers can perform in hours or days.

I totally agree. I’ve joined a few technology meetup groups here in NY and so far I’ve had interesting reactions to my field of expertise. I basically say my job is predictive models, but on small-mid-sized datasets. Cue disappointment.

Everyone is focused on predictive models that crunch BIG DATA. I’m taking a course on ML but I don’t do BIG DATA.

There are two kinds of big, you see. You can have a list of a billion addresses, but that doesn’t really qualify as BIG big.  People can get their heads around what to do with a billion addresses: what regressions to run, what information can be reasonably gleaned from analyzing it.

BIG big is different. BIG refers to a HUGE number of parameters that might or might not be meaningful. Think about some problems that are common applications of ML:

For highly dimensional problems, such as text classification (i.e., spam detection) or image classification (i.e., facial detection), it’s almost impossible to hard code an algorithm to accomplish its goal without using machine learning. It’s much easier to use a binary spam/not spam or face/not face labeling system that, given the attributes of the example, can learn which attributes beget that specific label. In other words, it’s much easier for a learning system to determine what variables are important in the ultimate classification than trying to model the “true” function that gives rise to the labeling.

BIG means you don’t really know what you should do with the data. You kinda know what answer you want, but you can’t really hold a thousand or ten thousand different parameters in your head long enough to specify some kind of regression.

Now think about technology trends today. Computing power, bandwidth and memory capacity are all now cheap enough that computers can handle BIG better than humans can. THAT’s interesting.

In my professional life (sadly or happily, depending on how fired up I am to do ML), I don’t tend to get that many parameters.  Insurance is rated on a very few rock solid inputs and the rest is just sniffing out when someone is trying to screw you over.

But I’m intrigued, nonetheless. Woe to the ambitious one who doesn’t keep a tab on the cutting edge.

Here’s a link to yet another ML class.

Leave a Reply