From Couch Potatoes to Neural Nets

How about we do no less than lay out the levels of modeling sophistication among humans.

Level 1: I don’t understand or care to understand how the world works. People magazine (ESPN for chicks) or ESPN (gossip rag for bros) and I’m done.

Level 2.The world is pretty simple. I don’t bother to even pretend to hide political or cognitive biases. Whatever those are.

Level 3. The world is complicated. If you ignore nuance you lose. Our understanding needs to reflect that complexity so let’s build models that are complicated.

Level 4. The world is complicated but humans can’t handle that. Let’s use heuristics to make good enough decisions but get a lot more done.

Level 5. Hey level 4, you’re more like level 2 than you’d care to admit. Heuristics are fine but use them wisely, know your biases, pass ideological Turing tests. Succumb to your flaws, but consciously.

Balancing the need for complexity with a limited ability to understand it is the biggest challenge in my professional life. And at some frontiers of human knowledge this is the dominant problem (social systems like the business world are different). Some hope that the machines will save us. Computers’ superiority to us in chess is the beginning, they say.

Could be I suppose, but we’re a long way away. Take neural networks, a very sexy topic of late.These are machine learning techniques modeled on the human brain, the killer innovation being that the analysis is done in nodes that all interact with each other.

The thing to take away from that is that wet don’t know what the network is doing to the data in between the input and output. We feed it data and tell it what the output should look like and repeat millions of times. Eventually the program settles in on the steps that minimize error against the output. Again, remember, we don’t know what the program does to the data inside the network. And one of the candidate theories (that the network transforms the data in meaningful ways in its journey through they net) has now been discredited.

This means that neural networks do not “unscramble” the data by mapping features to individual neurons in say the final layer. The information that the network extracts is just as much distributed across all of the neurons as it is localized in a single neuron.

And here’s another result

Since the very start of neural network research it has been assumed that networks had the power to generalize. That is, if you train a network to recognize a cat using a particular set of cat photos the network will, as long as it has been trained properly, have the ability to recognize a cat photo it hasn’t seen before.
Within this assumption has been the even more “obvious” assumption that if the network correctly classifies the photo of a cat as a cat then it will correctly classify a slightly perturbed version of the same photo as a cat. To create the slightly perturbed version you would simply modify each pixel value, and as long as the amount was small, then the cat photo would look exactly the same to a human – and presumably to a neural network.
However, this isn’t true.

So if they take a correctly recognized picture of a cat and change a few pixels the computer no longer recognizes it. Ouch. See the link for pairs of images that are indistinguishable to they human eye but baffling to the net.

One last quote because it is a bit hilariously overcomplicated:

One possible explanation is that this is another manifestation of the curse of dimensionality. As the dimension of a space increases it is well known that the volume of a hypersphere becomes increasingly concentrated at its surface. (The volume that is not near the surface drops exponentially with increasing dimension.) Given that the decision boundaries of a deep neural network are in a very high dimensional space it seems reasonable that most correctly classified examples are going to be close to the decision boundary – hence the ability to find a misclassified example close to the correct one, you simply have to work out the direction to the closest boundary.

I think what this means is that they’re fragile as predictive devices. The net is capable of generating incredible complexity in its treatment of the data with each node interacting with so many others. The term of art for this is overfitting: a high radio of predictive variables to training events. This means if you give it a new test it hasn’t seen before (ie if you actually use the damn thing) it will fail hopelessly.

I am also reminded of Google’s driverless car. When I was leaning machine learning the neural networks section featured this project prominently as an example of NN’s practical use. I’ve since learned that they abandoned this strategy ages ago and now literally program the routes and rules into the software. If it runs into a situation it doesn’t recognize or a route that hasn’t been programmed it stops and waits for a human to take over. Driverless, sure, I guess. But not smart.

I think it was Robin Hanson who wrote something a while ago that stuck in my head but I can’t find on his blog to cite. From memory he said that the more general the situation a system must adapt to the more similar systems look that adapt to it independently. Think about how eyes or wings evolved independently many times. Perhaps our brain, as an incredibly general device, really is as good as a system could possibly be at managing complexity? If we want better performance from another kind of mind we must necessarily then sacrifice something enormous from its capability.

Another way of saying this is that of course neural networks are crap. If they were any good we would be able to understand them!

Leave a Reply