Data Science

The Netflix competition will probably go down as the event that gave birth to the Data Science Era. Like all iconic events there was absolutely nothing groundbreaking or new about it, it was just the firs time a few trends came together in a public way: large scale data, a public call for solutions, a prominent relatively recent startup disrupting an ‘evil empire’ kind of industry. And a bunch of money.

And the winner’s solution was never used:

If you followed the Prize competition, you might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later. This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.

To me it makes the whole thing an even better story as a cautionary tale in the differences between academic indulgence and commercial needs.

Perfect is often the enemy of good.

Quote Of The Day

Reproduced by hand from my dead tree copy of *What Technology Wants*:

Old world primates have full-color vision and an inferior sense of smell compared to their distant cousins the New World monkeys…

All, that is, except the howler monkey, which, in parallel to the Old World primates, has tricolor vision and a weak nose. The common ancestor to the howler and the Old World primates goes very far back, so howlers independently evolved tricolor vision. By examining the genes for full-color vision, biochemists discovered that both the howler and the Old World primates use receptors tuned to the same wavelengths, and they contain exactly the same amino acids in three key positions. Not only that, the diminished olfactory sense of both howler and apes was caused by the inhibition fo the same olfactory genes, turned off in the same order and in the same details.

Talk about complex interactions. Is it possible for humans to figure this stuff out? What happens when computers figure this sort of thing out for us?

Bonus, from the previous page:

Biologist Richard Dawkins estimates that “the eye has evolved independently between 40 and 60 times in the animal kingdom… There are only so many ways to make an eye, and life as we know it may well have found them all.

Today In Unanswerable Questions

What is the value of something?

Here’s Pete Warden with the best explanation for what Facebook gets in Instagram. Many commentators have gotten caught up in the comparison of valuations between the New York Times and Instagram (both $1bn). Don’t be fooled, this is deep stuff.

The best answer for what something’s worth is the amount of money you can make from buying it. To a pure trader, most everything is inherently valueless, they simply buy and sell assets to flip them. Economically, all they do is provide liquidity. Markets *really* work only when interested parties transact with heterogenous uses for the assets. One man’s trash is another man’s treasure.

To Facebook, Instagram is worth $1bn. To Facebook, the NYT is worth far less than $1bn because they would probably destroy quite a lot of value by buying it. To Microsoft or Yahoo! Instagram may well be worth a lot less than $1bn. Heck, Instagram may well not be worth $1bn to Facebook, either, but Facebook *thinks* it is at the moment.

And that’s as satisfying an explanation as you can get. Talking about an abstract “price” for something is nonsense.

Mash of Links

They made their own clothes and built their own tools and worked on tiny farms. Lots of economic activity has moved from the home to the stock market since 1812. Via MR.

Next, I quote Yglesias, whose blog I’m really enjoying:

One of the most pernicious misunderstandings out there is that the prosperity of the United States in the postwar years indicates that there’s some meaningful alternative strategy for economic growth that doesn’t involve increased education and human capital. This idea is driven by the sense that back in the proverbial day there were great middle-class job opportunities out there for people who hadn’t gone to college, and so maybe what we really need to do is bring that kind of economy back…

America was far and away the best-educated country in the world during the postwar years

Great graph at the link.

My other favorite new blog is Science-Based Medicine. Here’s an excellent fact-filled rant:

It has been a stunning triumph of marketing and propaganda that many people believe that treatments that are ¡§natural¡¨ are somehow magically safe and effective (an error in logic known as the naturalistic fallacy). There is now widespread belief that herbal remedies are not drugs or chemicals because they are natural.

The other major fallacy spread by the ¡§natural remedy¡¨ industry is that if a product has been used for a long time (hundreds or thousands of years), then it must also be safe and effective because it has stood the test of time (this fallacy is referred to as the argument from antiquity)…

This first came to world-wide attention in the 1990s when a group of Belgian women who were taking Chinese herbs as part of a weight loss regimen developed end-stage kidney failure. The syndrome became known as Chinese Herbs Nephropathy, and it was soon discovered that aristolochic acid was likely the culprit…

It is also interesting to consider how aristolochia came to be used to aid in the birthing process – one of its most popular uses and the source of its name, which means “noble birth” in Greek. As with the traditional use of many herbs, it appears to be based entirely on sympathetic magic – the belief that a plant will be useful for an indication based upon what the plant looks like. In this case the flower of many aristolochia species looks like a birthing womb. The rest is anecdote, placebo effect, and confirmation bias – but no science.

Lest we get too self-congratulatory, scientific medicine isn’t always so scientific either.

Cringely on Best Buy, a company doomed to die:

Shopping at Best Buy last Christmas was a joke. Best Buy corporate was upset people were using their smart phones to do price comparisons in the stores. Think about that: Best Buy was upset that their customers were too smart, that they actually used the sort of technology Best Buy purported to sell. Worst of all, Best Buy completely missed the simple point that their prices were too high.

Reverse Engineers

Leaky, a car insurance comparison website, ran into a problem:

The problem? In order to compare the insurance prices you’d pay with different providers, Leaky was scraping the data directly from the insurance companies’ websites. It sounds like Traff wasn’t entirely surprised by the letters (“We understood their objections and complied with them,” he says now), but he thought Leaky would have more time to fly under-the-radar while it figured out the best way to get its data. However, the high-profile launch made that impossible, and the site went offline after four days.

The solution?

Now Leaky is back, and it’s offering price comparisons based on a new data source — the regulatory filings that car insurance companies have to file with the government. Using those filings, the company has created a model that predicts, based on your personal details, how much each insurance provider will charge.

I presume he means the rate filings insurers give to regulators (I smell an actuary in there somewhere!). This is a fascinating project but I’m pretty pessimistic.

The web startup model, as I see it, is to build something geeks love, piggyback on the free advertising in the startup press and wait to get bought out by someone who has the platform to actually bring your product to the masses.

Leaky is offering no product, though. They’re offering replica pricing. Oh, but it’s so close to the real thing!

That means Leaky is no longer getting its prices directly from the providers, but Traff says the new model is making predictions that fall within 3 percent of the actual prices.

First lesson in stats: means mask the tails of the distribution. There’s plenty of wiggle room in 3% average deviation (if that’s what he means) to make this product completely useless.

Car insurance is not unlike car manufacturing. I remember reading an interview with Carlos Ghosn where he was lamenting that the only way to make money is to have huge scale in auto manufacturing and the only way to get that scale is to kill your margins.

Online platforms, like manufacturing plants, are a colossal capital outlay. As soon as it’s up insurers need to pour money into advertising to get people to the site. Sure you’re cutting out the broker, but you need to pay Google and network TV to get the word out and promise (cross our hearts) that your deals are actually cheaper.

And the real cheap deals only come occasionally as a carrier grasps for market share. Leaky can’t predict that from the rate filing.

So the only way to improve on the existing model is to compare real quotes from real insurers. Online players killed the broker a long time ago, they aren’t going to let him back in now.

Humanity’s Greatest Challenge

And I don’t say that lightly.

Why haven’t we cured cancer? The short answer is that the latest uncovered Rumsfeld Unknowns are pretty scary.

Unfortunately, paraphrase a later quote from Dr. Malcolm, cancer always finds a way. Well, pretty darned close to always, anyway. If I believed that cancer always finds a way, then I would also believe that cancer research and personalized therapy are futile endeavors. I do not believe that.

Even so, the reason we haven’t cured cancer yet is because we haven’t figured out how to overcome the power of evolution. Until we figure out a way to do that, we will continue to make only incremental progress.

The Fat Of The Land

Dane-o sent me this article on the possibility of a Student Debt bubble. I’m often wary of Zero Hedge: they are an odd mix of great analysis and sensationalist extrapolation.

The core of the story is that Jamie Dimon is pulling back hard on his student loan business. I’d divide my interest in this topic into a few areas: 1. If student loan defaults went bananas, should I be worried about systemic risk? 2. Are student loan defaults going bananas? 3. What does all this mean for higher education?

Ok, let’s start with 1. Is the student loan market systemically important?

First, how big is it? One estimate is 1 trillion (and that’s a high/overstated number), which is bigger than I’d expect. But my favorite view on the Great Recession is that even the housing collapse would have been a trivial shock if not for monetary tightening. And the US Mortgage market is about 10 trillion.

And you can walk away from your mortgage, wheres lenders have “broad powers” to seek repayment on student loans. Bankruptcy isn’t an option. That means that lenders aren’t going to feel much pressure to write the suckers down. No solvency risk means no systemic risk.

So why exit?

Probably because though they won’t sink the ship, they aren’t lifting the boat. Much is made of the large number of delinquent student loans. And 27% at 30 days past due is big. But in mortgages, at least, a lot of loans are ‘cured’ between 30 and 90 days, after which they’re considered in default. And anyway, you can’t kill a student loan with bankruptcy.

Following the chain of links leads us to an American Banker piece:

The CFPB recently began accepting student loan complaints on its website.

“I think there’s going to be a lot of emphasis and focus … in terms of what is deemed to be fair and what is over the line with collections and marketing,” Petrasic says, warning that “the challenge for the CFPB in this area is going to be trying to figure out how to set consumer protection standards without essentially eviscerating availability of the product.”

Outstanding student debt, including private and federal loans, has topped $1 trillion, surpassing previous estimates, the CFPB reported earlier this month.

More regulation, uncertain cash flows. Sounds like a plain old fashioned crappy business. Not much of a story.

What matters, to me, is the fact that the government owns most of this business and appears to be picking up share as private firms rush out. Higher education reform will be high on the agenda in the coming budgetary armageddon. Those costs have to come down somehow.

US the richest and most productive economy in history. And it is spending an all-time record *proportion* of that all-time mass of wealth on health care and education.

Education is not the most productive sector in the world. It’s just the fattest in the world.

Review of *Jiro Dreams of Sushi*

My family used to go on long vacations when I was a kid. One summer we found ourselves in a little town out West watching some kind of small time rodeo/outdoor fair. We ate lunch watching a country band.

Culturally, we weren’t normally into that kind of thing and I remember asking my mom what the deal was. Her response: “They’re live and good at what they do. That’s always interesting to see.”

That about sums up what’s great about this movie. Jiro is good at what he does, maybe the best in the world. His restaurant is a case study in narrow but deep achievement. Only Sushi? No bathroom? 10 seats? Underground strip mall? Three Michelin Stars.

The secret is a workaholic monomania. Sushi is raw fish, rice and wasabi. Simple enough. You’ll be surprised, perhaps, to learn that Jiro’s apprentices progress on a geologic time scale: months of squeezing towels to start, 10 years before you’re allowed to cook the eggs.

Tyler Cowen sees some kind of employer cartel among Sushi Chefs: extorting labour from skilled apprentices for years with the promise of trade secrets at the end.

I prefer the explanation given in the opening sequence. There are no secrets, says Jiro’s son and heir Yoshikazu, being a great sushi chef simply requires enough tolerance for mind-numbing routine to never, ever lose focus. Squeezing towels? The apprentices might as well just train by staring contest.

Deep skills can be learned through vaguely related, trivial tasks, as we know:


Jiro isn’t rushed. Like all masters of craft, he has been doing it forever (75 years) and feels he’s learning all the time. He’s released many apprentices to open their own restaurants, even his younger son (there’s only room for one dauphin). Anyway, the greatest masters never stop being apprentices, Jiro included. That lesson, too, takes time to learn.

The length of this apprenticeship stands in meta irony to the production of the film itself. The director, David Gelb, is only 28. I’m no cinophile so this comment sits a bit awkwardly in my mind, but I really enjoyed the camerawork in the film. The NYT reminds me of a particularly delightful scene, and it seems our young gun got some help:

Toward the end of the documentary, which comes out in New York on March 9, there is a spellbinding “concerto of sushi” in which nearly every item on the omakase menu at Sukiyabashi Jiro, Mr. Ono’s sliver of a restaurant, is captured in a loving close-up while Mozart soars on the soundtrack.

It looks simple enough — point a camera at the fish and start rolling — but it wasn’t.

Mr. Ono’s standards are so obsessively high that he wanted to shoot each handmade masterwork at “the supreme moment of deliciousness,” Mr. Gelb said, which happened to be the precise instant of its creation.

“It was important to Jiro that the sushi looked the way it was supposed to,” Mr. Gelb said.

This went beyond the fresh glistening hue of the fish. In the chef’s eyes, the scene had to incorporate even the gentle settling and merging of the fish and rice and sauce as each piece was placed onto a plate.

Take too long, Mr. Gelb said, and “we would’ve lost the soft landing, because it had already landed.”

The film touches on some of those lessons that are simple but hold enduring appeal. #1: the greatest are incalculably greater at their craft than the rest.

#2: But they don’t get that way by magic. There is not a single shot of Jiro’s home or life outside of the restaurant. And that is probably because, not being sushi, they don’t matter much to him.

For The Finance/Econ Geeks

via Tyler Cowen we are sent to FT Alphaville for a some serious brain damage. I have training in both bond markets and monetary macroeconomics but have no real practical experience in either, so I know only enough to be dangerous here. Beware.

With that, let’s of course talk about how bond markets affect monetary policy.

Banks borrow money. In the olden days, they’d borrow it from you and me (deposits) and lend it out for profit. Sometimes the banks would make enough bad loans that borrowers would all simultaneously freak out and try to withdraw all their money. Loans can’t be called that fast. Bank run… Bust.

A big part of Bernanke’s academic legacy is describing the nasty things that happen when an economy is starved of credit after an epidemic of bank runs.

Deposit insurance stops this problem because the olden days creditors (us) never have to worry about getting their money back.  But these days there’s a new channel for bank runs: shadow banking.

Here’s how a shadow bank works: you have a treasury bond or some highly rated corporate debt but you WANT cash. So you borrow money overnight and pledge that bond as collateral then go make some money with that cash. That’s shadow banking.

The point of the FT post is that shadow banking is huge and absolutely TANKED during the crisis. There’s lots of interesting discussion in that piece on this point. Here’s a graph (there are may more)

This had the effect of sucking money out of the economy, perhaps not quite like what happened in the Great Depression but certainly more than the experts thought they saw in 08-09.

I really liked this quote:

6) How does this understanding of the crisis jive with Gary Gorton’s theory of what happened? Recall that Gorton’s story wasn’t just about collateral values plunging and haircuts rising in repo markets. It was that certain types of debt used as collateral flipped from being information-insensitive to information-sensitive. And this flip wasn’t limited to subprime MBS, which would have caused only minor stresses.

His argument wasn’t about the relative safety of the collateral as its value fell. It was about the collateral going from safe to unsafe just because lenders in repo markets realised that it could potentially fall. So they pulled out en masse and hiked haircuts even when the collateral wasn’t subprime-related, and voila you’ve got a crisis. This was partly an issue of repo market transparency, but it also reflects a different, more binary understanding of what triggered the crisis.

Summary: on a one-day time horizon, highly rated corporate paper wasn’t risky until it was. And if that stuff was used for repo loans then the repo system stops working.

What nobody really understands, even now, is how important the repo system is. Even though Scott Sumner thinks it doesn’t matter.