I recently listened to an interview with Hal Varian, wherein he advocated a common view: the importance of statistical analysis is increasing and it isn’t likely to stop soon. Organizations and individuals should therefore concentrate on improving analytical capabilities to stay ahead of (on?) the curve.
I’m on board only with a very specific version of this view. Since routine tasks can be automated, the only part of a statistical analysis that is really ‘hard’ is the pre-analytical part: sourcing, verifying and scrubbing the data. Once you have a crisp and polished database, the conclusions often leap off the screen.
Hal has an interesting way of describing competitive advantage:
[Having a] scarce factor of production that is highly complementary to something that is ubiquitous and cheap.
It’s extracting something that we can recognize as data from the mess of everyday life that is the real skill.
Over to Europe, where the courts just destroyed a reliable dataset for auto insurers, who now can’t rate policies based on the gender of the driver.
Now, gender is an awesome data point: it’s easy to record, hard to fake and damn good at predicting outcomes.
The problem with it is, of course, that correlation and causation aren’t the same thing. Male-ness may be correlated with a bad driving record, but it isn’t causing it. Besides, within reason, once a guy, always a guy. Where’s the incentive for de-risking dudeness?
As I read it, the courts are implying that insurers should only rate on causation variables, because then you’re rewarding or punishing for actual changes in risk.
I can see logic in that.