data – Page 4 – Not Unreasonable

Do Goaltenders Age Better?

Posted on June 4, 2011June 4, 2011 by David Wright

I bet the average age of ice hockey goaltenders is the highest of any position in any sport outside of NFL field goal kicker. And if you adjust for some kind of ‘importance measure’ (say, share of team salary), they surely blow everyone else away.

Now, they don’t break ‘oldest athlete’ records and so have a smaller absolute range of ages than other sports, but the best seem (to me) to hit the performance wall much later.

I put together a spreadsheet of the ages of playoff goaltenders. Average age?

29 1/4 years old!

Median is a year younger.

Wow. Surely the highest.

Data is here. I did ages as days /365. Not perfectly accurate but I’m hardly about to start effing around with leap years.

I’m watching the game forchrissakes!

Suck it, IT!

Posted on June 2, 2011 by David Wright

Here is Celent:

…the use of Python and other scripting languages have long been used to clean up and prepare data. In both cases the question arises – is this an IT job or not?

The problem with insurance data is that it is often inputted into a system that isn’t built to help insurers manage risk, but rather to pass financial audit.

Questions like: “did you measure you income, cash in, cash out and claims liabilities correctly” matter.

Questions like: “how much money can I lose in scenario x” don’t matter.

It’s bizarre to think that an industry entirely concerned with risk management doesn’t introduce systems to manage risk.

The core issue is related to the quote above: systems are an “IT thing”, not a core competency of an insurance professional/risk manager. They’re tantalizingly close, though, and getting closer.

I’ve found myself desperately building skills once reserved for IT people. They’re effing useful.

The rest of this post describes my latest project (to help me think through the process).

Insurance management systems are really accounting systems with fields added in to record some extra policy data. Typically, the only field audited thoroughly is the premium field. For one thing, it’s the easiest one to audit because you have an independent data source (actual cash received) to check it against.

So we get these listings which have very accurate premium transaction numbers (hopefully those data aren’t scrubbed using a DIFFERENT system) and try to answer these kinds of questions:

How big are the limits offered by this company for different covergaes?
What are the distributions of these limits?
What limits does this company offer to a single insured?
How many insureds does this company insure?
How many separate policies does this company write?
How many coverages does this company write per policy?
Can we link all the claims (separate database) to the policies?

A very important step is to establish what is an insured, what is a policy and what is a transaction. A mentor of mine drilled me with the mantra: “count everything once and ONLY once”.

So, one project I’m working on is to build a database analysis tool that fixes mistyped insured names.

The key concept is Levenshtein Distance.

The idea behind LD is to measure the number of edits a word would need to undergo to turn it into another word. Useful for weeding out garbage in search engine terms, which is its most common use.

In my case, because these insured names are sometimes made up words or strange spellings of words (names of people or businesses), I want to run a LD analysis against the listing itself and tell the program to ask me when it thinks it’s found a mistake.

So what I want is a routine that builds a dictionary of all the separate words in the listing and tells me which words would get wiped out to build a more ‘efficient’ policy listing name.

I need to make sure I know how the original listing was structure so I can put it back together again, of course.

So here is the process:

Build a database of each unique word in the file
Discard one letter words (‘a’ and such)
Arrange the words in alphabetical order
Depending on how the words are arranged, compare each word to the last.
… this is as far as I’ve gotten.

Short Soccer Players

Posted on May 28, 2011May 29, 2011 by David Wright

Enjoyed watching the champions league final this afternoon. Yay Barca.

One unoriginal observation I made was that the Barca players were shorter than the Man U players. A bit of googling shows that this question has had a healthy examination before.

So I figured I’d calculate, in a back-of-the-envelope way, what the average heights were of the teams:

Man U Mean: 1.79

Man U Median: 1.76

Barcelona Mean: 1.75

Barcelona Median: 1.73

I haven’t adjusted these figures for round-number bias as per the Freakonomics link and I’ve used the wikipedia ‘starting lineup’, which has a different number of players for each team. And I’ve excluded Keepers. Not really soccer players, anyway.

That said, I’m not surprised at the result and it jives with the observation of the Spanish team being short, too.

Funny, though, isn’t it.

Sales Kills?

Posted on May 25, 2011May 25, 2011 by David Wright

Robin Hanson has been blogging Ken Lee’s PhD dissertation and saved the best for last: Jobs Kill.

The big result: death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income combined!

He presents this table:

Two comments on the chart. As I understand it, the higher the factor above the more a job characteristic contributes to death. So, “Overall:Physical Demands”, at 1.699, is a big killer. Also, more stars means a higher statistical significance.

Ok, so I want to talk about “Context: Socially Challenging”. Here’s Ken Lee (this link may some day break) describing this factor a bit:

Work Context: Socially Challenging… has such attributes [such] as as impact of decisions on others, frequency of conflict situations, stress tolerance, and dealing with physically aggressive or angry people.

One thing Robin has taught me (though this perhaps isn’t his insight) is that intelligence evolved to deal with the social complexity in our society. This means that jobs that are socially challenging are jobs that tax the human mind more than any other job in the world.

The most socially complex jobs, in my opinion, are sales jobs. Remember, the best salespeople are those that are best at two things: one-on-one persuasion and accepting rejection, two incredibly socially stressful activities.

If my take away here is that salespeople have high mortality, then I completely buy it. It’s brutal work.

It would be cool to correlate these ‘death factors’ to wage and employee turnover in the occupations. I bet wage is related to status and turnover will be highest in low-status, high-danger jobs.

For The Insurance Geeks In The Crowd

Posted on May 24, 2011 by David Wright

This is a deeply powerful point about the insurance business:

There are ten lines in that graph..

The straight blue line is the 1-1 line, which is the measurement of a year’s performance 1 year out. This is a pure fudge figure because the insurer doesn’t have enough information to measure the cost yet.

The fact that this line is at 1.00 is important. 1.00 means that the insurer expects to pay out 100% of its premium in claims. Nominal Revenue = Nominal Cost. 10 years of interest makes this possible.

As you look back at the year over time (1-2, 1-3, etc), the amplitude of the ‘wave’ increases. This happens because, over time, insurers gain information about how well that year is going and absorb the volatility in the relationship between revenue and costs.

Workers’ Compensation business is the most ‘long tail’ of insurance businesses. This means that the claims cost of comp policies take the longest to resolve.

In fact, insurers have very little idea for the ultimate cost when they write a comp policy. Workers’ comp is notorious for this and many, many insurance companies avoid it entirely because of this uncertainty.

The cycle is present in all insurance businesses, though. Once people figure out they’re losing money, they pull capacity and rates go up. The difference with comp is that there is more risk of finding out too late.

Think of that realization like a tsunami. When they’re out to sea, small waves look like big waves because very few have enough power to displace the entire vertical distance of water from the ocean floor all the way to the surface. Good years and bad years and company-killing years look pretty similar.

But once the sea floor shortens up and you hit the shore, you find out how much energy was in the sucker.

And with comp, those suckers can be big.

Sticks and Stones Can Break My Bones But Names Break the Bank

Posted on May 23, 2011May 23, 2011 by David Wright

From the dept of the absurd:

a judge granted a privacy injunction, meaning that English newspapers could not legally publish the name of a professional soccer player who allegedly had an affair, despite thousands of people who have reported the name on the social-networking site.

The reason here is, of course, because of legal liability. The UK has one of the most spectacularly aggressive libel laws in the world.

The intended consequence is, presumably, that people play together more nicely, but the unintended consequences are more interesting. Libel Tourism, for one.

Another is that, because libel is by definition printed, and most print is online, the Internet is making for some interesting legal theatre, as in the case above.

Score another massive point for Web 2.0. Because big firms are a big fat target, people sue them.

But that cost isn’t borne by the firms. Oh, no. It’s paid by the insurance companies who, in turn, raise rates for the entire media industry.

I was working on an account today and noticed that liability insurance for UK Media Firms is about 6x more expensive than the equivalent for Australia and Canada (no data for the US in my little trove).

Twitterers don’t pay liability insurance for libel cover because they can’t realistically be sued for libel.

This is obviously a margin at which power is being transferred away from Big Media and whatever coalition is blocking libel reform in the UK.

Yay for liberty.

These Days in Big Ideas

Posted on May 20, 2011 by David Wright

Here’s a book I hope I take the time to read when it comes out: “The Changing Body: Health, Nutrition, and Human Development in the Western World Since 1700″, reviewed here in the NYT.

The book is an investigation into how and why people have gotten bigger over the last few hundred years. It mostly boils down to nutrition and health, of course, which is, in turn, related to technological improvements. Yawn, perhaps.

But I like these books because I like reading about how people put data together to prove big ideas. Not easy to do.

Demographics – One of my Favs

Posted on May 11, 2011May 12, 2011 by David Wright

This is my party, so I can tickle my bias if I want to.

Here’s a cool paper via Tyler Cowen: Continue reading Demographics – One of my Favs

Catastrophes and Reinsurance Stocks

Posted on May 7, 2011May 7, 2011 by David Wright

I’ve put together some analysis for work that never got used, so I figured I’d throw it up here.

First, there’s a strange conundrum that confuses even the most enlightened financial observers. Here’s a quote from David Merkel’s blog: Continue reading Catastrophes and Reinsurance Stocks

Take Your Consensus and Shove it

Posted on April 30, 2011April 30, 2011 by David Wright

Arnold discusses how analysis of aggregate data yields a lot of BS: Continue reading Take Your Consensus and Shove it