Meet Your Insurance Broker: Google

Insurance is the most expensive keyword money can buy.

This is GEICO and Progressive showing us that insurance distribution is extremely valuable. Those local brokers’ salaries are now being paid to Google engineers.

Advertising is expensive, though, and you need scale for this business model to work.

US Auto liability works fine, but it is both the biggest and most standardized insurance market in the world. It is the exception that proves the rule that an insurance broker is an economically useful advocate.

How to Improve

This caught my eye this morning: “10 ways to improve your programming skills”

Since I’m learning how to program (has it been two months!?), and I want to get better at it, I should try to follow some of this advice. Here’s the list:

1. Learn a new programming language

Um. It’s all new!

2. Read a good, challenging programming book

Ok, bit advanced.

3. Join an open source project

Yeah, right.

4. Solve programming puzzles

Possible candidate here. Sounds like a lot of work, though.

5. Program

Got enough of that to do!

6. Read and study code

Ugh.. no time.

7. Hang out at programming sites and read blogs

No “hanging out”, but I read.

8. Write about coding

Hmm….

9. Learn low-level programming

Nope.

10. Don’t rush to StackOverflow. Think!

Meh.

-=-=-=-=-

So maybe writing about programming is the low-hanging fruit here.

Ok, so here’s what I did today at work.

There’s this company called AM Best who are an insurance-specialist rating agency, like S&P but much narrower in focus. I go there periodically for financial information on our clients and markets and for the industry in general.

Anyway, I noticed that there is a press release archive going back to 2000. The thought struck me that it would be neat to have a database of all these press releases to crunch and see if there are any patters in the rating actions taken on companies.

THEN it would be neat to link these rating actions to stock prices, to see whether the ratings actually, um, you know, work.

For instance: how good of a predictor are they of default? Is there an immutable ‘snowball effect’ where a rated entity just keeps getting downgraded until it fails or merges with someone else?

So this project has been bubbling around in my head for a few weeks and this morning I finally had enough spare time in which to implement it.

I’ve put together a scraping routine (busily ‘scraping’ as I type) that is pulling down all 10,000+ press releases and dropping them into a database.

I considered doing all the actual data mining today, too, but that’s going to take a bit too much time. I’m happy with just sitting on the data for now.

My next objectives:

1. Parse the text to figure out what the various categories of press releases are.

I know there are downgrades and upgrades of companies, but what about actions against subsidiaries only? What about debt ratings? Most of this crap is useless to me.

2. Figuring out a system for identifying companies that matter.

There is going to be a ton of mergers, defaults, spin-offs and goodness knows what going on that I’ll need to work out. That will be tricky.

3. Isolating the rating actions associated with corporates and building a more ordered database of actions over time.

This is the ‘real work’, obviously. How ironic that it’s going to by far be the easiest step once everything’s organized. Regular Expressions, baby. Cinch.

3. Figuring out how money has been made or lost in this process.

I want to link these names to stock symbols and see if there is any perceived contagion (by the market) and, even more importantly, whether there is any ACTUAL contagion. I suspect not.

The Devil You Don’t See

A few weeks ago I had lunch with a cat bond manager who was crowing about buying and selling a particular bond at a solid profit in a single day.

The bond was Mariah Re, which is at risk of being triggered by tornado losses this past quarter in the US. It hasn’t triggered yet, but it’s close.

My bond manager friend figured that, since Tornado Season is over, the fact that the bond was trading at a steep discount at the time meant it was a steal. And it was. He sold half of his position for a tidy little profit and figures he’ll ride the rest out at the great yield he’s secured.

He’s looking a bit less smart today.

He’s actually probably ok, but this made me once again appreciate the power of IBNR.

IBNR is a classic Rumsfeldian phenomenon:

[T]here are known knowns; there are things we know we know.
We also know there are known unknowns; that is to say we know there are some things we do not know.
But there are also unknown unknowns – the ones we don’t know we don’t know.

Claims happen with uncertain cost. Ok, that’s the set of unkowns.

Some claims get reported reasonably promptly, even if the ultimate cost is still uncertain. These are the known unknowns.

The unknown unknowns is the IBNR. It can be estimated, but you need lots of good data. Anyone who hasn’t looked at a lot of portfolios over a long period of time will struggle with this.

Even then, it’s a horribly difficult concept to keep in your mind. People appear to be hard-wired to think that things in the past are in the past and can’t hurt us in the future.

Not so.

An Anonymous Rant Against A Professional Writer

PC360 gives us this. It’s so sticky with jargon to be barely readable.

Let me summarize the (2,300 word) article:

Claims data can teach underwriters about where claims come from and expose new drivers of claims cost. Analyzing claims databases is a good way of testing new hypotheses but,  for organizational reasons, most companies aren’t great at this.

Yawn. Could have been written at any point in the last 300 years.

Next is a big discussion about how automated computer programs can correlate variables without the burden of actually ‘understanding’ the data.

[shields up! BS ALERT!]

My old man once spent some time learning about a stock picking technique which, to be perfectly honest, looked like garbage to me. But sometimes it worked!

I’d argue it’s complete luck. As they say, “even a broken watch is right twice a day”.

Narrative validation a powerful test for statistical conclusions: correlation is useless without a deep understanding for the causal mechanism. Unexplained, ‘dumb’ empirical relationships (describes all too much of medical research, imo) are too unreliable for me to back with cash.

If you don’t know how it works, how on earth do you know when it breaks?

Standardization Begets Innovation

This is starting to seem quite important to me.

For the weekend project, I’ve been delving deeper and deeper into Python and have come across a distinction between “1998 HTML” and “2003 to today HTML”.

In 1998 there was much less standardization for coding websites. The HTML was poorly written or written by programs that produced sloppy code. An interesting consequence of this is that many older websites are harder to analyze with scraping programs.

The code today is massively improved. Today’s websites benefit from standards that get updated with ‘best practices’, which can spur automation of all kinds of functions.

The upshot is that information is becoming much easier to find, analyze and publish. And no new technology, just old technology maturing.

And we’re just getting started. A lot of websites now have something called an “API“, which is just a web address you can point your computer to and fling requests for information at. The idea is that regular browser websites are great for people, but computers don’t need all that stupid formatting. They just want data.

Well, for some reason, lots of websites will have different content offered on their API from the website.

It’s bizarre, particularly because, with minimal-to-moderate effort, any industrious programmer can build a simple scraping routine and pull the data out of the ‘human’ interface. Why the hurdle? It’s just wasting time.

The hurdle’s there for cultural reasons having nothing to do with technology. These cultural blocks are preventing the sharing of information and so are preventing innovation.

And they’ll change, I think.

Much of my job is concerned with translating one system’s way of recording insurance information into another system’s ‘language’. There is no standardization for the way information is stored in insruance management systems. This makes for lots of people with jobs like mine, which are basically wasting time and money. My business isn’t about analyzing information, after all.

Standardizing data formatting is going to be the next dislocation in the economy. We already pity those poor suckers in the back office wrestling with legacy systems.

Eventually they will go the way of the typing pool.

Suck it, IT!

Here is Celent:

…the use of Python and other scripting languages have long been used to clean up and prepare data. In both cases the question arises – is this an IT job or not?

The problem with insurance data is that it is often inputted into a system that isn’t built to help insurers manage risk, but rather to pass financial audit.

Questions like: “did you measure you income, cash in, cash out and claims liabilities correctly” matter.

Questions like: “how much money can I lose in scenario x” don’t matter.

It’s bizarre to think that an industry entirely concerned with risk management doesn’t introduce systems to manage risk.

The core issue is related to the quote above: systems are an “IT thing”, not a core competency of an insurance professional/risk manager. They’re tantalizingly close, though, and getting closer.

I’ve found myself desperately building skills once reserved for IT people. They’re effing useful.

The rest of this post describes my latest project (to help me think through the process).

Insurance management systems are really accounting systems with fields added in to record some extra policy data. Typically, the only field audited thoroughly is the premium field. For one thing, it’s the easiest one to audit because you have an independent data source (actual cash received) to check it against.

So we get these listings which have very accurate premium transaction numbers (hopefully those data aren’t scrubbed using a DIFFERENT system) and try to answer these kinds of questions:

  • How big are the limits offered by this company for different covergaes?
  • What are the distributions of these limits?
  • What limits does this company offer to a single insured?
  • How many insureds does this company insure?
  • How many separate policies does this company write?
  • How many coverages does this company write per policy?
  • Can we link all the claims (separate database) to the policies?

A very important step is to establish what is an insured, what is a policy and what is a transaction. A mentor of mine drilled me with the mantra: “count everything once and ONLY once”.

So, one project I’m working on is to build a database analysis tool that fixes mistyped insured names.

The key concept is Levenshtein Distance.

The idea behind LD is to measure the number of edits a word would need to undergo to turn it into another word. Useful for weeding out garbage in search engine terms, which is its most common use.

In my case, because these insured names are sometimes made up words or strange spellings of words (names of people or businesses), I want to run a LD analysis against the listing itself and tell the program to ask me when it thinks it’s found a mistake.

So what I want is a routine that builds a dictionary of all the separate words in the listing and tells me which words would get wiped out to build a more ‘efficient’ policy listing name.

I need to make sure I know how the original listing was structure so I can put it back together again, of course.

So here is the process:

  1. Build a database of each unique word in the file
  2. Discard one letter words (‘a’ and such)
  3. Arrange the words in alphabetical order
  4. Depending on how the words are arranged, compare each word to the last.
  5. … this is as far as I’ve gotten.

 

For The Insurance Geeks In The Crowd

This is a deeply powerful point about the insurance business:

There are ten lines in that graph..

The straight blue line is the 1-1 line, which is the measurement of a year’s performance 1 year out. This is a pure fudge figure because the insurer doesn’t have enough information to measure the cost yet.

The fact that this line is at 1.00 is important. 1.00 means that the insurer expects to pay out 100% of its premium in claims. Nominal Revenue = Nominal Cost. 10 years of interest makes this possible.

As you look back at the year over time (1-2, 1-3, etc), the amplitude of the ‘wave’ increases. This happens because, over time, insurers gain information about how well that year is going and absorb the volatility in the relationship between revenue and costs.

Workers’ Compensation business is the most ‘long tail’ of insurance businesses. This means that the claims cost of comp policies take the longest to resolve.

In fact, insurers have very little idea for the ultimate cost when they write a comp policy. Workers’ comp is notorious for this and many, many insurance companies avoid it entirely because of this uncertainty.

The cycle is present in all insurance businesses, though. Once people figure out they’re losing money, they pull capacity and rates go up. The difference with comp is that there is more risk of finding out too late.

Think of that realization like a tsunami. When they’re out to sea, small waves look like big waves because very few have enough power to displace the entire vertical distance of water from the ocean floor all the way to the surface. Good years and bad years and company-killing years look pretty similar.

But once the sea floor shortens up and you hit the shore, you find out how much energy was in the sucker.

And with comp, those suckers can be big.

Sticks and Stones Can Break My Bones But Names Break the Bank

From the dept of the absurd:

judge granted a privacy injunction, meaning that English newspapers could not legally publish the name of a professional soccer player who allegedly had an affair, despite thousands of people who have reported the name on the social-networking site.

The reason here is, of course, because of legal liability. The UK has one of the most spectacularly aggressive libel laws in the world.

The intended consequence is, presumably, that people play together more nicely, but the unintended consequences are more interesting. Libel Tourism, for one.

Another is that, because libel is by definition printed, and most print is online, the Internet is making for some interesting legal theatre, as in the case above.

Score another massive point for Web 2.0. Because big firms are a big fat target, people sue them.

But that cost isn’t borne by the firms. Oh, no. It’s paid by the insurance companies who, in turn, raise rates for the entire media industry.

I was working on an account today and noticed that liability insurance for UK Media Firms is about 6x more expensive than the equivalent for Australia and Canada (no data for the US in my little trove).

Twitterers don’t pay liability insurance for libel cover because they can’t realistically be sued for libel.

This is obviously a margin at which power is being transferred away from Big Media and whatever coalition is blocking libel reform in the UK.

Yay for liberty.

Insurance is a Commodity: A Continuing Series

I was reminded today of how many insurance companies actually make a profit: Premium Financing.

The idea is that an insurer accepts some credit risk from the policyholder by not getting all the money up front but charges some insane interest rate on the loan (like, credit card interest insane).

The published rates are therefore cheap, generating much less money than would actually be needed to run the company. Insurance regulators can cheerfully pat themselves on the back, though, because rates are low.

Customers are obviously fine because, like with mobile phone plans, the deal appeals to their huge discount rates.

The trick, of course, is that because this is the only profit these companies ever make, they have to shield it from everyone, particularly reinsurers. Imagine a Joint Venture where the costs are split 50/50 but the revenue is split 60/40. Big problem.

And because most of the companies that work this way are small (for some reason), they are heavily reliant on reinsurance. So they desperately need this service (reinsurance), but simply cannot afford to pay full price for it.

These clients are the biggest pain in the ass. Honestly.