David Wright

GPT-4 Fails Final Actuarial Exam

Posted on March 29, 2023March 31, 2023 by David Wright

I gave GPT-4¹ the 2019 version of CAS Exam 9. This is the last exam in the Casualty Actuarial Society progression to a fellowship (FCAS), the highest designation available to Property & Casualty actuaries. You can see GPT’s answers here, the grading rubric here and GPT’s grades summarized here. You can see a youtube video of me walking through some of this content here: https://youtu.be/dMvjku-4hUY.

Summary

GPT 4 failed pretty miserably² (19.75 / 52.5, about half the passing score of 38.5) but I think the score could improve with some better prompt engineering and the right plug-ins. This raises two questions:

Can GPT-4 get all the way to a pass?
If it passes, would that mean its capabilities match an FCAS?

I’d bet GPT-4 could pass some old exams now and with some clever hacking maybe it could get through this one, too. However, the exams are a moving target and they’ve been evolving away from GPT’s strengths for a generation. I think GPT will accelerate that evolution, strengthening the profession’s value proposition.

Let’s dive in!

Context

These exams are hard, people³. The pass rate on the 2019 Exam 9 was 56% (338/601) and I like to point out that the 601 candidates who sat for this exam had mostly passed almost a dozen other exams with *similar pass rates*. In a world of grade inflation and decadence, most actuaries I speak to agree the exams have gotten harder over time. These are among the ultimate standardized tests that push candidates on analytical depth, domain knowledge and technical skill under a stressful time constraint. For goodness sake grading the exams is a brutal exercise that itself demands all these skills!

My process was to use a pretty simple prompt, starting each question by saying “this is an actuarial exam question” then pasting the question. Each question got a fresh instance of GPT 4. All answers were generated between March 23rd and 27th, 2023. As I was going through the answers I realized that I made some mistakes in transcribing the questions and regenerated the answers⁴. For a couple questions GPT refused to do calculations so I experimented with more prompts like “you are an actuarial student” etc (see notes) but I was unable to get it to show me the calculations so I kept all the answers from the prompt above. I’m sure there are ways of engineering better prompts that would generate dramatically better answers for some questions. But even though there is certainly low hanging fruit available to improve GPT’s score, I also think its successes and failures contain clues to some hard boundaries on its performance.

Performance Analysis

For analysis I’ll focus on questions 1, 6, 7, 18 and 19.

Starting with problems 6 and 18, which were the two times GPT got full marks. These two problems were very well defined, fairly straightforward formulations of well known analytical models. These are the bread and butter of ‘easy’ exam questions since they don’t really challenge the candidate on comprehension. You have to memorize a technique and notice the problem requires it and work it without error. There are probably lots of examples online for these techniques and they are mostly worthless as analytical tools for a practicing actuary I think. If GPT embarrasses the profession into killing these questions, good riddance.

The next class of problem is one that GPT didn’t get correct but I think the gap to cross is small. Examples here are problems 1 and 7. In both these situations GPT got confused about some complicated modeling process that was presented in a weird way. It missed the “trick”. I think prompt engineering and some kind of plug-in⁵ that gives GPT a formal model structure to “plug and chug” and then leave it to interpret results will quickly improve its scores on these. Even humans don’t necessarily deeply understand the models we use and GPT interestingly often attempted to build a model from first principles when it didn’t recognize some obscure actuarial terminology. It really didn’t work. Which brings me to the last category.

I think the failure on 19 is most instructive of the limits of GPT. Here we have a toy model for calculating the capital requirements of an insurance company and GPT completely whiffed. This model is coherent in the sense that it captures underlying concepts of what a leverage model should cover but the figures and mathematical structure are ridiculous in their simplicity. It is recognizable only to someone who “speaks math” and understands what models are “trying to do”. Novel, ad hoc, toy models are tools of an effective actuary who is distilling a process to capture basic ideas and communicate with colleagues. Mostly these models don’t exist in a formal sense, aren’t studied on the internet and are reflections of the quality of judgment of a very good (or bad) actuary. Building and manipulating these models is creative analytical work at its best. I don’t see how GPT figures these out without practicing as an actuary for a decade.

Implications for CAS Exams?

I predict the CAS exams will (and should!) continue to get ‘harder’ in the coming years, partly in response to GPT, which frankly is exposing the weakest parts of the exam system for what it is: wasteful memorization. But partly this has been the direction of the exam committee for a generation already and I think this strategy has been deeply vindicated by GPT. Many of our cousin professions are asking deeper existential questions than actuaries need to. It’s hard to see what a lawyer does after we integrate even today’s version of GPT into the economy, much less whatever on earth we have in 2030. For actuaries there is a pretty clear path to continued relevance. My favorite parts of the exams were when a question made me examine my own knowledge and experience and integrate it with the syllabus material to produce something new, which aligns to the differentiating skill of the best the actuaries. GPT won’t do that for a long while.

So let’s keep focusing there!

Questions and Answers

See google doc

Notes

I gave GPT-4 the 2019 version of CAS Exam 9 which you can see here (including grading rubric).

I started a new prompt window for each numbered question. The only prompt I used was to say “this is an actuarial exam” then pasted the text below
For some answers (Q1), GPT refused to perform calculations. After noticing this I tried a variety of different prompts like: “answer this as an actuarial student” “answer this with a perfect answer and perform all calculations” but it never gave me any calculations. So I kept the answers below since they’re equivalent to the answers I saw when experimenting with different prompts
I noticed that GPT has some very favorite techniques for analyzing problems, liking VaR and Sharpe Ratio and pulled those in frequently. I think it lacks an ability to differentiate among more nuanced analytical techniques for highly adjacent problem spaces.
GPT performs much better in essay type questions that use words instead of analytical techniques to find numerical answers to problems
Question 4: There is an idiosyncratic definition of market price of risk in the syllabus which was misinterpreted by gpt. This is a bit of a trick the cas pulled to write an exam question I think
In question 5 the syllabus departs from investopedia’s analysis of CAPM vs APT. Why?
On Question 6 it got all the calculations right. I made an error and regenerated the response which was really really wacky.. then I discovered another error so regenerated again and got it right.. how do the papers capture this kind of variability?
In Q 7 “Tricks” in presenting problems really mess up its capabilities. Exams (and life!) are full of problem types that do not present themselves in a way that can be simulated on a test. Very complex interest rate term structures or scrambled problem information that requires a highly generalized model of the operation of a market totally mess it up.
In a sense it feels like to the extent our specified models are accurate, GPT can navigate problem spaces.
In Q8, GPT totally missed the shortcut formulas and did not understand the concept of an insurance book that renews
In Q9 the questions were pretty rote. The yield curve implications of prepayment risk were lost on gpt here which is kind of a layer deep on the standard terminology. I would bet that this could be improved with prompt engineering.
On question 14 we have a solid understanding of the relationship of irr and surplus and investment income even though the calculation mechanics are wonky. It can interpret results it seems, which is pretty neat.
On 15 there is a volatile mix of hallucination and real answers. Problems with the time value of money
On 16, there are some conceptual mistakes in figuring out timings of cash flows and more complex discounting than straight PV
For 17, it lacks an understanding of the cash flow timing of investment income and loss payments. There’s a kind of real world knowledge that actuaries have there that gpt doesn’t.
In 18, this question was right in the wheelhouse and GPT nailed it.
For 19, this was an ad hoc arbitrary function for leverage and GPT both misunderstood the point of the function and how to use it to calculate capital.

1. What is GPT-4? The best way to find out is to ask GPT! Go to chat.openai.com, register for free and ask GPT what it is and how to use it. I’m not kidding! At the time of writing GPT-4 is the latest model and only available to paying subscribers. For many ‘normal’ questions GPT-3.5 is about 95% as effective.

2. How did I grade this? The exam has an examiner’s report attached that has sample answers and a grading rubric for each question. I followed that. There were definitely judgment calls to make on many questions, usually where GPT did something unusual but reasonable and I awarded part marks. From my experience comparing my self-graded practice exams to the real thing I’ve always been a bit generous!

3.I myself never sat for Exams 8 or 9 though I’ve read the 9 syllabus a few times to implement some of the ideas at work and later did a whole podcast series on the text that will likely supplant a big portion of the material. See here!

4.In doing this I noticed something many others see: that GPT generates radically different answers if you regenerate the response. Very minor changes in content sometimes resulted in completely new problem solving strategies, sometimes doing weird stuff. GPT is an example of a model occasionally dismissively called a stochastic parrot. It repeats things it has ‘seen’ before. Since it has seen almost everything textual it needs to make choices and those choices can change with each click of “Regenerate Response”. In this exercise I kept the first valid choice so it is plausible that just by hitting regenerate response you could land on a better set of answers if you knew how to identify them in advance. But how might you know when to stop regenerating responses to a question you don’t know the answer to? Figure that out and you’ve a good strategy for improving GPT’s score!

5 . Plug-ins are a very important recent addition to GPT where it can access other models that allow it to use external software to take actions (book flights, hotels, etc), search the Internet or basically anything else. (I fear that this footnote will age very poorly!)

Death Spirals and Other Selection Problems with Amy Finkelstein

Posted on January 30, 2023 by David Wright

youtube: https://youtu.be/nvVlNSolE3s

podcast audio link: https://www.buzzsprout.com/126848/12145818

Why did I do this episode?

Selection problems are probably the most surprising and challenging feature of managing an insurance portfolio. I was frankly very surprised to see it get this kind of high profile treatment in a popular book! How could I not do a show on it!?

What did I learn?

I was very surprised at the degree to which we can actually have private information about our own risk in many areas.

What was my favorite part?

The stories about selection problems in the book give fantastic, vivid examples of this problem for learning about how difficult insurance problems really are.

Pricing Insurance Risk With Steve Mildenhall and John Major

Posted on January 7, 2023January 10, 2023 by David Wright

youtube: https://youtu.be/ZQHpMVH7d9s

episode page: https://www.buzzsprout.com/126848/11998732

Why did I do this show?

This is important work and needs to be celebrated and I honestly can’t imagine any other media property than the Not Unreasonable Podcast that could give it proper treatment. Steve and John have achieved something tremendous here in capturing the latest theory and practice of pricing insurance.

What did I learn?

Ho, boy, a great many things in my studying for this. During the show I got the chance to really solidify my understanding of the concept of allocating margin vs allocating capital. Like all brilliant ideas it’s absurdly simple once you get it. For bonus marks I’ll go with Steve explaining Freddy Debean’s remarkable result for how to break a TVAR tie.

What was my favorite part?

There has been some incredible work by incredible researchers that will be used by actuaries for generations to come and we deliberately tried to work in the names of as many of these giants as we could. Insurance, oddly, has too little of a sense of its own intellectual pantheon. It exists! We should better celebrate the intellectual giants that support our work.

Some papers referenced:

Freddy Delbean!
https://scholar.google.com/citations?user=mVF1X_UAAAAJ&hl=en&oi=ao

The paper is *Coherent Risk Measures*
https://link.springer.com/book/9788876423055

As distinct from *Coherent Measures of Risk* which is his top cited paper (and also a good one of course!)

clips:

Should I buy reinsurance? How much?
We used to tackle these questions by importing CAPM. No longer! From my interview with Steve Mildenhall and John Majorhttps://t.co/32ztAkrfaO pic.twitter.com/CREICEBLLP
— David Wright (@davecwright) January 11, 2023

https://www.linkedin.com/posts/david-wright-73661214_stephen-j-mildenhall-and-john-major-set-about-activity-7018206925800538112-FyXI

https://www.linkedin.com/posts/david-wright-73661214_what-is-reinsurance-worth-should-i-buy-it-activity-7018539536934674432-6PR_

What I Learned In 2022 (and the books I read!)

Posted on December 26, 2022December 29, 2022 by David Wright

Three things I learned in 2022:

Florida is not weird, insurance is just really hard to get right. For one, it is a community of migrants who don’t totally trust each other yet and, two, it’s massively exposed to hurricanes. Natural disasters put a lot of pressure on the social fabric of any society and Florida’s social fabric is pretty thin! See my episodes with Gary Mormino and Dave DeMott on Florida. For 2023: A hypothesis here is that all disaster prone regions/societies will have dysfunctional insurance industries. Is that true?

Insurance is an expression of our value system. If you’re an insurer in Florida, for example, you have to be very careful to choose which subculture you want to insure as each will deploy a different set of values in their relationship with an insurer. I wrote an essay describing how it is the job of the insurance underwriter to use moral judgment to assess an insured’s values. See episodes with Howard Kunreuther, Joe Edelman and Jen Brady for discussions of values and insurance and this clip with Stan McChrystal nails it!

Your social theory only works if you can sell insurance with it. Joe Henrich taught me that we don’t pick our culture, we copy the culture of the prestigious. A lot of people have a problem with a big implication of this: it takes generations for a culture to change. If our treatment of risk is dictated by our culture and culture changes slowly, does that imply that we can’t do anything about newly identified risks? It does! My conversation with Joe Edelman explores some possibilities of another way and I’ve listened to a ton of interviews by Jim Rutt and Jordan Peterson studying this. It feels like something will be discovered here but not yet. Today, political compulsion is the only way to influence behavior reliably and no other theory of motivation is worth a damn. For 2023: What other theories can I test?

My top episodes for this in terms of downloads were:

Someone once asked me to list out the materials I use to prepare for my podcast. This year I studied about 40 books (some are re-reads, very few cover to cover), probably twice that many academic papers, probably 10x that many podcasts and interviews and a huge amount of time just thinking about what on earth I believe about all this stuff. Below are the books on my shelf and in my kindle this year (plus one library book I took back!).

Aspiration by Agnes Callard
Democracy and Decision by Brennan and Lomasky
Nudge by Thaler and Sunstein
Moral Economies of Corruption by Steven Pierce
Emerging Perspectives on Judgment and Decision Making
Thinking, Fast and Slow by Kahneman
Team of Teams by McChrystal
Leaders by McChrystal
Risk a User’s Guide by McChrystal
My Share of the Task by McChrystal
Bureaucracy by James Wilson
Insurance and Behavioral Economics by Kunreuther
Mastering Catastrophic Risk By Kunreuther
The Complacent Class by Tyler Cowen
Stubborn Attachments by Tyler Cowen
Talent by Tyler Cowen
Creative Destruction by Tyler Cowen
What Price Fame by Tyler Cowen
Big Business by Tyler Cowen
The Art of Not Being Governed By James Scott
The Moral Economy of the Peasant by James Scott
The Sociology of Philosophies by Randall Collins
Why we Fight by Chris Blattman
Diminished Democracy by Skocpol
Human Agency and Language by Charles Taylor
The Righteous Mind by Jonathan Haidt
Self Efficacy by Bandura
Self Determination Theory by Ryan and Deci
Pricing Insurance Risk by Mildenhall and Major
Utilitarianism and Beyond by Amartya Sen
The Secret of Our Success by Joe Henrich
Seeing Like a State by James Scott
The Limits of Organization by Ken Arrow
Modern Warfare by Roger Trinquier
Risk Society by Ulrich Beck
The idea of a social society by Peter Winch
Why Democracies Need Science by Collins and Evans
Artifictional Intelligence by Harry Collins
The WEIRDest people in the world by Joe Henrich
The moral foundations of politics by Ian Shapiro
Rethinking Expertise by Collins and Evans
Land of Sunshine, State of Dreams by Gary Mormino
Dreams of a New Century by Gary Mormino
Age of Em by Robin Hanson
Elephant in the Brain by Robin Hanson

Addendum: I pulled my notes docs from a few of my interviews that listed out academic papers I took notes on. The two sources are the citations from the books above and me applying Cowen’s second law. Mostly these are from deep dives into literatures on Meaning and Motivation. I haven’t captured quite everything here because I don’t always take notes on all the papers. Google Scholar is a ridiculous treasure trove but not all literature is good!

Understanding a Primitive Society – Peter Winch
The Rapacious Hardscrapple Frontier – Robin Hanson
Testing the Automation Revolution Hypothesis – Keller Scholl, Robin Hanson
Post-conflict Recovery in Africa: The Micro Level – Blattman
Civil War – Blattman
Gang Rule – Blattman
ENGINEERING INFORMAL INSTITUTIONS – Blattman
Forscher, P. S., Lai, C. K., Axt, J. R., Ebersole, C. R., Herman, M., Devine, P. G., & Nosek, B. A. (2019). A meta-analysis of procedures to change implicit measures. Journal of Personality and Social Psychology,
Consensus-based guidance for conducting and reporting multi-analyst studies
Mapping the moral domain. -Nosek
Understanding and Using the Implicit Association Test: I. An Improved Scoring Algorithm – Nosek, Greenwald, Banaji
Michie, S., van Stralen, M.M. & West, R. The behaviour change wheel: A new method for characterising and designing behaviour change interventions
EVERYONE DESIRES THE GOOD: SOCRATES’ PROTREPTIC THEORY OF DESIRE – Agnes Callard
Capability and Well Being – Amartya Sen
Equality of What – Amartya Sen
Rights and Agency – Amartya Sen
Rational Fools – Amartya Sen
Putting together morality and well-being – Rachel Chang
What is Human Agency – Charles Taylor
Utilitarianism and Beyond – Amartya Sen and Bernard Williams
The Centered Self – David Velleman
Systematic review of meaning in life instruments – Monika Branstatter
Beyond the Search for Meaning: A Contemporary Science of the Experience of Meaning in Life – King, Heintzelman, Ward
Three Forms of Meaning and the Management of Complexity – Jordan Peterson
Meaning and Belonging in a Charismatic Congregation: An Investigation into Sources of Neo-Pentecostal Success – Douglas B. McGaw
Finding” meaning” in psychology: a lay theories approach to self-regulation, social perception, and social development – Molden, D. C., & Dweck, C. S.
Life is Pretty Meaningful – Heintzelman
The three meanings of meaning in life: Distinguishing coherence, purpose, and significance – Frank Martela & Michael F. Steger
Routines and Meaning in Life – Heintzelman & King
Encounters with objective coherence and the experience of meaning in life – Samantha J. Heintzelman, Jason Trent and Laura A. King
(The Feeling of) Meaning-as-Information – Heintzelman & King
Motivation to learn: an overview of contemporary theories – David A Cook & Anthony R Artino Jr
Motivating the academic mind: High-level construal of academic goals enhances goal meaningfulness, motivation, and self concordance -William E. Davis, Nicholas J. Kelley, Jinhyung Kim, David Tang, Joshua A. Hicks
Motivation for accepting parent values – Ariel Knafo, Avi Assor
Grit: Perseverance and Passion for Long-Term Goals – Duckworth, Angela L., et al
Grit, basic needs satisfaction, and subjective well-being -Jin, B., & Kim, J.
Facilitating Internalization: The Self-Determination Theory Perspective – Edward L. Deci, Haleh Eghrari, Brian C. Patrick, Dean R. Leone
Regulatory fit: A meta-analytic synthesis – Tamar Avnet
Happy Soldiers are Highest Performers – PB Lester

Mark Friedlander on Florida’s Insurance Overhaul

Posted on December 20, 2022 by David Wright

youtube: https://youtu.be/8Bfkrbiazxw

episode page: https://www.buzzsprout.com/126848/11907447

Why did I do this show?

Florida has passed legislation and I wanted an excuse to dig through why it’s so important for insurance!

What did I learn?

That some reinsurers still aren’t happy!

What was my favorite part?

Speculating on what the eventual political consequences to insurance

Gary Mormino on the Social History of Forida

Posted on December 15, 2022December 24, 2022 by David Wright

youtube: https://youtu.be/WT0iS-sDa54

episode page: https://www.buzzsprout.com/126848/11848870

Do you think Florida is weird? Most everyone does. Why? Gary is the man to answer this question. Gary is Professor Emeritus of the University of South Florida and has dedicated his career to studying the social history of Florida.

Here is Gary on wikipedia
Here is Gary on Amazon

Quote of the show: “Do crazy people immigrate to Florida or do perfectly normal people come here, and then be a little goofy and go crazy.”

What is the most unusual social characteristic of Florida? 0:00
What are some of the most distinctive features of Florida? 9:37
Florida’s “Florida Man” reputation. 15:49
California and Florida are neck and neck in population density growth in last 100 years. 24:51
Florida is running out of options for reinsuring barrier islands. 35:55
What it costs to live on the coast in Florida. 40:18
How is Florida a Ponzi State? 42:28
What’s the real alternative? 46:55
What are the similarities and differences between Florida and other states in terms of immigration? 53:49
How the Cuban vote has been a solid republican vote since 1961

Why did I do this show?

Gary is ridiculously underrated. Florida is a massive deal and its reputation for weirdness screams out for explanation. There is an insurance question here of course but come on, why don’t more people know Gary Mormino!

What did I learn?

As I expected Florida can probably be explained using a kind of factor analysis of its sociological parts. It is old, its immigrant politics are relatively dominated by communist refugees so skews right, it is “California on the cheap”. We can dig into each of these things and understand that they are all perfectly normal but their combination yields a fascinating place.

What was my favorite part?

This section is sometimes hard for me to write but I’ve got an embarrassment of riches in this episode. The winner HAS to be Gary describing the giant senior community that launches presidential campaigns where everyone drives golf carts instead of cars.

Dave DeMott’s Stories About Florida Insurance

Posted on December 9, 2022December 24, 2022 by David Wright

episode: https://www.buzzsprout.com/126848/11840226-dave-demott-s-stories-about-florida-insurance

youtube: https://youtu.be/G7iGuLLZNwk

Dave DeMott is President-Elect of The Florida Surplus Lines Association, Chair of the Legislative committee and sits on the national Wholesale & Specialty Insurance Association committee.
Most importantly for today, Dave DeMott is a real, legit, on-the-ground insurance practitioner in Florida. He gets into the real details and war stories about insurance claims in Florida.

Dave’s introduction to the Florida insurance market. 0:00
What is a lodestar fee multiplier? 5:23
The problem with AOB 12:29
What work is being done to curb predatory behavior by carriers? 19:03
Water damage and leaky roofs 23:15
What’s the distinct about Florida? 29:28
Lobbyists have their golden opportunity.

Why did I do this show?

I have been studying the Florida insurance market and Florida culture generally trying to figure out why Florida is weird. Dave DeMott is a real on-the-ground practitioner with real war stories and we got some in this episode!

What did I learn?

I learned what a one-way attorney’s fee was! If the attorney settles for $1 more in claim payment than the carrier offered, they get their entire fee paid for plus a multiplier. Holy cow!

What was my favorite part?

Here’s my favorite quote: “The very first time you hear about the loss, you get the you get the notice of loss to your claims team. There’s a notice of loss written by the insurance agent, there’s an assigned AOB, a public adjusters contract and a notice of intent to sue all together at once at first loss.” Insurers get sued at the very moment they learn of the claim!!

clips: https://www.linkedin.com/posts/david-wright-73661214_next-week-the-florida-legislature-sits-for-activity-7007081290201997313-yZYr

Joe Edelman on Designing Meaningful Things

Posted on November 19, 2022December 24, 2022 by David Wright

youtube: https://youtu.be/Sjennrn5LNA

episode page: https://www.buzzsprout.com/126848/11585772

Mark Friedlander on Problems with Insurance in Florida

Posted on October 27, 2022 by David Wright

youtube: https://youtu.be/g63n9Kgq4CY

episode page:https://www.buzzsprout.com/126848/11582094

Why did I do this show?

I’m trying to learn about what the problem is with Florida insurance!

What did I learn?

From Mark I learned some details about fraud rings and how the problem really extends beyond property insurance.

What was my favorite part?

Learning about how property developers are dying to get back into Florida!

Joe Petrelli on Trouble with Insurance in Florida

Posted on October 22, 2022 by David Wright

episode link: https://www.buzzsprout.com/126848/11547382

youtube: https://youtu.be/Cd9JpHsjDr8

Why did I do this show?

Insurance companies operate based on trust. Probably the single most important gatekeepers of trust of insurance companies are rating agencies. Some will find that surprising! Rating agencies are a dominant force in insurance especially in supremely dysfunctional markets like homeowners insurance in Florida where Joe’s company, Demotech, is the dominant rating agency. Florida is a mess and then that mess got hit with the largest hurricane in its long history of hurricanes!

What did I learn?

That litigation reform is very, very hard!

What was my favorite part?

I learn again and again, entrepreneurs are pretty similar everywhere, even those in the weird business of rating agencies!

Clips:

Author: David Wright