math – Not Unreasonable

Word of the Day: Wormhole Finance

Posted on August 16, 2012 by David Wright

Wormhole Finance (n): a collection of trading strategies most useful when basic assumptions about reality are violated.

Be ready in case you find yourself in a bizarre alternate universe where fire and brimstone rain down from the skies. Forty years of darkness, earthquakes, volcanoes, the dead rising from the grave, human sacrifice, dogs and cats living together… mass hysteria!

Be ready… to assume geometric brownian motion.

Kids These Days (?)

Posted on June 21, 2012August 15, 2012 by David Wright

Tyler Cowen sends us to this very interesting post by Steve Postrel on higher education:

My hypothesis is that it is precisely the dumbing down of U.S. education over the last decades that explains the increase in willingness to pay for education. The mechanism is diminishing marginal returns to education.

Typical graduate business school education has indeed become less rigorous over time, as has typical college education. But typical high school education has declined in quality just as much.

Postrel offers up four links to support the decline in rigor at all levels of education. Given that the entire argument rests on this point, we should follow those links:

1. Dumbing Down High School English. Opens up with this quote:

A recent ALSCW ( Association of Literary Scholars, Critics, and Writers) study finds that

a fragmented English curriculum and a neglect of close reading may explain why the reading skills of American high school students have shown little or no improvement in several decades despite substantial increases in funds for elementary and secondary education by federal and state governments.

After that it’s all hand waving. Disappointing because Postrel is worried about the LEVEL of education and this link despairs over the rate of positive change. Not relevant.

2. Link to this book. I haven’t read it. In this review, they say the book refers to one study in particular (pdf here) with this abstract:

Using multiple datasets from four different time periods, we document declines in academic time investment by full-time college students in the United States between 1961 and 2004. Full-time students allocated 40 hours per week toward class and studying in 1961, whereas by 2004 they were investing about 26 to 28 hours per week.

Students spend less time studying. I believe that.

But one thing the study does not control for is major type. Are engineers less well educated than in the past? I sincerely doubt that. Cowen may not find answers to declines in productivity improvements in this line of inquiry.

3. The Math Wars. Short version: the quality of math instruction in high schools has declined. I suppose I buy that. But I don’t envy my grandparents’ math instruction. I am sympathetic to reformers here.

4. On MBAs. Here’s a relevant quote on the author’s memory of his MBA student days and what he thinks has happened:

So, we read 30-40 academic journal articles per class. We became capable of digesting their content and, thereby, able to access new ideas 10-20 years ahead of widespread practice. We traced the trajectories of core research streams and, thus, came to recognize that subtle thinking is required of complex issues. We jammed into Merton Miller’s class, not because he was entertaining or capable of summarizing complex ideas into exquisite 10-bullet lists, but because everyone knew he was a genius and felt damn lucky to sit in his presence and glimpse into his thinking about finance. Excerpts from books by Tom Peters and other management “gurus” were not viewed as examples of special wisdom but, more accurately, of sloppy, shallow, unsubstantiated pap. That was a bad-ass education — one that served us well throughout our careers, not just in our next jobs.

What happened? Well, Business Week rankings coupled with the “Northwestern Innovation.” BW rated schools on: (1) student satisfaction, (2) recruiter assessment, and (3) research ranking. Northwestern, which was not a contender back then, realized that moving (2) or (3) could only happen veeery slooowly. Item (1), on the other hand, well, that could be manipulated almost instantaneously. And thus began the race to the bottom of the toilet. As far as I can tell, anything approaching the education I got has long since been abandoned.

More hand-waving, mostly, but I accept his premise. I’d point out a few things:

Chicago is a special place. Not every school has super-duper-star instructors and highly motivated students.
MBA programs are a ridiculously easy target for this kind of argument. I happen to think quite a lot of management ‘theory’ is complete garbage. Economics can be nearly as bad. Rigor in these subjects can often obscure learning outcomes. Let stories be told where stories must be told.

I find the idea that bad high school quality drives the increase in the college premium persuasive. I’m more sympathetic in respect of math than other subjects, though we must remain ever vigilant against cognitive bias. Complaining about the “kids these days” is usually total crap.

Revolution’s Achilles Heel

Posted on March 30, 2012March 31, 2012 by David Wright

Pete Warden didn’t ask us to square this circle, but he should have. Both quotes from his blog.

Quote 1:

Our tech community chooses its high-flyers from people who have enough money and confidence to spend significant amounts of time on unpaid work. Isn’t this likely to exclude a lot of people too?

…I look around at careers that require similar skills, like actuaries, and they include a lot more women and minorities. I desperately need more good people on my team, and the statistics tell me that as a community we’re failing to attract or keep a lot of the potential candidates.

Appreciate the shoutout to actuaries and all, but isn’t the simple solution to encourage more education in this field?

Quote 2 comes from the comments to his first post:

I’m a female who majored in computer science but then did not use my degree after graduating (I do editing work now). While I was great with things like red-black trees and k-maps, I would have trouble sometimes with implementations because it was assumed going into the field that you already had a background in it. I did not, beyond a general knowledge of computers.

I was uncomfortable asking about unix commands (just use “man”! – but how do I interpret it?) or admitting I wasn’t sure how to get my compiler running. If you hadn’t been coding since middle school, you were behind. I picked up enough to graduate with honors, but still never felt like I knew “enough” to be qualified to work as a “true” programmer.

How is this possible? Even the people with degrees in field can’t code? And this isn’t the first time I’ve come across a story of Comp Sci graduates that couldn’t program.

Actuaries aren’t the best comparison because so much of Actuarial Science builds on pre-existing math knowledge and adds insurance and finance training. Coding is more fundamental. I’d say an actuary is to a .NET (or whatever) programmer what a generalized ‘math geek’ is to a ‘programmer’.

There’s only one way to learn to code, and it’s not the easy way. Like math, or any other language for that matter, you’ve got to sit down and crank away, learning from your mistakes; few could call themselves mathematicians three years after picking up their first calculators.

Of course, you don’t need to master the coding equivalent of calculus to be useful any more than you need to take integrals to do your taxes. But right now the whole programming ecosystem is starved of talent. Pete needs ninjas and everyone else needs front end web devs.

That means every kid should in the world should figure out whether they like programming or not in a middle school classroom.

Higgs and Stats

Posted on December 20, 2011 by David Wright

Every time there is some science news, I always hold my breath until SWAB comments. And on this issue Ethan Siegel does not disappoint. I highly recommend reading him if you’re interested in great science writing.

Anyway, I’ve been pretty confused about a lot of the statistics around the evidence of the Higgs Boson. I’ll set this up, first, though. Here’s Ethan:

Back in 1976, there were only four quarks that had been discovered, but suspicions were incredibly strong that there were actually six. (There are, in fact, six.) If you look at the above graph, the dotted line represents the expected background, while the solid line represents the signal published here from a E288 Collaboration’s famous Fermilab experiment. Looking at it, you would very likely suspect that you’re seeing a new particle right at that 6.0 GeV peak, where there ought to be no background. Statistically, you can analyze the data yourself and find that you’d be 98% likely to have found a new particle, rather than have a fluke. In fact, the particle was named (the Upsilon), but when they looked to confirm its existence… nothing!

In other words, it was a statistical fluke, now known as the Oops-Leon (after Leon Lederman, one of the collaboration’s leaders). The real Upsilon was found the next year, and you shouldn’t feel too bad for Leon; he was awarded the Nobel Prize in 1988.

But the lesson was learned. It takes a 99.99995% certainty in order to call something a discovery these days.

6 sigmas?! WTF?! That’s humongous. That says to me that they’re either using the wrong distribution or the number of observations is immensely higher than any dataset I’ve ever seen. Considering these are probably the most competent statisticians on earth, I have to assume the latter, but… seriously?! SIX standard deviations?

I’d love to see the data.

Machine Learning Course Notes – Bittersweet

Posted on October 22, 2011 by David Wright

Finished this week’s exercises in a 5-hour marathon starting at 4:30am this morning. Today’s meta-lesson: implementation is way harder than reading slides and ‘kinda getting it’. My god is it hard to actually write a program that uses even what appear to be simple concepts.

So there are three tracks for this course: first is the spectator track (my term), where you just do the basic assignments (enough to be dangerous and spew plausible-sounding BS).

There’s the ‘advanced’ track, which I’ve chosen, which asks you to do some actual programming assignments (this morning’s marathon). Within the advanced track there are ‘extra credit’ assignments, which ask you to implement even more of the course material in Octave (a programming language). I haven’t gotten to the extra credit stuff. More on this later.

The final track is the ‘real’ track, where you pay real money, show up to class and all the rest. I read a discussion thread on the course website that speculates that my ‘advanced’ track covers about 40%-50% of the real course material. The real course is about 1.5x as long (3 months instead of 2), so let’s say we’re about 60%-75% of the pace of a real university course.

I’m starting to think it was a mistake to take two of these courses. I just don’t have enough time to learn everything I want to learn. I want to do the extra credit stuff, because what’s the point of reading the slides on stuff if you don’t REALLY get it? And my first crack at the extra credit stuff shows that I don’t REALLY get it.

And there are all these dudes (yes, all dudes) carpet-bombing the discussion boards who obviously REALLY get this stuff, while I only kinda get it. How many times in University did I wish I were smarter? That I wish I had really learned the background material in high school like I should have and I could have picked this up quicker?

Anyway, I’m done complaining and it’s just too time-costly for me to learn more of this right now, so I won’t. I wish it were different but that’s just too bad for me, isn’t it.

How Square Roots are Calculated in Quake

Posted on October 17, 2011 by David Wright

This has been sitting in my drafts folder, waiting for me to read the article, learn about it and summarize it here.

I took a quick scan today to make sure I wasn’t biting off more than I could chew when I stuck it in the queue. Unfortunately, I don’t think I understand it better than the linked-to-writer and I’m not interested in spending the time to become so.

Here’s the attention-grabbing part:

My Understanding: This incredible hack estimates the inverse root using Newton’s method of approximation, and starts with a great initial guess.

The trick has to do with how floating point numbers are stored in a computer, something I’ve actually blogged about.

Who said math wasn’t useful!

Machine Learning: first impressions

Posted on September 27, 2011September 27, 2011 by David Wright

Wow, this is pretty cool stuff. Some notes:

There are two kinds of machine learning: supervised and unsupervised.

Supervised learning is literally composed on of running regressions on datasets you know something about. That’s it. They further break the regressions down into binary variable regression (classification problems) and plain-old single/multi-variate regression. The point of regression, of course, is to estimate a smooth function that describes a lumpy dataset.

Unsupervised learning is where the sex is in this field. That’s google news (clustering stories together that are ‘about’ the same topic) and various other kinds of data mining. The idea is that you get a dataset and ask the computer to find a pattern.

There were quizzes in this, too, at which I did distinctly better than yesterday with the databases. I chalk that up to a better teacher setting questions up. As always, there are instances of false precision leading to a binary result (WRONG ANSWER, you idiot!), when in the real world I’d probably have gotten away with my approach.

For instance, there was a question that asked to identify the type of regression problem: one was predicting whether email was spam/not spam (obviously a ‘classification’/binary problem) and the other was to predict how many of a warehouse of goods would be sold or not sold in three months. I said that was also a classification problem, but the instructor thought it would be a normal regression problem. Could probably go either way, but I lose.

Finally, I am going to need to learn another programming language, Octave, which is apparently an open-source version of MatLab (The days of building programming languages and selling them for money are long gone.). Great.

Next up is a linear regression tutorial then a linear algebra lesson. I never took linear algebra, so I am distinctly not looking forward to the amount of time I’ll probably need to spend on this.

But I press on nonetheless.

Databases: Intro, Relational and XML (Hierarchical)

Posted on September 26, 2011September 26, 2011 by David Wright

First day wasn’t so bad. I watched about 40 minutes of video at 1.2x or 1.5x (I’d probably have falled asleep at 1x pace) and learned a bit.

There are four kinds of developers in the database programming world: builders, designers, programmers and administrators.

This course is not about (#1 above) building a database system, which would involve designing the database interaction with the physical system in C or Assembly or something crazy like that. Nor is it about (#4) maintaining a database in use, which involves optimizing resources, minimizing downtime and keeping things running.

What we’re learning about are choosing a database type, designing schemas, writing queries and incorporating the structure into a program. In other words, this is a database course for people that build database functionality into a program.

There are two choices for databases today: relational databases (in my case, SQLite or some other system that uses SQL as its query language) or hierarchical databases (XML, which means a flat text file formatted in a certain manner).

We were just getting into a description of what XML is when a quiz popped up in the lecture, which was neat.

The questions, however, were ridiculous, not least because I got them all wrong. Here are some things I learned from the quiz (honestly, none of this was discussed in the lecture):

ALWAYS use relational databases when you can. In particular, when the data structure is fixed (a few ‘columns’ and lots of records), relational is the default. Why this is isn’t discussed (grr), but I suspect that relational databases are just much faster.
XML is useful when the data is ‘hierarchical’, which means that data can be easily described as subsets of other parts of the dataset. The example they used in the question was a family tree. For this question I chose ‘XML only’ when the right answer was ‘either XML or relational’. Again, relational is the default and only stray from it IF YOU HAVE A DAMN GOOD REASON TO.

I also learned what ‘relational algebra‘ is. Turns out it isn’t as scary as it sounds. Algebra, I’m reminded, simply means a vocabulary of symbols for concepts. Relational algebra, therefore, can be thought of as a series of symbols that represent actions a database program performs (pi defines the set, sigma selects the data, blah blah blah). I’m hoping it’s somewhat intuitive.

I’m happy I got this out of the way early because seeing the words RELATIONAL ALGEBRA staring out at me from the syllabus was giving me the heebeejeebees. The last thing I need to do is brush up on lots of complicated math.

It it takes FOREVER to relearn math.

Fin de Poisson

Posted on July 14, 2011July 14, 2011 by David Wright

Ok, probably the last post in this series. I’m finally feeling comfortable with Poisson.

Lets recap, first: one, two, three and four.

So, recall the original code that sparked all this:

algorithm poisson random number (Knuth):
    init:
         Let L ← e^−λ, k ← 0 and p ← 1.
    do:
         k ← k + 1.
         Generate uniform random number u in [0,1] and let p ← p × u.
    while p > L.
    return k − 1.

I was confused about what the e is doing in there. I think I get it, now. Here is a link to a chapter that helped me with the following (a bit):

Imagine you’re sitting in a room with Danny DeVito and Arnold Schwarzenegger is in another (identical) room. We want Arnold to walk across the room ten times and see how many steps it takes each time. BUT we can’t get into Arnold’s room. What do we do?

Well, let’s say that these are special rooms. They have been designed so that Arnold will take on average ten steps to cross the room. Now, we also know that Danny’s stride is exactly half as long as Arnold’s. The calculation becomes easy!. We tell Danny to go halfway across the room ten times and that’s our answer.

This conversion is the same idea. We know that each Poisson event takes a bit of time (length of Arnold’s strides) and that that time varies a bit. The trouble is that Arnold’s strides vary on an exponential distribution, which we can’t really model. We can model a uniform distribution easily (Danny’s strides), but we need to find a way to convert them.

We do that by picking a different distance.

Unfortunately, though, exponents really screw with your intuition here, which is why this site has been so helpful.

Think of an exponent as the amount of time a number (e) spends growing until it hits a target: say, 100. Ok, we can figure that out easily by taking the ln(100) = 4.6. But we want random nubmers, which means that e does not equal 2.71, its expected value is 2.71. But the target stays at 100, which means that our random number is actually the time e needs to grow to hit 100.

So we’ve got two random numbers, now. e is random (input) and the (output) time is random. But we can’t do random es, it’s too hard. We CAN do random uniforms (0-1), but how do we pick our target?

Well, why don’t we figure out what the expected value of the uniform is (0.5) and tell it to grow for that time=4.6 we calculated? That’s our new target!

Now things get easy. We just get this new target and generate lots of uniform random numbers to see how many it takes to hit our new target. Each time we hit the target, we write down how many uniforms it took.

Voila, each of those counts is Poisson-distributed.

Now, back to the derivation of Poisson for a sec:

The two circled terms are the Poisson formula. I didn’t really realize how that red-circled part worked before. Look what it’s saying!

It is 1 – λ/n, which is the probability of NO event, ‘grown’ by the number of trials. In my examples above, I used events as opposed to probabilities of events. This makes no difference to the math, really, you just take the inverse of all your terms.

And now the code is clear: it just strings together a bunch of events until you hit the probability at which you know there can be no more events. And that probability is different when expressed a uniform distributed number than as an e-distributed number.