Bedouins Don’t Need Chainsaws, But You Need Programming

Great post out there called “Nobody Wants To Learn To Program.”

It’s true of course. Programming is a tool, and tools are defined by the problems they solve. And I see unsolved programming problems all around me. If more people understood what power it had, they would solve those problems for themselves and for me.

But it’s difficult to learn something unless it’s framed by an immediate use. The worst teachers don’t tie their lessons to things that matter. So often in school I felt like a Bedouin in a chainsaw course: WTF am I doing here?*

(That’s what’s so neat about Scott Adams’ idea of a school curriculum built (HA!) around building a house)

Anyway, I only started learning programming when I realized writing macros in excel would make my work life easier. This past year I found a few more uses for it and have massively expanded my programming toolkit.

So now I’m a programming dork, by which I mean that learning about something for its own sake is interesting to me. But ‘regular’ teachers (who are also dorks) teaching beginners from that perspective is ridiculous and boring.

*In that Bedouins (which I know is an incorrect pluralization) live in the desert and there aren’t any trees there and a chainsaw course would be entirely concerned with cutting trees.

Expertise vs Scale

A little background:

I’ve dabbled a bit in web development and making your page doing look the same in different browsers is a pain. Most browsers, luckily, tend to ‘behave’ in similar ways when given instructions and even the differences eventually get ironed out in later versions.

But until they do, you’ve got to detect which browser your user has and call one of a few parallel (ie duplicated through hours of extra effort) implementations of your web page depending on the answer.

When you have not just a few different browsers, but also many many old versions of browsers out there, making a web page that substantially all web citizens can see and use is a time-consuming challenge.

Anyway, the worst offender in this respect is Internet Explorer 6. It’s notorious for interpreting instructions in a radically different way from other browsers and also for being incredibly long-lived.

So this announcement from Microsoft, that they’re auto-updating their browsers, is welcome. But what’s interested me is that this probably won’t solve the problem. To HN:

The article points out that MS will still provide blocking tools for companies. Corporations are the major source of IE6 browsers and I’m not sure this will have any impact on them. The best we can hope for is that high consumer adoption rates will force many more sites to drop IE6 support which might spur companies to finally test and upgrade.

One of the things that really blew my mind this year was a large ($40MM) software development project I became familiar with (a worldwide internal system for a multinational corporation) that concluded — in 2011 — and required MSIE 6. MSIE 6! Doesn’t even run on MSIE 7, much less any modern browser.
While I personally think that’s insane — if you are that specific (not to mention antiquated) with your browser requirements, why don’t you just code a native app? — I’ve also never developed software with a team larger than five, and certainly don’t know the nitty-gritty details about spreading the work over a dozen countries and hundreds of developers, the vast majority being low-cost Chinese and Indian coders. So I’m not judging (or at least I’m trying not to).
But my point is that Big Corporate just wants their freaky “web-based” apps to run predictably for the projected 6-year deployment timeframe and does not give one flying fuck about whether their staff can access the new hip and way-superior version of . Unless said had real business value to large enterprise, but then, if it did… it would probably support MSIE6.

I am persuaded by some of the recent arguments (John Siracusa’s maybe?) that both the innovation and the money in general-purpose computing industry have moved over to the consumer side of the equation, and that this change has put MS in a worse position than they’ve traditionally been in.

Very very interesting thread. The idea is appealing: that corporate customers who have built their own internal web-apps and aren’t interested in updating something that works just so their employees can use fancy new websites.

I don’t think that last point is on, though. I’ve tried to find the original material he’s mentioning but failed. I think that it is the case that consumer software are leaping ahead of corporate software, but I think it’s because of scale. Corporations build small-scale, customized solutions. These are going to always move slowly. It takes just as much effort to build something for 100m users (in the general Internets) as it does for 10,000 (in your little company).

Awesome Things I’ve Learned Today

Image compression.

In the Machine Learning class, we just learned the k-means clustering technique. Sounds complicated but, as always, the jargon is harder to learn than the concepts in computer science and math.

The idea here is that you express each color in an image as a combination of red, green and blue. Now you can plot each of the colors on a 3d graph so that similar colors will be plotted near each other. Dark red and light red will all be in the high-red, low green, low blue part of the graph.

Now pick a number of colors that you want to compress the image to. I’ve picked 16.

Nett you randomly drop 16 points into that plot of all the colors used. Each of the colors used will be close (in 3d space) to one of the dropped 16.

Here comes the clever part.

Now you have 16 groups of colors based on their closest random point. Find the mid-point (or average, or whatever you want to call it) of the colors and move the one of 16 to that mid point, then recalculate. Here is an image of it working in 2d space.

Eventually you wind up grouping all of the original colors into ‘likeness’ groups. The midpoint of this likeness group is your new color and the imagine is compressed!

Here’s an imagine of my dog Max I just compressed in this fashion.

Today In Things I Don’t Understand

Emacs, the development environment for gods.

Here is xkcd:

I find myself completed fascinated by computer programming. I admire people who are awesome at it. I follow the culture. I’m desperate to learn it.

I code in just about every waking moment not explicitly devoted to something else. I still lead a life, of some sort (wife, exercise routine, job), but few hobbies: the odd boxing match (5%), my blogroll (10%), this blog you’re reading (5%) and programming projects (80% of my free time). I am so frustrated with myself for not learning this years and years ago. Here I am in my 30s and scratching the surface of something immense. I will never be good at it… Never.

So. Emacs. I’ve read about. Heck, I even downloaded the thing onto one of my computers and completely ran aground. “You see, chortles the silverback neckbeard, real software isn’t ‘INSTALLED’ (*spits*), it’s compiled… AFTER editing a few non-trivial text files to suit your system’s specific requirements. If you can’t handle that…”

Pathetic, he whispers dismissively, turns and waddles away.

THAT’s Emacs. That’s the C library for XML processing I strangled my mind with the other day. Heck, that’s just about everything you do with C, which I’m desperate to learn but simply cannot see how I do that without spending months of dedicated evenings/weekends studying it. I have too many priorities. I have too little time.

I feel stupid every single day.

Monte Carlo Simulation Implemented ENTIRELY in Excel (no VBA macros)

When I moved to NY from our Toronto office I was a bit hamstrung by needing to log into a remote server to use the stochastic software we tend to run our simulations with. Not that it didn’t work, but it was slow and a bit of a pain to use.

So the first thing I did was try to implement my own Monte Carlo simulation program. I wanted to be able to send it to people wholesale so I forced myself to use VBA.

It worked, but it was really really slow. I hate slow.

So my latest iteration is a relatively simple model written entirely in excel formulas, basically only using the rand() function, which has been greatly improved by MS in Office 2003.

The attached file simulates Poisson random numbers with the famous Knuth algorithm (a lovely bit of math, by the way).

Next up will be installing normal, lognormal and beta distributions. With those, I can probably simulate just about anything I need to, really. I’m going to limit myself to Excel’s native (C)  implementations of all of these functions, which are a ba-jillion times faster than anything you can handcode in VBA.

So why reinvent the wheel?

Today’s Fad

Here’s a quote from HN in response to the question: what’s the big deal with Machine Learning?

There’s this enormous focus on ‘web scale’ technologies. This focus necessarily invokes visualizing and making sense of terabytes and eventually even petabytes of data; conventional approaches would take thousands or millions of man hours to accomplish the same level of analysis that computers can perform in hours or days.

I totally agree. I’ve joined a few technology meetup groups here in NY and so far I’ve had interesting reactions to my field of expertise. I basically say my job is predictive models, but on small-mid-sized datasets. Cue disappointment.

Everyone is focused on predictive models that crunch BIG DATA. I’m taking a course on ML but I don’t do BIG DATA.

There are two kinds of big, you see. You can have a list of a billion addresses, but that doesn’t really qualify as BIG big.  People can get their heads around what to do with a billion addresses: what regressions to run, what information can be reasonably gleaned from analyzing it.

BIG big is different. BIG refers to a HUGE number of parameters that might or might not be meaningful. Think about some problems that are common applications of ML:

For highly dimensional problems, such as text classification (i.e., spam detection) or image classification (i.e., facial detection), it’s almost impossible to hard code an algorithm to accomplish its goal without using machine learning. It’s much easier to use a binary spam/not spam or face/not face labeling system that, given the attributes of the example, can learn which attributes beget that specific label. In other words, it’s much easier for a learning system to determine what variables are important in the ultimate classification than trying to model the “true” function that gives rise to the labeling.

BIG means you don’t really know what you should do with the data. You kinda know what answer you want, but you can’t really hold a thousand or ten thousand different parameters in your head long enough to specify some kind of regression.

Now think about technology trends today. Computing power, bandwidth and memory capacity are all now cheap enough that computers can handle BIG better than humans can. THAT’s interesting.

In my professional life (sadly or happily, depending on how fired up I am to do ML), I don’t tend to get that many parameters.  Insurance is rated on a very few rock solid inputs and the rest is just sniffing out when someone is trying to screw you over.

But I’m intrigued, nonetheless. Woe to the ambitious one who doesn’t keep a tab on the cutting edge.

Here’s a link to yet another ML class.

How to Delete All Named Ranges in VBA

When you’re copying spreadsheets around all over the place (particularly if you use the excellent advanced copy trick for isolating unique cells), you build up a gigantic store of named range. These are infurating.

Well, luckily I’ve developed a little VBA sub that wipes them all out:

Sub deleteNames()
Dim counter
Dim nameCount

nameCount = ActiveWorkbook.Names.count
counter = nameCount
Do While counter > 0
ActiveWorkbook.Names(counter).Delete
counter = counter - 1
Loop

End Sub

This made my friggen day.

Why I want to learn C and OMG is it Tough

For some reason I don’t think I ‘get’ object oriented programming. I kinda know it has something to do with classes, whatever hell those are.

I find I write (think) in a procedural style, which is why I’m so fascinated by C, which I understand to be a procedural programming language. So some day I’m going to finish learning C (I’ve got WAY too much on to really dive into it right now), which is a deep and difficult language.

And so this (part 2 here) is a fascinating way to learning exactly how C works. It’s super simple, but just complicated enough that it’s out of my reach right now.

Once I figure all this out, I’d like to learn C++ and read this.

We Need More APIs

THE most constant pain in my ass is getting equivalent datasets in radically different formats. Every single insurance company records the same friggen data in different ways.

And if that wasn’t enough, whoever downloads this data then gets their grubby little paws all over it (usually mannually effing around with rows and columns in excel… [vomit]) before sending it to me.

I want machine data, I do NOT want people data. It is infuriatingly complicated cleaning up these datapiles and would all be better if they’d just give me access to their machines.

I’m reminded of the Matrix.