Machine Learning: first impressions

Wow, this is pretty cool stuff. Some notes:

There are two kinds of machine learning: supervised and unsupervised.

Supervised learning is literally composed on of running regressions on datasets you know something about. That’s it. They further break the regressions down into binary variable regression (classification problems) and plain-old single/multi-variate regression. The point of regression, of course, is to estimate a smooth function that describes a lumpy dataset.

Unsupervised learning is where the sex is in this field. That’s google news (clustering stories together that are ‘about’ the same topic) and various other kinds of data mining. The idea is that you get a dataset and ask the computer to find a pattern.

There were quizzes in this, too, at which I did distinctly better than yesterday with the databases. I chalk that up to a better teacher setting questions up. As always, there are instances of false precision leading to a binary result (WRONG ANSWER, you idiot!), when in the real world I’d probably have gotten away with my approach.

For instance, there was a question that asked to identify the type of regression problem: one was predicting whether email was spam/not spam (obviously a ‘classification’/binary problem) and the other was to predict how many of a warehouse of goods would be sold or not sold in three months. I said that was also a classification problem, but the instructor thought it would be a normal regression problem. Could probably go either way, but I lose.

Finally, I am going to need to learn another programming language, Octave, which is apparently an open-source version of MatLab (The days of building programming languages and selling them for money are long gone.). Great.

Next up is a linear regression tutorial then a linear algebra lesson. I never took linear algebra, so I am distinctly not looking forward to the amount of time I’ll probably need to spend on this.

But I press on nonetheless.

What (Meta Skills) Do Computers Teach Us?

Easy: programming. But what does that do for us?

I just read Alan Kay‘s essay: The Computer Revolution Hasn’t Happened Yet. There’s also a video of the lecture on which this essay is based. Haven’t watched it yet.

Here’s the synopsis: when the boys at PARC in the 70s were inventing just about every major component of the personal computer we use today, they had gigantic aspirations. They had a printing press-style revolution in mind and Kay is unimpressed with humanity’s progress using those breakthroughs. He figures we’re still only scratching the surface of its power.  I agree.

Here’s how he rationalizes it:

One way to look at the real printing revolution in the 17th and 18th centuries is in the co-evolution in what was argued about and how the argumentation was done.

Increasingly, it was about how the real world was set up, both physically and psychologically, and the argumentation was done more and more by using and extending mathematics, and by trying to shape natural language into more logically connected and less story-like forms.

The point here is:

As McLuhan had pointed out in the 50s, when a new medium comes along it is first rejected on the grounds of “too strange and different”, but then is often gradually accepted if it can take on old familiar content. Years (even centuries) later, the big surprise comes if the medium’s hidden properties cause changes in the way people think and it is revealed as a wolf in sheep’s clothing.

So, the computer is going to literally change the way we think about and solve problems and this hasn’t really happened yet.

Big thought, that one. I like it a lot.

Kay would answer my questions at the beginning of this post as follows, perhaps: computers let us learn programming, which allows us how to simulate stuff, to play with ideas.

He spends quite some time on his work with children learning science by programming computers to test out ideas of their own. To learn the best way one can learn: by failing. Or let’s dust off an old metaphor: the printing press let us learn by watching, the computer allows us to learn by doing.

If this is right, it means that tomorrow’s people will simply have a better intuitive grasp of difficult concepts: they’ll be smarter. Is it crazy to say that a pedagogy with computer games as its centerpiece will revolutionize education and the world? Sure sounds a bit crazy.

Kay laments that our society sees a computer and thinks ‘super-TV’. Ouch, but he’s right. Remember the One Laptop Per Child program? Kay’s affiliated with it, unsurprisingly. When I heard that I had a flashback to some of the commentary: “What on earth will kids do with a cheap computer when they don’t have water? Watch YouTube?” Imagine Kay’s exasperated reply: “they’d learn, you fool!”

Because they aren’t super-televisions. Oh, no.

Computer literacy was once learning to type. Mechanical skills?!How laughably 19th century. More recently it meant learning how to open a document in Windows: pshah, that’s like teaching one book in an English class. Better to teach the kid to write!

You pick up those basic skills as you go along. The point is that we can’t rely on everyone teaching themselves. Computer literacy means literally learning how to read to and from computers. It is learning programming.

And it’s the future.

Databases: Intro, Relational and XML (Hierarchical)

First day wasn’t so bad. I watched about 40 minutes of video at 1.2x or 1.5x (I’d probably have falled asleep at 1x pace) and learned a bit.

There are four kinds of developers in the database programming world: builders, designers, programmers and administrators.

This course is not about (#1 above) building a database system, which would involve designing the database interaction with the physical system in C or Assembly or something crazy like that. Nor is it about (#4) maintaining a database in use, which involves optimizing resources, minimizing downtime and keeping things running.

What we’re learning about are choosing a database type, designing schemas, writing queries and incorporating the structure into a program. In other words, this is a database course for people that build database functionality into a program.

There are two choices for databases today: relational databases (in my case, SQLite or some other system that uses SQL as its query language) or hierarchical databases (XML, which means a flat text file formatted in a certain manner).

We were just getting into a description of what XML is when a quiz popped up in the lecture, which was neat.

The questions, however, were ridiculous, not least because I got them all wrong. Here are some things I learned from the quiz (honestly, none of this was discussed in the lecture):

  1. ALWAYS use relational databases when you can. In particular, when the data structure is fixed (a few ‘columns’ and lots of records), relational is the default. Why this is isn’t discussed (grr), but I suspect that relational databases are just much faster.
  2. XML is useful when the data is ‘hierarchical’, which means that data can be easily described as subsets of other parts of the dataset. The example they used in the question was a family tree. For this question I chose ‘XML only’ when the right answer was ‘either XML or relational’. Again, relational is the default and only stray from it IF YOU HAVE A DAMN GOOD REASON TO.

I also learned what ‘relational algebra‘ is. Turns out it isn’t as scary as it sounds. Algebra, I’m reminded, simply means a vocabulary of symbols for concepts. Relational algebra, therefore, can be thought of as a series of symbols that represent actions a database program performs (pi defines the set, sigma selects the data, blah blah blah). I’m hoping it’s somewhat intuitive.

I’m happy I got this out of the way early because seeing the words RELATIONAL ALGEBRA staring out at me from the syllabus was giving me the heebeejeebees. The last thing I need to do is brush up on lots of complicated math.

It it takes FOREVER to relearn math. 

Ok, I Did It

Fair warning to blog readers. I’m going to use this thing as a crutch for the Machine Learning and Database courses from Stanford, which I just finalized my enrollment in. Look out for lecture summaries and coursework as I think through the problems I encounter.

I’ve looked into each course and I am a bit skeptical about the Machine Learning stuff. Machine Learning is when you set a few algorithms loose on a gigantic dataset of uncertain value. Apparently humans are much better when guessing with even the slightest of an idea for what conclusions should pertain.

As a big believer in intuition, this suggests that Machine Learning is of limited use. Am I wasting my time? Should I be spending these hours on my weekend project (which I shall not forget)?

We shall see. God I hope I have the dedication to stick to this…

Fight Review: Mayweather vs. Ortiz

Watching HBO’s reply (love having HBO). Here’s Bad Left Hook’s extensive coverage.

My notes:

Mayweather really can land those lead straight rights. Man he’s fast.

My enduring view of Mayweahter is that he’s one of these athletes that makes it look easy. One of the reasons he looks so good is that his opponents get discouraged and can’t figure out how to continue.

Defense/counter-attack is probably the most consistently successful strategy in all of sports. This bodes well for Mayweather’s chances against anyone.

On the other hand, Mayweather’s style of defense will wear down more quickly than others’ offensive powers. Speed kills, sure, but speed dies, too.

You know when a fighter reacts to getting hit they feel it.. Floyd did that whole smile and “Come on, that didn’t hurt!” routine. The power got to him. Hm…

Saw Floyd’s two hands go up right before the headbutt. My god, Ortiz actually had a chance. What an idiot. An idiot inexperienced kid. A kidiot.

The knockout was pretty unsportsmanlike but not against the rules as many have said. I think Floyd went into that bro hug thinking, what in the hell is this? Fuck it, I’m going to hit him.

Now, knockouts are concussions and sometimes they don’t go away. When you get KO’d  you lose some ability to endure blows to the head forever. Arthur Abraham’s despicable foul against Andre Dirrell comes to mind. That guy will never be the same.

So let’s speculate about Pacquiao. Floyd chose his opponent well, here: Ortiz is a less-skilled version of Manny: probably not as quick and not as awkward and not as powerful. But a southpaw and close enough on everything else.

I suspect Manny won’t get frustrated with Mayweather who clearly isn’t going to knock a ready fighter out. With each passing day, Floyd’s abilities erode more quickly than Manny’s.

Five years ago, Floyd by a comfortable UD, I think.

Today? Or in two years when they actually meet? I think Manny hurts Floyd like Mosley did and like Ortiz did. But Manny won’t screw up the finish.

Manny Pacquiao by KO.

Build vs. Buy

Celent is releasing a report soon on build vs buy. I find this debate frustrating because I feel like it isn’t a difficult decision.

Insurance companies are made of three things: money, a processing system and an underwriting system. Money is money and at current regulatory margins I don’t believe there is an advantage to be gained for insurers having more or less of it. Underwriting systems exist to sniff out moral hazard so humans have to handle those (committees, referral, etc).

100 years ago, the processing was done by humans, too. Today, you choose between build or buy.

If you find yourself among the small minority of (re)insurers that write volatile, hard-to-price insurance, you don’t really need a rock-solid processing system. You probably write big deals and work more like a hedge fund than a retail bank. You can buy.

As for the rest: if you don’t build, the risks are simple and the money isn’t yours, your job is to provide a commodity on the cheap. If you aren’t building your own system, what on EARTH do you get paid to do?

Over time technology improves, making new systems better regardless of whether you build or buy. This will, however, rarely (never?) affect whether a company chooses picking clients or shaving margins as its business model.

Progress

Progress is a funny concept. For instance, here are some things that have undoubtedly progressed in human history:

  1. Math
  2. Physics
  3. Biology
  4. Chemistry
  5. Engineering

Nobody disagrees, right?

Here are some more things:

  1. 100m dash
  2. High Jump

Still with me, no doubt. It’s difficult to deny that the greatest sprinter in history is Usain Bolt (after all, we can see everyone’s times).

But what if I said that Aristotle would have been no more than a moderately successful professor at a decent college if he were alive today? Think of it this way: he was the smartest guy in a city-state of, what, 500,000 people (most of whom were slaves)?  Let’s say this total civilization had about 1,000,000 people in it.

What if I said that no tennis player in history could compete with today’s players? Tennis, like swimming, is a tricky example because technology is difficult to disentangle from ability, but is it not safe to say that if sprinters are better, so should tennis players be?

I once ran into Jonathon Power at a party in Toronto and shamelessly geeked out for a few minutes. He said something interesting: each generation of athletes is demonstrably better than the preceding generation. That is to say that if Power jumped into a time machine to the 50s, he’d have absolutely crushed everyone he faced. He’s a pretty confident guy and literally said as much.

The corollary, of course, is that he’d himself get crushed by today’s best. He wasn’t about to admit that, though.

So progress exists in bodies of knowledge, which makes sense, and in sports, which is easily measured, but what about something creative like music?

I’d say it’s probably uncontroversial that musicians today are much better technically than in years past, but are writers more creative?

My head starts hurting pretty quickly when thinking about this. Tyler Cowen is persuaded by this:

It’s glaringly obvious that all the astounding, time-space rearranging developments in the dissemination, storing and accessing of audio data have not spawned a single new form of music.

Which is ballsy coming from a guy that loves classical music. If you judged by his tastes, you’d have to conclude that stagnation is impossible, there has never been any progress!

Surely musical relativism can be taken to an extreme and appreciation of music is an intensely personal affair. I’m reminded a bit of Adam Smith’s speculation on the sympathy of sentiments, where he said that the goal of much of human interaction is to come to a common feeling on some topic (it’s raining out: man, don’t we just hate the rain!? Yeah!!).

Music seems to me to be something that we have a feeling that we should agree on but often don’t.

Anyway, I’d argue that cultural output cannot be immune from progress and therefore older music would be in some fashion demonstrably worse than newer music.

Here’s the thought experiment: if you tossed Mozart into a time machine and played a Britney album for him, or whatever, would he be blown away?

Some Amateur Speculation on The Stock Market

The economic disaster trade appears to be short commodities and long Treasuries.

Timing is everything, but when things turn around the USD is going to weaken and commodities prices are going to pick up. That’s when you would want to be long a Canadian or Australian Index Fund, I think.

The quickest way that happens, probably, is when the either the Fed or the ECB credibly commits to an expansionary monetary policy. A Euro breakup counts because the Dmark skyrockets and the Italian, Greek, Portugese and Spanish currencies plummet.

‘We’ need to destroy some wealth in those problem countries pronto.

Another Entrepreneur Talks Up His Book

Got here from Ben Casnocha. I don’t believe this for a moment:

TripAdvisor, the leading hotel and travel reviews site, will be spun out from its parent Expedia this month, andshareholders are giddy. With 50 million reviews and counting, the site is shaking the travel industry to its core. Underlying TripAdvisor’s success is a powerful long-term trend: ratings websites threaten to make many brands irrelevant.

I actually see the opposite effect, which is to reinforce brands’ power by making their experiences more consistent and reducing their monitoring costs.

Feedback is NOT disruptive technology for branded hotel chains or restaurants. Branding only works because it sets expectations, which are then either met or not met. If they are unmet, then the brand takes a hit. TripAdvisor just speeds up this process.

It’s an ugly bit of self-indulgence for marketing-types to think that brands are built by marketing. They’re built by experience. Motel 8 didn’t get its image by buying ads, it got it by whipping small franchisees into competent low-end hotel owners and delivering some kind of central reservation system. And maybe tomorrow they get good at hiring summer interns to pump up the reviews of their in-chain hotels.

And never forget TripAdvisor’s business model: selling ads. To the branded hotel chains. It’s in TripAdvisor’s best interest to maximize its value to these core clients.

TripAdvisor is helping to kill some businesses, though: rating organizations as they previously existed and old media advertisers.

Advertisers have lost a client. This is one less ad at the Superbowl because online ads are much more targeted and real information about the quality of an establishment is more easily sourced. And if their brand is powerful enough and their reviews are good enough, maybe they don’t even bother with advertising on TripAdvisor either.

And as I said before, TripAdvisor is an excellent new tool for big chains to use.

Let’s say your job is to monitor the Super 8 Brand across the US. You travel to hotels, perform inspections, chit-chat with the hotel owners and move on. That job is now dead. All an executive has to do now is sign up for a feed of reviews about its hotels flag underperformers. Any organization that has the resources to actually take advantage of these tools will win and small franchisees are buying those resources.

Branded chains don’t care about mom & pop shops unless they can scale. And if they can scale, they’re a branded chain.

This Week in Science I Don’t Understand

First story is one everyone’s probably heard:

Sept. 23 (Bloomberg) — A neutrino beam appears to have moved faster than the speed of light in an experiment whose results need to be confirmed independently, CERN, the European Organization for Nuclear Research, said.

My go-to for understanding this kind of thing is Ethan Siegel, who concludes (at the end of an excellent post, which I recommend):

Now, something fishy and possibly very interesting is going on, and there will certainly be scientists weighing in with new analysis in the coming weeks. But in all the excitement of this group declaring that they observe neutrinos moving faster than the speed of light, don’t forget what we’ve already observed to much greater precision! And be skeptical of this result, and of the interpretation that neutrinos are moving faster than light, until we know more.

Ok, we wait for more.

Next up is on genetics:

The top line finding seems to be that Europeans and East Asians are closer to each other than either is to the Australian Aboriginal. I’ve seen this result before. But, a major issue which is resolved here with their methods is that Aboriginals are closer to East Asians than they are to Europeans!

Every time I read about this the divergence of humanity seems to get more and more complicated. Perhaps this shouldn’t be surprising. The blue dots in the image below are the source genomes that were sequenced to develop the flows in red and black.

More here. Neat!