How R Is Used

I don’t use R regularly, though I’m somewhat familiar with it. My work is in 90% Excel (the lingua franca of my world) and 10% Python, which I just plain like.

Yet here is a paper evaluating R’s design and how it is *actually* used. Neat.

We assembled a body of over 3.9 million lines of R code. This corpus is intended to be representative of real-world R usage, but also to help understand the performance impacts of different language features. We classified programs in 5 groups. The Bioconductor project open-source repository collects 515 Bioinformatics-related R packages.

The Shootout benchmarks are simple programs from the Computer Language Benchmark Game implemented in many languages that can be used to get a performance baseline. Some R users donated their code; these programs are grouped under the Miscellaneous category. The fourth and largest group of programs was retrieved from the R package archive on CRAN.

Some excerpts of the results:

We used the Shootout benchmarks to compare the performance of C, Python and R. Results appear in Fig. 7. On those benchmarks, R is on average 501 slower than C and 43 times slower Python. Benchmarks where R performs better, like regex-dna (only 1.6 slower than C), are usually cases where R delegates most of its work to C functions.

…Not only is R slow, but it also consumes significant amounts of memory. Unlike C, where data can be stack allocated, all user data in R must be heap allocated and garbage collected.

…One of the key claims made repeatedly by R users is that they are more productive with R than with traditional languages. While we have no direct evidence, we will point out that, as shown by Fig. 10, R programs are about 40% smaller than C code. Python is even more compact on those shootout benchmarks, at least in part, because many of the shootout problems are not easily expressed in R. We do not have any statistical analysis code written in Python and R, so a more meaningful comparison is difficult. Fig. 11 shows the breakdown between code written in R and code in Fortran or C in 100 Bioconductor packages. On average, there is over twice as much R code. This is significant as package developers are surely savvy enough to write native code, and understand the performance penalty of R, yet they would still rather write code in R.

…Parameters. The R function declaration syntax is expressive and this expressivity is widely used. In 99% of the calls, at most 3 arguments are passed, while the percentage of calls with up to 7 arguments is 99.74% (see Fig. 12).

…Laziness. Lazy evaluation is a distinctive feature of R that has the potential for reducing unnecessary work performed by a computation. Our corpus, however, does not bear this out. Fig. 14(a) shows the rate of promise evaluation across all of our data sets.

And the upshot:

The R user community roughly breaks down into three groups. The largest groups are the end users. For them, R is mostly used interactively and R scripts tend to be short sequences of calls to prepackaged statistical and graphical routines. This group is mostly unaware of the semantics of R, they will, for instance, not know that arguments are passed by copy or that there is an object system (or two)…

One of the reasons for the success of R is that it caters to the needs of the first group, end users. Many of its features are geared towards speeding up interactive data analysis. The syntax is intended to be concise.

Via LtU and here is an interesting discussion on this related video.

Today In Too Good To Be True

Here’s a post by an “Appreneur” who appears to have made good money selling Apps:

“In just over two years, I’ve created and sold three app companies that have generated millions in revenue. Two months after launching my first company, one of my apps averaged $30,000 a month in profit. In December of 2010, the company’s monthly income had reached $120,000. In all, I’ve developed more than 40 apps and have had more than 35 million app downloads across the globe. Over 90 percent of my apps were successful and made money.”

And the secret…

Don’t hate; Emulate! When you follow in the footsteps of successful apps, you will have a better chance of succeeding because these apps have proven demand and an existing user base. This takes the guesswork out of creating great app ideas.

I can’t stress the importance of emulating existing apps enough. It’s easy for people to fall in love with their own idea, even if the market doesn’t show an appetite for it. But this is one of the costliest errors you can make.

Unfortunately, developers make this mistake all the time. They focus on generating original ideas and spend a lot of time and effort creating those apps. When it doesn’t work out, they go to the next untested idea, instead of learning from the market. Often times, they repeat this cycle until they run out of money and dismiss the app game. This doesn’t have to be your experience.

Considering the wellspring from which his inspiration comes, it’s amusing that he spends almost an entire step (#6 of 10) discussing the NDA:

You must protect your ideas, source code, and any other intellectual property. These are the assets that will build your business, so you need to have each potential programmer sign an NDA before you hire them. Yes, it’s rare to have an idea stolen, but it does happen.

So let’s just say the ideas aren’t that important. How about all that development and design?

Coding your own app, especially if you’re teaching yourself at the same time, will take too long. The likelihood of you getting stuck and giving up is very high. It will also be unsustainable over the long run when you want to create several apps at the same time and consistently update your existing apps. After all, the goal is to get your time back and escape the long hours of the rat race. Therefore, programmers will be the foundation of your business. They will allow you to create apps quickly and scale your efforts.

Hiring your first programmer will be a lengthy process.

And his apps? Well he posts his “wireframes” of one of his apps and I found the real thing on iTunes:

Not too compelling. Here’s Chris Dixon:

A fundamental principle of business is that you do things in house that you think can give you a competitive advantage and outsource things that you don’t. At an early-stage technology company this means you do in house: product design, software and/or hardware development, PR, recruiting, and customer relations/community management. Ideally, most of these activities are led by founders.

All very sensible. But if this guy outsources ideas AND development AND the one app he referenced in the article appears to really suck (“An” Emoji app is also referred but there are many of these) what is making him all his money?

Finally, for those who’d like a copy of my NDA template (along with the checklist I use when hiring a new coder), email a copy of your receipt for App Empire, my comprehensive book on app development and marketing, to bonus (at) appempire.com. The book goes into depth on advanced marketing and monetization techniques, including how to put your business on cruise control (automate).

This calls for a new Word of The Day!

This Is Your Brain On Sports

My mother in law was over once while I was channel surfing and when I came to rest on a boxing match, said “Why would anyone want to watch this kind of brutality?” Sheepishly, I turned the channel.

There really is something a bit ridiculous about watching dudes punch each other in the head for fun. “But other sports are violent, too!” is usually my limp defense. Doesn’t even address the charge. If there was a way to limit the damage without really disrupting the sport, I’d support it.

BLH has a piece discussing some recent concussion research. Here’s the gist:

Preliminary results from a new brain study suggest that there might be a point of no return for some combatants. Essentially, there becomes a point where the brain can no longer repair itself and chronic traumatic encephalopathy (CTE) becomes inevitable. The symptoms of CTE include personality changes and general cognitive difficulties, much like Alzheimer’s disease.

So boxing is probably the most concussive of sports and it’s pretty easy, and accurate, to point the finger at that community first. But remember Ted Johnson, the subject of the NYT article about concussions in the NFL?

Asked for a prognosis of Mr. Johnson’s future, Dr. Cantu, the chief of neurosurgery and director of sports medicine at Emerson Hospital in Concord, Mass., said: ”Ted already shows the mild cognitive impairment that is characteristic of early Alzheimer’s disease. The majority of those symptoms relentlessly progress over time. It could be that at the time he’s in his 50s, he could have severe Alzheimer’s symptoms.”

Ted has CTE. And Sidney Crosby missing almost a whole season’s worth of hockey over two years for “concussion-like symtoms”? Here’s an important part of the research cited by this article and BLH:

As part of an ongoing study on brain health, the researchers divided 109 licensed boxers and mixed martial artists into three groups: those who had fought for less than six years, six to 12 years or more than 12 years. Their average age was about 29.

Participants underwent MRI scans to measure their brain volume and tests of their thinking and memory.

“In those that fought less than six years, we didn’t find any changes,” Bernick said. For that group, he said, “the more you fought didn’t seem to make any differences in the size of brain structure or their performance on some of the tests like reaction time.”

But for the other two groups of boxers and combat athletes, “the greater number of fights, the sizes of certain volumes of the brain were decreasing,” he said. “But, it was only in those that fought more than 12 years that we could detect the changes in performance in reaction time and processing speed.”

Concussive sports are for the young only. Most people think of athletes playing in a sport until their reactions slow, their strength wanes and they loose their speed.

The reality is that most athletes are ‘bubble’ players who only barely make their teams and retire after a season or two. Only the best of the best, who are overrepresented in our minds and on the sports pages, play until their bodies tell them to stop. And the reality for them is that the brain may be the first thing to go.

Forcing retirement from too many concussions would be a tragedy for the player and fans. Imagine if Crosby was forced to retire at age 23? Things like this will begin to happen. And rightly so.

It’s the concussion awareness era. If it’s true that the damage can be identified early enough to limit long term problems by forcing retirement then that’s what should happen.

Not From An Actuarial Textbook

One of the mainstays of actuarial education is thinking about the ways one might underwrite auto liability insurance. The most accurate predictor of risk should be miles driven; after all, the more you’re on the road, the more likely you’re going to get into an accident.

The problem with calculating the number of miles driven is that it’s simply impractical. You’d need an insurance company to install a mileage monitor in every car, scoffs the textbook, and that’s just too costly to do. Some day, perhaps…

Well well well, the day has arrived!

Telematics insurance relies on a databox the size of a mobile phone which is installed by the insurance company into your car. The box does not damage the car and will not affect the warranty; it uses less energy than a car radio so should not drain your car battery.

What Data Is Collected?

Data from this box is collected by GPS, enabling insurers to monitor:

  • The distance it travels at those times
  • Where the car is located
  • On what type of roads the car travels
  • Speed of travel
  • Braking behaviour of the driver
  • Direction and speed of travel before and after a collision
  • Force of impact in a collision
  • At what times the car is used

Buy High, Says Venture Capitalist

Here is a post by Steve Blank, a venture capitalist, identifying a fact:

Facebook takes our need for friendship and attempts to recreate that connection on-line.

Twitter allows us to share and communicate in real time.

Zynga allows us to mindlessly entertain ourselves on-line.

Match.com allows us to find a spouse.

At the same time these social applications are moving on-line, digital platforms (tablets and smartphones) are becoming available to hundreds of millions. It’s not hard to imagine that in a decade, the majority of people on our planet will have 24/7 access to these applications. For better or worse social applications are the ones that will reach billions of users.

Yet they are all only less than 5-years old.

Here is his inspirational conclusion:

It cannot be that today we have optimally recreated and moved our all social interactions on-line.

It cannot be that Facebook, Twitter, Instagram, Pandora, Zynga, LinkedIn are the pinnacle of social software.

All of these things are true. And here’s his opening line: “The quickest way to create a billion dollar company is to take basic human social needs and figure out how to mediate them on-line.”

I think he needs to shift to past tense, there.

The most influential personal/enterprise software companies in the early 80s were Apple and Microsoft. And who are they today?

No need is ever satisfied perfectly, but there is such a thing as a big head start. Surely it’s more likely that the Instagram acquisition represents the end of the disruptive phase of this technology trend.

Innovation comes from working on a need that you have that isn’t yet satisfied. The best itches to scratch are ones people will pay you for, obviously. And as a general rule, the market price for something is typically about as much money as someone else can make with it.

Social media is a bit different, much like newspapers, radio, TV and other advertising-driven businesses were different. These are super-scalable goods with the ability for pinpoint market segmenting. All very exciting, but their economic function is simply to make insurance more efficient.

I’d be more inclined to think that the next wave of billionaires will attack the problem of process inefficiency more directly. History tells us that this usually happens by elminating processes entirely.

My favorite disruption came soon after this:

In 1898, delegates from across the globe gathered in New York City for the world’s first international urban planning conference. One topic dominated the discussion. It was not housing, land use, economic development, or infrastructure. The delegates were driven to desperation by horse manure.

The horse was no newcomer on the urban scene. But by the late 1800s, the problem of horse pollution had reached unprecedented heights. The growth in the horse population was outstripping even the rapid rise in the number of human city dwellers. American cities were drowning in horse manure as well as other unpleasant byproducts of the era’s predominant mode of transportation: urine, flies, congestion, carcasses, and traffic accidents.Widespread cruelty to horses was a form of environmental degradation as well.

This Week In Space Travel

The price of a standard flight on a Falcon 9 rocket is $54 million. We are the only launch company that publicly posts this information on our website (www.spacex.com). We have signed many legally binding contracts with both government and commercial customers for this price (or less). Because SpaceX is so vertically integrated, we know and can control the overwhelming majority of our costs. This is why I am so confident that our performance will increase and our prices will decline over time, as is the case with every other technology.

The average price of a full-up NASA Dragon cargo mission to the International Space Station is $133 million including inflation, or roughly $115m in today’s dollars, and we have a firm, fixed price contract with NASA for 12 missions. This price includes the costs of the Falcon 9 launch, the Dragon spacecraft, all operations, maintenance and overhead, and all of the work required to integrate with the Space Station. If there are cost overruns, SpaceX will cover the difference. (This concept may be foreign to some traditional government space contractors that seem to believe that cost overruns should be the responsibility of the taxpayer.)

That’s Elon Musk

And is this related?

Space exploration company Planetary Resources will be unveiled in a conference call on Tuesday, April 24th. Besides the audacious announcement, which promises to “overlay two critical sectors — space exploration and natural resources — to add trillions of dollars to the global GDP,” what makes this unique is its high-profile support group. The venture is backed by Google executives Larry Page and Eric Schmidt, director James Cameron, and politician Ross Perot’s son, among others.

Chances any of this matters for regular folk?

Clash of the Machines

Here is a great series on High Frequency Trading. I was most intrigued by this:

It’s important to note that market making is nothing new. In the era when stocks were traded in 1/8ths and 1/16ths, market making was done by humans working in the pit. A single human trader would often run a market making strategy on larger stocks with significant volume. Later on, from the 1980’s to the early 2000’s, human daytraders would often fill this role. To a much lesser extent they still do.

Automated trading systems have replaced these human market makers for a very good reason – cost. For a strategy (and note: this strategy works only for a few securities, no human can track hundreds of stocks mentally) to be worth a financial professional’s time and effort, it must generate at least $20-200k profit each year (this assumes a human smart enough to daytrade would work for $20k/year). In contrast, a single server in a data center can run hundreds of strategies at a cost closer to $50k/year, and they can do it faster and more accurately than any human.

…Suppose that at precisely 10:31:30:000 AM, new information becomes available which suggests that it will now be profitable to place a buy order at $20.07 – perhaps a press release has hinted that the price will go up, or a correlated security has just gone up in price. Because of this, both Mal and Jayne want to change the price on their orders to $20.07. Whoever happens to be fastest will rise to the top of the book:

This is why automated market making has morphed into high frequency trading, and why so much effort is poured into creating low latency systems. Whoever places their order first will be the most likely to trade.

It’s interesting that progress in this market is defined as the degree to which machines talk to and understand each other.

The immediate ability to profit from technological advances means computers will be autonomously driving market liquidity before they’re driving cars.

Still In The Woods

I go straight to Calculated Risk for all my ‘real’ economic news, by which I mean the data and basic commentary. Their graphs are outstanding.

And those graphs are telling all kinds of still-nasty stories about the downturn we are still in.

Look at the housing starts:

Hopefully it’s becoming clear that the economic story is not about ‘we built too many houses’. It’s about lots of stuff (debt deleveraging, etc). Have a look at this. The single family housing starts are down, sure, but so are owner built and built for rent sales, which didn’t really pick up in the boom.

And I like the graph below because I’ve long had the impression that most apartment buildings were built in the 70s and 80s. And it’s true!*

When people talk about “infrastructure spending” think about all of the low hanging fruit that’s already been picked.

Let’s build some highways. Got ’em.

Let’s build some airports. Got them, too.

Ok, how about apartment buildings? Done, and, in any case, NIMBY!

Replacing these things are going to be much less accretive to growth than building them in the first place.

And of course the real story is employment.

*My wife and I recently moved and had trouble finding a place that would both let our two dogs in and was built in the last 10 years.