All good points. I think your last point nicely introduces the difference between the real estate and stock markets which is often overlooked. "Casual" investors in the stock market typically take a long position and often don't risk any more sophisticated trading (I'm including myself in this category). However, at least in this country (UK) by far the commonest way to get onto the housing "ladder" is to take out a substantial mortgage. So in effect, most of the trades on the real estate market are heavily geared, which means I think negative equity is a much commoner problem in this market than margin calls are in the stock market. This could well be a significant counterargument to the crowd that frequently claims investing in housing is safer than the stock market. It turns out reality is (surprise, surprise) more complicated than that.
I realise you've already alluded to all of this, but I think it really bears spelling out in detail.
R does support fully user-defined types, inheritance and polymorphic methods. You just have to want to use them enough to dig through the multiple OO implementations available as part of the core. The commonly used systems, S3 and S4 objects, don't exactly play nicely together. I personally lean towards S4 since it seems much cleaner, but a lot of legacy code still uses S3 so it looks like there won't be a rationalisation of these two systems any time soon. The Bioconductor R modules generally (but not exclusively) use S4, so check those out for examples.
I think it's worth pointing out somewhere in this thread (and here seems pertinent) that there are many branches of science which have already confronted the question of data and software disclosure, and have generally come to the conclusion that if you want to publish you should disclose everything. My own branch, biology, has for many years been sharing sequence, protein structures, microarray and high-throughput sequencing data freely at the point of publication. 9 million data points are a drop in the ocean; I'm currently working on a dataset with 5 billion data points, and even that's small compared to the cutting edge. Now, I'm not going to pretend it's perfect, since it's up to the journals to police their data disclosure policies, but the point often missed is that in return for disclosing your hard-won data, you get access to everyone else's data as well. That alone makes it worth it, speeding up the process of scientific discovery which is, after all, what we're all about.
The climate research community badly needs to get itself an international data repository along the models of EMBL/Genbank, GEO/ArrayExpress, and PDB.
And yet, Twitter is still around and still relevant. Which shows that one can get away with taking these short-cuts and still achieve the ultimate aim of your project. I see people getting bogged down in the details of which software architecture/model to use all the time (never mind sort algorithms!), so much so that they lose sight of their objectives. What often happens is that someone (usually me) then does a quick end-run around them in <insert scripting language here> and we eventually move on. People wonder about the prevalence of dodgy scripts in the world today; I say this habit of programmers taking their eye off the ball is one of the reasons. Never underestimate the advantage of being first to market.
Just to address the rules local to the UK, this government website shows that bicycledriving.org is not an entirely reliable authority, at least in this case:
Note in particular the final sentence in rule 63.
The Independent picked up on this before Slashdot, and that's not unusual in my experience.
You cannot have a science without measurement. -- R. W. Hamming