Slashdot videos: Now with more Slashdot!
- 12.5% on income up to $14000
- 21% on income between $14000 and $48000
- 33% on income between $48000 and $70000
- 38% on income between $70000
So if you're earning $80000-$90000, you end up paying about 30% of it in taxes, which isn't too bad considering that we need a lot of infrastructure for the population that we've got (New Zealand is larger than the UK but has only 4M people).
I still think that things could be a lot easier than what we have with the current generation of RDBMS. As an example, Skype uses Postgres but they have to jump through a lot of hoops to make it work for them. For one thing, they can't just run SQL queries anymore, and they have to maintain the shards somehow (e.g. they probably need a way of balancing them). Backup/restore probably isn't viable for them either, so they must have implemented some form of redundancy. Another limitation is that with shards you need to route all queries through an indexing server which can also become a bottleneck. In short, this is a very difficult problem to solve.
The appropriate solution also depends on the structure of your data. For example, in my case we had a massive table with hundreds of millions of rows that dwarfed everything else, and we did relatively simple queries on the data. A more suitable dataset for RDBMS would have a lot of tables with roughly the same number of rows in them, where you run queries with lots of joins and filters.
I'm actually curious what the data in your 150TB database was like and what sort of hardware was required for it.
We also had two databases on one server, so the other thing that helped a lot was to run them on two separate servers. The largest table we had was clustered by one of the fields which made queries on that field fast. We didn't use autovacuuming and instead vacuumed overnight. A hardware upgrade also helped. We did some query profiling and made sure everything was indexed appropriately. None of this is rocket science of course, and just shows that as your database grows you have to get more and more involved in ensuring good performance.
We investigated vertical scaling with a better, more expensive server, and that would have helped for a while, but the database was projected to double in size in 1-2 years, so that would be no more than a stopgap measure. The conclusion I came to was that we had to move away from standard relational databases. One option was to use sharding (but I think sharding is a workaround for RDBMS limitations, so I don't like it that much), and the other option was to use something like a key-value store that can scale horizontally. Unfortunately, I didn't stay at the company long enough to implement this, so I can't tell you which of those would be a successful solution.
But here is another issue I thought of: backup. For our database it was 24 hours to do a full restore, which isn't practical. The only reasonable solution I know is to use replication, which is a nuisance with Postgres and adds maintenance overhead (keeping the schemas in sync). I'd prefer to have built-in redundancy. Again, I think you get that with Cassandra and MongoDB.
I guess in a few years we'll probably end up with something that combines good properties of both key-value stores (redundancy and scalability) and RDBMS (powerful query language, transactions).
Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.
Based on my experience, I don't think that relational databases are appropriate for really large databases, and at present the only realistic option is horizontal scaling which is a lot easier with things like Cassandra or MongoDB.