kpharmer - Slashdot User

Comment Re:Pros & Cons of non-relational solutions (Score 1) 423

by kpharmer on Friday July 03, 2009 @08:07AM (#28570283) Attached to: Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed

Large databases use a combination of range partitioning AND indexing. If you're lucky you've also got hash partitioning - to distribute your database across N servers. Partitioning is far more general and forgiving for purposes like this than indexing.

And creating trends in your application layer then storing them in logs can work, but:
1. you still rely on figuring out in advance what you're going to need
2. if you don't load those logs back into the relational database then you can't effectively join it to all that data. And you then either must log redundant data or have useless logs.
3. grep & sed or custom reporting code against your logs is a poor substitute for any standard reporting tool or custom code against your database.
4. database features like partitioning, indexing, parallelism result in in-database aggregates being faster than logs
5. did i mention automatic query rewrite? where the database automatically converts eligible queries against the base table to actually run against the summary tables...

Comment Pros & Cons of non-relational solutions (Score 5, Interesting) 423

by kpharmer on Thursday July 02, 2009 @06:54PM (#28565701) Attached to: Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed

Note that most of these solutions come from the interwebs, social networks, etc. And it isn't so much anti-sql as it is anti-relational database (sql != rdb).

The basic premise is that we need different solutions that: can scale very high for very narrowly scoped reads & writes, don't need to perform ranged queries / reporting /etc, and don't need ACID compliance. And that may be the case. Sites like slashdot, facebook, reddit, digg, etc don't need the data quality that ebay needs.

On the other hand, ebay achieves scalability AND data quality with relational databases. And when I've worked with architectures that scale massively and avoid the relational trap for better solutions - they inevitably later regret the lack of data quality and complete inability to actually get trends and analysis of their data. It *always* goes like this:
Me: So, is this thing (msg type, etc) increasing?
Developer: No idea.
Me: Ok, so lets find out.
Developer: How?
Me: I don't know - typical approach - lets query the database.
Developer: It'll take four+ hours to write & test that query and then days to run. And when it's done we might find that we wrote the query wrong.
Me: What?!?
Developer: We had to do it this way, you can't report on 10TB databases anyhow
Me: What?!? Are you on crack? there are dozens of *100TB* relational databases out there that people are reporting on
Developer: well, we probably don't need to know what that trend is anyhow
Me: I'm outta here

Slashdot Top Deals