I work with a SQL database every day. I optimize weekly. I haven't done a project since mid 1990's that didn't have some sort of SQL DB attached to it. I just wanted to see if there was something better. Apparently not.
Slashdot videos: Now with more Slashdot!
We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).
Thanks. I at least got some pointers to things I wouldn't have otherwise considered through this thread, even if my original question was poorly thought out. I haven't looked at hybrids.
Really, I just want to know if you were starting a new site that was mostly incoming data and needed to possibly scale quickly, what choice would you make at the outset to make your future life more bearable.
I think we were just looking for an excuse to play with NoSQL solutions, rather than needing it. Through segmenting customers across multiple databases, we will do scaling just fine. Hoping for an elegant solution, but Copy, Rename, Repeat seems to work fine too.
All correct statements. Thanks.
POC is already running. It is heavily write intensive just in testing and, having been on the receiving end of a firehose of data before, we really just wanted to investigate the options. The easy scaling is to just split customers across multiple copies of the database and link them for aggregate queries, but it seems like such a cludge. In the past I have logged everything to flat files and then imported those into the DB every 5 minutes or so, which helps with web layer scaling, but creates a lot of unintended issues managing the flat files. Thus, we are looking at the other options out there.
Low cost solution for underserved and emerging markets. What happens when we hit 20,000 customers? (22.4 billions data points per year)
Sorry, didn't mean to make buzzword soup. Here is what we have: Mobile apps -> PHP rest apis -> some datastore (Currently MySQL) -> PHP web for reporting. We are dealing with tracking physical products. We will keep the MySQL for user management and primary tables of product information (UPC, description, weight, etc), but storing information on everything else is where we are not sure.
This is exactly for the stock inventory project. We currently have MySQL backend and it seems to be working well. Currently plan is to migrate to PostgreSQL in the next few months. We are expanding the project to have consumer-facing iphone/android, so scaling is ??? The consumer facing app will query data looking for availability and then report to the retailers info on those queries (18 people looked for this product in your area last week and you were out). We are also starting to import data from 3rd party inventory systems. The long play here is automated stock and reorder management, but we are starting where there is less competition in the space.
Unfortunately, I am stuck in backwater USA, and my contacts from my silicon valley days have mostly cashed out or I've lost touch. We expect a few hundred thousand rows of data a day, but you never know. I've hit the hard limits on databases before, so want to avoid that up front if possible.
Sorry about that. I just wanted you to have an understanding of what our complete stack looks like and what our connectivity issues would be. I am sure that some NoSQL has better support for PHP, and others have better support for Java. Also, just indicating that this is to support mobile apps, and the inherent unknown scaling issues that come from that.
Experience. I've personally been in the "we doing 4 billion transactions a day and replicating that over multiple data centers" thing. Don't want to do that again. Its crazy expensive in hardware and effort.
Scaling is the #1 issue we are concerned about. The reports are not complex, but they do need to happen. We are also not stuck on ruby (it is a pig, processor wise), but the application is such that it is easy to scale the front end horizontally.
I think we are leaning toward SQL as it is something we all know. However, the alternatives needed to be investigated.
I quit a gig within the last year where the company was on DB2 (8) and the data was scattered. Their daily processes were pushing 22 hours to complete, and their chosen solution was just to delete historical data, so that they couldn't even tell their customers what happened the previous month. Of course, the same team had been building their PL/I code since the 1980s, so there was no way to get them unstuck without some executive decisiveness, and that wasn't happening. They wanted a data warehouse for business intelligence in their oracle system. I signed a contract for golden gate with implementation and then walked. Really, the place was a mess. WebFocus for reporting, an excel reporting team because they couldn't get data from WebFocus, a java team where the last 2 architects had quit within a year, a stealth jasper reports team that had been working on the same goal for years with no deliverable, etc. Really, the only thing in the entire place that worked at all was the DB2, and on that alone they were making money hand over fist. It was an amazing scene.