Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming

Journal Chacham's Journal: OLTP vs Data Warehouse, Why not or just have both? 11

This is something that I've wondered about before. I had a small interview with some people regarding some work, and the idea came up again. So, I've been thinking about it.

The issue is having two databases. One as an OLTP, and the other as a Data Warehouse. Being I worked with Oracle, and I like a good deal of the Oracle way of doing things, I am going along with Oracle terms. Basically an OLTP is an On-Line Transaction Processing. It is a database that is designed more for input than for output, and most queries are small. A good example would be an ATM machine. In such a case, the database should be normalized, for authoratative answers, and less work on input.

On the hand, a Data Warehourse is made more for output than for input, and most queries are large. A good example would be a data mining firm. In such a case the database should not be normalized, for quick answers, and less work on output.

The problem is, many companies need both. There is always information going in, and every day a number of people want some serious reports. That's when the normalization fights come in, and the line in between OLTP and Data Warehouse blurs beyond recognition.

What I've been wondering, is why not have both? The OLTP could be normalized, and be the authoritative database, and the Data Warehouse could be updated nightly (or weekly). The only loss really is the reports are woking on data from yesterday. But that shouldn't be much of a problem.

This discussion has been archived. No new comments can be posted.

OLTP vs Data Warehouse, Why not or just have both?

Comments Filter:
  • Just as an FYI, and it's not definite but I may be able to release something under the nerdfarm name that can help you with the data synchronization.

    It's a perl system, basically a glorified job scheduler. It needs some more work but it does cascading job schedules with failover. One major patch that I want to do is embed the jobs in a framework that allows shared database connectivity. I'll find out hopefully this week or next about the status of releasing it GPL'd.
    • Sounds interesting, but why would I need it? (Enlighten me please. :-)

      The way I see it, at midnight a job could fire off a procedure that just runs a bunch of queries to throw report worthy data into the Warehouse. Either INSERT INTO DW.OLTP_Customer SELECT * FROM Customer; Or even, CREATE TABLE DW.OLTP_Customer AS SELECT * FROM Customer; (So the tables don't take up any space that they don't need.) Maybe even exp/imp? The data would be loaded into temp tables, just to get the data over. Then, the Warehous
      • Sounds interesting, but why would I need it? (Enlighten me please. :-)


        To control the link, it's just a high-capability cron/job scheduling system.

        The way I see it, at midnight a job could fire off a procedure that just runs a bunch of queries to throw report worthy data into the Warehouse.

        They way I would do it is do a dump of the database from a specific time range, reducing overhead on the transaction system. I'm also seeing this on two different systems, which may not be the case I see now.

        Job s
        • reducing overhead on the transaction system.

          Good point. But, being there's no real UPDATEs, other than just dumping data, is that really going to be a burden? Nothing in the original DB will be changed, so there's no need for rollback images, and the warehouse would be closed to other transactions. I'd even remove all INDEXes first, so all it does is DELETE the old data.

          But an export/import might very well be quicker. Now that I think about it, the real concern would be the space the dump would take. If
          • But why can't DBMS_JOB to that? At least in Oracle it can run a procedure that has failover and monitoring. It could easily update a table mentioning its progress, and being inside the DB, I'd assume that it would be at least a little bit faster.

            I'm assuming a scenario of a light-weight insert style database (like berkeleyDB or something.) So you should be fine w/ Oracle.

            I'm not trying to put you're idea down. I'm just trying to understand it. I appreciate you helping me here.

            I know. It's not my ide
  • Too much theory; not enough code!

    I understand the need for truly enterprise stuff in some applications, but what kind of system is it that needs huge reports like this on up-to-the-minute data (ie not already a day old or something?) You'd have to run a horse track in hong-kong to need something like that, and even still your data set size could probably let you get away with Oracle on big iron.

    ~GoRK

    • Too much theory; not enough code!

      That the way Databases are. Mostly theory and design, and a few things like schemas, triggers, and stored procedures to make them work together. It means the design phase is the longest, and must be pretty much perfect.

      but what kind of system is it that needs huge reports like this on up-to-the-minute data

      None really. However, the application doesn't pay the bills, the company does. And the suits there generally want it up-to-date, even if for insane reasons. However,

Never test for an error condition you don't know how to handle. -- Steinbach

Working...