Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Questions about Database Implementation. 21

Metuchen asks: "I'm developing a program for Linux that will eventually be managing a medical practice. This program will be running on several terminals on the same computer, but they must all access the same database simultaneously (i.e. if one user were to update a record, the next time another user accesses that patient--on a different terminal--the correct information must be displayed). What would be the best way to implement such a program? I would prefer to stay away from writing everything to file after each update since the database will be updated frequently; for the same reasons, I would like to be able to keep everything in RAM without using a RAMdrive (since that may require up to one file per patient). Any suggestions?"
This discussion has been archived. No new comments can be posted.

Questions about Database Implementation.

Comments Filter:
  • by BluBrick ( 1924 ) <blubrick@ g m a i l.com> on Sunday January 23, 2000 @04:13AM (#1345740) Homepage
    Postgres, MySQL, or mSQL for varying degrees of "open source-ness". Or you may wish to go with one of the "big 5" commercial databases, Oracle, Informix, DB2, Ingres, (what's the other one, Unidata?). Each of these DBMSes provide a rich API - basically SQL, or a superset thereof - to which to write your application. With careful use of SQL statements, your app will be a lot more portable across DBMSes as well as across platforms. This could be a considerable advantage if your project grows beyond all expectations, or if a new client already has a SQL database. Hell, it might even work on MSSQL!


  • by Zurk ( 37028 )
    i've been writing a program using java servlets for e-commerce systems. its designed to be database independent/platform independent and can be accessed from a text mode browser (lynx). suffice to say, you can abstract out the SQL from the database and let it handle the locking on its own..its fairly trivial with java servlets.
  • Sounds to me like you just need a database backend with your own custom frontend. For the backend, I'd say go with one of the open source databases, (eg MySql, Postgress, etc). But you could probably go with whatever you want.

    In regards to your concern of if one user were to update a record, the next time another user accesses that patient--on a different terminal--the correct information must be displayed. That is no troubles, you'd have to go out of your way to have that as a problem. (eg write special caching in the client or something).

    As for your concerns on speed, etc. Let the backend deal with that. If you really feel a desire to help it along, you could access the database on a ramdrive, and do regular copies to a physical disk. (You don't want to loose a days changes if the server crashes).

    I seriously doubt you'd need to use a ramdisk, as you mentioned it would be run on serveral terminals, which means "not hundreds". And even if 10 users were pounding out records every few secounds, I suspect even a slow server could handle the writes

    For the frontend, you can do basically whatever you like, from a Command Line interface, to ncurses, to a webinterface. (if you do a webinterface, I recommend a Apache+php3+MySql combo).

    Have fun, and remember to do lots of input checks, and don't forget to escape the input string before parsing it to the database. (eg " becomes \")

  • For the backend, I'd say go with one of the open source databases, (eg MySql, Postgress, etc)

    Minor nit: MySql isn't open source. It's really nice inside a limited problem domain, but it lacks nested selects and other features that you may need. It's pretty speedy, though. Postgres is pretty good for servers that don't get hit too much, but it's too slow for heavily used systems.

    Sumner
  • This is sort of offtopic, but would you be willing to make the finished product available publically? I've always thought medical office software would be an easy arena for Linux to excel in.

    BTW, have you checked out Freemed? [freemed.org] They are building free browswer based medical mgmt software based on MySQL, and have a demo for you to try.

    Hope this helps.

  • It sounds like this is something you want to implement in the next year or two, so I'll assume that you don't want to re-invent the wheel by writing your own database engine.

    Use PostgreSQL or some other tested, reliable database. Put the data on mirrored & striped disks - RAID 0+1. Believe me, the extra dough is worth it, and performance will be fine.

    Read up on data design. Too many people assume that once the database software is in place, laying out the database structure is the easy part. Then, after the application is in place for a while, problems start cropping up, and it can become a real maintenance nightmare. This most often happens with applications written in Access or Excel/VBA by well-meaning accountant types, but can happen to anyone. A database with incomplete and/or inaccurate data is worthless.

    I recommend "Database Design for Mere Mortals", by Mike Hernandez, it offers a nice, practical methodology for data design, and is very clearly written.
  • It sounds like you may need a database engine doing most of the work.

    To allow multiple people to access the data, but make sure that everyone gets updated data, you will probably want to rely on transactions. PostgreSQL is the only opensource database that I know of with support for transactions. All of the bigger commercial databases support them as well.

    transactions basicly use an intelligent locking scheme to keep people out of records that are being updated/accessed. This won't last long, and requests can be queued up so that except for a slight delay(time ranging on the complexity of the transaction) it will not be noticable by the end user.

    transactions also have data integrity built in, in case of a power outage/crash they will not be only partially complete. They are an all or nothing option.

    Lantimes guide to SQL covers the SQL2 standard, with good depth on data integrity, tranactions, etc.

    Ora has a great book on transact-sql, a child of SQL2(?) that is used by Microsoft and Sybase.

  • by w3woody ( 44457 ) on Sunday January 23, 2000 @09:54AM (#1345747) Homepage
    Okay, there are two things you can do to solve this problem. First, you could use an "off-the-shelf" database engine. There are quite a few very good open source database engines out there, as well as a few good low cost and high cost engines.

    OTOH, I assume you are probably interested in rolling your own engine. So here's a few strategies that you may wish to consider if you are going to do it yourself.

    Overall, one of the most powerful metaphores for breaking down a problem into managable pieces is the notion of "layers" or of a program "stack". That is, you break a large problem into a collection of software "layers", like the TCP/IP protocol stack--at the lowest level you have the piece of software that talks to the hardware. Above that, you have the thing which encapsulates the packets. Above that, you have the thing which handles routing. And so forth.

    It's a powerful notion because it allows you to break a complex problem such as "develop a database engine which provides transaction rollback and RAM caching" into a bunch of easy to understand units.

    For a database, eventually you are going to have to write the records to a file, and take the performance hit that entails. That's just life. And worse--if you want to have proper transaction rollbacks (meaning if the power crashes while you are in the middle of writing a record, you need to have a mechanism to recover from this), that means you are going to have to do *two* file writes--one to write some sort of "recover" record, and one to write the record you are trying to save.

    A common way to handle this sort of transaction rollback is to write a record or group of records in three steps. Step (1), read the records you are replacing and write them into a "backup" or "rollback" file. The idea is that this "rollback" file contains a record of everything you changed in the database file, so that if the power fails while you are updating your database, you can "fix" the database by reversing the steps in your rollback file. (You should also checksum the rollback file so you can detect if the power failure occured while creating the rollback file.) Step (2), you update the database by replacing records and by appending new records. Step (3), you delete the rollback file.

    Now there are two common database formats that I've encountered out there. The first common database format is a simple file array of fixed-sized records. If each patient record (for example) is a fixed-sized record, you simply create an array of these records in your database.

    Of course this has a couple of disadvantages. First, each record is fixed--that means you can potentially waste a lot of space. Second, it's inflexable. And third, there really is no good way to handle adding "out of band" records--that is, records which store information that is not part of this simple array table. Things like hash indexes or B-Tree indexes wind up being stored in separate auxiliary files.

    The second common file format I've encountered is documented in "Transaction Processing: Concepts and Techniques" by Jim Gray and Andreas Reuter. (Very heavy, but excellent book.) The idea is that you break your file into a bunch of fixed-sized records whose size is roughly 4K to 32K. Each record contains in it's header a sort of "page table of contents" which indicate the contents of this page. The body contains the data. The idea is that you fit oddly sized blocks of data within the fixed-size record, and mark where those are in the page table of contents so you can find each odd sized block of data using the tuple (page_index, toc_index).

    What's good about this is that you can do all sorts of interesting tricks: if you need to replace a block of data with a larger block that no longer fits in the page, you can simply mark in the table of contents a "forwarding address"--that way, the (page_index, toc_index) refers to the same block--just forwarded on another page in the database. You can also cache these records in memory as you read them--creating a higher-level cache which stores a handfull of these pages can speed up access. And it does solve the problem of "how do I handle varchar records" and "where do I store my B-tree index records."

    So this is the bottom level--creating code which handles your underlying database, and handles data recovery in case of a system failure.

    Now it sounds like you want to keep stuff in memory as long as possible in order to minimize I/O to the file. While that's nice, it has the problem that in a system failure, stuff in RAM can be lost. That's why most commercial databases write transactions to the database as soon as you indicate "END TRANS"--because the idea is that when you end a transaction, the data had better be recoverable in case of a system failure.

    I suspect you'll find that most people are looking up names and reading their status a lot more often than they are updating or changing patient information. So on top of your database file I/O engine, I would suggest two things. First, I would suggest a "write-through cache" of data records. The idea is that you keep around a collection of records in RAM, marking how old each record is, and updating that mark when a record is read--that way, records that are no longer being actively used eventually drop out of memory. When the user updates a record, you immediately write that record to the database, update it's timestamp, and move on.

    Second, I would recommend that you find the fields that are commonly accessed to look up records (such as name, hospital patient ID, etc), and create either a hash table or a B-tree structure which speeds up looking up by those fields. The reason why is that when your database hits a few thousand records, you need a way to find records that is faster than reading the entire database into memory. A well-managed B-tree object will do the trick--it will permit you to find a patient, yet minimize the number of I/O reads you have to perform to O(log(n))--a big win when n is a few thousand.

    If you insist on living a risky life with your data, you can implement a "lazy write" scheme where you write the updated record after it "expires" in memory--and then, optimize your writes by scanning the other records you have in memory that need to be written with this expired record, and write them all together. (This can potentially be a win, especially if you have multiple records living on the same database page--you potentially reduce several page read/page write operations into one page read/page write op.)

    However, in my personal tinkering with these techniques in my own database engine, I find that these methods don't necessarly do the trick. That is, if you have 50 updated records, sooner or later you are going to have to do 50 database writes--and if these records are scattered through your database file, at best you will be able to collapse one or two of these records into a single write. So in a sense, you are risking losing a lot of data (losing 50 patients sounds like a lot to me) in the hope that you may get a 2%-4% performance gain. Better to spend a few hundred more on a faster hard disk for your server computer.

    In short, I'd avoid any sort of lazy write scheme if possible.

    Notice that I haven't put in a word about the multi-user aspect of this. That's because I'm assuming that your writing the database engine as a single process--that is, that the database engine is written as a single process which controls the database, which then communicates with the database clients through a pipe or socket. This has the advantage that it simplifies the whole "record locking" and "race condition" aspect by having one process own the file. And that means on top of your "file I/O", "record cache", and "record search" layers you have a "client access" layer and you're done. Another approach is to implement record locking and create a database engine which allows multiple processes to access the same file. This requires that you be able to implement some form of record locking on the file--that is, that you can (either in the operating system, or simulate through a "record lock file") lock segments of your database file; that way, you can update one segment of the database while another process access other records in the same file. (File locking, by the way, is really only used AFAIK to determine if the database page one process is attempting to read is currently being written over by another process.)

    This method isn't much harder to implement than "one database file/one process". It has the advantage that if you are on a box with multiple processors, multiple client access can nicely divide across those processors. However, it's my experience in tinkering with such things that this sort of technique can be harder to code and get right at the record I/O level. (Though it does eliminate all of that "keep track of which client I'm talking to" code.)

    Anyways, that's my thoughts on the matter. I do highly recommend the book "Transaction Processing", by the way, as it covers a lot of this (and more!) in incredible detail. The three lessons I've come away from the book is to "layer your code"--that way, error checking can be handled in the same way that error checking is handled on TCP/IP stacks. And second, always have a way to "rollback" or "undo" the last operation (or "transaction"); that way, you can guarentee database consistency in the event of a catestrophic failure, as well as implement a good recovery mechanism in the case of a minor error that doesn't even affect higher level code (such as waiting for a record to become unlocked). And three, you can speed a lot of things up by using a good tree structure to index your records.
  • I hate to say this but from the questions you ask, I suspect you're asking for trouble. Implementing a transactional databases solution without any experience is a tough road to take.

    > I would prefer to stay away from writing everything to file after each update...

    What happens if the server crashes? Any unwritten data is lost.

    >but they must all access the same database simultaneously...

    You need a multi-user transactional database. Oracle, Sybase, DB2, Sql Server are good examples. MySQL is not since it does not offer transactional support. If you don't know what a commit/rollback is then you definitely need a good book.

    The 4 DBMS systems I mentioned are tuned for multi-user systems and each has it's advantages (and drawbacks). My preference is Oracle or Sybase (as those are the ones I use on the job as a DBA).
  • *shrug*

    I only outline the above because it's interesting, because sometimes you don't need Oracle-SQL to store a few records, and because frankly, a hell of a lot of people who write applications which use a primitive "database" that simply consists of an array of records. And it wouldn't hurt to add a couple of layers to the low level record I/O routines in that application to make it a touch more bullet proof.

    Me, I'm not a big believer in using a cannon to kill a fly. Using a full-on DB engine like Oracle or Transact-SQL just to store an array of records is a wee bit overkill...
  • I think the thing you're missing here is that the term "database" doesn't necessarly apply to just a SQL server performing transactions across a wire. Granted, in this case, the person doing this project should probably invest in Oracle--after all, that's what he's doing. But I did suggest that at the top of my post.

    Thing is, a "database" is not just SQL running remotely. Did you ever develop an application that loaded and saved data in a file? Well, that could also be considered a "database". Transaction processing doesn't just apply to traditional databases, either--they're a strategy to prevent your user from losing data just because he hit "save" at the wrong time.

    Further, what ever happened to curiosity? What ever happened to the hacker mentality of wanting to know how it works and tinkering with something that's interesting? It does supprise me the number of people who instead of saying "I reinvented the wheel because I thought it would be cool" (hacker mentality) are saying "why the hell did you waste your time doing 'x'?"

    Besides, once the guy realizes all that's involved, perhaps he can be the one who makes the mindful decision as to how to perform his employer's wishes?
  • by Anonymous Coward
    Oracle only tells you that an update, deletion or insertion has successfully occurred once it has written it to disk. The data is changed in RAM anyway, so there is no performance penalty (i.e. the next reader will read it from RAM, thats for sure). Now why does Oracle completely write the changes to disk?, well it's so that if you have an instance crash e.g. instant power cut then you will never worry about any updates that maybe were made in RAM but hadn't been written out to disk. Oracle is also fault tolerant in that eventually it writes RAM contents out to datafiles. Thus you have two separate stores of all DML (data-modification language) SQL statements on separate disk files (which should be on separate SCSI chains in case you get a error-producing SCSI controller). There are lots more issues to consider but it's best taking things one step at a time. Price shouldn't worry you with Oracle. If it is a serious app. at all then the clients will have no problem paying for Oracle. Also the licencing scheme these days for Oracle is simple and cheaper than it has been. In the U.K. the cost is either a) unlimited users / connections at £50 per power unit (50Mhz of CPU power on the server). e.g a PIII at 500 Mhz is 10 power units therefore £5000. Scheme b) is based on the number of named users i.e. separately identified user accounts. This costs £100 per named user however there is a price floor which is dependant on the power of your CPU. In our case we only needed 5 named users but because the server (Netware 4.11) had a 350 MHz CPU we had to buy 350/50 = 7 * £100 user licences. The £5000 or £700 amounts give you the right to use the server until you die / no-one gets ill anymore / the proton-death of the universe. The only ongoing cost (if you take it) is a support contract. This is 22% of the one-off amount, payable each year. Just one thing, Oracle 8i on Linux is beta. Go for Oracle 8.0.5. In the world of RDBMSes you can make false economies. Oracle didn't get to be a company with 9.6 Billion dollars of revenue a year for nothing you know.
  • It sounds like you aren't quite understanding how a database works. The database will handle all the locking necessary to enforce serializability (looks like one thing happens at a time). Any DB will have a buffer pool which acts as a cache to allow faster reads & writes. Higher end DBs will have recovery features such as write ahead logging that allow for reliability while still performing well.

    I suggest you get a good book that will explain SQL and the concepts of transactions.


  • There are books full of ways to do this for different scenarios (many vs. single simultaneous transactions, complex vs. simple transactions, fine vs. non-fine locking granularity, tolerance for data loss or transaction failure, single-phase commit vs. two-phase, etc. etc.). For each technique, a different writing process and recovery algorithm are required.

    In general the algorithms use some sort of cache to delay having to write to disk on every operation in a transaction (eg ten updates to the same block will only need one write to disk). When a transaction is finished, most (but not all) methods allow cache pages to remain "dirty" by writing the changes to a log in compressed form. Dozens of changes to different blocks can be packed into one log block, which is written in a single, atomic operation upon transaction commit - no other I/O is needed.

    The main differences among these algorithms are in how they handle the cache - some allow dirty blocks to be flushed at any time, while some only allow flushing after a transaction is committed, etc. The tradeoff is that the techniques which ar e the most flexible in caching require the most complicated recovery procedures.

    Other methods include writing all the affected blocks out of the cache on transaction commit (the method used by transmeta BTW), or even writing each transaction to new blocks and changing a pointer to them on commit (shadowing). None of these would give much performance boost in your situation, IMHO.

    If you want some more info on different techniques, I would recommend a book called "Relational Databases and Knowledge Bases" by Gardarin and Valduriez. This is a good survey of all aspects of database construction - reliability, concurrency control, query processing, and so on, and it weighs less than five pounds :-)

  • You're writing software for a medical practice, and you want to use your own flat file mechanism rather than a proven database? This is highly irresponsible and unprofessional, you are letting your own linux bigotry stand in the way of your duty of care.

This file will self-destruct in five minutes.

Working...