Stories
Slash Boxes
Comments

News for nerds, stuff that matters

30+ GB Databases On Unix?

Posted by Cliff on Wed Jul 26, 2000 06:47 AM
from the that's-one-large-database-you-got-there-pardnah dept.
CaptainZapp asks: "A customer of mine runs a ~30 GB data warehouse on a Sybase SQL Server Database. Now their business requires a mirror of this database in a different location. An offer by a reputed U/X vendor for the hardware turns out to be about five times as expensive as when you get a reasonable x86 box, with the necessary amount of disk space and, say, 1 gig of memory. What does the esteemed Slashdot community think: Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?" He's not worried about "mission-critical" here, he's just wondering if it's possible.

"Now, the database is not mission critical (which doesn't mean it's not a major pain to reload it), so the issue if raw devices are supported is not too relevant. Further, and even more important, this is a major chance to convince a global player of the capabilities of Linux.

All that said, I' m aware that some of you readers have a quarter terabyte of disk space at your disposal. But that's also not the issue at hand. The question is if it is feasible to run an industry strength database of 30 - 40 Gb size with all its consequences (uptime, maintainability, dumps, etc...) in a Linux / Intel environment."

This discussion has been archived. No new comments can be posted.
30+ Gb Databases on Unix? | Log In/Create an Account | Top | 249 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2 | 3 | 4
  • No, only Microsoft SQL Server can do it. Period. by Anonymous Coward (Score:1) Wednesday July 26 2000, @02:22AM
  • Re:Three words: by Anonymous Coward (Score:1) Wednesday July 26 2000, @02:47AM
  • Sybase on Linux by Anonymous Coward (Score:1) Wednesday July 26 2000, @03:17AM
  • Re:Raid 5 for a database? You must be kidding. by Anonymous Coward (Score:1) Wednesday July 26 2000, @04:01AM
  • it works fine by Anonymous Coward (Score:1) Wednesday July 26 2000, @04:10AM
  • 64-bit Hardware by Mike Hicks (Score:2) Wednesday July 26 2000, @03:57AM
  • Re:30Gb databases by Phroggy (Score:1) Wednesday July 26 2000, @03:43AM
  • Re:Clarity of Expression by Phroggy (Score:1) Wednesday July 26 2000, @04:01AM
  • Re:30Gb databases by Phroggy (Score:1) Wednesday July 26 2000, @04:59AM
  • Re:You really mean 30 GB Database on Linux by Stefan (Score:1) Wednesday July 26 2000, @03:16AM
  • Sybase caveats and a new free version by emil (Score:1) Wednesday July 26 2000, @03:32AM
  • Sybase is just plain cleaner by emil (Score:1) Wednesday July 26 2000, @03:50AM
  • Re:Clarity of Expression by Matthew Weigel (Score:1) Wednesday July 26 2000, @11:56AM
  • Re:Of course, yes... Wait, of course, it depends. by riffraff (Score:1) Wednesday July 26 2000, @02:58AM
  • please clarify by soellman (Score:1) Wednesday July 26 2000, @07:36AM
  • Large databases by nerdin (Score:1) Wednesday July 26 2000, @11:56AM
  • by Zachary Kessin (1372) <zkessin@kessin.com> on Wednesday July 26 2000, @01:54AM (#904992) Homepage Journal
    Machines desinged to deal with very large databases tend to be more expensive than your average desktop, even if it has the same amount of disk. They tend to be built for reliablity and stability and speed. All of which cost money.

    My Advice don't skimp on buying the box, you will probably loose anything you save in admin costs on a cheap and not very good box.

    The Cure of the ills of Democracy is more Democracy.

  • Re:You really mean 30 GB Database on Linux by tzanger (Score:1) Wednesday July 26 2000, @08:54AM
  • Re:You really mean 30 GB Database on Linux by tzanger (Score:2) Wednesday July 26 2000, @05:41AM
  • by BadlandZ (1725) on Wednesday July 26 2000, @02:00AM (#904995) Homepage Journal
    I have absolutely NO experiance with Sybase w/ Linux, but Sybase has claimed they support Linux, and are planning on being at Linux world, so it's worth calling them about it. (They seem to be trying to hire Linux techs pretty agressively!).

    SQL database at 30G, sure. I would say call Sybase Inc. first, then VA Linux second, and get the answers streight from the people who are most likely sure to give you a usable product. Get your prices, then compare.

    I'd be more worried about the differances in _how_ your going to mirror the data (connection speeds, transfer methods, how frequently) and that Sybase doesn't garble things when going from a database on one OS to another (unlikely, but possable).

    I'm sure Oracle for Linux will be mentioned, because there are many claims that it will handle such a situation. But, your problem there is going from Sybase to Oracle, not from another OS to Linux. Keep in mind, not all "SQL" databases are identical, the SQL may be, but the extentions provided by the manufacture won't be.

  • It's what you do with it that counts! by stephend (Score:1) Wednesday July 26 2000, @04:07AM
  • Take a look at Deja.com (aka deja.com)
    All of that is run off of an oracle database..

    The Database is HUGE!
    /dev/rd/c0d0p1 71706488 41278452 29710576 58% /v/10

    41GIG
    As long as you have the right indexes... you're all set..

    ChiefArcher
  • Hardware/Software by citmanual (Score:1) Wednesday July 26 2000, @01:59AM
  • Re:Size is not the issue by Amphigory (Score:2) Wednesday July 26 2000, @05:46AM
  • Re:Credibility dropping fast by Chang (Score:1) Wednesday July 26 2000, @03:10AM
  • Re:Large Production Databases by Tet (Score:2) Wednesday July 26 2000, @02:00AM
  • At work... by Palin (Score:1) Wednesday July 26 2000, @03:47AM
  • Re:raw partitions by peter (Score:1) Wednesday July 26 2000, @04:10AM
  • Re:30Gb databases by peter (Score:1) Wednesday July 26 2000, @04:19AM
  • Re:No remote NT management? wtf? by Mawbid (Score:2) Wednesday July 26 2000, @03:40AM
  • Raid 5 for a database? You must be kidding. by Nicolas MONNET (Score:1) Wednesday July 26 2000, @03:22AM
  • Re:Absolutely Raid 5 for Data Warehousing systems by Nicolas MONNET (Score:2) Wednesday July 26 2000, @05:12AM
  • Re:Oracle officially recommends against RAID ... by Nicolas MONNET (Score:2) Wednesday July 26 2000, @12:12PM
  • Re:Oracle officially recommends against RAID ... by Nicolas MONNET (Score:2) Wednesday July 26 2000, @09:14PM
  • seek times are dramatically improved in most (if not all) RAID levels

    Seek time is not going to be any better in mirrorring, for one. The two heads reading the same data won't go faster than one head, will they?

    Then for striping, this usually won't make any kind of difference since data access will be randomly spread over the disk. So there you go.

    NOW smartly organizing the database WITHOUT striping amongst several disks *will* make seek times faster, actually, it will require less seeking. A typical Oracle installation (as recommended by Oracle) will have for example the software on one disk, the indexes on another, and the actual data on a third.

    Now since one DB transaction requires typically at least one index lookup and one data retrieval, which are unlikely to reside close to each other on one disk. Now when they're separated on two disks, subsequent queries will have less seek time .

    Now, since I was right, will you give me my karma back? ;)

  • Re:Interoperability and limits by Johann (Score:1) Wednesday July 26 2000, @11:30AM
  • Veritas for remote replication by mzito (Score:1) Wednesday July 26 2000, @02:47AM
  • 160 Gigabyte database by mzito (Score:1) Wednesday July 26 2000, @02:58AM
  • Re:Sybase is just plain cleaner by myconid (Score:1) Wednesday July 26 2000, @10:23AM
  • Re:~30Gb Sybase Database by Doctor Memory (Score:1) Wednesday July 26 2000, @05:47AM
  • Re:Large databases by Optic (Score:1) Wednesday July 26 2000, @03:09AM
  • Re:You really mean 30 GB Database on Linux by Espressoman (Score:1) Wednesday July 26 2000, @03:41AM
  • Re:Clarity of Expression by Cato (Score:2) Wednesday July 26 2000, @09:03AM
  • Re:Oracle officially recommends against RAID ... by Speed Racer (Score:1) Wednesday July 26 2000, @09:41AM
  • by ansible (9585) on Wednesday July 26 2000, @04:47AM (#905020) Homepage Journal

    The problem with a caching controller is that unless it's well engineered (with it's own battery backup), you more likely to run into filesystem corruption in the case of a power failure or OS crash.

    A standard filesystem (such as ext2) on top of RAID5 will never be fast for small writes.

    NetApps get around this because the WAFL filesystem is explicitly designed to sit atop a RAID4 drive array.

    And there is a difference between RAID10 and RAID0+1.

    RAID10 is a stripe of mirrors. Each pair of disks stores the same information (RAID1), and a stripe is created over those mirrors. This can tolerate multiple drive failures as long as at least one drive from each mirror is working.

    RAID0+1 is a mirror of stripes. Two stripes are created(RAID0), each with half the total of disks. These stripes are then mirrored(RAID1). The problem here is that if a drive goes out, it takes out the entire stripe. If a drive in the other stripe goes out before the rebuild is complete, you're hosed.

    Normally RAID systems (like RAID5) can't tolerate more than 1 drive failing at the same time. However, RAID10 provides more protection than RAID0+1, at the same price.

  • Re:Three words: by rnturn (Score:2) Wednesday July 26 2000, @03:43AM
  • Re:Three words:with three words by rnturn (Score:2) Wednesday July 26 2000, @03:57AM
  • Re:The question changed by rnturn (Score:2) Wednesday July 26 2000, @05:19AM
  • Re:raw partitions (Score:5)

    by rnturn (11092) on Wednesday July 26 2000, @04:46AM (#905024)
    ``oracle uses its own raw partitions/filesystem to store its data. this speeds up oracle''

    It doesn't have to manage it's own disk space. And it may, under certain conditions, provide better performance. We have been moving away from raw data partitions. This after running some benchmarks of a large table residing on raw partitions vs. the same data residing in tables in a filesystem. The performance was actually better while accessing the data in the filesystem. We're talking 10+% better performance not just a few percent. Our experience, based on our benchmarks, and discussions with Oracle technical people, is that the preference for using raw data partitions was based on performance tests using older versions of UNIX and less capable filesystems. Of course, your mileage may vary.

    Aside from performance, if your database changes frequently, adding and deleting tablespaces is a major pain (with long downtime) when you're using raw data partitions but is a snap when you're using filesystems for data. If your database is fairly static raw partitions might buy some little bit of performance but, again, at the expense of managability. IMHO, raw data partitions just aren't worth it. Even if comparitive performance were a wash, the easier means of managing the database weighs in favor of filesystems.


    --

  • running Oracle on NFS by jonbrewer (Score:1) Wednesday July 26 2000, @06:26AM
  • Re:Wouldn't go with Linux myself - fallacy by Lumpy (Score:1) Wednesday July 26 2000, @02:44AM
  • Re:Three words:with three words by arivanov (Score:2) Wednesday July 26 2000, @07:25AM
  • by arivanov (12034) on Wednesday July 26 2000, @03:30AM (#905028) Homepage
    Very bad idea. Or maybe even "Stupidity is limitless"

    1. If you have not noted Oracle legal has walked around every single site that had Oracle vs X benchmarks (X=mysql, sybase, informix) and made them drop them. This is actually possible under the 8.0x EULA. Actually just read the EULA. It is a masterpiece in itself. You are not allowed to benchmark the product and not allowed to question the fact that it is fscking slow and not ANSI compliant. That is besides the fact that if I was you I would not buy something where the manufacturer intentionally disallows fair comparison with other products. It is enough to say fsck this at least for me...

    2. The original database is on Sybase. Sybase is at least more or less syntactically ANSI SQL compliant. Oracle is as far from ANSI as it gets. It will be a good guess that it will take you ages to port the bloody thing. And porting it will be more expensive than the "expensive" hardware.

    3. I would see if the database design is implementable under postgreSQL or MySQL on an Alpha. Alpha is cheap. A reasonably good alpha is under 5000$. Storage will be a 1000$ more. This is as much as an appropriate x86 box. Postgres does not have a 2GB database limit anyway as it splits database files. MySQL does not have this limit on alpha because the platform is 64 bit. Your problems are in the key limitation/lob interface for postgress and transactions for MySQL.

    4. If Neither of the solutions in 3 is implementable you have to open wide you wallet and buy informix for Intel or DB2 for intel. Both of them work and are ANSI compliant. In btw DB2 for Intel linux developer edition is free. Free period. No expiration. So you can actually see if the database will work. And they match Oracle on some benchmarks and DB2 beats the crap out of it when it comes to real scalability and clustering.

  • Re:Three words: by lcase (Score:1) Wednesday July 26 2000, @03:53AM
  • kdb by muchandr (Score:1) Wednesday July 26 2000, @03:03AM
  • Ever considered Adabas/D? by Paranoid (Score:1) Wednesday July 26 2000, @05:27PM
  • wrong question (Score:4)

    by jetson123 (13128) on Wednesday July 26 2000, @02:54AM (#905032)
    Of course, UNIX can handle it, probably better than just about anything else out there. Linux isn't UNIX, of course, and whether Linux can handle it is a different question. It probably can if you find the right software (I'd give DB2 a try).

    But why ever would you replicate a database to a different kind of server? If the original database runs on Sybase SQL on whatever, then the obvious answer is to replicate it to an identical setup. Anything else, whether mission critical or not, is just going to be a lot more work, training, and maintenance.

  • two words for you: by um... Lucas (Score:1) Wednesday July 26 2000, @03:42AM
  • Multi CPU by Hammer (Score:1) Wednesday July 26 2000, @04:55AM
  • Re:No, only Microsoft SQL Server can do it. Period by bruceg (Score:1) Wednesday July 26 2000, @05:49AM
  • Am I missing something here? by dreamt (Score:1) Wednesday July 26 2000, @02:09AM
  • Re:Three words: by Sensor (Score:1) Wednesday July 26 2000, @02:18AM
  • Re:No, only Microsoft SQL Server can do it. Period by Dissenter (Score:1) Wednesday July 26 2000, @08:16AM
  • Re:Clarity of Expression by Felinoid (Score:2) Wednesday July 26 2000, @04:26AM
  • Re:Clarity of Expression by Felinoid (Score:2) Wednesday July 26 2000, @04:30AM
  • Re:No remote NT management? wtf? by fcw (Score:1) Wednesday July 26 2000, @04:21AM
  • Silly question! by MeanGene (Score:1) Wednesday July 26 2000, @03:36AM
  • Re:Absolutely Raid 5 for Data Warehousing systems by Surak (Score:2) Wednesday July 26 2000, @06:59AM
  • Re:No remote NT management? wtf? by rm -rf /etc/* (Score:1) Wednesday July 26 2000, @04:28AM
  • Re:No, only Microsoft SQL Server can do it. Period by CristianoMonteiro (Score:1) Wednesday July 26 2000, @08:13AM
  • Re:Of course, yes... Wait, of course, it depends. by Cedric C. Girouard (Score:1) Wednesday July 26 2000, @03:35AM
  • More info by _Spirit (Score:1) Wednesday July 26 2000, @02:02AM
  • Re:Custom built machines by _Spirit (Score:1) Wednesday July 26 2000, @02:07AM
  • Re:No remote NT management? wtf? by FascDot Killed My Pr (Score:1) Wednesday July 26 2000, @03:13AM
  • Re:No remote NT management? wtf? by FascDot Killed My Pr (Score:1) Wednesday July 26 2000, @03:52AM
  • by FascDot Killed My Pr (24021) on Wednesday July 26 2000, @02:04AM (#905051)
    The title and summary say "Can Unix handle it?" while the "below the fold" area asks "Can Linux/Intel handle it?".

    I'd say the answer to the first question is a resounding "duh!". The answer to the second is a resounding "probably".

    I found Oracle on Linux to be quite usable and nice (except for lame non-readline-enabled interactive tools) and fairly fast. But there is something...incongruous about spending $2000 on hardware, $2000 on Oracle and then using a free OS (that you WILL have to tweak to optimize).

    Other tidbits:
    1) Do NOT, I repeat NOT NOT NOT use Oracle on NT. The (evaluation) version I tried sucked BIG TIME. The bulk loader didn't properly support all the file formats it was supposed to and I was able to repeatedly crash the box by mistyping field names into the table creator GUI. Add all the problems of NT (no real remote management, etc) and you have yourselves the makings of a nightmare.

    2) Raw devices are for more than recovery. They also help in the speed department. If you are going to be loading 30+ GB of data multiple times (this is a backup, right?) you are going to want speed. IIRC, ~100MB took about 5 minutes to bulk load (raw, not insert) on Oracle for Linux. That's 25 hours of load time for 30 GB.

    3) Can't you take the backups from your primary DB and load them as restores to the backup DB? That would save tons of time and effort (up front AND ongoing).
    --
    Give us our karma back! Punish Karma Whores through meta-mod!
  • Yes by Dacta (Score:2) Wednesday July 26 2000, @02:04AM
  • How I'd do it. by Genady (Score:2) Wednesday July 26 2000, @04:09AM
  • try a clean design by FonkiE (Score:2) Wednesday July 26 2000, @02:36AM
  • Re:try a clean design by FonkiE (Score:2) Wednesday July 26 2000, @04:35AM
  • IBM can handle it! by raffe (Score:1) Wednesday July 26 2000, @02:13AM
  • by DamageBoy (28870) on Wednesday July 26 2000, @02:03AM (#905057)
    Unix systems handle the largest databases known to mandkind
    as we speak.
    Databases managed by unix systems have been known to be in
    the vicinity of around 2-6TB.

    Your question seems to refer to Unix on x86 databases that
    have that size.
    Of course that running unix on x86 systems usually boils
    down to running Linux...

    Linux is officially supported by both Oracle, Informix and
    I think that even Sybase altough I'm not completely sure
    about that.

    Obviously running it on the same RDBMS would be an easier
    to accomlish, so you'd probably want Sybase to support Linux.

    You'd also want RAID 5, preferably hardware which is supported
    by Linux.

    You'd probably want to use some sort of journaling file systems.
    I myself have no problem trusting the beta versions of ReiserFS.
    I've also ran oracle on them witout any problem.

    If you feel reluctant in using bleeding edge kernel patches
    for a production environment, I can only recomend that you use
    SMALL ext2 partitions to avoid catastrophic FSCK times, and let
    Oracle / RDMS do it's magic in managing a single 30GB database
    over smaller files...

  • Re:Size is not the issue by dublin (Score:2) Wednesday July 26 2000, @10:19AM
  • Re:Raid 5 for a database? You must be kidding. by sbeitzel (Score:1) Friday July 28 2000, @10:42AM
  • Of course you can do it by sbeitzel (Score:2) Wednesday July 26 2000, @05:36AM
  • I'm using Sybase ASE 11.9.2 as my company's database, and running it on Red Hat Linux 6.2. We've found that using raw partitions can work, but with this version of Sybase the largest you can get a partition is 2GB so you have to distribute your database across several devices. That's no big deal, though.

    Now, if you wanna talk about performance...get yourself a RAID and use a multiprocessor system. Sybase understands SMP systems and the RAID will help you on your I/O.

  • Re:Why not Sybase on Linux? by jackmama (Score:1) Wednesday July 26 2000, @02:17AM
  • Re:Size is not the issue by ajs (Score:2) Thursday July 27 2000, @02:38AM
  • by ajs (35943) <ajs AT ajs DOT com> on Wednesday July 26 2000, @02:14AM (#905064) Homepage
    I run a database of this size, and it's not a challenge. Cost is very high, but that's mostly because a database of that size is one that you cannot afford to have to restore.

    I currently use a Sun architecture, but I know of sites that use Intel/Linux, HP PA/RISC and even (may all the little gods help you) Intel/MS/SQL server which does have it's place in non-mission-critical places where you're never going to have a good DBA.

    I can seriously recommend the Network Appliance Filer for back-end storage. Their claim that their network-attached storage array is faster than local disk sounds silly on the face of it, but there are good and valid reasons that it's true (mostly due to their journaling and caching strategy which is highly optimized for NFS). The Filer makes databases a lot easier to manage. For example, the Filer can make an online backup in less than 5 seconds, no matter how much data you have!

    Back to your original point: 30GB is small, don't sweat it. But, don't cut corners either!
  • Re:You really mean 30 GB Database on Linux by Tower (Score:1) Wednesday July 26 2000, @03:25AM
  • Re:Custom built machines by Tower (Score:1) Wednesday July 26 2000, @03:34AM
  • 2GB filesize limit by antiher0 (Score:1) Thursday July 27 2000, @01:54AM
  • Re:You really mean 30 GB Database on Linux by nakaduct (Score:1) Wednesday July 26 2000, @07:21AM
  • Re: 30 gig no problem in HP/UX at least by thing12 (Score:1) Wednesday July 26 2000, @02:15AM
  • Re:Custom built machines by thing12 (Score:1) Wednesday July 26 2000, @02:18AM
  • Re:The question changed by thing12 (Score:1) Wednesday July 26 2000, @02:24AM
  • Re:The question changed by thing12 (Score:1) Wednesday July 26 2000, @02:37AM
  • Re:The question changed by thing12 (Score:1) Wednesday July 26 2000, @06:50AM
  • Re:Interoperability and limits by Starselbrg (Score:2) Wednesday July 26 2000, @06:31PM
  • Re:Clarity of Expression by superlame (Score:1) Wednesday July 26 2000, @03:17AM
  • Tiny! by RallyDriver (Score:1) Wednesday July 26 2000, @06:43PM
  • 30GM...Try 2 Terra-Bytes by MooseMunch (Score:1) Wednesday July 26 2000, @02:50AM
  • Wait for next kernel release by La Camiseta (Score:1) Wednesday July 26 2000, @08:46AM
  • 30gig is SMALL by CountZer0 (Score:2) Wednesday July 26 2000, @05:15AM
  • clarification by theonetruekeebler (Score:1) Thursday July 27 2000, @05:19AM
  • Whever you build a database, you must at how it will be used before you make physical layout decisions. The Asker here specified a data warehouse, which to me implies a DB which will be written to once then read hundreds of times afterwards. With an R/W ratio that high, write performance is only a minor consideration compared to read performance. While RAID-10 would give great all performance, for read access it won't do an awful lot better than RAID-5, at just over half the hardware cost.

    So for a data warehouse I would not hesitate to do RAID-5.

    As for mirroring, I can't speak for Sybase, but Oracle supports a wide variety of mirroring and networked DB options. I would look into something akin to snapshots, which are read-only copies of a master database. Designate one copy of the DB as the write-to master, and snapshot it over. Of course, this all depends on why you're mirroring. If you are doing this for redundancy in the event of catastrophe, look at your loss tolerance and acceptible downtime. You could do something as simple as making a copy of the database remotely, then copying over your redo logs at every log switch. Then if your database fails, use the redo logs to roll your remote database forward, and bring it on line.

    World of possibilities.

    --

  • Re:The question changed by CigarBuff (Score:1) Wednesday July 26 2000, @03:45AM
  • Re:"Not mission critical?" by be-fan (Score:2) Wednesday July 26 2000, @07:49AM
  • Yes, but by strombrg (Score:1) Wednesday July 26 2000, @05:28AM
  • Solution Found! by Danborg (Score:1) Wednesday July 26 2000, @03:09AM
  • Re:Solution Found! by Danborg (Score:1) Wednesday July 26 2000, @03:45AM
  • Re:You really mean 30 GB Database on Linux by ostiguy (Score:2) Wednesday July 26 2000, @05:23AM
  • Re:raw partitions by java.bean (Score:1) Wednesday July 26 2000, @02:45AM
  • Re:Sybase is just plain cleaner by java.bean (Score:1) Wednesday July 26 2000, @04:06AM
  • Re:30Gb databases by java.bean (Score:2) Wednesday July 26 2000, @02:07AM
  • Re:The question changed by jaclu (Score:1) Wednesday July 26 2000, @08:44AM
  • Re:You really mean 30 GB Database on Linux by e. boaz (Score:1) Wednesday July 26 2000, @07:30AM
  • Re:Some minor problems to look out for... by matthead (Score:1) Wednesday July 26 2000, @03:35AM
  • Re:No problem by matthead (Score:1) Wednesday July 26 2000, @03:41AM
  • Re:Three words:with three words by bradleyjg (Score:1) Wednesday July 26 2000, @05:32PM
  • Re:Three words: by Paranoid Diatribe (Score:1) Wednesday July 26 2000, @07:16AM
  • PC hardware does this, easy by rlglende (Score:1) Wednesday July 26 2000, @05:07AM
  • Of course Linux/Unix can handle 30 GB by fence (Score:1) Wednesday July 26 2000, @10:24AM
  • Look better for a decent UNIX offer by Baki (Score:1) Wednesday July 26 2000, @03:40AM
  • Re:Journaling File System by n3bulous (Score:1) Wednesday July 26 2000, @03:50AM
  • The biggest data warehouse today... by Cushman (Score:2) Wednesday July 26 2000, @05:17AM
  • Re:Of course it can by MindOpen2 (Score:1) Wednesday July 26 2000, @05:01AM
  • Why not Sybase on Linux? by JonK (Score:1) Wednesday July 26 2000, @01:55AM
  • Re:Large Production Databases by JonK (Score:1) Wednesday July 26 2000, @02:05AM
  • Re:Custom built machines by JonK (Score:1) Wednesday July 26 2000, @02:08AM
  • Re:Wouldn't go with Linux myself by JonK (Score:1) Wednesday July 26 2000, @02:15AM
  • Re:You really mean 30 GB Database on Linux by JonK (Score:1) Wednesday July 26 2000, @02:57AM
  • Re:No.. Three words by JonK (Score:1) Wednesday July 26 2000, @03:09AM
  • Re:not totally irrelevant by JonK (Score:1) Wednesday July 26 2000, @05:32AM
  • Re:Interbase? by JonK (Score:1) Wednesday July 26 2000, @05:46AM
  • Re:Excuse me! by JonK (Score:1) Thursday July 27 2000, @12:39AM
  • Two words (Score:4)

    by JonK (82641) on Wednesday July 26 2000, @02:23AM (#905112)
    Bad Idea.

    Changing RDBMSs is a Really Painful Experience and one to be avoided at all costs if possible: it makes changing OSes look trivial (hell, even upgrading from one point release to the next can be a world of pain). If the data's already on Sybase then for god's sake keep it on Sybase. Go for Sybase on Linux, Sybase on SCO, Sybase on NT or whatever but remember: it's a RDBMS and the underlying platform is effectively irrelevant (pauses for flames as thousands of enraged Slashdotters start to spout off and steam at the ears)
    --
    Cheers

  • SUN SPARC E3500 by $nyper (Score:1) Wednesday July 26 2000, @12:40PM
  • MySQL thinks very highly of itself.... by .havoc (Score:1) Wednesday July 26 2000, @02:58AM
  • Re:MySQL and data warehousing don't go together by .havoc (Score:1) Wednesday July 26 2000, @06:02AM
  • As I remember from old documentation by mr (Score:1) Wednesday July 26 2000, @03:24AM
  • I've got customers running .5TB DBs on FreeBSD by ericr (Score:1) Wednesday July 26 2000, @04:11AM
  • Re:More on Partition Sizes by bluetoad (Score:1) Wednesday July 26 2000, @01:44PM
  • Re:Why not Sybase on Linux? by steelhawk (Score:1) Wednesday July 26 2000, @02:12AM
  • Re:Why not Sybase on Linux? by steelhawk (Score:1) Wednesday July 26 2000, @03:05AM
  • 30G is nothing by wharfrat (Score:1) Wednesday July 26 2000, @02:37AM
  • For real power... by TrailerTrash (Score:1) Wednesday July 26 2000, @04:56AM
  • Custom built machines by Relic (Score:1) Wednesday July 26 2000, @02:02AM
  • Re:Clarity of Expression by debaere (Score:2) Wednesday July 26 2000, @02:40AM
  • Re:Yes (Score:3)

    by bero-rh (98815) <bero.redhat@com> on Wednesday July 26 2000, @02:07AM (#905125) Homepage
    You might have to partition the DB into multiple files to get around the 2GB file size limit on Linux

    Or patch the kernel so it doesn't have the limit.
    Patches for this are available; if you don't want to build your own kernel, get the Red Hat Linux Enterprise Edition, which has this patch by default.
  • 30GB is a no-brainer by Ora*DBA (Score:1) Wednesday July 26 2000, @03:49AM
  • I personally see two issues here... by PromethiumInfrmation (Score:1) Wednesday July 26 2000, @03:40AM
  • Re:You really mean 30 GB Database on Linux by Omega996 (Score:1) Wednesday July 26 2000, @02:29AM
  • Re:You really mean 30 GB Database on Linux by Omega996 (Score:1) Friday July 28 2000, @10:54AM
  • Linux vs commercial unix vs GatesWare by lehmann (Score:1) Wednesday July 26 2000, @02:58AM
  • Re:Clarity of Expression by katarn (Score:1) Wednesday July 26 2000, @05:58AM
  • Fast Oracle on Linux by Rebar (Score:1) Wednesday July 26 2000, @05:06AM
  • Re:Absolutely Raid 5 for Data Warehousing systems by Rebar (Score:1) Wednesday July 26 2000, @05:56AM
  • Re:raw partitions by Rebar (Score:1) Wednesday July 26 2000, @03:51PM
  • Re:Absolutely Raid 5 for Data Warehousing systems by Rebar (Score:1) Thursday July 27 2000, @03:19AM
  • by Rebar (110559) on Wednesday July 26 2000, @04:42AM (#905136)
    Yes Raid 5, in hardware thankyouverymuch.

    Like most everyone else, you are assuming all database are OLTP systems. Data warehousing or data analysis on the other hand requires MASSIVE data transfer rates (mostly read activity), and Raid 5 with large stripe sizes and multiple arrays works really well for this type database. Most queries against the roughly 3TB database I currently work on run in several minutes passing somewhere under 100GB of data each, and if we had used OLTP tactics (indexes to join everything, small block size for low latency reads, etc) to tune the database, they would run in days or hours instead of minutes. Aggregate I/O rates on this monster can exceed 500MBytes/second.

    As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!

    Someone please tell me that I will be able to use large files painlessly on Linux sometime. Until then, run large databases on name brand UNIX servers with name brand UNIX. Linux on x86 is good at a lot of things, but a large database isn't one of them YET.

    SQL> select sum(bytes) from dba_data_files;

    SUM(BYTES)
    ----------
    2.9003E+12

    And every byte is on RAID 5.

  • Re:Why not Sybase on Linux? by -brazil- (Score:1) Wednesday July 26 2000, @01:58AM
  • Re:No.. Three words by -brazil- (Score:1) Wednesday July 26 2000, @10:45PM
  • Three words: by -brazil- (Score:2) Wednesday July 26 2000, @01:52AM
  • raw partitions by emir (Score:1) Wednesday July 26 2000, @02:41AM
  • Re:Clarity of Expression by emir (Score:1) Wednesday July 26 2000, @06:46AM
  • Re:Clarity of Expression by emir (Score:2) Wednesday July 26 2000, @03:20AM
  • not totally irrelevant by GodOfHellfire (Score:1) Wednesday July 26 2000, @04:33AM
  • Not a problem... by digitalhermit (Score:1) Wednesday July 26 2000, @04:53AM
  • by tjwhaynes (114792) on Wednesday July 26 2000, @02:32AM (#905145)

    Look - 30GB database? Lets just look at the necessities first and then we'll get down to a choice of vendor (because you are going to want a reasonably heavy weight database server for this).

    30GB of data. Okay - so you aren't mission critical. Even so, with that amount of data, you probably want a hot-swappable redundant system such as RAID if availability means anything to you. But these days you have lots of choices for RAID, including software RAID under Linux. I'd probably still go for a hardware solution for RAID, but that is because I'm not clued up on how robust and failure-proof the Linux RAID is when one of the disks dies. If you don't care about redundancy, 40GB drives are easily found. For performance reasons you might want to find four drives of say 15GB each so that random access to the drives can be done in near parallel, especially if you stripe the drives, but that is yet another option.

    Accessing 30GB of RAM is going to require some reasonable memory space - think 512MB minimum and work up from there. Of course, you could run it on far far less (say 80MB) but you will pay a performance penalty - the database products I know about have plenty of tricks up their sleeves if they have spare memory to play with, and resort to paging out to disk when things get tight.

    The choice of software is important too. I'll declare my biases up front and say go for DB2 Universal Database, partly because I work on it and I like it. Your other choices are Oracle, obviously, and there are a host of other database vendors out there for Unix systems across the board. DB2 UDB is easier to administrate and looks to be faster than Oracle, as well as generally being cheaper to deploy. As far as functionality goes, everybody nowadays assures SQL92 conformance. SQL99 core conformance isn't too much to hoot about, as it's basically SQL92. The SQL99 spec is far more modular than the SQL92 spec, so it's easier to match the base functionality and then add on SQL99 conformance for, say, the multimedia extentions, later.

    So the answer to your question is yes - it is possible to deploy a 30GB on Unix. And it is definitely possible to deploy the same database on Linux - both IBM and Oracle have versions of their databases on Linux.

    Cheers,

    Toby Haynes

  • "Not mission critical?" by rob_from_ca (Score:1) Wednesday July 26 2000, @07:08AM
  • Re:Multi CPU by fsck (Score:1) Wednesday July 26 2000, @06:08AM
  • Credibility dropping fast by gammatron (Score:1) Wednesday July 26 2000, @02:42AM
  • Re:Credibility dropping fast by gammatron (Score:1) Wednesday July 26 2000, @07:15AM
  • 30Gb databases by LinuxGrrl (Score:1) Wednesday July 26 2000, @01:56AM
  • Re:No, only Microsoft SQL Server can do it. Period by SuiteSisterMary (Score:1) Wednesday July 26 2000, @03:08AM
  • Re:Excuse me! by Denix (Score:1) Thursday July 27 2000, @01:05AM
  • 30G no problem. Heres some advice by grantsucceeded (Score:1) Wednesday July 26 2000, @06:34PM
  • Re:No remote NT management? wtf? by evilgrin (Score:1) Wednesday July 26 2000, @03:29AM
  • Re:No remote NT management? wtf? by evilgrin (Score:1) Wednesday July 26 2000, @10:09AM
  • No remote NT management? wtf? by evilgrin (Score:2) Wednesday July 26 2000, @02:59AM
  • Re:30G is nothing by mrfiddlehead (Score:1) Wednesday July 26 2000, @03:55AM
  • Kernel enhancements in 2.4 by mauryisland (Score:1) Wednesday July 26 2000, @04:00AM
  • Are you kidding? by dilyard (Score:1) Wednesday July 26 2000, @03:42AM
  • hardware raid5 by ArchieBunker (Score:1) Wednesday July 26 2000, @04:50AM
  • IRS by Dungeon Dweller (Score:2) Wednesday July 26 2000, @05:45AM
  • Sybase/Solaris still affordable... by ironduke-particle (Score:1) Wednesday July 26 2000, @10:00AM
  • Re:30G??? Try 10T...or 130TB by gowdy (Score:1) Wednesday July 26 2000, @04:36PM
  • Re:Uuups, a few clarifications by Tassach (Score:2) Friday July 28 2000, @11:28AM
  • Interoperability and limits by Gruturo (Score:1) Wednesday July 26 2000, @10:35AM
  • Re:Interoperability and limits by Gruturo (Score:1) Thursday July 27 2000, @05:24AM
  • Re:Solution Found! by richardbowers (Score:1) Wednesday July 26 2000, @03:34AM
  • 30Gb+ by drfrog (Score:1) Wednesday July 26 2000, @08:32AM
  • Re:You really mean 30 GB Database on Linux by mccrohan (Score:1) Wednesday July 26 2000, @04:15AM
  • Database Mirror Utility by Sherman Peabody (Score:1) Wednesday July 26 2000, @03:42AM
  • Clarity of Expression by Gothmolly (Score:2) Wednesday July 26 2000, @01:59AM
  • Second on the Filer by The Big Bopper (Score:1) Wednesday July 26 2000, @03:15AM
  • Re:Three words:with three words by Ian-K (Score:1) Wednesday July 26 2000, @04:40AM
  • Re:Two words by john_many_jars (Score:1) Wednesday July 26 2000, @04:38AM
  • Re:Of course it can by john_many_jars (Score:2) Wednesday July 26 2000, @04:34AM
  • 2Gb filesize == old information by cthulhubob (Score:1) Wednesday July 26 2000, @04:14AM
  • Re:Custom built machines by Pinball Wizard (Score:2) Wednesday July 26 2000, @05:34AM
  • what about other databases? by Thu Anon Coward (Score:1) Wednesday July 26 2000, @08:59AM
  • Re:what about other databases? by Thu Anon Coward (Score:1) Thursday July 27 2000, @10:20AM
  • 30GB on the x86 platform is doable. by iamabot (Score:1) Wednesday July 26 2000, @06:03AM
  • Re:Silly question! by johnlcallaway (Score:1) Wednesday July 26 2000, @04:33AM
  • by johnlcallaway (165670) on Wednesday July 26 2000, @02:59AM (#905182)
    I hear this question a lot, and I am really tired of it. It doesn't matter how big the database is, but how much it is going to be used. If I created a multi-terabyte database (can you say p0rn?) that only had one user, sure Linux/Intel could handle it. But throw it up onto a network with millions of requests per hour, and the equation shifts. Could you build an Lintel box to support it?? Lets see....

    Here are the priority items for any database box --
    • Memory. Databases love memory for cache, logs, etc. If you can keep your entire database in memory, disk speed becomes irrelevant after the first data access and for writes. If your box only supports a couple of gigs of memory, move on. We have boxes with 4GB of memory, and our DBA wants more.
    • Disk Bandwidth. The more disk bandwidth the better. Several little disks scattered about multiple SCSI controllers will usually perform better than comperable aggregated large disks. Don't even think about using IDE/EIDE
    • IO Speed. The faster your disks, the better (Duh...) Again, disk size can play second fiddle to disk access times. I would rather have many small, fast disk drives than one large, slow one.
    • CPU speed. Did you notice this was last??? Face it, if you can't keep it in memory and your disks aren't fast enough for your processor(s), then the CPU speed isn't as relevent
    • Network bandwidth. Most computers do not have issues here. However, there is overhead pushing data over a network, and the more data you push, the need for network bandwidth increases to respond to requests.
    It is also a good idea to seperate application/web servers from database servers. All modern databases support the ability to service database requests over a network. Providing a unique network solely for database activity that is seperate from the user network is common in most shops now to support the data movement from database servers to the app servers.

    The game all sys admins and DBAs perform is finding the current bottleneck. There is always a limiting factor for performance, and it can usually be tied to one of the above items

    Determining a configuration to support a database is not easy. You need to gather usage predictions, such as number of concurrent users, read rates, update rates, log projections, and make a guess. You also need to know your target audience and how they access it. A million requests spread over 24 hours is not the same as a million requests in a short period.

    This is only a sig, this is only a sig.....
  • Re:Three words: by professionalGeek (Score:1) Wednesday July 26 2000, @02:03AM
  • BSD by Britz (Score:1) Wednesday July 26 2000, @02:54AM
  • Storage Considerations by ccGecko (Score:1) Wednesday July 26 2000, @12:49PM
  • Re:two more words: by stephenbooth (Score:1) Wednesday July 26 2000, @02:17AM
  • Re:Two words by stephenbooth (Score:1) Wednesday July 26 2000, @03:17AM
  • Not a hard questions to answer by xtheunknown (Score:1) Wednesday July 26 2000, @06:33AM
  • Re:Raid 5 for a database? You must be kidding. by InsaneGeek (Score:1) Wednesday July 26 2000, @04:46AM
  • Re:Interbase? by j-pimp (Score:1) Wednesday July 26 2000, @08:11PM
  • Re:You really mean 30 GB Database on Linux by duffbeer703 (Score:1) Wednesday July 26 2000, @02:05PM
  • Re:Three words: by RedFang (Score:1) Wednesday July 26 2000, @11:34AM
  • Informix for Sinix by romanm (Score:1) Wednesday July 26 2000, @03:14AM
  • Uuups, a few clarifications by CaptainZapp (Score:1) Wednesday July 26 2000, @05:47AM
  • Re:30Gb databases by Moderation abuser (Score:2) Wednesday July 26 2000, @01:59AM
  • Re:Three words: by bebopkim (Score:1) Wednesday July 26 2000, @02:55PM
  • Re:two more words: by oliverthered (Score:2) Wednesday July 26 2000, @02:03AM
  • Journaling File System by FlyingElvis (Score:1) Wednesday July 26 2000, @02:31AM
  • You said it... by mirko (Score:1) Wednesday July 26 2000, @02:12AM
  • Interbase? by Cliffton Watermore (Score:1) Wednesday July 26 2000, @03:38AM
  • Excuse me! by Cliffton Watermore (Score:1) Wednesday July 26 2000, @07:44AM
  • Re:Excuse me! by Cliffton Watermore (Score:1) Wednesday July 26 2000, @09:47PM
  • No.. Three words by twisteddk (Score:1) Wednesday July 26 2000, @02:59AM
  • Re:try a clean design by twisteddk (Score:1) Wednesday July 26 2000, @03:25AM
  • by twisteddk (201366) on Wednesday July 26 2000, @02:56AM (#905205)
    And You also quite nice manage to change the question Yourself.
    Nobody said anything about Oracle. No wait.. I take that back... But the person posing the QUESTION didn't say anything about Using oracle for a DB. The question actualy stipulates a Sybase DB !

    But anyway, to answer the question posed in the first place: Yes You COULD probably handle a UX/NT trasition of the data, but try not to change database as this often screws with the data. Not all tables are stored identically in all databases (probably one of the reasons there are more than one supplier of databases). So for gods sake.. Even IF You want to have a backup/mirror on the UX box, make sure You run the same DB.

    But still, it sounds like you want to "exchange" the UX box for an Intel machine running Linux.. Am I right ?
    If this is indeed the case, yes even a 30 (or 50 gig for that matter) DB is possible. The major pitfalls in this scenario are (I've been there myself):
    1. Physical space for disks.
    If you go buy a Intel machine You limit Yourself to say about 3-4 SCSI controllers, and unless You go and buy a shitload of External conenctivity (kabinets and such), which can be a pain, You're often limited to about 8-10 disk drives, so size Your DB with some future expansion in mind.

    2.Backup solultion
    Make Sure to have a decent and FAST backup. I've not yet been able to run parallel backups on Linux (maybe I'm just not very good at configuring it), and it DOES take a while to backup 30 Gigs, even on a DLT, so if the client wants high-availability, take this into consideration. However, in Your situation, this might be redundant, since this DB WILL be the mirror (but the point should be handled otherwise).

    3. High Availability.
    Your client might want the DB to be accessible at ALL times, and we all know that when a PSU or CPU goes in a NT box, the machine is pretty darned worthless. And getting a decent service level on a Intel box is almost impossible (usually 24 hours is as good as it gets). Also You should consider if this mirror should be used as a fail-over in case of whatever.

    4. Remote access.
    Remote servicing is a bit easier though, as You can easily set up Telnet or whatever. However, You can get some goot remote programs for Windows machines also, just not AS good. But this should only factor in if You need to access the machine frequently. If the choice is between UNIX/Linux, it's the same diff. But if it's UX/NT, then think about it for a while.

    5. Maintanence
    Maintenance is a BIT heavier, especially when the machine gets older, but the first year or two, how gives a S***. Also, whatever peole might say of UNIX harddirves, they're EXACTLY the same as the ones sold for Windows machines. They're just formatted differently. So You will save a bundle on the costs of aqusition, which should cover for the added maintenance of trading in old components that can no longer hack it (MB's, Networking cards, SCSI controllers, RAM etc.), All of the components which are NOT the same :)

    6. Choosing the right hardware.
    You might want to make sure to spend a few more dollars on the right hardware. Whatever people might say, the UNIX boxes are most often put together with the best of hardware, ECC ram, Redundancy controllers, and hot-swap drives (and sometimes also other pieces can be swapped whilst power is still on). DON'T save more money here than absolutely nessesary. A good point to make would be: It's basically the same hardware, only the software is different.

    These are my thoughs/experiences on this matter. As for "FascDot Killed My Pr". I REALLY have to say: I've been running an Oracle DB (8.04) on an NT for over two years now, not a single glitch yet. And YES, it's a development DB, so there ARE active users. And installation was as sweet as pie. Only major flaw in my opinion is the inability of older Oracles to "bundle together", You could not have more than one major relase DB installed on one system, You have to add another logical DB to the exising one, or install a different major relase version of oracle as the second DB. But that's SUPPOSEDLY done away with in version 8 and up (not that I'm not haveing problems with it anyway)

  • Works by rxmd (Score:1) Wednesday July 26 2000, @03:08AM
  • More details: Size, Users, Purpose by rxmd (Score:1) Wednesday July 26 2000, @03:15AM
  • MySQL and data warehousing don't go together by rxmd (Score:1) Wednesday July 26 2000, @03:18AM
  • HOWTO: Oracle on FreeBSD by rxmd (Score:1) Wednesday July 26 2000, @04:21AM
  • Re:HOWTO: Oracle on FreeBSD by rxmd (Score:1) Wednesday July 26 2000, @12:10PM
  • Re:No, only Microsoft SQL Server can do it. Period by dodo-lodo (Score:1) Wednesday July 26 2000, @03:06AM
  • DATAFLEX by freediver211 (Score:1) Wednesday July 26 2000, @02:42AM
  • Unix or Linux? by photon317 (Score:1) Wednesday July 26 2000, @06:59AM
  • Re:Interbase? by ScuzzMonkey (Score:1) Wednesday July 26 2000, @04:49AM
  • Re: 30 gig no problem in HP/UX at least by HP-UX'er (Score:1) Wednesday July 26 2000, @02:56AM
  • Re:50GB on Linux? by TwoFlower69 (Score:1) Wednesday July 26 2000, @01:52AM
  • ~30Gb Sybase Database by .foreward (Score:2) Wednesday July 26 2000, @03:20AM
  • Quick Other-side... by Pyre (Score:2) Wednesday July 26 2000, @05:30AM
  • Re:No, only Microsoft SQL Server can do it. Period by j_skillz (Score:1) Wednesday July 26 2000, @02:31AM
  • PostgreSQL certainly will by Karora (Score:1) Wednesday July 26 2000, @11:56AM
  • Re:Are you kidding? by mheaney (Score:1) Wednesday July 26 2000, @04:53AM
  • sybase ase on linux, considerations. by scroe (Score:2) Wednesday July 26 2000, @06:01AM
  • Nobody heard of Progress RBDMS? by LinuxBuddha (Score:1) Wednesday July 26 2000, @06:41AM
  • Re:Three words:with three words by jaraco (Score:1) Wednesday July 26 2000, @01:18PM
(1) | 2 | 3 | 4