Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
News

Increasing the Transfer Rate? 34

Nintendork asks: "I recently started a new job as a resident computer geek and am analyzing the performance of our SQL server. I did quite a bit of research and would like an opinion from the Slashdot community on my proposed solution for increasing the STR (Sustained Transfer Rate) from the server to the workstations. The server (Compaq ProLiant ML530) has 16 10,000 RPM drives with an average STR of ~43MB/sec. per drive. 14 are used for two RAID 5 logical drives (7 physical drives per logical). The remaining 2 drives are backup drives in case one fails. Currently, they're all connected to a Compaq fibre RA4000 adapter. It runs at 100MB/sec. from what I could find in a jungle of fibre information. Reasoning tells me I have a huge bottleneck at the fibre adapter and the 100baseT NIC. I should also mention that the server has 2 PCI buses. One runs at 64-bit and 66Mhz and has 2 PCI slots. My proposed setup would be to back up all the data and create a new array with a few hardware modifications. Take out the fibre adapter and use two, dual channel 64-bit 66Mhz ultra160 adapters on the two 64-bit 66Mhz PCI slots (4 drives per channel). Take out the 100baseT NIC and start a gigabit backbone." Would this significantly increase performance? Read on, if you to check out the numbers on the new setup.

"From what I've learned thus far, the proposed setup would be a blazingly fast file server approaching ludicrous speed. Let me break it down. Data can be read from the drives at a STR of ~602MB/sec. (~43MB/sec. * 14 drives). Each Ultra160 channel has a STR of 132MB/sec. This provides a bearable bottleneck that reduces the overall STR to ~528MB/sec. (132MB/sec. * 4 channels). The 64-bit 66Mhz PCI bus has a STR of 528MB/sec., which is an exact match for the 4 ultra160 channels! From there, I assume the data goes out the NIC, which is on a gigabit backbone. This would provide a STR of ~528MB/sec. to the workstations. Unless I'm missing something such as a possible bottleneck between the PCI bus and the NIC, my reasoning makes gosh darned perfect sense!

Thanks in advance for any insight you all can provide on this issue."

This discussion has been archived. No new comments can be posted.

Increasing the Transfer Rate?

Comments Filter:
  • Ditch Raid5. (Score:4, Informative)

    by AntipodesTroll ( 552543 ) on Friday February 08, 2002 @08:00AM (#2973366) Homepage
    Your first priority is to ditch Raid5.
    By the looks of your post, you have money to spend, so invest in more disks and go RAID 0+1. You'll notice a speed increace right there. If youre worried about PCI latency, get a system with 2 or 3 PCI busses.

    Infact, Whoever set up this configuration needs a slap, if they were going for performance. Raid 5 has more and more overhead penalty the more disks you add to a stripe set. Even knocking back the sets to 4 or 3 disks would help, I would never use more than 4 disks in a raid5 set.

    That said, if your app is mostly db reads, 0+1 on good controllers should give you 4* the average disk throughput, thanks to striping, and round-robin mirror reads. Also, make sure your OS and filesystem are well tuned. Often you see people spending money on hardware, because they havent bothered to optomise the software. (Kernel variable, filesystem tuning, etc.)
    • your .sig (Score:1, Funny)

      by Anonymous Coward
      RMS-free software is like a Kenneth-Starr-free blowjob.

      I don't get it.
    • Re:Ditch Raid5. (Score:3, Informative)

      by Insightfill ( 554828 )
      Agreed. RAID5 does fairly well in the "reads" dept., but starts taking serious hits in the write department, because every little 20K MS Word document needs to have it's parity written off to the side. Not as big a deal when the files are big enough, but can be a pain.

      Check out:
      http://www.cs.washington.edu/homes/savage/
      for the work of Stefan Savage, a member of the UCSD faculty who's done some good work in the field. Scroll down to the near-bottom for the paper on "AFRAID" and see what I mean. The idea is that if we delay the parity-write long enough to queue them up, we start approaching the speed of RAID0. There's a possible data loss here, but his premise is also that the MTBF rate for most drives makes this less an issue than power supply failure, controller failure, etc. (RAID was developed back when MTBF was closer to 20K hours.

      He also has some great work on DoS attacks, too. See the write-up at Ars:
      http://www.arstechnica.com/reviews/2q00/networking /networking-1.html

      • Agreed again. :-) But you may also consider RAID-50 (a RAID-0 of RAID-5's). Further, the 12MB/s (100mbps) network card is probably your bottleneck. Gigabit would be best, but FEC of 4 100mbps would give you 400mbps, or about 50MB/s.
    • If the data is important it may be dangerous to ditch the raid 5...
      • I would think if you're doing raid0+1 (stripe sets mirrored), then the mirror would take over - just make sure you have seperate controllers (don't want one to totally fry and kill data on both sets)...

        Also, if the database is huge (and the company values their data), I'm sure they're doing at least nightly incrementals (with weekly full, possibly?) -- Sure, you lose a days' work, but the mirror sets should keep you running without loss.

        Besides, a fire would take out a raid5 set just as easily as a raid0+1 set...
  • Benchmark your drive performace on the server to the drives. Maybee use bonnie++ [sourceforge.net] find out how much bandwidth you can get from your server [compaq.com] to the drives.

    Wouldn't hurt to max the machine out in SDRAM and add gig ethernet too.

  • One thing to keep in mind though is how much is this increase needed. I don't see why you wouldn't get the 500% performance increase. But if your users are only using 25 Mbps of the current setup, and there's no reason that you know of for this usage to drastically increase soon, then you might not be able to justify this change just yet.
  • by swillden ( 191260 ) <shawn-ds@willden.org> on Friday February 08, 2002 @09:48AM (#2973643) Journal

    It may be too early in the morning, and I'm just not thinking straight yet, but you calculate you'll have an STR of 538MB/s through the PCI bus which you're trying to shunt the data onto a 1000Mb/s network (notice the small b). 1000Mb/s is something less than 125MB/s, and that's complete saturation which Ethernet doesn't handle well. I'd be surprised if you get better than 100MB/s over that network. Sounds nice to me, but then I do fine with an 11Mb wireless network myself :-)

    You also called the box running this stuff a SQL server. Relational databases are very, very slow. I don't know what's under the hood of a ProLiant ML530 (and I'm too lazy to look it up) but I doubt it has the nads to process 528MB/s of SQL queries.

    • You beat me to my thought.... 1gig connection is definitely less than the 528MB/s he's looking to attain... I hadn't thought of the underlying processes tho.

      If this person truly needs that much bandwidth, I'd say it'd probably be best to have a cluster of machines to do the work, although that's just from a speed issue. His database would have to be able to scale to a cluster at that point.
    • I forgot to divide by 8! Change of plan: Wait for the 10 gigabit NICs to be released to the public. I guess I could place it on the 64-bit 33 Mhz bus.
      • I forgot to divide by 8! Change of plan: Wait for the 10 gigabit NICs to be released to the public. I guess I could place it on the 64-bit 33 Mhz bus.

        For a 10 gigabit NIC you would need a much faster local bus. Better at that point to change out the underlying motherboard and processors while your at it. People just don't understand numbers and their importance.

        This guy needs to really figure out where his true bottleneck is before changing things. Looking at the setup. The two optimizations I see are gigabit NIC or multiple 100Mbit NICs and switching to RAID 0+1. That's it. Without reallying looking at actual performance numbers and profiling I wouldn't touch it.

        Performance tuning for the sake of performance tuning is just asking for trouble. Do it when you know you need the extra performance or know you will need it soon. My bet he could get all his DB/Disk speedup via tuning the DB. He could get the communication speedup by going with a 64bit/66Mhz gigabit NIC for talking to the users.

        From the sounds of the system layout it looks like the original DB wasn't setup properly. Only the data segments of the DB should be on RAID 5 and then only if you don't have the hardware to go RAID 0+1. The rest should be on RAID 0+1 or just stripped. I also bet there aren't separate RAIDs for data, index, before and after image, and log files. Separate off the before image, after image, and log files first, then separate out the indexes, each to their own RAID. The indexes should just be on the fastest RAID 0+1 array you can afford. The before and after image RAIDs should be in the same speed class as the main data RAID. The striping on them should be short as they are only accessed sequentially. The data and index segments need fast seek and low latency to perform best.

        First thing is find out where the real bottleneck is. Don't just swap and hope you solve it. You don't want to swap something out to find out the replacement isn't stable.

    • Compaq ML530 can host up to 4-way Xeon processors and shittons of RAM. It's got drive cages (hotswap) for up to 14 drives - but that's not the fiber he's looking for.
      I'd get a dual port gig-e NIC in one of those 66Mhz PCI slots, and a badass fibercard in the other.
      OR, put two fiber cards in the 66Mhz PCI slots, and fill the rest with load-balanced 100BaseTX cards.
  • I'd echo the move-to-RAID-0+1 statement wholeheartedly. You could change very little and still increase sustained performance. In my own testing (moving 1GB of small files and a 1GB file separately), I'm reducing the move time by about 30-percent. That's on a DG Clariion, not a Compaq array, so ymmv.

    I don't claim to be an expert on fibre, but that speed seems slow. I don't know if the limitation is the array or the card, but it seems that would be the most promising way to eliminate the backlog. Sure, the 160MB/sec scsi controllers perform well, and that would be a fine upgrade. But 200MB/sec dual fibre connections (with failover support, hopefully) would be even better.
  • database? (Score:1, Insightful)

    by Anonymous Coward
    Great; you can transfer the data fast. Assuming you have any more data to transfer.

    Are the queries optimized for the database? the right indexes in use, etc? With bad index choices you could have any disk array you want, you can have any network configuration you want but it could still be slow.

    a machine running a database isn't a fileserver (and shouldn't be). Does it have enough memory to do it's work? Speeding up the disks is great; but you have to analys what the machine is actual doing with the data it is reading.
    • Re:database? (Score:4, Insightful)

      by Bryan Andersen ( 16514 ) on Friday February 08, 2002 @03:20PM (#2975742) Homepage

      Yep.

      Look and find the real bottleneck. Check which queries are being done all the time. Are there indexes for them?

      I've run into querey nightmares quite often. One DB I helped tune kindof had an index the DB could use to narrow down the search, but it still left over a thousand records to search through to find the needed one. They we calling this query a few times a minute durring busisness hours. A simple index addition and they went from 100% saturation on the disk IO system to less than 1%. Sounds extreme, but it isn't. Bad indexing can be a real killer. So can poor selection criteria in a query. Both were at fault in this case, it is just that the DB query optimizers could deal with the poor selection criteria once it had a better fitting index. Bad indexing and queries cause the DB to need to build temporary indexes whice usually only can be used for those specific query run instance. This takes disk IO, memory, and CPU time that can be better spent elsewhere. It also makes for much duplicated effort.

      Another optimization fuckup. One place decided what they needed was more RAM. Users weren't getting their queries back fast enough, and the network wasn't saturated. It turned out that they already had over three times more RAM than the total size of their DB. Net effect, one happy hardware salesman and no DB speed up. They had a CPU bottleneck. Going to an eight way system and distributing the RAM they had to the six new CPUS changed their problem over to one of network access speed. Adding three more 100Mbit NICs solved the new problem and gave them some breathing room. Disk IO still wasn't and issue, though write speed was looking like it would be the next bottleneck. They went from about 12% of max write throughput to close to 65% durring only a few nightly batch jobs. Even then it would only be an issue if they weren't done by morning. The rest of the time they spent near .1% disk utilization. Reads, what reads. After the hole DB got cached into RAM the system only wrote to the disks. Consider that when you are evaluating benchmarks.

      Most databases I see hare horrors when it comes to indexing and physical database layout. These are the ones put together by profesionals. The ones put together by amatures make the profesionals look like saints. If you put the before image, after image and data files on the same disk subsystem they all fight for IO time, and you loose speed. You'll be about half as fast. The before image and after image files can overlap their writes, but the data file can't have writes done to it until the before image data is written to disk. The after image data has to wait till the data segment writes have completed. Splitting them out onto their own disks allows for a significant increase in speed. The before image writes for one query can now overlap the data writes of the previously processed query. The after image writes can also be overlapped. Also get additional disk subsystems each for your log files and index files. Spread the IO out. This also provides better data integrity. If your data segment's disk subsystem quits, you can rebuild the DB up to the last commit from a backup and the after image file.

  • by fooguy ( 237418 ) on Friday February 08, 2002 @11:07AM (#2974099) Homepage
    Like the other posters said, start with ditching RAID 5 for RAID 10/0+1 (depending on your preference). RAID 10 (Mirror + Stripe) is my preference because of the higher redundacy - if one disk dies the whole stripe doesn't drop out. RAID 0+1 is faster but slightly less redundant. Either way, the parity overhead generated by RAID 5 is the death of a database.

    Your controllers are pretty fast, it's more likely your software config or your network. Are you running MS SQL Server, or something else? MS SQL Server requires some pretty specific tuning to get good performance (like telling it *not* to use all the RAM). How well are you objects tuned?

    How about your OS? What filesystem are the datafiles living on? Oracle supports RAW partitions, which allows you to eliminate the OS overhead from the database. We're testing the performace now of Oracle on different Solaris filesystems. You'd be surprised the differences between UFS, UFS with Logging, and Veritas File System.

    While I won't deny that gigabit ethernet is fast, it's kind of expensive (especially if your network infastucture isn't equipped to handle it). If cost is a concern, and you're in a switched enviroment (no hubs) you can add more 100Mb Ethernet NICs and trunk them for more bandwidth. In reality, I've never choked a 100Mb connection unless something was wrong (like end users writing nasty Crystal Reports).
    • Now, I'm not going to claim to be a RAID expert, I'm just a beginner, but it's my understanding that RAID 10 is NOT more reliable/redundant than RAID 0+1.

      Mylex agrees [mylex.com] that in a RAID 0+1, up to TWO drives can go out, as long as they are non-adjacent members of a ray. How RAID 10 could be more redundant, I don't know.
      • Actually, that's easy. Here are 2 layouts

        A A A A A : RAID 0/1
        B B B B B
        and
        A B C D E : RAID 1/0
        A B C D E

        In the first case, I can lose any number of A -OR- any number of B and still be good, but if I lose 1 A and 1 B...toasted data.

        In the second case, unless I lose a mirrored pair (Both A's), I don't lose data. I can lose A1, B2, C2, D1, and E2 (or any other non-paired disks) and still operate without data loss.

        Now, you wanna get really kicking? Add in plaid striping so your filesystems are build with block1=A, block2=B, block3=C, block4=D, block5=E, block6=A....etc. Now, if you do any reads or writes, you share your load across EVERY spindle in your hard drive set. Suddenly, your Fibre connection to the Raid array may be your bottleneck (PS. From real world experience using multiple 30 disk set (2x14 + 2 hot spare) Clariions for a bruiser of a server.)

        And the more drives you run, the less 0/1 makes sense......do you really want to rebuild that 50 disk Raid 0 from your mirrored set of 50 during production hours and cause the kind of load associated with a rebuild, since you only lost 1 disk? Or would you rather rebuild ONE disk from it's DIRECT mirror and then sync in with the rest of the 50 disks in the RAID 0 group?

        In real world, nothing beats a RAID 1/0 Plaid stripe monster. And nothing beats running 'ls' and seeing 112 disks light up.
        -- My personal bias, YMMV --
  • 528MB/sec ? hello ? on gigabit ethernet you can maybe get 70% of max throughput if your lucky. 1000Mbits/sec works out to 125MBytes/sec and thats sustained which you aint gonna get.
    You would be lucky to get 100MB/sec which your current setup delivers anyway.
    My advice : get a copy of oracle or some other non crappy database (postgresql), optimise the heck out of your queries (use a connection pool dammit), get 2-3 $50 100Mb/sec network cards and drop them in the server (i use 3com 3c905b's...accept no substitutes...the 905c's are crappy). Then do ip traffic load balancing across the ethernet cards (linux can do it easily with a kernel compile, solaris needs sun trunking software, all other OSes are irrelevant to me). and please tell me you tuned the OS properly (especially that damn SYN packet problem which i keep seeing other clueless admin whine about) for large numbers of connections.
    • Gigabit ethernet is switched and is this scenario I could see routing only packets to one or two middleware servers... in this case you would get a higher utilization I think.

      If 500 MB/sec of data transfer is needed for whatever application this guy is running, and he is the guy in charge of planning for it, the question is moot since the company will be out of business real soon.
  • by Twylite ( 234238 ) <twylite AT crypt DOT co DOT za> on Friday February 08, 2002 @11:48AM (#2974387) Homepage

    First, you need to consider that because of data locality, you are never going to reach the STR you are pondering. Queries would have to magically occur in such a way that the disk load was perfectly balanced between all drives. Your worst case STR is therefore 43Mb/sec, ASSUMING you benchmarked the STR on random access rather than sequential access.

    Second, as many others have said, use RAID0+1 ; RAID5 has overheads that (can) involve other drives in the chain.

    Third, your Gb ethernet is gigaBIT. That provides for a maximum throughput of 125Mbyte/sec on a switched network. To improve on this you could use multiple Gb NICs ... but they are on the PCI bus, along with the RAID adapters. This doesn't necessarily halve performance (it could in the worst case scenario), but the degradation depends on the size of the request/response versus the data that must be retrieved and processed.

    Forth, you have a couple of more obvious software overheads. Issuing a read or write takes time. Software has to interpret the query, formulate the strategy to come up with a solution, and make read requests. Those requests are processed by file system handlers, which translate into raw disk operations. This means your file system and database software are adding a lot of overhead and latency that will reduce the STR.

    Basically you are wanting a heck of a lot of memory, and a vast amount of processing power to keep up with the potential of the hard drives.

    • Now, here's something that's been bothering me... He said he's got a 500MB/s STR on the drives. Now, I'm not an expert on DB design, but I really doubt he needs 500MB/s of bandwidth to answer queries.
      • You're absolutely right. I was considering whether or not to raise the point about multiple NICs, but then for a fault tolerant system you should have them anyway from a redundancy point of view.

        The question of necessity (in terms of bandwidth) comes down to the client-server design rather than the database design. The database server may provide lots of information lists to the client (say, a customer list, and then a query will return detailed customer information) which will require a huge amount of bandwidth if there are a lot of clients. OTOH the clients may be primarily interested in correlative queries, in which the database server is doing a lot of processing to come up with a relatively small answer.

        I think the main issue was that the original poster seemed to make the mistake of thinking gigaBYTE ethernet, not gigaBIT, and implying that the networking would not be a bottleneck to the disks.

  • by duffbeer703 ( 177751 ) on Friday February 08, 2002 @11:56AM (#2974420)
    RAID-5 and relational databases are a dangerous mixture. Not only does RAID 5 give you a 50% performance hit, but there are cases where data will be lost or corrupted without you ever hearing about it.

    In the event of partial media failure over time with one or more disks in a RAID set, errors can be introduced into your data that will not be detected by partity checks. Once the drive runs out of sectors to remap you'll eventually have data that cannot be reconstructed by the ECC code on the drive.

    Also, in the event of total drive failure, the rebuilding process performed automatically by the controllers can reduce overall performance by up to 85%.

    RAID 10 is the way to go. Not only do you get highest possible level of performance and redundancy, but you suffer no performance hits during a single failure.

    Don't read this post and scoff "I've never had drives break like that". I've worked in some large data facilities (ie 400-500 TERAbytes of storage) and have entire defective batches of 200 brand-new disks. Although hardware failures happen much less than they did in the past, they can and do happen every day.

    So my advice to you:

    1. Keep your current Fibre-channel configuration.
    2. Buy more drives than you need, max out your array.
    3. Backup the data, ditch RAID-5 and build RAID 10 volumes.
    4. Reload the data, carefully plan where your busy tables and transaction logs are to avoid hot disks.
    5. Conduct a through analysis of how your data is accessed and rearrange the volumes accordingly. Re-analyze everything every quarter.

    You have reached the point of maximal return looking at your performance issues from the POV of a system administrator. You need to get a very smart DBA or start reading at this point. Designing your physical database design around your queries is the only way to pull signifigant performance increases out of your system. (except for getting rid of RAID-5)

    Also, your performance expectations are too high for x86 equipment. You are never going to push out 100MB/sec from a database, even with trivial queries and optimized tables.

  • Nobody is complaining about performance. The reason I'm doing this research is mostly for my own educational purposes, but if I can increase the performance for relatively little costs, great! I picked this server because it's our only high end server. So much money was put into buying a "server in a box" and I'd like to get a better understanding of everything involved. While I'm on the topic of self education, last night I started reading the PC Guide [pcguide.com]. I would appreciate suggestions on additional reading material that can help me understand PC architecture, performance, etc.. For those interested in security, I also downloaded the Rainbow Series and the CCITSE v2.1 for later. <Big Grin>

    In my original post I was thinking giga"byte" when I picked the NIC (My bad). I could wait for the 10 gigabit NICs to be released or I could team up several 100baseT NICs as another reader suggested (Thanks!). I should have thought of that, but nobody's perfect. I also forgot that the NIC(s) needs a slot. I would have to place it on the 64-bit 33Mhz bus which would further decrease the overall STR from the original plan. I double checked the specs on the Mobo. The server has 3 PCI buses, not 2 (Again, my bad). There's the 64-bit 66Mhz bus with 2 PCI slots, a 64-bit 33Mhz bus with 5 PCI slots, and a 32-bit 33Mhz bus with 1 PCI slot. It's got 2 P3 800 Xeons and 1.5 GB of memory. It's running NT4 (SP6a and post patches)and SQL 7 (SP3 and post patches).

    I'm extremely green in the database arena. In fact, it was just a month or so ago that I learned Access. Don't laugh! Again, suggested reading material is appreciated. I'm sure we all have an area in computers that we ignored completely until it was forced into our lives. My stronger areas and the focus of my career path are networking and security (System, network, physical, etc.).

    In regards to the software our company runs, please don't bash MS. It won't help me learn anything I don't already know. I don't agree with their business practices and think that open source is the way all software should be for the good of mankind and progress. On the other hand, I did NT4 server support for Microsoft (Under one of their outsourcers) and prior to that, Windows 2000 Professional support. I have a firm belief that they're great products if you know how to use them properly. The same goes for XP (I flunking HATE 95/98/ME). When it comes to vulnerabilities and exploits, the only flaw is the administrator that doesn't install patches and doesn't understand why a properly configured firewall is a good idea. There's not many worms or hackers that exploit unannounced vulnerabilities.

    • Your last paragraph makes me think you're a troll.

      In case you aren't. Get the book "An Introduction to Database Systems" by Date

      Also look at the comp.databases.informix newsgroup. There is alot of good high-level (and troll-free) discussion of database concepts and issues in addition to informix-specific stuff.
  • Reads from Raid 5 spread over lots of disks tend to be pretty quick (quicker than Raid 0+1 / 10) since you've got more spindles active. There very well may be some kind of highly optimized raid 0+1 style controllers that utilize all spindles for reads -- that would negate the raid 5 advantage there, but I haven't seen such a controller yet.

    Writes are what give you the 4x performance hit. For a read-intensive database (e.g.: datamart, datawarehouse) it might make sense to trade off some write performance for read performance.

    As far as the question goes, it doesn't make a lot of sense. If you're worried about raw, sustainable transfer rates (e.g.: ftp downloads) then your tuning options are very different than if you are worried about how many transactions per second your DB is able to handle.
    • Raid 1/0 with plaid striping (the process of saying Block1=diskA, block2=diskB, block3=diskC, block4=diskA) has got Raid 5 beat down for both writes AND reads for exactly the reason you stated...."more spindles active". Standard RAID 1/0 does say 'fill disk A, then overflow into B, then into C', but that is what plaid striping is for.
      If you have the money for the hard drives, since we are still talking a RAID 1 (read: double your hard drives, double your fun) based solution, then there is NO reason not to use 1/0 plaid. Why accept the myth that the hard drive HAS to be the bottle neck for transactions? And there is nothing quite like running an 'ls' and seeing a whole panel of drives light up.

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...