Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Comment Re:"they should have used ZFS or btrfs" (Score 5, Interesting) 304

Dubious backups? Depends. We had a system which was a 6TB cluster that was notoriously difficult to back up. This went on for years, it took too long, failures caused issues downstream etc. Then someone took a moment to realise that the application was not capable of re-using that 6Tb of data if it was restored - once the data came in it was processed and archived. To recover the application all they had to do was backup a few gig of config and binaries, and restart slurping data from upstream again. Viola - backup stripped down to nothing, 6TB a day of data less to backup, and next to no failures as it was now so quick to backup.

Then there is the case of an application which the vendor and application developer signed off on using a backup solution using a daily BCV snapshot. What they failed to tell us was application not only held data in a database, but in a 6G binary blob file buried deep in the application filesystem. If the database and the binary where out of sync in any way, it could mean missed or replayed transactions or generally that the application was inconsistant. As this was an order management platform, that was bad. You can guess the day we found out about this dependancy.... yup, data corruption, bad vendor advice screwed the binary file and all we had to go on was a backup some 23 hours old where the database was backed up an hour after the application. Because of a corresponding database SNAFU, the recover point was actually another day before that, with the database having to be rolled forward. It was at this point we found out the despite the signed off backup solution, the vendors documented recommendations (that were not supplied to us) was that the only good backup was a cold application one - not possible on a core order platform. Thankfully after some 56 hours of solid work the application vendor managed to help sort the issue out and the restore from backup was not actually needed. The backups were never really tested as the DR solution worked on SRDF - the DR consideration for data corruption was never really part of the design (from a very high level, not just this platform).

So there you have it. Two dubious Enterprise backups - one not needed, the other not usable.

Comment Re:Their site... (Score 4, Insightful) 454

Because unless they state they are only publishing positive reviews, it is misleading to show that all feedback from "users" is positive. It is deceptive to filter out the negatives as it misleadingly portrays the product as good based of what is supposedly unbiased user feedback as opposed to vendor advertising.

For advertising, yes, of course you only show positive reviews, it stands to reason to choose what supports the product (movie etc).

Comment Re:Virtualisation missing (Score 1) 113

The T-series (sun4v platform actually) have LDoms which are very similar to LPARs, but a bit more simplistic in their implementation. You can virtualise storage, networking on a control domain (i.e like a VIO server) and create domains out of the available threads and memory on the box. So with this you can do individual OSes in each LDom. It even now has dynamic migration where you can migrate a live running LDom between two machines (akin to VMWare Vmotion or the LPAR equivalent, the name of which escapes me now)

I see zones and LDoms complementing each other though. I see zones as great of "environment" isolation where you can make multiple copies of the same application in zones using the cloning and integration with ZFS. You make a LDom to seperate any applications that have real OS versioning restrictions, and put zones within the LDOMs to seperate environments.

That way you can patch a particular application as you want, but you can easily provide new/fresh/cloned application environments using zones.

Comment Re:Not a good idea to publish this (Score 1) 113

What workload needs 64 POWER or SPARC procs anymore? More often than not if it needs that much CPU it is horizontally scalable anyway, in which case buying 2x as many T series boxes would be cheaper anyway. Most of the time the reason you have boxes with 64 CPUs installed is for partitioning with LPARs or domains.

And for scaling it all depends what you are doing with the box. We have an application which consumed a full 48 core E6900 (i.e the box was 100% on CPU) because it ran all its components on the one box. We moved a component that consumed 50% of the load onto a single T5240. The T5240 was only consuming 15% CPU with improved response time (granted it was a Borland java application which suited a T-series box a lot better).

For the cost of a E6900 uniboard, we could buy 2-3 T5240s to replace the E6900 and the T5240s would handle 6 times the load.

Comment Re:Pathetic accusations (Score 2, Interesting) 189

As I recall it was something to do with the routers that if they lost power, they lost configuration - something to make sure if gear was stolen then it didnt come up with any of the secure networks details.

From memory someone viewed this as him setting up some sort of timebomb instead of being good security practices, and charged him as such.

Comment Re:That shows a serious lack of initiative (Score 1) 1164

Well, a fallible point in your logic is assumption that the creator was "god", where as creationism in purity could/should pertain to any "creator". It doesn't, it presumes in the certainty of "god" as the creator and steps from there. Using your argument I could ask for you to present evidence of this "god" as the creator.

Don't get me wrong - there is nothing to prove the argument of evolution except that what it is based on is provable, and demonstrated in nature. It does not complete the picture back to the dawn of time necessarily, but the pattern is so consistent and simple that it needs to make far fewer leaps of "faith" to paint the picture.

I see it more as the arrogance of "man" as a species to presume we could not have evolved from apes or supposedly lower species, which also fits with the invalid belief we have "dominion" over all life on earth. Maybe we are doing pretty good now with this whole "technology" thing since we started by throwing rocks, but nature has shown time and again we are not in charge. The belief that we are in the image of "god" fits more to comforting us to believe we are actually special, not just along for the ride.

But good on you for keeping an open mind.

Comment Re:So, what is the status of btrfs? (Score 4, Informative) 241

For hardware support it really depends what segment of the market you are arguing about. If you are talking white box, low end mostly self supported stuff then no doubt, Linux wins hands down. But as a sysadmin I find Linux to be the of the most painful platform to work on compared to Solaris or AIX - predominantly because of the lack of standardised, stable and properly supported management interfaces.

Fibre channel support is a joke. Sure, for the most part you can dynamically bring stuff in and out, and udev goes a short way to bringing some consistancy. The problem is when something goes wrong you are left with pretty much just rebooting - messages tell you nothing - is the device there or not? Usable details are buried away in /proc and /sys and typically are only useful for developers. Solaris and AIX had cfgadm/cfgmgr and lsdev and friends to tell you what state things are in or what has happened. There are useful and informative error messages (typically). So far on RHEL 3/4/5 all I ever see is odd octal dumps from drivers when errors occur, and wierd hangs and IO errors when devices get broken. It gets worse as you change fibre drivers and versions. Options which exist in one disappear in others. Vendor drivers add customisations which cause other issues.

The lack of stablity in terms of being able to do things between versions gets me as well. On AIX/Solaris you write a script for Solaris 8, and it just works going forwards to other versions. Solaris 10 changes things a bit, but for the most part you can still poke around the same places or the same way to get info back. In short they tend not to break things that work.

Linux goes the other way - a change is made, and thats that, it seems to be up to you to either track or figure it out. You find yourself having to customise things for many many variations of platform - not just major versions, but minor versions as well. Changes to config file locations, the ways those files are defined etc.

Don't get me wrong, I got into UNIX on Linux and I wont dispute its strength in drivers or community, but that community is not "Enterprise" focused. Its why I use it for my PVR and not my file server. The rapid changes in Linux are why the DVB-T cards I got became supported so quickly after the hardware changed. I get the differences, but its not one size fits all.

Comment Re:So, (Score 1) 241

Read the ZFS White Paper. Just because the disk checks its blocks, doesn't prevent other sources corrupting, overwritting or generally tampering with data. For example, say your el cheapo fibre card corrupts one bit in every 2 billion writes - on disk its fine, SMART never sees it, it never complains. When ZFS reads the corrupted blocks it will see a checksum error, and repair if necessary.

It also doesn't cover the case of deliberate/administrative corruption such as accidentally overwriting the wrong disk etc. With a normal mirrored device you could read off either side, it simply returns the data and the data would be blockwise correct. With ZFS again you would see failures and if possible it could correct. In fact this is how the early demonstrations of ZFS would work - simply using dd to clobber one half of the mirror and watching it fix itself.

And for ZFS I would absoultely recommend ECC memory. For the exact case that I had a blown capacitor cause random memory errors on a motherboard I had, and any new or modified file would throw up checksum errors when re-read. Without ZFS I probably would not have known until I got some weird panics from corrupted metadata or something.

Comment Re:Well I suppose... (Score 3, Insightful) 370

Really? What sort of test was it?

We took a Java application off a E6900 using 35% of 48 1.35Ghz US-IV cores. We put it on a T5240 with 16 1.4Ghz cores we saw it only use 14% of the machine with improved user response time.

We also ran a database benchmark for some tests we were running between some AIX and Linux boxes and threw it against a T5240 running Oracle 11g for comparison. Because it was predominately a single threaded operation it ran slower than the 2.2Ghz Power5 LPAR, but the overall difference was about the same ratio as the difference in clock speeds. The thing to note was the machine was only a few percent utilised, so we could have run another 16 or so instances and coped easily.

These machines are workhorses. Granted, you need the right workload but highly parallel/highly transactional work like java web applications or web serving absolutely fly.

Slashdot Top Deals

Life is a healthy respect for mother nature laced with greed.

Working...