Comment Re:Makes no sense... (Score 1, Informative) 464
Speaking as a systems admin for compute services and one of two who looks after HPC equipment at the university i work at I'd say the following.
Why would you move the modelers to Linux from Solaris? There is no real advantage....
Moving compute work away from the Solaris/SPARC combination for us isn't only an advantage from a cost perspective it's also an advantage for compute capability and most of all what users require. One of the big applications used by the scientists I look after shared compute resources for is matlab. As for R14 (v7) matlab only supports Linux/x86_64 for 64bit computing so if you want to do analysis that uses more than 4gb of ram you're limited to either Linux/Opteron or Linux/Intel w/EM64T. Incidentally we're running on a V40z which while it's a fairly impressive peice of harddware does have some annoying things about it. One that springs to mind is the LOM card is absolute rubbish,
Sure a Beowulf cluster is a nice piece of hardware, but hardware can only compensate a bit for programmer productivity...
I'd mention that we don't use beowulf clustering within mid-range compute. On the HPC platform we do have an SGI Altix which gives us one major advantage over virtually any other system, a single system image. With our current system we're able to address 64GB of ram from anyone of the 32 processors in the system. Clusters really don't have this capability without taking major performance hits when MPI/etc tries to reach out accross the interconnect to remote memory. the ccNUMALink running at 20-40Gbit/sec has some massive advantages. It's also worth noting that the 32P Altix cost 100k less than a 8P SunFire 6800. Not to mention the massive performance advantages that Itanium2 processors have. While most will go "eh?" to that statement we've found that I2's chew through large datasets like soft butter. The speed of these processors can be hard to realise at times due to a lot of optimisation occouring at the compiler level or through the adition of the Intel math library kernel modules.
But it *will* set them back in productivity while they move to different compilers and adapt the execution of the program to the Beowulf environment.
This may not be true for all areas but for us moving from Solaris to Linux wasn't that big an issue compiler wise. Most of the users were already using GCC on the Sun systems because the Sun Forte (SUNWspro) compiler was unable to deal with the majority of the code that was being developed by their peers in other universities or due to issues linking to external libraries for applications such as netcdf/hdf/grads/etc. A large proportion of code these days is written by users on the cheepest possable platform which is generally a x86 system running linux and that's what the code is tested on. Few have access to Sun/SGI/IBM/SuperDome style equipment as development machines so the big users end up either having to rewrite large chunks of the libraries or their code or simply move to the more commonly accepted compiler and toolchain.
You mention Trusted Solaris many times. Trusted Solaris doesn't ship with hardware as standard and many systems shipping today with Solaris 10 which does have a large majority of the Trusted featureset in it is being removed and replaced with Solaris 9. 10 isn't accepted within the VAR community yet as a "supported platform" for the wide range of applications being run by scientists or even for more mainstream corporate applications. While we do run some systems with S10 it's been for specific features such as zones so we could run multiple squid processes on a single hardware platform.
For the original article. In the environment of a shared compute facility there should be no reason for a user needing root access to a machine. If they are wanting an application installed for mulitple users to access then they should be handing the app to the administration team to install and maintain. One thing that does annoy me within my environment is that the support contracts for the majority of scientific applications is "owned" by a scientist come computer support officer. This can make sourcing new versions of applications and getting contract details to log bugs with the vendors a pain.
if users are some how managing to screw up file permissions then it sounds like a intro to unix manual needs to be knocked up for them to read over. Sure i have the occasional user dig themself a nice little hole but for the most part as long as their applications work and the system runs they are happy.
Other things would be to try and keep a good global environment by default, trouble occours when people have hevily modified their login scripts to source specific bits and pieces and as versions of libraries are upgraded they are still referencing the old ones in their local .cshrs/.bashrc files. Having aliases setup to make rm = rm -i for all users and other simple things like df = df -t ext3 to reduce the posibility of deadlocking on stale nfs mounts.
Why would you move the modelers to Linux from Solaris? There is no real advantage....
Moving compute work away from the Solaris/SPARC combination for us isn't only an advantage from a cost perspective it's also an advantage for compute capability and most of all what users require. One of the big applications used by the scientists I look after shared compute resources for is matlab. As for R14 (v7) matlab only supports Linux/x86_64 for 64bit computing so if you want to do analysis that uses more than 4gb of ram you're limited to either Linux/Opteron or Linux/Intel w/EM64T. Incidentally we're running on a V40z which while it's a fairly impressive peice of harddware does have some annoying things about it. One that springs to mind is the LOM card is absolute rubbish,
Sure a Beowulf cluster is a nice piece of hardware, but hardware can only compensate a bit for programmer productivity...
I'd mention that we don't use beowulf clustering within mid-range compute. On the HPC platform we do have an SGI Altix which gives us one major advantage over virtually any other system, a single system image. With our current system we're able to address 64GB of ram from anyone of the 32 processors in the system. Clusters really don't have this capability without taking major performance hits when MPI/etc tries to reach out accross the interconnect to remote memory. the ccNUMALink running at 20-40Gbit/sec has some massive advantages. It's also worth noting that the 32P Altix cost 100k less than a 8P SunFire 6800. Not to mention the massive performance advantages that Itanium2 processors have. While most will go "eh?" to that statement we've found that I2's chew through large datasets like soft butter. The speed of these processors can be hard to realise at times due to a lot of optimisation occouring at the compiler level or through the adition of the Intel math library kernel modules.
But it *will* set them back in productivity while they move to different compilers and adapt the execution of the program to the Beowulf environment.
This may not be true for all areas but for us moving from Solaris to Linux wasn't that big an issue compiler wise. Most of the users were already using GCC on the Sun systems because the Sun Forte (SUNWspro) compiler was unable to deal with the majority of the code that was being developed by their peers in other universities or due to issues linking to external libraries for applications such as netcdf/hdf/grads/etc. A large proportion of code these days is written by users on the cheepest possable platform which is generally a x86 system running linux and that's what the code is tested on. Few have access to Sun/SGI/IBM/SuperDome style equipment as development machines so the big users end up either having to rewrite large chunks of the libraries or their code or simply move to the more commonly accepted compiler and toolchain.
You mention Trusted Solaris many times. Trusted Solaris doesn't ship with hardware as standard and many systems shipping today with Solaris 10 which does have a large majority of the Trusted featureset in it is being removed and replaced with Solaris 9. 10 isn't accepted within the VAR community yet as a "supported platform" for the wide range of applications being run by scientists or even for more mainstream corporate applications. While we do run some systems with S10 it's been for specific features such as zones so we could run multiple squid processes on a single hardware platform.
For the original article. In the environment of a shared compute facility there should be no reason for a user needing root access to a machine. If they are wanting an application installed for mulitple users to access then they should be handing the app to the administration team to install and maintain. One thing that does annoy me within my environment is that the support contracts for the majority of scientific applications is "owned" by a scientist come computer support officer. This can make sourcing new versions of applications and getting contract details to log bugs with the vendors a pain.
if users are some how managing to screw up file permissions then it sounds like a intro to unix manual needs to be knocked up for them to read over. Sure i have the occasional user dig themself a nice little hole but for the most part as long as their applications work and the system runs they are happy.
Other things would be to try and keep a good global environment by default, trouble occours when people have hevily modified their login scripts to source specific bits and pieces and as versions of libraries are upgraded they are still referencing the old ones in their local