Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Sun Microsystems

Sun Releases Grid 5.2 for Linux 90

Linux_ho writes "Grid 5.2 is a distributed processing engine that runs on Solaris, and now Linux. Apparently it has been released under an "an industry-accepted open source license" but I couldn't find out which one. The product was designed to make use of the spare cycles from any idle Solaris or Linux machines on your network. Sun mentioned in the press release that it can be used for frame rendering, but I bet you can come up with some other interesting applications. Here's the FAQ."
This discussion has been archived. No new comments can be posted.

Sun Releases Grid 5.2 for Linux

Comments Filter:
  • by Wills ( 242929 ) on Friday February 02, 2001 @06:35AM (#461899)
    You might want to check out MPI-6.3.2 (aka LAM) [nd.edu] which has been around for a lot longer than Grid. MPI is a library for writing parallel programs that execute on groups of workstations.
  • by mliggett ( 144093 ) on Friday February 02, 2001 @07:19AM (#461900) Homepage
    If you have a heterogeneous environment and are interested in harvesting spare cycles, then Condor [wisc.edu] might be a better bet than Sun's Grid. Condor is free as in beer, but the license does not cover source, and source is not freely available at this time. Condor works on more UNIX variants and also works (in limited fashion) on NT. I have done a little work with Condor and it is a very advanced and well-put-together system.

    It is very easy for each machine-owner to restrict or preference which jobs will run on his machine, for each job to preference certain machine attributes, and also for the queueing system to fairly distribute net CPU time across all active users of the system. All of this works using a very simple C-like language in which you express you desires.

  • It has been removed. Thanks whoever for removing the gross picture that I responded to...
  • Beowulf clusters might speed things up at a lower cost.

  • > This doesn't look like the GPL I know...

    They never claimed it was under the GPL. The press release states, "Sun will distribute Sun Grid Engine software under an industry-accepted open source license in order to accelerate the adoption of the distributed computing model."

    Take note that "industry-accepted" is an adjective that means absolutely nothing other than there are other applications distributed under the same license. Also take note that, "in order to accelerate the adoption of the distributed computing model," roughly translates to, "so more people will use it."

    If you go to the "Download" page, you'll find the license statment as in all other Sun downloads. It says:


    The product is free and there is no software enforcement that restricts distribution. The product is issued under a standard Sun Binary Code License.

    As many of you already know, the SBCL is Sun's version of Open Source that allows them to control the modifications done to the product.
  • And it HAS been removed. So quit gripping; Oh, wait, I see you are the guy who posted the picture of you with your fist up you butt and you are disappointed that no one can see you now... Right?
  • It sounds in concept like condor [wisc.edu] being developed by the CS department at UW-Wisconsin Madison, unfortunately the condor folks aren't letting their source code out except for special special cases, and then you have to beg them for it.
  • It isn't GPL. From the licensing section of the download page:

    The product is free and there is no software enforcement that restricts distribution. The product is issued under a standard Sun Binary Code License.

  • It's still there. They don't remove posts, they moderate them down. Right now it's at -1: Troll. You probably don't see it because your threshold is set to 0.
  • You want to see pictures of fat naked men with their fists up their Butts. Sorry, I didn't mean to piss off a bunch homo's who NEED to see this kind of disgusting image.

    I did not realize that slahdot is where men go to see naked men doing lewd acts to each other.

    To each thier own, but that is certainly not for me!
  • they just lie and tell you what you want to hear then they do what they wanted anyway.
  • I think this would be a great idea. Perl would be perfect for this. Consider:

    • Perl is widely ported to many many architectures.
    • Perl has a large library of existing code, including many high performance math libraries, many of which are already highly portable and fast, being written in X-Sub C or C++ code. (The fact that perl is slower than one would like for scientific computations is mitigated by the fact that all compute-intensive operations could be implemented by simply calling into C++ libraries.)
    • Perl has a number of schemes for executing scripts in secure "boxes," such as the Safe module, which limit the types of operations available to those which may not modify the host computer.
    • Perl allows one to combine program code and data into a single stream, by using the __DATA__ filehandle.

    Really this could be implemented very quickly, Clients could simply HTTP GET datablocks (complete perl scripts) and HTTP POST the results.

    The data integrity could be ensured by randomly testing clients by sending them data blocks for which answers were already known. The percentage of known answer test blocks to unknown answer work blocks could be varied with the perceived level of trust in the Hive.

  • by Dr.Dubious DDQ ( 11968 ) on Friday February 02, 2001 @08:34AM (#461911) Homepage

    At least, it doesn't look that way to me, browsing Sun's documentation

    The nice thing about Mosix is that it automatically handles migrating any existing processes to wherever it thinks it will be finished the quickest. Last time I played with Mosix, I put it on an old P-100 Laptop with no L2 Cache and a K6-2/350. When not connected to the network, using LAME to generate MP3's on the P-100 went at about 15% of "real-time" speed. Connected to the network, it went at about 85% since the process could be automatically migrated to the K6-2. (It goes at about 105% or so when run "natively" directly on the K6-2/350). I suspect it would have been an even more dramatic difference had I been running on something faster than a 10BaseT Hub for the networking. Other than the kernel patch for Mosix, all of the software on both systems were standard "off the shelf" linux programs. (Compiling, I noticed, also went substantially faster, using plain old GCC, for example).

    It looks like with "gridware" you actually have to "submit a job" to the handler using a separate program. I can't tell for sure just browsing the documentation whether you can submit ANY process, but it does look like it has to be done 'manually' in any case.

    Gridware looks pretty neat, but I get the impression it'll be of more use to technical people who have need to distribute particular types of jobs, and have the resources to set up a "compute farm", and have technical enough users to make good use of it. Other than the installation, Mosix doesn't seem to have this limitation (but on the other hand, Mosix is Linux only [I think even ix86 Linux only, but don't quote me on that] and requires patching the kernel).

    Now, if Mosix would get a 2.4 patch out, I could get it set up again at home...

    "They have strategic air commands, nuclear submarines, and John Wayne. We have this"
  • by Anonymous Coward
    There are a lot of features I wish they'd impliment into the core kernal, but unfortunately will probably be rather slow to do. ACLS, XFS, JFS, ext3, etc. (They finally did ReiserFS.) I suspect Mosix will be even further down the list.
  • when I was trying to download StarOffice for a Solaris machine at work. Was going to look into it but by the time I filled out Sun's humongous registration form multiple times, tried to DL StarOfiice multiple times, and getting a message they wouldn't approve my DL for export I lost all interest in Sun. The funny thing is that I'm in the US and work for a large US company. Sorry Sun but you have got to get rid of the Microsoft style tactics before I'll come back.
  • yup. and condor can also be attached with gnu queue which also does via distributed job processing via shell scripts for finer grained control of which servers process which jobs.
  • by Anonymous Coward
    Mandrake (I'm pretty sure) actually ships a MOSIX-ised kernel. It's NOT the kernel that's installed at first, but the MOSIX-ised one is just an rpm away after you've intially installed. I don't think anyone has taken this yet in the direction of a standardized commercial application, as in Redhat Professional Gold2000 Webfarm-in-a-Box. If they have I haven't heard of tests and success stories yet. Linux-VirtualServer which works at a protocol level (versus MosiX's general purpose process migration) has received more attention as a high availability option and method to get heavy traffic accomodated thru load balancing and fail-over.

    There is also TurboLinux's EnFuzion clustering which is somewhat like MOSIX's general purpose clustering tech which is in fact already being used by for example J.P.Morgan in NYC and London in "kilonode" strength for derivatives analysis. Actually it simply distributes loads of data from the same set to be processed in identical way by any given number of hosts on your network. So it isn't a for-real supermachine the way MOSIX is (many boxen, shared mind) Why more isn't known about TurboLinux's product isn't clear to me. I suppose it could be because the EnFuzion product runs not only on Lunix but also on NT (as well as Aches, Irritix, Horsepucks, and Slowaris), and Linux people aren't hot for a solution that makes NT look good. Ever. What's worse, it's not GPLed.

  • Yeah... That'll solve all their problems. Everyone knows that goatse.cx is the only gross picture on the Internet.
  • by volsung ( 378 ) <stan@mtrr.org> on Friday February 02, 2001 @06:45AM (#461917)
    I've had an interest in a distributed computing framework that would allow people to create a project like seti@home, distributed.net, and the like, without having to solve the same cross-platform issues every single time.

    The basic idea is to have virtual machine (of sorts) that provides an API friendly to algorithm implementation. (Lots of math and data manipulation functions) The virtual machine can limit both CPU utilization and memory/disk usage by the actual distributed program. The program is written in a scripting langage (grab your favorite one) that can be compiled on the fly. The API functions would be implemented in the fastest possible way for each platform.

    You could designed the virtual machine so that users could easily add programs to it for background execution. The client's security would be ensured by the resource limits enforced by the virtual machine and the lack of "dangerous" features in the scripting language.

    I never was able to solve the data integrity issue in a satisfactory way, though. Rogue clients in this scheme could always submit bogus results to the server. That's not catestrophic, but it means that the distributed platform could not be used in an uncontrolled environment like the Internet. If anyone has some ideas on how to solve this problem, feel free to post or email me. (Or you could go patent them and maybe make yourself some money.)

    Oh yeah, I also thought that "Hive" would be a cool name for such a program. :)

  • This is an open forum and people are allowed to post what they want, whether you feel it's appropriate or not.

    You should set your threshold to +1. The post you are refering to has been moderated down to -1: Flamebait. Since it was posted by an AC, it would have started out with a score of 0. If you had your threshold set to +1, you would have never even seen the post to begin with along with all of the other AC posts, flamebaits etc.

  • How about "A community-accepted open source license" instead?

  • by sql*kitten ( 1359 ) on Friday February 02, 2001 @06:49AM (#461920)
    Beowulf clusters might speed things up at a lower cost.

    Which means buying a roomful of kit to build one out of. Grid is designed to run jobs on your existing hardware while it's idle - the rest of the time, they're all still general purpose, interactive workstations running regular applications. The Beowulfs of which I am aware use dedicated hardware.

  • These days I don't think the bottle neck for the average computer network is the cpu. More likely FSB or the PCI bus. Disk access is also a bottle neck but no as much these days.
  • That sounds like a rather long oxymoron.

  • I like the XML idea. Transparent conversion from structures (in the C sense) to XML would be a must for the Hive API.

    Are there any scripting languages that could be easily modified to serve as the basis for such a system? Candidates should be:

    • Securable - Memory management, package access, and resource access (disk and network) should be strictly controllable.
    • Fast - On-the-fly compilation that may cause a performance hit at the start of processing, but in the long term performance approaches that of compiled code.
    • Extensible - Ability to link to native C/assembly/whatever libraries that implement the API.
    • Nice to program - Syntax is functional and/or nice to use.
    • Modifiable - Can be modified to add any of the above features which are missing.

    I'm leaning toward Python simply because Guile will scare away potential users and Perl would be too hard to lock down and/or modify. Does anyone know about how Python scores on the above criteria?

  • Has anyone actually tried to D/L the linux version yet? You get a pretty button that's just a GIF at the end of it.(give name, logging in,etc...) BROKEN!
  • You need trusted server(s) to control the jobs.

    You don't. You need knowledge of clients sufficient for your task. You can deal the job for them as necessary. Think - uuh, what's the name of that distributed "napster replacement" file sharing system again? A network of clients, each propagizing it's available resources (processing power, memory, storage; price for those?).

    two or more distant clients and compare the results.

    If there's a virus/worm rampant changing the way the clients compute, N results can be equal but still wrong :( Need workaround for this. Then again, the risk is always there even with "trusted" environments.

    Jobs should still be anonymous and abstract, even if the client owner should have right to see what kind of things are happening on its machine. An obvious job should be split into less obvious parts.

    "The Net is vast."

    WIth this I meant there's always more clients available elsewhere than you have on your network :)

  • this is to inform you that you are infringing on copyrights, trademarks and patents held by apple computer inc of cupertino califoria. the term mosx refers to mac os x and that abbreviation, similar variations, and colored plastic are the sole intellectual property of apple. please cease and desist immediately
  • Gridware is definately an also-ran in the distributed environment at the moment. Companies who use this type of product generally go for LSF [platform.com], a commercial product. Many big IC design houses use LSF to use the spare cycles from workstation cpu's + to run their compute farms. Sun is one of these companies. If you want a GPL equivalent go for GNQS [gnqs.org].
  • Good info. I was moving away from Perl I didn't think it would be easily securable. I hadn't considered the Safe module, however.
  • As a follow-up: Python also has a feature for creating a "sandbox" for Python programs called RExec (for "restricted execution").
  • GridEngine also comes with a "grid-enabled" interactive tcsh, so you can have an interactive shell running which is actually spawing work all over the compute farm, as resources are available.

    Sweet! How would that work?

    tcsh% hostname
    tcsh% hostname
    tcsh% hostname
    tcsh% hostname
  • Could this maybe be used on Universities where there is a need for massive computing jobs? There is usually less than 50% full in our computer halls during day and most of the night its basically empty. This seems like a waste of cycles, couldn't this be used for academic research or SETI like activity, I'd rather trust the data from a known source rather than data from whatever crackhead on the internet. Maybe the schools could sell of CPU time even? That'd be great, hudreds of Spacs and PSs chugging away all night every nigth... well maybe not in California.
  • I'm doing a lot of replying to myself, but I keep finding good info:

    This guy explains how to convert XML DOM objects to Python objects: A Closer Look at Python's [xml.dom] Module [gnosis.cx]

  • I never was able to solve the data integrity issue in a satisfactory way, though. Rogue clients in this scheme could always submit bogus results to the server. That's not catestrophic, but it means that the distributed platform could not be used in an uncontrolled environment like the Internet. If anyone has some ideas on how to solve this problem, feel free to post or email me. (Or you could go patent them and maybe make yourself some money.)

    One way to solve this is to submit the same work unit to multiple clients, each client chosen at random from the pool of availible clients. The results of all the clients are then compared and only if they all return the same result then do you accept the result. This way a small number of rogue clients will be unlikely to produce wrong results. (This of course assumes that the rouge clients are not wide spread, and doesn't ensure that bad data won't be returned.)

    Additionally, you could have the work unit itself perform computations which are unique to each work unit. This is mixed with the actual work you care about in such a way as to make it very difficult to perform one computation without the other. For example, you toss in some data, at random, which when processed returns a known result. This data is mixed with data on which you actually want to have some computation performed. Since the client (and the server for that matter) won't know what part of the data is what, if any of the computation is not performed correctly then it will be detected by the server.

  • If you read http://www.sun.com/gridware [sun.com], you will note that it says:

    Sun will distribute Sun Grid Engine software under an industry-accepted open source
    license in order to accelerate the adoption of the distributed computing model.

    Note that they haven't distributed it under an OpenSource License yet.
  • by Anonymous Coward
    For security reasons, I think perl is too powerful a language. One would really want a scripting language that is pretty much math oriented, and that's it. Let the interpreter/compiler figure out whether to make a 1000x1000 array as a file int /tmp or as a buffer in memory. The distributed algorithm will be easier to write, and easier to maintain. The client is harder to subvert into becoming a remote cracking station for some 31337 cracker, since it can only do math-like functions. And my general level of sanity increases.

    What makes perl the first choice for you, is also what makes it the last choice for me. Kinda ironic, isn't it.
  • by bluelip ( 123578 ) on Friday February 02, 2001 @05:59AM (#461936) Homepage Journal
    Yes, I know it's from Sun so it's probably stable. I'd hate to see it crash.

    Grid-Lock sucks.

    C'mon guys, it's a Friday
  • There is a timeline at the end of anouncement, that says that the source will be released anytime soon (2001;). This is only 'binary free' release. And the other thing - how does this releate to PVM, MPI and other parallell libraries?
  • by gus goose ( 306978 ) on Friday February 02, 2001 @06:07AM (#461938) Journal
    Grid is a push based system, monitoring the activity on a set of servers, and pushing work to the more idle ones.

    This is great, but I believe more in the Seti@home approach, let the idle servers pull work down.

    Everyone who has worked with distributed computing knows that the application really has to be designed paying carefult attention to the distribution model. How about a more generic solution, say an XML based data and programming unit (in a language with multi platform capabilities like Java or perl) queued on a controlling server, with a farm of slave servers pulling down a unit during an idle time. It sould be something similar to:

    nice -19 jobpoller --controlhost=control.server.com

    Picture this as a backend to a website processing CGI, etc.

    Anyone interested in forming a subscription based distributed computing project with me drop me a mail...

  • Nice to see, especially the comments at the bottom about the road to widespread usage including not just open source release but also the standardisation of an API to allow compatible implementations.

    Amusingly, the biggest-horsepower compute farm in the company is likely to be the collection of people using office apps. You can't easily buy processors below around 700Mhz these days and desktop apps do little but wait for keypresses.

    The only real impact that people would notice would be in terms of memory footprint. Those high-res wallpapers and ten copies of Excel (or Staroffice, whatever) like to slurp the RAM and stay resident. So unfortunately, it is unlikely to be unnoticed by those donating their cycles...which is the real killer.
  • by SEWilco ( 27983 ) on Friday February 02, 2001 @06:10AM (#461940) Journal
    Well, there's no mention of an interface for Juno distributed processing...
  • This so-called 'open-source' license will merely be another proprietary ownership by Sun deal. Just like Java. You know the old refrain, "It is open but Sun owns it, Sun controls it, Sun makes money from it, and Sun takes your development efforts and sells them back to you!"
  • Not my homepage, but I totally agree with its stance on Verizon.
  • I was waiting for someone to make that comparison. MOSIX shares memory and CPU cycles over machines in a network, but as far as I know, Mosix is quite platform dependant (it is a Linux kernel patch right now). I'm also not sure if they have released a 2.4.x kernel patch yet, since I haven't seen any announcements on freshmeat or Linux Today about it.

    Does anyone know how cross-platform this is, or if the Linux and Solaris boxes aren't able to talk to each other? The main advantage over MOSIX that this seems to have is the cross-platform thing. Other than that, they (at least to the casual observer) seem to be in the same arena.

    (As an aside, I've been watching MOSIX to see if any major distributions are going to capitalize on it. Definately a killer feature, but I hope it is eventually brought into the kernel proper....)


  • This doesn't look like the GPL I know...

    Sun Grid Engine 5.2.2

    Sun Microsystems, Inc.

    Binary Code License Agreement READ THE TERMS OF THIS
    a non-exclusive and non-transferable license for the
    internal use only of the accompanying software and
    documentation and any error corrections provided by Sun
    (collectively "Software"), by the number of users and the
    class of computer hardware for which the corresponding fee
    has been paid.

    2. RESTRICTIONS Software is confidential and copyrighted.
    Title to Software and all associated intellectual property
    rights is retained by Sun and/or its licensors. Except as
    specifically authorized in any Supplemental License Terms,
    you may not make copies of Software, other than a single
    copy of Software for archival purposes. Unless enforcement
    is prohibited by applicable law, you may not modify,
    decompile, or reverse engineer Software. You acknowledge
    that Software is not designed, licensed or intended for use
    in the design, construction, operation or maintenance of any
    nuclear facility. Sun disclaims any express or implied
    warranty of fitness for such uses. No right, title or
    interest in or to any trademark, service mark, logo or trade
    name of Sun or its licensors is granted under this

    3. LIMITED WARRANTY. Sun warrants to you that for a period
    of ninety (90) days from the date of purchase, as evidenced
    by a copy of the receipt, the media on which Software is
    furnished (if any) will be free of defects in materials and
    workmanship under normal use. Except for the foregoing,
    Software is provided "AS IS". Your exclusive remedy and
    Sun's entire liability under this limited warranty will be
    at Sun's option to replace Software media or refund the fee
    paid for Software.


    In no event will Sun's liability to you, whether in
    contract, tort (including negligence), or otherwise, exceed
    the amount paid by you for Software under this Agreement.
    The foregoing limitations will apply even if the above
    stated warranty fails of its essential purpose.

    6. TERMINATION. This Agreement is effective until
    terminated. You may terminate this Agreement at any time by
    destroying all copies of Software. This Agreement will
    terminate immediately without notice from Sun if you fail to
    comply with any provision of this Agreement. Upon
    Termination, you must destroy all copies of Software.

    7. EXPORT REGULATIONS. All Software and technical data
    delivered under this Agreement are subject to US export
    control laws and may be subject to export or import
    regulations in other countries. You agree to comply
    strictly with all such laws and regulations and acknowledge
    that you have the responsibility to obtain such licenses to
    export, re-export, or import as may be required after
    delivery to you.

    being acquired by or on behalf of the U.S. Government or by
    a U.S. Government prime contractor or subcontractor (at any
    tier), then the Government's rights in Software and
    accompanying documentation will be only as set forth in this
    Agreement; this is in accordance with 48 CFR 227.7201
    through 227.7202-4 (for Department of Defense (DOD)
    acquisitions) and with 48 CFR 2.101 and 12.212 (for non-DOD

    9. GOVERNING LAW. Any action related to this Agreement
    will be governed by California law and controlling U.S.
    federal law. No choice of law rules of any jurisdiction
    will apply.

    10. SEVERABILITY. If any provision of this Agreement is
    held to be unenforceable, this Agreement will remain in
    effect with the provision omitted, unless omission would
    frustrate the intent of the parties, in which case this
    Agreement will immediately terminate.

    11. INTEGRATION. This Agreement is the entire agreement
    between you and Sun relating to its subject matter. It
    supersedes all prior or contemporaneous oral or written
    communications, proposals, representations and warranties
    and prevails over any conflicting or additional terms of any
    quote, order, acknowledgment, or other communication between
    the parties relating to its subject matter during the term
    of this Agreement. No modification of this Agreement will
    be binding, unless in writing and signed by an authorized
    representative of each party.

    For inquiries please contact: Sun Microsystems, Inc. 901
    San Antonio Road, Palo Alto, California 94303


    These supplemental license terms ("Supplemental Terms") add
    to or modify the terms of the Binary Code License Agreement
    (Collectively, "the Agreement"). Capitalized terms not
    defined in these Supplemental Terms shall have the same
    meanings ascribed to them in the Agreement. These
    Supplemental Terms shall supersede any inconsistent or
    conflicting terms in the Agreement, or in any license
    contained within the Software.

    and agree that you must first obtain a separate license from
    Sun Prior to reproducing or modifying any portion of the

    2. NO SUPPORT. Unless you have entered into a separate
    support agreement with Sun, Sun is under no obligation to
    support the Software or to provide to you any updates or
    error corrections (collectively referred to as "Software
    Updates"). If Sun provides any Software Updates, whether
    pursuant to a support agreement or otherwise, at its sole
    discretion, the Software Updates will be considered part of
    the Software and subject to the terms of this Agreement.

    3. BACK-UP. You have the sole responsibility to protect
    adequately and backup your data and/or equipment used in
    connection with the Software. Sun shall not be liable for
    any lost data, re-run time, inaccurate output, work delays
    or lost profits resulting from your use of the Software.

    4. TRADEMARKS AND LOGOS. You acknowledge and agree as
    between you and Sun that Sun owns the Sun, Solaris, and Sun
    Grid Engine 5.2.2 trademarks, and all Sun, Solaris and Sun
    Grid Engine 5.2.2 related trademarks, service marks, logos
    and other brand designations ("Sun Marks") and you agree to
    comply with the Sun Trademark and Logo Usage Requirements
    currently located at http://www.sun.com/policies trademarks.
    Any use you make of the Sun Marks insures to Sun's benefits.

    For inquiries please contact: Sun Microsystems, Inc. 901
    San Antonio Road, Palo Alto, California 94303
  • Programs do not crash on Linux!

    Give me 50 pushups private bluelip!

    Soldier! What are you doing with Microsoft Windows?!
  • Sorry sir!

    Won't happen again! Don't know what could possibly come over me. Must've been that liverwurst last night.
  • True, but he said "we spend lots of money on the newest and fastest machines" so I'm assuming that it's at his business. Of course, he'd still have to rewrite the code to run distributed either way. Wonder if there's a distributed system that could leverage existing multi-threaded code? Then you'ld get the advantages of running distributed, without having to rewrite.
  • Means accepted by Sun Microsystems.
  • by Pogie ( 107471 ) on Friday February 02, 2001 @06:58AM (#461949)

    The GridEngine system from Sun is an LSF-type batch-queueing/load-balancing system. Sun bought GridWare, which was an independent German company previously, and is looking to bundle the GridEngine with it's workstations, promoting the 'spare cycle' idea.

    In answer to your question, yes, GridEngine can run anything. It isn't an MPI-type implementation which requires you to modify your code. GridEngine allows you to set up multiple execute resources, based on processor type, OS, memory, disk, I/O, runqueue usage, or really any heuristic you want to implement. You submit your job, with whatever resource requirements you need, and then GridEngine runs it on the available resource which meets your requirements. There's also a Q3-ish available product called GRD, which allows you to further allocate resources on a more policy-style basis. This piece will be a licensed add-on, but it provides the enterprise with the ability to divvy-up compute farm resources on the basis of users and groups, etc.

    GridEngine also comes with a "grid-enabled" interactive tcsh, so you can have an interactive shell running which is actually spawing work all over the compute farm, as resources are available. There's also a "enabled" make, which does the same thing for builds.

    It's pretty neat, but I think it's more effective in a dedicated compute-farm type of installation than a "let's use spare desktop cycles" kind of installation.

  • ... Man, Imagine a beowulf cluster of these things!

  • Wonder if there's a distributed system that could leverage existing multi-threaded code?

    Threads are paths of execution through the same process context. It's very difficult to distribute them without a single system image, which is what Irix [sgi.com] can do. In general, if you want to distribute a task across multiple machines, each part of the task must be able to run self-contained, and be incorporated within the overall result of the computation. This would very well for, say, rendering, because you can simply give each node a frame to do, then assemble all the frames into a movie. But it would be very difficult to break a scene apart into objects, render each one of those on a different machine and then incorporate them into a new image, because the different machines wouldn't be able to compute the effects of a shadow of one object on another.

  • How about EnFuzion [turbolinux.com] from TurboLinux? I've never had the opportunity to use it, but I came across it while doing related research for a financial services client who needed to perform some pretty hairy analyses. Said analyses could wait until after the markets closed and so could take advantage of literally thousands of otherwise idle machines. (The client ultimately decided to outsource the analyses, so we never pursued this approach.)

    Key features of EnFuzion (these are lifted from a product marketing page so I can't comment on their veracity) include:

    • No need to rewrite or modify code.
    • Maximize system performance with multiple load monitoring and resource-sharing options.
    • Highly secure: EnFuzion's advanced security features bolster the standard security provided by Unix and Windows NT systems.
    • Easily integrate EnFuzion into your applications with EnFuzion's application programming interface (API).
    • Monitor and control EnFuzion execution using a familiar language such as C, Java, Perl, Bash, Tcl/Tk, etc.
    • Easily integrate EnFuzion with any shell or scripting language.
    On the surface this would seem to fit wiredog's needs.

  • My intial aspirations were for a trusted network. The server farm being on a dedicated and trusted net. SSl with careful access control could provide enough security over public nets to be useful.

    Hive would be a good name, with clients (servers) running 'drones'.

    Applications I was thinkin of, as well as certain CGI, were things like spell checking, things similar to Babelfish, creating thumbnails from images, etc. Each unit packaged in an XML structure, with the code to support the function. Again, having trusted code is paramount. Easiest to use in a controlled environment.

  • simply put Sun is going to make Grid a core part of their software offerings so that all their own customers can use it pretty easily.

    Along with that the sun people are *very* committed to makeing Grid available everywhere. They are committed to supporting as many architectures as the code can be ported to.

    I'm not going to name my company because I'm not sure if I'm allowed to yet but we are one of the groups gearing up to offer Grid support for Linux and all other non-Sun hardware architectures.

    I really see Grid not competing with code like PBS and Condor. It will probably annoy the Platform Computing people who make LSF because they charge so much $$ per CPU that LSF simply does not make sense on clusters built from commodity hardware.

    Just my $.02


  • This could give rise to a whole new breed of Seti@homers Those of us who either work at huge corporations with tons of extra hardware laying around after all the Y2K redundancy like mine [sprint.com] or little dotcoms with more hardware than customers.

    Now we'll show you some real power, silly @homers. A Beowulf cluster of E10K... drool.

  • by GoodFastCheap ( 209947 ) on Friday February 02, 2001 @08:18AM (#461956)
    I guess this is news... maybe not, though since the Condor Project [wisc.edu] has been available for a whole lot [wisc.edu] of platforms for quite some time now. (Yes, Linux is supported.)
  • Are you kidding me? Do you use Linux? Programs crash all the time. The platform stays stable and at worst (most of the time) restarting X is all that's needed (read Netscape). Even some apps that claim "Stable" status crash every now and then. Be a zealot, just don't be a foolish one. /rant
  • I'd agree with you on that. I'd also add that the x86 architecture is quite possibly a part of the issue - it's, shall we say, over-evolved. Motorola makes a much better designed CPU, and it's application proves it.

    No, I don't like Macs. At least those with OS X.


  • by dsouth ( 241949 ) on Friday February 02, 2001 @10:22AM (#461959) Homepage
    Gridware isn't all that new, and it isn't a reaction to Mosix or SETI@home.

    Batch systems have been around a long time in the HPC world. Gridware was orginally developed by GENIAS Software GmbH. GENIAS produced a batch scheduler called Codine, which was a commercial version of DQS [fsu.edu]. In fact, Sun's Grid Engine FAQ [sun.com] even states that Sun Grid Engine is a new name for CODINE.

    Of course, DQS/Codine/Grid isn't the only batch-scheduling/cycle-scavenging game around. Other players are:

    • Condor [wisc.edu]
    • openPBS [mrj.com] and it's commercial version PBS Pro [pbspro.com]
    • Load Leveler [ibm.com] (which IIRC is IBM's commercial implementation derived from Condor)
    • LSF [platform.com] which is the product Sun was previously co-marketing until they purchased Gridware (probably because of the high per CPU cost of LSF).
    • and lots of others that I've forgotten, many based on the once-common NQS/NQE batch system.
    • There are also systems like Legion [virginia.edu] that represent a sort of ``next step'' computing enviroment.

    Many of these predate newcomers like SETI@home and Mosix by serveral years. Most also provide hooks into parallel computing APIs like MPI, [anl.gov] PVM [ornl.gov], openMP [openmp.org], or something similar.

    Batch scheduling and cycle-scavening are old concepts. Having wasted away my years as a graduate student submitting large quantum chem jobs to Crays, it's nice to see lots of groups continuing to squeeze every useful cycle out of existing hardware. Sun's recent annoucements are just the latest update to an old product---not a new idea, and not a Mosix/SETI rip-off.

  • Make that 'those without OS X or linux.' Bloody maldafication. :)


  • FYI - Condor [wisc.edu] supports all of the platforms that you mention in your post. I don't know how useful it will be in your particular situation, though.
  • Don't get the idea I'm bashing Sun for NOT GPLing it, I was actually replying to an earlier poster in his statement that it was "probably GPL"
  • Whose definition of "Open Source" is it?

    The definition belongs to the Open Source Initiative (OSI).

    If they can't decide on a known license, force them to post the license as "Unknown".

    That's putting the onus on Freshmeat to keep track of OSI's approvals and update their own list. It's far easier to just list the big five or six and group the rest under "Open Source". There's no need to worry. Freshmeat checks their submissions, and if anyone is lying it doesn't get put up.

    Even writing a custom license is better than using the generic term "Open Source".

    It's not open source unless it's uses an OSI approved license. It would have been nice if OSI had gotten their trademark on Open Source, but that changes nothing. "Open Source Software" is a much more specific term than "Free Software". While Microsoft can get away with calling IE "free software", it would be an outright lie for them to call it "open source software".

    I have confidence that if Sun is using an "industry accepted Open Source license" it will be one approved by the OSI.
  • Well, it depends what kind of network you are referring to. A network of number cranking computers, yes, it's the bottle neck. A network of web serving computers, the disk access usually comes into play here. If it's a network of programs that generate a lot of chatty traffic (Windows), then the network is your bottleneck :) (though in the specific case of windows, everything seems to be your bottleneck..)
  • I was wondering if it's part of this [web.cern.ch].

  • Random checks won't be very good in detecting bad clients. They could introduce random errors on only 1/1000 of the returned data -- and if only 1/1000 of the data is checked then your probability of detecting a 1 in 1000 error drops to 1e-6.

    The only way of getting reasonable results from untrusted clients is to have the results computed independently by N clients. You could also first try to get the problem solved, get a prelim answer to play with and then continue to do the verification.


  • What I was wondering about, can this thing run anything, or do programma's need to be written for it. I couldn't find a clear answer to that skimming through the website. However a example job from the manual does suggest so:
    # This is a simple example of a CODINE batch script
    # Print date and time
    # Sleep for 20 seconds
    sleep 20
    # Print date and time again
    # End of script file

    If it can truly run just any programm it would be really cool.
  • I'm not sure if it's GPL'd, but it may have to be. UT Austin did some work on distributed memory allocation that blew the current Solaris implementation out of the water. Fortunately, that work was GPL'd. Go Blum-Foe! ;-) The GPL Virus at Work.
  • The applications of such a package are numerous. It is good to see that there is an open source life for a project of this nature. Granted, you do have to design carefully to properly use the distributed nature of the software. However, many problems lend themselves to this approach. Any CS algo nerds would know that anything requiring "Dynamic Programming" falls under the domain of this software. Anyways, a few friends and I were going to write a similar package approximately a year ago only to find out that there were already people working on the project. Being light-weights in industry at the time it didn't make sense to pursue. However, I will definitely peruse the source of this one. Any possibility that this is a Sun product without a gui? Their software is so damn stable, but what's the deal with Forte and Star? Are they trying to be telepathic. I can finish my own damn word without you reminding me how it might finish. All those event listeners bog down the run-time speed. Maybe we'll get lucky and there'll be no GUI this time.
  • by hoegg ( 132716 ) <ryan...hoegg@@@gmail...com> on Friday February 02, 2001 @06:18AM (#461970) Journal
    1. Sun Microsystems is a large Corporation.
    2. Microsoft is putting a lot into its .NET initiative.
    3. Sun has been nurturing Java for over 4 years.
    4. Both .NET and Java provide platforms for distributed processing in desktop applications.

    My opinion is that this is the beginning of an enterprise computing paradigm that Sun hopes will give Java an edge in the desktop market, after Microsoft's 15 year reign.

    Imagine an entire office of computers efficiently sharing resources. I get up for coffee, my cycles are used for my co-worker's application compile. He goes to lunch, his cycles are mine for Unreal Tournament.

    I think it's got potential.

  • I guess I'll be the n-th person to point here [sun.com] where it says:

    The product is free and there is no software enforcement that restricts distribution. The product is issued under a standard Sun Binary Code License.

    Now who'll come up to explain what that means?

  • Threaded applications use shared memory to communicate between threads. Distributed shared memory schemes that aren't carefully tailored to the application they're designed for are very slow right now.

    An application currently has to be split up in a such a way that the different parts need to communicate with eachother in the most minimal way because communication overhead is large.

    Even then, it helps if they communicate with eachother in a way in which the results arrive at the computer that needs them before they're asked for, which is even harder.

    SETI@Home works because there's virtually no communication between computers. The only thing that could make SETI@Home faster is to start downloading a new data set just before the current one is finished so you never have a break in doing the hard, CPU intensive mathematics.

  • To use a beowulf cluster, you'd have to rewrite it using MPI or PVM. I've seen some systems that use distributed memory, basically adding code to the VM that queries the network on a page fault. When I was looking at it, it used gigabit range bandwidth, and was still kinda slow. Either way, you'd have to make the task paralelizable (sp wrong, don't care), which is easier said than done (think rewrite). What this *would* help with, is if you have an office of machines (sun/linux remember) and you have many models to run, or you can split the models you have into multiple independant models. You wouldn't have to change too much in that case.
  • I never was able to solve the data integrity issue in a satisfactory way, though. Rogue clients in this scheme could always submit bogus results to the server. That's not catestrophic, but it means that the distributed platform could not be used in an uncontrolled environment like the Internet.

    Same solution as in HA problems: redundancy to avoid SPF. You need trusted server(s) to control the jobs. They can give the same job out to two or more distant clients and compare the results. Comparation can be done by producing MD5's in a second set of clients to reduce server load.

    The servers should not handle the programs or data, just schedule works and tell clients whom to ask for what when.

    I'm sorry: double, even quadruple work, but you can't have both open and trusted environment. Then again, who cares? "The Net is vast."

    Internet will sloowly turn into a big, amorphous of processing power and storage. I'm happy to see that other people are giving thought to this as well.

    A GNU project for job distribution VM would be essential in standardizing it.
  • IIRC, Irix achieves this with a NUMA (Non-Uniform Memory Access) architecture. This basically means that all memory has a home, and accessing memory that's not on your current CPU requires a network request for each memory access. Most NUMA machines have a dedicated internal network, just for memory, but it's still needs to have applications that are designed to use it for it to work, which, of course, is your point. :-)

  • Typical! got my mosix cluster going yesterday, and then Sun announce this!

    Mosix involves kernel patches - does this Grid thing?

    Mosix can migrate most processes, but there are some restrictions on threads, shared memory etc.

    There's other queueing systems with process migration, checkpointing, etc etc. I'm sure a search on the new-look freshmeat.net will find them for you. As usual Open Source was there first...

  • Let server stir some known info into the unprocessed data chunk before it goes to the client. Say, 0.1% of 'checked' data and 99.9% of real, 'unchecked' data. The client must be unable to distinguish checked and unchecked data. The server side knows how processed checked data should look, by having processed it before sending. After the client's results are received, server compares checked data, and if it looks right, the whole processed chunk is accepted.

    Of course, this approach is only useful when data chunks can be 'sliced' thinly enough to add checked data at random positions, and makes server to process checked data. Even with happy 1/1000 ratio of checked/unchecked data, it takes considerable amount of computing power on the server to run it (20x as powerful as an average client for 2e+4 clients). This approach tends to make chunks bigger, too.

  • by Anonymous Coward
    Condor, a distributed system developed by the University of Wisconsin madison, does all what sun's gridware does and is supported on 15 platforms. Moreover, a glace at condor's website (www.cs.wisc.edu/condor) reveals that this 15-yr old system has been adopted by a lot of ppl, notably NASA and NCSA.
  • I work at a small 3D animation shop with a small budget and as such we have to try to squeeze out as many rendering cycles as possible on the hardware we have. I'm running a rendering farm made up of 6 Intel boxes running linux and a couple of old Alphas running DEC Unix. However all our workstations run WinNT or 2000.

    What I would really like to see would be something that I could install on these windows workstations so that the linux render servers could use their spare cycles when they aren't being used. Something like mosix [mosix.org], but with Windows clients would be amazingly useful for us. Does anybody know if such a program exsists.

  • Well do not hope for a quick release of the source. I had infomation from a developper that last December they had no idea when they would be able to release the sources and he indicated that it would almost certainly not be before June
  • Condor is not like Grid Engine (which is in fact Codine/GRD)

    Condor uses idle workstation to do some work.

    Grid Engine includes a batch management system with priorities, different queues, calendars, ...

    It is often used to manage high performance clusters.
  • I used to work on a project at UCSB called Javelin [ucsb.edu]: "Javelin is a Java-based infrastructure for global computing". It's presently a bit more academic than practical, but it seems to fit the bill of what you're looking for fairly precisely. It's a bit better than, for example, seti@home in that it supports more tightly coupled computations (e.g., branch-and-bound). Currently, Javelin supports:
    • piecework computations, where a large chunk of work can be split into smaller chunks, and
    • branch-and-bound computations, like the travelling salesman problem
    , but work is in progress on a version with a more general computational model that supports computations with arbitrary DAGs for task creation and data dependencies (a la Cilk [mit.edu]).

    It's highly fault tolerant (uses eager scheduling) and load balanced (uses work stealing).

  • And the other thing - how does this releate to PVM, MPI and other parallell libraries?

    MPI, PVM, et al are libraries that parallelized programs can use for inter-node communication.

    Sun's Grid Engine is basically a queuing system. To quote Sun's website: "The basis for load management is the batch queuing mechanism. In the normal operation of a cluster, if the proper resources are not currently available to execute a job, then the job is queued until the resources are available. Load management further enhances batch queuing by monitoring host computers in the cluster for load conditions allowing additional utilization of resources."

    Grid Engine, however, seems to be designed to run single-node computations; the website mentions nothing about queuing a job for execution on N machines. This is a major distinction from queuing systems in traditional parallel machines and clusters, where you tell the queuer to run your job on N nodes, and when N nodes are available, it runs N copies of your program simultaneously on those nodes.

    So, your PVM/MPI/etc. programs won't be able to run parallelly on Grid Engine, because traditional parallel programs assume a tightly coupled network of processors (or processes) that are all running your code simultaneously, and Grid Engine doesn't provide for this. Grid Engine lends itself more to things that can be done by machines independently, like seti@home.

  • Odds are there is a dual license with that one and this one [openoffice.org] . I would think that that one is for PHBs who don't want open source and the good one is for the rest of us. This is fairly common and it would make sense that once they have written a license in house and had some of the big names in OSS bless it that they will stick with it.
  • by prisoner ( 133137 ) on Friday February 02, 2001 @06:23AM (#461985)
    do programs have to be written specificly to take advantage of this? I do lots of groundwater modeling and the models can take forever to run. Consequently, we spend lots of money on the newest and fastest machines. It would be nice to use something like this with our models "as-is"
  • I don't think that we need to ban that poster, but I do think that the /. staff needs to put http://www.goatse.cx/ in their "lameness filter" (For those who don't know, the lameness filter blocks people from posting "first post" and posting like 3 times in a minute.
  • can be found here [cnn.com]

  • This is great, but I believe more in the Seti@home approach, let the idle servers pull work down.

    That won't work in this case, because one thing Grid allows you to do is submit many jobs to the network service it manages, and assign a priority to each one, such that high priority tasks take precedence over low priority tasks. So the system as a whole needs to be able to package and distribute tasks based on criteria other than when they were submitted to the system, and other than their overall size. Seti-like approaches aren't flexible enough for this, because once a node has started a job, it will complete it before submitting its results and requesting more work.

If graphics hackers are so smart, why can't they get the bugs out of fresh paint?