Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Hardware

Science Grid Genesis 166

Cranial Dome writes "According to this Cnet.com story, the Department of Energy (DOE) is working to interconnect the first two computers which will form the genesis of the DOE Science Grid, a virtual supercomputing system which will eventually encompass many more systems at several locations. The larger of the two machines: DOE National Energy Research Science Center's (NERSC) IBM SP RS/6000, a distributed memory machine with 2,944 compute processors. This machine, together with a smaller 160 processor Intel system, will make up a combined 3,328 processor Unix system with 1.3 petabytes(!) of storage space. And this is only the beginning..."
This discussion has been archived. No new comments can be posted.

Science Grid Genesis

Comments Filter:
  • A Beowulf cluster of DOE Science Grids? Maybe if each country got one, and they linked them together... *G*
    • by Anonymous Coward
      Can you imagine a Beowulf cluster of DOE Science Grids SHOVED&nbsp UP&nbsp YOUR&nbsp ASS?!!!!


      (I think Stanislaw Lem wrote about that, IIRC, the story was "The First Sally, or The Trap of Gargantius".)
  • by alen ( 225700 ) on Friday March 22, 2002 @12:01PM (#3207424)
    I guess it's going to be enough space for a full install of the latest Red Hat distro.
  • by ch-chuck ( 9622 ) on Friday March 22, 2002 @12:01PM (#3207426) Homepage
    AOL/TW starts mailing out free sign up DVD's to access their portal to the Science Grid. Within days messages start appearing in highly technical discussion forums that simply state "Me Too!".

  • ...the large amounts of hardware scattered across the country being linked together to solve (process?) larger problems. There is so much hardware sitting on desks that goes largely unused, this seems to me like the next logical step in computation resources.
    • And this will be going to scince, which I think means tht it won't be monopolized by military shit. I do hope that they try some cool stuff on this thing. One project using this grid is the Supernova Cosmology Project [lbl.gov] from someplace. They are sending a bunch of data for image processing. It is wonderful that there will be a giant supercomputer for this stuff.

      An interesting next step might be to have a "Science Grid@home" program that people can run as a screen saver on their PCs, or something. Not for all projects, but a little extra programming might be justified for all those unused CPU cycles.

  • 1.3 petabytes (Score:1, Interesting)

    Even the Internet Wayback Machine [archive.org] with its 10 billion web pages can claim only 100 TB (.1 PB). We could fit thirteen archives on it.

    A use for this type of power and storage is simulating nuclear detonation. It's possible we noo longer have to actually detonate nukes on a test basis.
    • And then there would be no need to build nuclear head rockets...
      Everybody would start simulations, and the one with the worst damage lose...

      Rrr... sorry
      • I seem to remember a Star Trek episode about this - You know, one of the tedious moralistic crappy ones.

        The long running war was damaging their artwork and architecture, so instead they ran a simulation, and whenever a strike was successful, the computer said how many died, and then the government rounded up that many people and killed them (in some non gory, painless ST kinda way)

        Please God, take me back to the good old days with Bones using a remote control to make Spock walk because his brain had been stolen, or Kirk doing horse impressions whils being ridden by a midget. Oh, yeah.

      • Everybody would start simulations, and the one with the worst damage lose...

        hehehe, I can see it now, some world leader halfway around the world simulates a nuclear explosion...

        "no fucking way you hit me with that last nuke, I was right behind you. This guy's using a fuckin bot!"

        bah, it's payday, I'm in a weird mood :)
    • They are already doing this with the ASCI White supercomputer at Sandia labs. Personally I would prefer to do away with the things entirely and use the money and computer power for something constructive.
    • At some point actual testing will need to be done again, if only to ensure the continued validity of their models. Every day they will be working with increasingly old base data... Otherwise, what they are doing is sort of like load-testing a steel-frame bridge with the assumption that no corrosion is occuring, and being very suprised when it fails, because their (inaccurate) models said it wouldn't...
  • by Ecyrd ( 51952 ) on Friday March 22, 2002 @12:08PM (#3207467)
    According to this paper [microsoft.com], the entire human life takes roughly a petabyte of storage.

    Using the current prices, this amounts to roughly 150.000. It's not that impossible to store your entire life on a single computer anymore. These guys show that such a thing can be built.
    • According to this paper [microsoft.com], the entire human life takes roughly a petabyte of storage.

      Looks like interesting times for AI researchers. Does AI require as many transistors as the brain has neurons? Does it require the same amount of storage and information? Is there something else needed? Looks like we're soon to answer at least one of these.
      • well I think people are starting to realize that it isn't the number of neurons (computation/storage nodes) involved it's the number of interconnects. So 4 nodes in series doesn't get you much, but 4 nodes in a grid does. This is what I remember from a friend's phd work on AI. (At the time she was doing her work I was still in high school so my technical understanding is limited to this rough outline)
    • Using the current prices, this amounts to roughly 150.000[E]

      Okay, so would that be [E]150,000 or [E]150 + 000 cents? I'm guessing you meant the former, but that's still completely "impossible" for most everybody.
      • 150.000, that is roughly, what, $120,000? Yeah, we use a comma to separate decimals, and a dot to separate thousands - sometimes you get confused when writing English :-).

        Note that the price actually gets spread out throughout all your life. If you started now, you'd need only about $5000 every year to buy the necessary hard drives. And considering the speed at which prices have been going down per Megabyte, it is likely that the original estimate of $120,000 is the upper bound, and the REAL price is a lot lower.
    • Holy shit, those guys are big time BS artists. That's what guys are doing at Microsoft. I can come up with shit like that by the bucket. For example:


      Suppose my sensory input is of the order of
      100Kb/sec (that's pretty conservative, since the visual input alone is probably more than that, say 10 frames/sec, 10Kb image/frame).


      If I live to 50 years I will have been awake for roughly 10^9 secs. Thus my total informational inflow is around 100*10^12 Kb = 100 petabytes.


      Look I have a paper right there.
      Just mix in some inane ramblings about digital immortality and voila! Too bad I don't work at Microsoft.

  • by gwizah ( 236406 ) on Friday March 22, 2002 @12:10PM (#3207484) Homepage
    Well it seems as though we may now know what Sony Engineers mean by "Distributed Computing"

    Seriously though, What type of security system is the DOE building into this, which is essentially a large mainframe? Its understandable to be worried when the DOE handles things such as nuclear secrets that sometimes slip into the hands of certain researchers, much like they were picking them up at a drive-through.

    Im curious to see how the data will be encrypted/decrypted along such a vast system.
  • Sweet, first the military creates the internet. Not to be out-done the DOE creates the... electronet? Does this sound familiar to anyone else though?

    I suppose it wouldn't have the same reach, as it isn't grounded with scientists / universities as the original. Wishful thinking I suppose.
  • Whoo! (Score:3, Interesting)

    by Accipiter ( 8228 ) on Friday March 22, 2002 @12:12PM (#3207501)
    Remember back in 69 when a few government agencies and universities put together a small little network called "ARPANet?"

    It started off with something like four nodes. Look where it is today.
    • this is actually the new method of invention ... Every few years, the government invents a new and better kind of network, we take it over, they get pissed and decide to make an even better one where the whole process starts over again. progress!
    • Ummmmm, yeah. Just look at where it is today. Maybe we'd better pull the plug on the Science Grid now while we still have a chance.

      :)
  • my god... i want to pet it and stroke it and i want it to have my babies... -- but seriously, how big is a petabyte exactly.. i gues i could do a google search on it?
    • A petabyte is 1024 terabytes. A terabyte is 1024 gigabytes. So you're looking at 1,363,149 gigs of storage.

      That's about enough to hold two and a half millennia of MP3's, or the Microsoft DirectX SDK.
  • 2944 + 160 = 3328 ?
    I just wonder...these must be those first
    generation Pentiuns with faulty math
    anyway.
    • I was wondering if I was the only one that saw that... I suppose if you put all the processors in a room, dim the lights, some soft music... who knows?
  • SETI (Score:2, Funny)

    by Yoda2 ( 522522 )
    I wonder if they'll run the SETI [berkeley.edu] client on it during non-peak times. We could find nothing that much faster!
    • It has been said that there are two possibilities, both equally mind-boggling: That we are alone in the universe, and that we are not. The fact is, negative results are loaded with implications, just as they are in any other field of science.
      • I want to believe that we are not along, I'm just not sure that SETI is how we're going to make contact.
  • And what Operating System will DOE be running on this state-of-the-art, bleeding-edge, faster-than-God-intended computer?

    Windows NT 2k2 (laugh thee not, M$ doth speak of such a beast)...

    Oh, and don't forget that wonderful .Net accessability, where 16-y/o girl geeks can write C# virii to prove women really do hate M$ as much as their fellow male geeks.

    -TG, more power = faster virii production, woohoo!

    PS: In all seriousness (ack, there goes my Funny)... It would be cool if we could put this bad boy to work on some nasty stuff, like Superstring Theory, Proteins, and other Monstrously Huge Data Crunching Projects. But somehow I get the feeling this is going to be a toy for atom-smashers... never something practical or real-world.
    • Now that you mention it, I'm extrapolating how long until Norton Anti-Virus takes up 1.3 petabytes.....

      For Windows- 2008
      Everyone else- 12,234

      *schwing!*
    • Microsoft has always pondered why nobody uses Windows on Ultra-high end hardware such as this. One reason is that organizations that get this kind of hardware want extreme customizability. Microsoft would have to allow these organizations reasonable access to their source code. Even if Microsoft were to do this, the terms for this would be hightly strict, so most people figure, mah, the hell with it.

      Another point is the fact that Windows has only been released on a handfull of architectures. To have systems such as this, you need support for ungodly amounts of memory. The best platform for windows at this point is X86, which is limited without more hacks that are worth the time and money.

      Even with windows NT on Alpha, windows didn't even come close to tapping the full potential of the architecture. At the time windows NT was the core product for MS servers, MS had a different agenda. Now that the Itaniums are coming, its a good bet that MS may want to try their hand at this market...but I don't think they'll get far.

  • Code cracking becomes borring and distributed.net [distributed.net] close up shop

  • yeah but... (Score:1, Funny)

    by Misfit ( 1071 )
    what kind of Graphics Card does it have?

    Fat lot of good all that super computing is going to do you if your frame rate sucks. You'll be fragged in minutes.

    Misfit
  • by zapfie ( 560589 )
    Some time after, the DOE discovers the machine is being abused by employees for personal use, with it running 42 Quake servers, hosting the worlds biggest pr0n archive, having the top ranking on distributed.net, and taking on Kasparov all at the same time. One official was quoted as saying vague stuff and not really making any sense.
  • The scheme of it all (Score:5, Informative)

    by fruey ( 563914 ) on Friday March 22, 2002 @12:38PM (#3207681) Homepage Journal
    Go to the link about the actual project. Look at the PDF. It explains things quite well, it's a wicked thang that is happening...

    Here, for the lazy, are some of the objectives:

    • Computational modeling,multi-disciplinary simulation,and scientific data analysis with a world-wide scope of participants and the use of computing and data resources at many sites.
    • High Energy Physics data analysis that involves hundreds of collaborators,and tens of institutions providing data and computing resources
    • Observational cosmology that involves data collection from a world-wide collection of instruments, analysis of that data to re-target the instruments,and subsequent comparison of the observational data with simulation results
    • Climate modeling that involves coupling simulations running on different supercomputers
    • Real-time data analysis and collaboration involving on-line instruments,especially those that are unique national resources
    • Generation, management, and use of very large,complex data archives that are shared across global science communities e..g.high energy physics data,earth environment data,human genome data
    • Collaborative,interactive analysis and visualization of massive datasets e.g.DOEs Combustion Corridor project
    • Multi-disciplinary R&D that integrates the computing and data aspects of the different scientific disciplines.

    Thus, the applications are enormous. Not that you couldn't do it distributed across desktops à la SETI, but here we're talking data integrity, and let's not forget that even SETI has a kick-ass centralised server setup or the whole thing wouldn't work anyway.

    But especially interesting is the document filename:-

    DOE_Science_Grid_Collaboratory_Pilot_Proposal_03_1 4.nobudget.pdf

    Now, who can get me the version WITH the budget? I want it. Hehe.

    • I'd like to see this type of massive computing power used for a comprehensive effort to map the human brain (in an undertaking similar to the Human Genome Project.) Large numbers of optically scanned brain slices (or high-res MRI data) could be input, and abstract representations of the nerve cell connections could be generated. Then a massive effort to simulate and explore large chunks of the brain could begin using this behemoth. I wonder if something like this could be in the works in the near future. Anyone have any information about this?
      • IANAB (biologist), but I think the problem is that nobody understands exactly how the brain works.. Yes we know that there's these neurons sending electrical signals to each other, but I don't think there is any theory on how this ultimately gives rise to the cognitive processes in the brain. Not that I'm saying that supercomputers would be useless in brain research, this article [wired.com] mentions some IBM guy planning to simulate how the "electric storms" during an epileptic seisure propagate or something like that.
        • >>I don't think there is any theory on how this ultimately gives rise to the cognitive processes in the brain...
          This is definitely true, but the simulation of large neural networks modeled faithfully after sections of the animal brain would at least give us a way to start quantifying information processing capabilities of various sections of the brain. This would be a start to developing a theory about how these sections work together to form cognitive processes. This is starting to drift way off-topic, but oh well..
    • While I think this is an interesting experiment in pooling parallel resources, there are also enormous challenges involved.

      Anyone who has ever used a parallel machine quickly realizes that in most "interesting" problems, a great deal of inter-processor communication is involved. Even apparently "trivially" parallelizable tasks, such as a CG ray-tracing of a shot from a movie scene, often carry bottlenecks which limit their degree of parallelization. For instance, in the ray-tracing case, even though each ray can indeed be traced independently of the rest, each processor must store the 3D volumetric model it is rendering in memory. Eventually the size of the volumetric model exceeds the memory capacity of the processor, and rays must then be swapped among processors. The same limitations apply to any number of other tasks -- data mining (where one needs to search for correlations in a huge volume of data, too large to be stored on a single processor), simulation (where hyperbolic, or even more bandwidth consumptive, parabolic or elliptic PDEs are often solved), etc...

      Achieving good load balance in parallel applications is a key challenge in computational science today. It's quite fair to say that on the current generation of IBM SP2s, which are the most common architecture in high-end computing, the parallel performance for most applications is poor at best. Slapping on an additional machine, with an even tigher bottleneck over the network between them, is not going to magically solve any problems. It is going to push the state-of-the-art of a very LIMITED set of applications a bit further, but a lot more work at the hardware and algorithmic levels needs to be done before MOST applications can really benefit from the scale of these machines.

      Bob
  • I couldn't find how they plan on interconnecting the nodes... I've always thought setups like this were rather hindered by their ability to pass messages quickly between nodes. If it's just standard slow WAN link like a T1, I suppose this would end up becoming more like a distributed.net model, and less an actual 'supercomputer' like the headlines imply. If I'm correct, there's a rather large difference in the applications.
    • Try 40Gbit for the TeraGrid project between Chicago & LA. However most of the system is likely to be on 10Gbit and in the near future 20Gbit backbones.

  • Quick, someone tell Linda Hamilton to head for the mountains! Her unborn child will be the only one to stop all of this madness!

  • To be really effective, all existing and future US government computers oughta be networked to this or a similar system. I think it would be a real boon:

    1) could reduce the future (taxpayer) costs for "supercomputer grade" applications.
    2) could be applied to help solve socio-economic problems in addition to the 'hard' sciences
    3) would get "bang for the taxpayer's buck" by utilizing the idle horsepower of publicly purchased computers

    I do think, however, they should employ a commercially available distributed computing platform, such as that from www.ud.com
    I don't feel that tax dollars need be spent on duplicate research in that area.

    -
    • Most of the UNiversity and REsearch Sites in the UK and Europe are already doing this. Of course we have LHC going on line in 2007 which means we need to since the amount of data generated (10+ PB a year) as well as our other experiments make todays stuff look kinda small.
  • Pretty soon we'll have three of these in a big underground control room that make our every decision. We'll call them Malchiar, Balthasar, and Caspar. But then we'll need litle kids to fight to save us from atacking angels in giant part human machines...

    sigh... oh well, i guess Evangelion is getting a little closer though.
    • Sometimes I wonder if, in the distant future, the creators of things like this will give them names like you said, simply because that's what history called them. Sort of like creating your own history, if you will.

      Like creating a time machine, going back to the Bible's time, and walking around telling everyone I'm Jesus. Because of that, there really was a Jesus...me!

      Then my brain starts to hurt from all of this time-travel paradox thought, and I think about something else
  • ...if they used it to run a simulation of climate and discovered that the Science Grid was responsable for global warming.

    (insert your comments about how hot Company X's chips run below)

  • by pridkett ( 2666 ) on Friday March 22, 2002 @01:25PM (#3208024) Homepage Journal

    This is a little surprising that it got posted and all because it's not all that earth shatterning news, but I'll provides some additional information about grids in General.

    There are a wide variety of systems like this that are either currently available or are being developed. Among them are Particle Physics Data Grid [ppdg.net], NEESGrid [neesgrid.org] and various European [eurogrid.org] and Asian [apgrid.org] counterparts.

    The basic premise is to allow access to various resources you don't have at your desktop. This is not to be confused to with putting all these computers together an forking a process a billion times and having it run it run all over the globe. It's more like saying I have a process that requires 128 processors and 4GB of ram, go find it an run it for me.

    Most of the systems use Globus [globus.org] which is pretty much the defacto standard. There are other systems out there such as Legion [virginia.edu] and Condor [wisc.edu] which serve slightly different purposes.

    I've also seen some issues about security raised, so I'll mention them quickly. Globus is built upon an API called GSS (Generic Security System), I believe it will soon (if not already) have an RFC published. This is a layer on top of various other security systems that may be local to the server running it. It can use Kerberos or PKI to do encryption across the network (don't flame me if it's wrong, I'm not security expert).

    When I wish to start using the grid, I start up my proxy that takes care of all authentication for me. Then my proxy connects to the gatekeeper on the remote machine which authenticates me based on my private key and then authorizes me via a mapping (usually just a text file). The task is then executed by the gatekeeper via the mapping on the remote machine. Input and output can be redirected over a secure layer if you so desire.

    My certificate is issued by an authority. In this case the Globus CA. The nice thing if that if you want to set up a grid of your own computers, you can get a cert from them too. Install Globus and it will tell you how.

    Certificates also allow you to get access to data. This allows me as a user A to run program B at site C providing results to user D at site E for a period of time F.

    It's all terribly neat and remarkably easy to install on your favorite Linux or Solaris box. It's also fairly easy to write programs to utilize the Grid thanks to the various CogKits [cogkits.org] for Python, Java and Perl.

    • These grids are all great and wonderful as far as peak performance is concerned, but I'm wondering how the latency associated with long haul networks affects peformance for a range of applications that are not embarrassingly parallel.

  • Did we get the skynet^H^H^H^H^H Grid achieves consciousness at 2pm EST time joke yet?
  • Two massive computers joining....

    DOE National Energy Research Science Center's (NERSC) IBM SP RS/6000, a distributed memory machine with 2,944 compute processors WINTERMUTE , together with a smaller 160 processor Intel system NEUROMANCER , will make up a combined 3,328 processor Unix system ... And this is only the beginning... expect alien artificial intelligence to be contacted very soon.

  • Supercomputers are mostly a Government pork program. Notice that there are very, very few of them in the private sector. It doesn't make sense to have a supercomputer unless you have single problems that require large amounts of time on it. Supercomputers aren't economic as crunch engines - they cost more per MIPS than good desktop machines. That's because they're low-volume, hand-built machines.

    This is the fallacy of "supercomputer centers" and "supercomputer networks". You don't want 1% of a supercomputer; you want a machine of your own.

    There was a time when sharing big number-crunching machines made sense. Until the mid-1980s, there were commercial scientific computing service bureaus running big iron and selling CPU time. They're all gone, along with Control Data Corporation, Cray, and the commercial market for supercomputers.

    If you really want a shared big engine cheap, cut a deal with a big hosting provider for off-hours time on the server farm. Set up a Beowulf cluster of a thousand rack-mounted 1U servers, crunching from midnight to 6AM every night. All you'd really need to do is negotiate a bulk buy of offpeak-only shell accounts. All the machines are identical and the cluster has lots of internal bandwidth, so you can get real coordinated work done, not just the low-bandwidth stuff like SETI and cryptanalysis.

    • They're all gone, along with Control Data Corporation, Cray, and the commercial market for supercomputers.

      Tell that to IBM...

      Set up a Beowulf cluster of a thousand rack-mounted 1U servers

      Clusters have their own set of issues and problems.

      This is the fallacy of "supercomputer centers" and "supercomputer networks". You don't want 1% of a supercomputer; you want a machine of your own.

      But everyone can't have a machine of their own that processes huge parallel jobs. You have to buy one, and share it between many users. So while you may only get 1% of a supercomputer's time, during that 1% of time you can use 10-100% of it's power. Considering the type of jobs we're talking about, that's a hell of a lot better than having a regular desktop crunching 100% of the time. It could take months to complete a job that could be done in an hour on a supercomputer, and waiting months for each step during your research would really suck.

      The fact remains, supercomputers are not dead. They're still widely in use and people are still buying them for good reason.
  • Globus? Let me guess, they'll be using it to run simulations of terraforming lifeless asteriods.

    NERSC? Designed and built by the dot-com effluvia of the 1990s (Eugenics Wars)

    But the good news is, Nimoy will have a final resting place when he dies.


  • GRID Computing is the current sexy term in scientific computing, but its something that is so vague that it can mean all things to all people. Which is perhaps why its suddenly so popular, everyone can get their pet project funded.

    To some people it means actualy hardware, routers, fibre, supercomputers, that sort of thing. Certainly in the UK and Europe this group consists mostly of Particle Physicists, see the GridPP [gridpp.ac.uk] Project Homepage for details of whats going on there...mostly the Particl Physicsts seem to have ridiclous amounts of data on their hands (Petabytes/day) that they have to ship. Fun stuff!

    To the astronomical community it means software, virtual observatories, data mining and intelligent agents. In the UK and Europe have a look at the AstroGrid [astrogrid.ac.uk] and the AVO [eso.org] projects. Although some of us are talking about hardware, the project I'm working on for instance, eSTAR [estar.org.uk], is putting robotically operated telescopes onto the GRID. However even here the main focus of the project is on the fun stuff we can do with the software, intelligent agents and data mining spring immediately to mind. In the US the NVO [us-vo.org] is the main focus of GRIDs for the astronomers there...

    Al.
  • Think they will let me run WinMX on it. With a petabyte of storage I could share a whole look of Divx.
  • ...is nowhere near as good as Neon Genesis Evangelion or Serial Experiment Lain. The dubbing totally sucked!
  • So, um, how fast can this thing compile a kernel? ;)

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...