Google Prefers DRAM to Hard Disks

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Google Prefers DRAM to Hard Disks 354

Posted by timothy on Sunday February 03, 2002 @10:19AM from the speed-versus-spin dept.

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

This discussion has been archived. No new comments can be posted.

Google Prefers DRAM to Hard Disks

Load All Comments

Search 354 Comments Log In/Create an Account

Comments Filter:

I can see it now... (Score:2, Offtopic)

by AcidDan ( 150672 ) writes:

In the hallowed halls of Google... Row upon row of uber-boxen with a Bagillion megabytes of ram...

Then someone trips over the power chord...

-- Dan =)
- Re: Power Chord- (Score:2, Funny)
  
  by kuhneng ( 241514 ) writes:
  
  The sound a Mac makes when you turn it on.
- - Re:I can see it now... (Score:2, Funny)
    
    by Egonis ( 155154 ) writes:
    
    Actually... when I worked at Internet Direct (in Toronto, Canada) one of the NetAdmins shut down a DNS Server with his ass when he backed into a Netfinity box.
    
    So where is your UPS NOW?
Additionally (Score:4, Insightful)

by Phosphor3k ( 542747 ) writes: on Sunday February 03, 2002 @10:24AM (#2945977)

How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.

Share
twitter facebook
- Re:Additionally (Score:2)
  
  by LWolenczak ( 10527 ) writes:
  
  I don't think I have ever seen DRAM fail, but I sure have seen my share of both ide and scsi drives die.
  - Re:Additionally (Score:2)
    
    by sammy baby ( 14909 ) writes:
    
    Exactly what I was going to say. DRAM has the "no moving parts thing" on its side, which is a pretty powerful bennie, if you ask me.
    - Re:Additionally (Score:2)
      
      by LWolenczak ( 10527 ) writes:
      
      We had a scsi drive that died due to it's circuit board going south.....
- Re:Additionally (Score:4, Informative)
  
  by VAXman ( 96870 ) writes: on Sunday February 03, 2002 @12:16PM (#2946345)
  
  DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).
  
  Parent Share
  twitter facebook
- Re:Additionally (Score:3, Insightful)
  
  by darkwhite ( 139802 ) writes:
  
  Very often. And the problem is, unlike hard drives, which will try their best not to return the data if they have a hint that it's corrupted (meta-data, checksums, etc.), DRAM will be more than happy to return the incorrect data, which then might get written to disk. Some of the errors I've seen due to corrupt DRAM are pretty amusing.
- Re:Additionally (Score:2, Informative)
  
  by Spoing ( 152917 ) writes:
  
  RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
  When's the last time you checked your RAM? I get about 1 bad module for every 2 machines. Defects usually show up on the initial test, though some don't show up for a few years.
  Don't believe me? Try it yourself; Memtest86 [teresaudio.com]. I suggest running one full test (can take days) when you first build a machine, and when you run into odd problems that you can't figure out. The default tests are good, but I've had times where it did miss problems.
  - Re:Additionally (Score:2)
    
    by haruharaharu ( 443975 ) writes:
    
    RAM is a mechanical device
    
    Ram is an electronic device. It has no mechanical parts, save for the junction between it and the motherboard.
  - Re:Additionally (Score:3, Informative)
    
    by Hal-9001 ( 43188 ) writes:
    
    RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
    
    RAM is not mechanical, it's capacitive, i.e. it operates by storing charge. One of the advantages of semiconductor, or solid-state, electronics over pre-transistor electromechanical relays and vacuum tubes is that they require no moving parts, making them more rugged and reliable.
    
    Defects usually show up on the initial test, though some don't show up for a few years.
    
    A curious thing about solid-state electronics is that a large number of parts fail initially, then the failure rate is constant for several years, and then the failure rate increases again. This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days. Contrast this with mechanical failure, which continually increases with time.
    - Re:Additionally (Score:2)
      
      by roguerez ( 319598 ) writes:
      
      This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days
      This makes no sense. A long warrenty period makes a product sell better. When 99.9% of parts that are going to fail do it in 30 days, it's in the interest of the manufacturer to either have no warrenty at all or a very short one (to prevent claims), or one that is very long, like 10 years or lifetime. After the first 30 days, hardly anything is going to break, so it would be stupid not to prolong the warrenty period. This can be done essentially 'free'. And I've seen RAM that have a lifetime guarantee.
  - Re:Additionally (Score:2)
    
    by lkaos ( 187507 ) writes:
    
    What?
    
    RAM is solid state. It is simple a circuit board with a couple of IC modules. There are absolutely no moving parts.
    
    The reason RAM goes bad is chiefly from operating temperatures and poor construction (mostly impurities in the air).
    
    There are absolutely no moving parts in RAM though. That is just silly to even suggest :)
    
    In fact, the only real moving parts in most PC's are the storage devices and fans...
    - Re:Additionally (Score:3, Informative)
      
      by Chmarr ( 18662 ) writes:
      
      Ram has both an electronic component, and mechanical. Try this experiment: Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!
      RAM heats up as it's used, metal expands, the Chips on that little PCB stretch slightly, joints weaken with each power cycle, sometimes they fragment. The same thing with the connectors to the motherboard.
      Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!
- Re:Additionally (Score:2, Interesting)
  
  by Blind Lemon ( 534249 ) writes:
  
  With hard disks you have things like RAID to protect against disk failure. No such thing with RAM. Sure, you can get protection from a bit going bad, but not for loosing a chip.
  The company I work for makes computers with a lot of RAM and so we've been researching how to survive a RAM chip failure, but as far as I know no system implements such a technology.
  - Re:Additionally (Score:3, Interesting)
    
    by Defiler ( 1693 ) writes:
    
    IBM sells this technology. They call it ChipKill.
    Perhaps this is what your company is looking for:
    ChipKill [ibm.com]
- Re:Additionally (Score:3, Insightful)
  
  by SilentChris ( 452960 ) writes:
  
  I've seen a lot of "logic" arguments to this post, but I think people are missing a sort of obvious one: size. If you had enough RAM as an average hard drive (say, 20 gigs) I'm sure that at least *one* piece would be faulty. You're comparing, in a best-case server scenario, a gig of RAM vs. a 80-gig hard drive. I think if the numbers were even it'd be a "fairer" fight.
RAM vs. HDD (Score:2, Redundant)

by hitchhacker ( 122525 ) writes:

If google has something like 10,000 linux PC's, I would definately think that using RAM and a ramdisk for the rootpartition would be cheaper than putting a hard drive in every PC. I would imagine that the hard drives would be the first to go if something failed.
Obviously, if they used DRAM for their HUGE central databases, it would not be a cheaper solution.
But, I'm talking out of my ass, because I don't know how their datacenter works.. anyone anyone?

-metric
- Re:RAM vs. HDD (Score:3, Interesting)
  
  by Anonymous Coward writes:
  
  actually google uses freebsd on their PCs
  - - Re:RAM vs. HDD (Score:2)
      
      by Anonymous DWord ( 466154 ) writes:
      
      Yup. It's a pretty-tweaked version of RedHat.
Speed saves (Score:3, Insightful)

by coreman ( 8656 ) writes: on Sunday February 03, 2002 @10:30AM (#2945992) Homepage

They make their money on hits served so speed is far more cost effective than cost of storage medium. If they can speed up serviing hits, they're ahead of the game.

Share
twitter facebook
From the article: Why DRAM is so fast (Score:5, Informative)

by yerricde ( 125198 ) writes: on Sunday February 03, 2002 @10:30AM (#2945993) Homepage Journal

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

Share
twitter facebook
- Re:From the article: Why DRAM is so fast (Score:4, Interesting)
  
  by jackb_guppy ( 204733 ) writes: on Sunday February 03, 2002 @10:46AM (#2946046)
  
  A simpler way of saying this:
  
  Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.
  
  -or-
  
  Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.
  
  In both cases we are talking about 1 million Hits per X time.
  
  In case 1 - it costs a port on master switch and $100,000 for the machine.
  
  In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.
  
  Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.
  
  Parent Share
  twitter facebook
I've always wondered (Score:2)

by Lord Hugh Toppingham ( 319381 ) writes:

Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?
AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.
- Re:I've always wondered (Score:2, Informative)
  
  by MarkusQ ( 450076 ) writes:
  
  Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?
  AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.
  You can do it in Linux (and probably in Windows too, though I'm not sure how)--but there generally isn't a reason to. The VM/RD cycle swings back and forth over the years, but at present the PC world seems to be running best with 2::1 VM ratio (using a chunk of HD about twice your RAM size to simulate more RAM) although part of this is that RAM is being used up by smart caching of disk. This holds for Windows, Linux, and (IIRC) Open BSD.
  So, the short answer is: you could do it, but it would likely slow you down overall.
  -- MarkusQ
  - Re:I've always wondered (Score:3, Informative)
    
    by Cylix ( 55374 ) writes:
    
    I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.
    
    The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.
    
    Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.
- The latest 2600 mag... (Score:2)
  
  by AltGrendel ( 175092 ) writes:
  
  ...has an article on this very subject. The listed article [yahoo.com] "How to hack from a RAM disk" is what you're looking for.
- Re:I've always wondered (Score:2, Interesting)
  
  by jc42 ( 318812 ) writes:
  
  Huh? Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory. This doesn't seem to have been much of a problem with linux (or any of the unix clones). A "ramdisk" isn't exactly a new concept in the unix environment.
  
  In fact, this sort of trick was exactly why the unix "block device" abstraction was invented more than a quarter century ago. It allows you to have a file system on anything that can store data in addressable chunks called "blocks". Memory works just fine for this.
  
  An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa). This causes most apps' temp files to be in main memory, and eliminates rotational delays for these files.
  
  There's no real problem with mapping the entire file system to memory.
  - Re:I've always wondered (Score:2)
    
    by haruharaharu ( 443975 ) writes:
    
    An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa).
    
    This was because SunOS had a dog-slow filesystem; even today, /tmp is usually backed by ram. Linux (and probably BSD) has a fast enough filesystem that this isn't an issue
- Re:I've always wondered (Score:2)
  
  by tshak ( 173364 ) writes:
  
  Caching the entire Kernal and commonly used DLL's is supported in WinXP (Pro, not sure about Home). I believe there is undocumented support in Win2K but I have not verified this. A friend of mine built a machine with 512MB of RAM and put XP on it and enabled this "cache" feature. Although the boot time was a little (barely noticeable) slower, the load time of apps and common tasks was incredible - almost as if you were using a solid-state device (a PDA, for example).
Scary! (Score:4, Insightful)

by Anonymous Coward writes: on Sunday February 03, 2002 @10:32AM (#2946000)

Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.

By the way, this guy can't speak for beans.
The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"

Share
twitter facebook
- Re:Scary! (Score:5, Funny)
  
  by Phosphor3k ( 542747 ) writes: on Sunday February 03, 2002 @10:44AM (#2946037)
  
  The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.
  
  Google fights back.
  
  Parent Share
  twitter facebook
  - Re:Scary! (Score:2, Funny)
    
    by Fissure_FS2 ( 220895 ) writes:
    
    Just my luck. Our favorite search engine takes over the world on my birthday.
    
    I can imagine it now: just as I am about to blow out the candles, a giant DRAM chip bursts out of the cake and says, "I am Google. I am here to protect you. I am here to protect you from the terrible secret of space... er, the web."
Once again a simplistic view (Score:3, Informative)

by damieng ( 230610 ) writes: on Sunday February 03, 2002 @10:33AM (#2946001) Homepage Journal

I often see comments from this from people who have little experience in business.

What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.

I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.

Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.

Share
twitter facebook
- Re:Once again a simplistic view (Score:2, Insightful)
  
  by NNKK ( 218503 ) writes:
  
  Stack reliability, as someone else mentioned, on top of power and speed savings.
  
  Personaly I seriously doubt that all or even close to all of the stuff google stores is stored in DRAM, it's more likely they'd keep newer data and high-access data in DRAM, and older stuff gets archived to disk, avalible for recall later, but slower.
  - Re:Once again a simplistic view (Score:2)
    
    by Alomex ( 148003 ) writes:
    
    Personaly I seriously doubt that all or even close to all the stuff google stores is stores in DRAM
    
    You better believe it. Altavista already did that a long time ago. Hotbot (inktomi) had a similar all-in-memory scheme. Since Google is faster than those two, all the more reason to believe that the data is in DRAM (although surely they have backups in HDs and tape, but that is a different story).
The key to it being cheaper is.... (Score:3, Insightful)

by rayd75 ( 258138 ) writes: on Sunday February 03, 2002 @10:36AM (#2946013)

That it can handle many clients with little latency... You'd have to duplicate the data across a huge number of disks to provide similar response time to clients. Sure, if you were the only client, you couldn't tell the difference but with thousands upon thousands of clients all seeking data that would be stored in different locations on a disk things would quickly grind to a halt. Because so much unrelated data is being requested, seek time is the key. Sure, memory is more expensive per meg but its ability to serve so many more clients makes it less expensive overall.

Share
twitter facebook
Imperial MegaRam? (Score:4, Interesting)

by Ben Jackson ( 30284 ) writes: on Sunday February 03, 2002 @10:39AM (#2946022) Homepage

They may be referring to Imperial Technology's MegaRam [imperialtech.com] solid state disks (SSDs). They claim about 36,000 IO/sec. Compare that with 80-120 IO/sec on a typical SCSI drive. I'm pretty sure that eBay is using them.
I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.

Share
twitter facebook
Fewer servers needed (Score:5, Interesting)

by michaelmalak ( 91262 ) writes: <michael@michaelmalak.com> on Sunday February 03, 2002 @10:39AM (#2946023) Homepage

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.

Share
twitter facebook
- - Re:Fewer servers needed (Score:2)
    
    by ErikZ ( 55491 ) writes:
    
    I want to know HOW they are doing this. Are they using PIIIs with 64GB of memory?
    - Re:Fewer servers needed (Score:2, Informative)
      
      by The Smith ( 305645 ) writes:
      
      Yes, but it's all rather confusing. Read this thread [iu.edu] in the Linux kernel mailing list if you're really interested. (WARNING: You won't understand any of it unless you know how the x86 virtual memory mechanism works.)
I believe it... (Score:3, Informative)

by josh crawley ( 537561 ) writes: on Sunday February 03, 2002 @10:41AM (#2946026)

At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.

Josh Crawley

Share
twitter facebook
RAM Disks (Score:3, Interesting)

by buckrogers ( 136562 ) writes: on Sunday February 03, 2002 @10:43AM (#2946036) Homepage

If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.

You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.

I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.

Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!

Share
twitter facebook
- Re:RAM Disks (Score:2)
  
  by epsalon ( 518482 ) writes:
  
  20TB is peanuts for a search engine the size of Google. Google's needs are closer to 500TB, or even a few PB. Don't forget the cached pages and the usenet archive! These stuff should take at least a few PB.
  - Re:RAM Disks (Score:3, Insightful)
    
    by graxrmelg ( 71438 ) writes:
    
    Google doesn't need petabytes of storage. Right now they claim 2 billion Web pages, 700 million Usenet messages, and 330 million images. That's a total of 3 billion things. Let's wildly overestimate their average size as 100K (remember that the Usenet archive doesn't include binaries). The storage space required would be 3e9 * 1e5 = 3e14, or 300 TB.
    
    It's probably true that 20 TB isn't enough for Google, but it's not true (and won't be for quite a while) that the cached pages and Usenet archive require "a few PB".
    - Index space? (Score:2)
      
      by SuperKendall ( 25149 ) writes:
      
      That's a great calculation, but just figures the space needed for caching the raw data.
      
      What about the indexes required to actually access that data in a timley manner? Once you factor in the extra stuff needed to actually make it a viable search engine, you could easily imagine a PB or more of storage was required.
      
      As for the other poster going on about comrpessing the data - I doubt they'd want to compress the data when all they are concerned about is raw speed of processing requests!
      
      .
      - Re:Index space? (Score:3, Insightful)
        
        by spiro_killglance ( 121572 ) writes:
        
        I don't know how google to it. But typical the
        main over head is the inverse file, for every word on every page, you just need the number of the page it was in and the word position on that byte. So the Google needs around 8-12 bytes per (non stoplisted) word.
  - Re:RAM Disks (Score:2)
    
    by buckrogers ( 136562 ) writes:
    
    Guess what? Google doesn't cache images! And I bet they compress the cached page too.
    
    So, let's get wild and say that there is 120TB of html pages that we care about... if you compress these pages then they would fit in 10 TB. Still plenty of room on a 20 TB RAM Disk for the index to all these pages.
    
    And besides, I'm just guessing... They might have 8GB of RAM in every machine, for all I know.
Five minute rule (Score:3, Informative)

by NearlyHeadless ( 110901 ) writes: on Sunday February 03, 2002 @10:45AM (#2946041)

The raw cost of DRAM ($/MB) is still much higher, but that is not the complete analysis. Database god Jim Gray's analysis shows that you should keep data in memory if it is going to be accessed every five minutes or less.

See The Five-Minute Rule, ten years later (Word Doc) [microsoft.com] or it's HTML-ified Google Cache [google.com]

Share
twitter facebook
price comparison (Score:4, Informative)

by karmma ( 105156 ) writes: on Sunday February 03, 2002 @10:46AM (#2946044)

Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.

In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.

Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.

Share
twitter facebook
- Re:price comparison (Score:2, Insightful)
  
  by bdolan ( 125199 ) writes:
  
  If you have heavily hit database indexes, i.e. google, then you may need 100-1000x fewer machines. The cost of the disks is not the important cost, it is the far fewer number of machines for an equivalent query rate. However, you want to have far more than 2gb of directly addressed ram per machine--in fact at current prices it is probably cost effective to put 100's of gb per machine if you need to keep the query ram based--even if the CPUs are dwarfed in cost by the ram.
  
  This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP -- Ram drives using an IO subsystem adds a huge overhead compared to indexing in arrays and natural data organization as opposed to fixed blocks of byte that have to be retrieved as a unit with 100s++ of instructions and security models in the way of access!
- Re:price comparison (Score:2)
  
  by darkwhite ( 139802 ) writes:
  
  $250/gig? That's not reasonably priced. I think PC133 DRAM can cost as low as $125/gig in bulk now...
- Re:price comparison (Score:2)
  
  by Reziac ( 43301 ) writes:
  
  It's gone back up a bit since then, but last December, Star Components (www.star-components.com) was selling PC133 DIMMs at $55/gig. Newer RAM types were somewhat higher, but nowhere near $250/gig.
  - Re:price comparison (Score:2)
    
    by haruharaharu ( 443975 ) writes:
    
    I just bought a Gig of DDR ECC ram for $150 from compsource [c-source.com], so there's a datapoint for you.
A number of reasons it could be "cheaper"... (Score:2)

by AtariDatacenter ( 31657 ) writes:

Maybe he's talking in terms of TCO (total cost of ownership). Over its lifetime, RAM costs less than its hard drive counterpart?

Another point... as long as you don't store you METADATA 100% in RAM, you can store at least your data (cached web pages) in RAM. What happens if it gets dumped? Simple. Just respider the pages you lost and go on. Small amounts of data loss can be covered.

Okay. It may sound like I'm talking out of my ass because I am. It is really hard to cover for a statement like that. But lets talk again on the performance angle that has been covered (but with a little more emphasis on RAID disks).

You *may* be able to get better cost/performance with LOCAL memory (not ram-based drives) than you could with a RAID array. And a raid array could never equal the performance you get with local memory. Of course, local memory could never reach the storage you achieve with a raid array. So these two paths seem to diverge (bulk storage vs speed) when comparing local DRAM to RAID'd disks.

His statement MAY make sense, but it would have to be put into a larger context. (RAM is better than disk in X circumstances.)
Something Nobody's Mentioned (Score:4, Interesting)

by Guppy06 ( 410832 ) writes: on Sunday February 03, 2002 @11:08AM (#2946103)

DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.

Share
twitter facebook
Bottlenecks... (Score:3, Insightful)

by percey ( 217659 ) writes: on Sunday February 03, 2002 @11:12AM (#2946119)

More often than not with a database your bottleneck is I/O. When you run a database you cannot have enough disks, and you cannot have enough FAST disks. In order to accomplish the kind of I/O bandwidth that a place like google is going to need you're going to need the best EMC arrays (or perhaps an IBM Shark) money can buy. And guess what? They run you megabucks. You can't just take a bunch of SCSI disks and expect them to perform as well as Fibre channel arrays. You gotta have controllers with multiple caches. Everyone who's never dealt with databases think that SCSI is the beginning and the end of hard drives, and its so far from being the truth its not funny.

I've really no idea how complex the queries are or whether or not they use a relational database but that being said its still has to hit the disk to retrieve the data and that's where every decently designed database's bottleneck is. Besides google caches all its pages. Egads! Do you have any idea how much RAM they must need for just that alone? Yes RAM is faster. Oracle even teaches you to try to keep your frequently used tables in cache anyhow, because its fastest, of course they qualify that with the word small realizing that most people don't have the gobs of memory needed to cache large tables.

Share
twitter facebook
More importantly than the DRAM... (Score:2, Insightful)

by LatJoor ( 464031 ) writes:

Although it's not mentioned in the Slashdot writeup, I think that probably the most important part of this interview was the discussion of Google's business model and future. It's good to see that they're committed to not getting in over their heads with extraneous services. They've found a business model that works and they're sticking to it, rather than getting greedy and adding dumb new services that have nothing to do with searching, or "search," as he put it.

A lot of technology companies would do very well to follow Google's example, it seems to me. They're proving that Internet services are a perfectly sound venture if the company has a sensible business model and always keeps focused on providing quality technology and services in the area that they know best.
Pretty amazing, but I can see it. (Score:5, Insightful)

by dinotrac ( 18304 ) writes: on Sunday February 03, 2002 @11:16AM (#2946133) Journal

Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

Share
twitter facebook
- Re:Pretty amazing, but I can see it. (Score:2, Funny)
  
  by russh347 ( 316870 ) writes:
  
  how many sysadmins are more than a step from insanity anyway?
  
  Absolutely none.
Overview of Today's Headlines (Score:4, Insightful)

by Corrado ( 64013 ) writes: <rnhurt@@@gmail...com> on Sunday February 03, 2002 @11:16AM (#2946134) Homepage Journal

Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.

This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!

Check it out here [google.com].

Share
twitter facebook
- Re:Overview of Today's Headlines (Score:3, Interesting)
  
  by costas ( 38724 ) writes:
  
  Hmmm... I can top that [memigo.com].
- Re:Overview of Today's Headlines (Score:2)
  
  by mikeage ( 119105 ) writes:
  
  Columbia has something similar.. my future brother-in-law was a grad student writing some code for it. It's from their Natural Language Project.
  
  http://www.cs.columbia.edu/nlp/newsblaster
You guys are missing the point... (Score:4, Insightful)

by duffbeer703 ( 177751 ) writes: on Sunday February 03, 2002 @11:29AM (#2946181)

DRAM requires little electricity and produces almost no heat.

Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.

Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.

Share
twitter facebook
- wrong... 10watts for 1GB reg. ECC SDRAM (PC133) (Score:2)
  
  by Lazy Jones ( 8403 ) writes:
  
  ...
- - Re:You guys are missing the point... (Score:3, Informative)
    
    by kesuki ( 321456 ) writes:
    
    With over 35 DRAM chips on the american market what good does it do to check only a single type of memory module from a single maker?
    However, since I don't want to spend the rest of the day finding out the lowest power DRAM module with the highest capacity, I will assume that the best case Senario is 4GB of ram using approximately the power of two HDs of any capacity after 4GB you would require either a custom DRAM NAS/HD or a second PC. However NAS Dram with multiple gigabit ethernet ports offer the most DRAM storage per watt of electricity. Still it is at least 4x as power hungry as an 8 HD 1TB Raid server. Assuming each DRAM chip in the NAS is 64 Megabytes. To reach one terrabyte we need 16 thousand Dram chips. Obviously if each chip even requires .1 watts to operate they're using 1600 watts of power. While the HD server may need a peak of 500+ watts even under load it still isn't using as much as when all 8 drives spin up so it's probably only using 400 watts total for the whole system under load.
    
    While it's pretty clear that power isn't an area that google can save money using DRAM over HD, and while DRAM is solid state and if it doesn't fail the first 6 months it probably wont fail in the first 100 years, it is still going to become obsolete long before it fails, requiring replacement. I've also figured that at $4 a Dram chip the cost of 1TB is $64,000 Vs $5,000 for a total package 1TB HD server. Even if you replaced the drives every 6 months it would take 15 years before the cost of materials on HDs exceeded the cost of materials on DRAM. However, there is a cost savings. First of all if you're mirroring the drives that doubles the electrical and material cost of the HD storage. Second of all that 1 GB HD server is only going to have it's seek time saturated by only 100 megabit ethernet.
    Unless the data is entirely sequential (not requiring seek time) and even in the case of sequential data a single gigabit ethernet is sufficient. That Dram 1TB has at worst 12 NS latency or .000000012 seconds per seek. That provides 83,333,333 seeks per second. The only thing he was wrong about is that DRAM isn't 200,000 times as faster as HD for data that requires seek it's on a magnatute of Millions of times more effcient. 200,000 times is probably based on real world performance differences. based on using DRAM vs HD in a "real world" setting and not just on paper. That means to replicate the Speed of DRAM with hard drives is a futile task.
    Far more futile than trying to replicate the capacity of HDs with DRAM.
The key is in the MTBF (Score:5, Informative)

by eldurbarn ( 111734 ) writes: on Sunday February 03, 2002 @11:42AM (#2946229)

My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

Share
twitter facebook
Google is great... (Score:2)

by Calle Ballz ( 238584 ) writes:

...but they'll get a million times better as soon as they'll allow boolean searches. Man sometimes it's frustrating!!
- Re:Google is great... (Score:2, Insightful)
  
  by russianspy ( 523929 ) writes:
  
  They do. Read the guide. You can include parethesis, AND, and OR. I don't remember if they allow XOR and others. Oh... They allow negation as well.
- Re:Google is great... (Score:3, Informative)
  
  by SpinyNorman ( 33776 ) writes:
  
  Um.. they do.
  
  AND is by default
  OR is OR
  NOT is -
  
  I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:
  
  A AND (B OR C) AND !D
  
  Which would be: A B OR C -D
DRAM probably is cheaper...Here's why. (Score:3, Informative)

by Bowie J. Poag ( 16898 ) writes: on Sunday February 03, 2002 @12:00PM (#2946285) Homepage

Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.

Cheers

Share
twitter facebook
- Re:DRAM probably is cheaper...Here's why. (Score:2)
  
  by foobar104 ( 206452 ) writes:
  
  In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.
  
  That isn't true at all. If you wanted to, you could mirror all of your data on two separate JBODs-- RAID level 1-- but that's not efficient. If you use RAID 3 or RAID 5, you'll never use more than 33% of your storage for parity data. As the size of your RAID set increases, the percent allocated for parity data goes down. In a 10-disk set, one disk is used for parity (in the case of RAID-3), which is only 10% of your total storage. (In the case of RAID-5, you'd still use only 10%, but you'd use 10% of each disk instead of one whole disk.)
The Google feature I want (Score:4, Funny)

by Hanzie ( 16075 ) writes: on Sunday February 03, 2002 @12:28PM (#2946417)

See that "mature content filter"?

How about a "mature content ONLY search"?

Share
twitter facebook
Innumeracy and price comparisons (Score:2)

by Alomex ( 148003 ) writes:

One would have expected /. nerds could to better at price comparisons than what we have seen so far.

Quick, what is a better price a 1994 Ford Fiesta at $10,000 or a brand new Ferrari at $12,000?

Clearly the Ferrari is a better deal. To do a proper price comparison you have to look beyond the sticker price alone.

What is the performance you get? resale value? maintenance cost? operation costs?

If all you wanted to buy is megabytes of storage you would be better of buying backup tapes. They are hard to beat price wise.

But in all likelihood you need to store that data for some purpose, so depending on frequency of access, latency, total cost of operation (tapes are operator/robot mounted), alternative solutions with higher sticker price, might well end up being cheaper.

What Eric Schmidt claims is that if you have a ton of data and you are accessing it all the time DRAM is more cost effective than (a) a large mirrored RAID array server or (b) a zillion tapes being mounted by operators.
TOC, RAM vs. Steel Platter (Score:4, Informative)

by eyepeepackets ( 33477 ) writes: on Sunday February 03, 2002 @12:59PM (#2946544)

Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.

These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)

Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*

If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.

Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)

Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.

As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.

Share
twitter facebook
They must mean FIXED HEAD 'disks' v DRAM (Score:2)

by Mongoose ( 8480 ) writes:

Fixed head hard drives have no seek time, since tracks have a many to many relationship to heads. That's also why you can't get them at compusa. ( expensive )
Why DRAM is cheaper (Score:2)

by Animats ( 122034 ) writes:

The price advantage of storing the data in DRAM comes from needing fewer copies. A disk-based search engine like Inktomi has many duplicated clusters, each with a copy of all the data, to get the traffic capacity needed.
Also, Google's searchable data is considerably smaller than the total size of the pages searched, even excluding the images. Read their white papers. And I doubt that they store the cached pages and images in DRAM. Those don't get hit that often.
Silly people! (Score:3, Insightful)

by m.dillon ( 147925 ) writes: on Sunday February 03, 2002 @03:23PM (#2947063) Homepage

You guys crack me up some times.

I'll lay it out. Obviously Google is not storing the master copy of the full multi-terrabyte database in ram, but they are certainly storing as big a chunk in ram as they can, and the cost model ought to be easy for anyone to understand if you sit down and think about it.

Consider the cost difference between the following EQUAL amounts of hard disk storage:

* A 160GB IDE drive

* A 160GB SCSI drive

* Four 40GB drives in an external RAID system

* The cost of a small medium-performance RAID
system.

* The cost of a larger high-performance RAID
system scaleability to a terrabyte.

* The cost of an *EXTREMELY* high performance RAID
system scaleability to multiple terrabytes.

Now consider the cost of building, say, a 40 terrabyte data store (lets not worry about backups for this experiment). If you build it out of a bunch of huge SCSI drives connected to a bunch of PC's it can be fairly cheap. But if you build out of, say, high performance EMC arrays it could cost millions of dollars more to get the same theoretical performance.

So when you consider the cost of storage, you always have to consider the cost of the PERFORMANCE you want to get out of that storage. All the Google CEO is saying is that, Doh! It's a hellofalot cheaper to improve the performance aspects of the system by buying DRAM in a distributed-PC environment in order to be able to avoid having to purchase extremely-high performance (and extremely expensive) disk subsystems. The cost of purchasing the DRAM to make up for the lower-performing disk subsystem is actually LOWER then the cost of purchasing an equivalent higher-performance disk subsystem.

The same is true in the ISP world. When RAM was expensive we had to rely on big whopping HD systems to scale machines up. But when RAM became cheap it turned out that you could simply throw in a very high density drive with 1/4 the performance that four smaller drives would give you, and the operating system's RAM cache would take care of the problem. Suddenly we no longer needed to purchase big whopping disk arrays.

Think about it.

-Matt

Share
twitter facebook
- Re:Cost v Speed (Score:5, Interesting)
  
  by Space cowboy ( 13680 ) writes: on Sunday February 03, 2002 @10:31AM (#2945995) Journal
  
  JohnHegarty scribbled
  
  I am sure the google archive is only a few 100gb
  
  Err. No.
  
  I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
  
  On that basis, I echo the original query about how it can be economical to use RAM...
  
  Simon
  
  Parent Share
  twitter facebook
  - Re:Cost v Speed (Score:3, Insightful)
    
    by Alomex ( 148003 ) writes:
    
    AFAIK, Google does not cache images, only HTML text. The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.
    
    Hence the size of the cache is somewhere between 500GB and 3TB, plus the index would be another 40% of that.
    
    My best guess is that the google archive is somewhere around a 2-3 terabytes, and that the total amount of DRAM available at google at the present time is somewhere between 5-10 terabytes.
    - Re:Cost v Speed (Score:5, Informative)
      
      by Space cowboy ( 13680 ) writes: on Sunday February 03, 2002 @11:52AM (#2946263) Journal
      
      Alomex wrote:
      
      The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.
      
      I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
      
      As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/ [financewise.com]) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int [eu.int]. This is a fairly innocuous site, but if I do:
      
      cd /opt/search/var/sites/26_europa.eu.int du -sk . 7731586 .
      
      That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.
      
      I just think that your estimate for the cache size is a long way short of the real figure...
      
      Simon
      
      Parent Share
      twitter facebook
      - Re:Cost v Speed (Score:2)
        
        by Alomex ( 148003 ) writes:
        
        I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
        
        Indeed, this has been a hot area of debate for the last 7 years or so, when the first paper with a substantially larger web than that indexed by search engines came out.
        
        Usually search engines estimate the web size to be about 15-30% of that claimed by statistical measurements.
      - Re:Cost v Speed (Score:3, Interesting)
        
        by jovlinger ( 55075 ) writes:
        
        Just a thought:
        
        when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).
        
        It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.
        
        The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.
        
        Re:Cost v Speed (Score:2)
        
        by Space cowboy ( 13680 ) writes:
        
        Sorry, I wasn't being clear - I forgot to point out that these files are already compressed (using gzip), but only on an individual file basis. The real site is significantly larger than this 7.7Gb, and I should have mentioned that.
        
        Whereas I agree that we're getting close (or maybe have passed) the point where it would make sense to do something better, since I don't have much of a budget, and disk is cheap ....
        
        ATB,
        Simon.
    - - Re:Cost v Speed (Score:2)
        
        by Alomex ( 148003 ) writes:
        
        Google does so cache images [google.com]. :)
        
        Cute, but not quite correct. They cache post-stamp sized copies. If you want the full image you have to go to the original web site.
        
        Granted, this does increase somwhat my original estimate of the amount of DRAM required.
      - Re:Cost v Speed (Score:3, Insightful)
        
        by Score Whore ( 32328 ) writes:
        
        The idea that all this is on DRAM is staggering. If the refresh stops (board failure, power problem) the data is just GONE?!
        
        Google doesn't create content. They are a search engine. Nor are they in the business of archiving the net for posterity. If they lose data, it's out there to be recollected or if not, then there's no point in them saving it anyway.
  - Re:Cost v Speed (Score:4, Insightful)
    
    by andykuan ( 522434 ) writes: on Sunday February 03, 2002 @10:58AM (#2946071) Homepage
    
    It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.
    
    Parent Share
    twitter facebook
  - Re:Cost v Speed (Score:2)
    
    by Yokaze ( 70883 ) writes:
    
    I think he (Eric Schmidt) spoke of storing the indices.
    Traditionally, they are only stored partially in RAM due to their size.
    
    Certainly, the unprocessed pages are still stored on HDs as one doesn't gain
    anything from storing them in RAM.
    - - Re:Cost v Speed (Score:3, Informative)
        
        by Yokaze ( 70883 ) writes:
        
        > each of which occupies how many bytes in index files?
        
        According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" [nec.com] by Segey Brind and [google.com] Lawrence Page [google.com], the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
        But, to quote from the paper:
        
        With better encoding and compression of the document index, a high quality web search engine may fit onto a 7Gb drive of a new PC.
        
        Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)
        
        >I dont think Linux can do it
        At least they think it can do it, since they are using Linux boxes, at least accoring to
        [ddj.com]
        The Technology Behind Google, by Jim Reese CEO.
        More than 10,000 Linux boxes, that is.
  - Re:Cost v Speed (Score:5, Interesting)
    
    by leuk_he ( 194174 ) writes: on Sunday February 03, 2002 @11:09AM (#2946106) Homepage Journal
    
    this makes more sence then:
    PC World: What are Google's biggest challenges?
    Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
    
    If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
    
    And i bet they store only the most frequest used part of the index in memory.
    
    Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)
    
    Parent Share
    twitter facebook
    - Re:Cost v Speed (Score:2)
      
      by zerocool^ ( 112121 ) writes:
      
      Hrm...
      
      So this is why SDRAM prices have been going up and not down lately...
      
      Bastards...
      
      ~z
  - Re:Cost v Speed (Score:2, Interesting)
    
    by kesuki ( 321456 ) writes:
    
    Google doesn't cache images google doesn't index or cache dynamic (scripted) content google caches PDFs as Plaintext.
    However they are definitely on the scale of terrabytes. "Searched the web for a.
    Results 1 - 10 of about 1,470,000,000. Search took 0.31 seconds." Assuming an average of ~25k cached per link 1.4 billion links would leave a cache of about 37,632,000,000,000 bytes, However The Cache doesn't necisarily need to be stored on RAMDISKs. He clearly states that it's 200,000 times more efficient for _seekable_ data. This means not the 'cached' data but rather the stuff that the search alagorythm looks at to show you appropriate hits. So the heart of the 'search' engine is using RAM exclusively, but 'cached' data would almost certainly still be stored on HDs, unless of course someone has built google a bunch of 120GB DRAM disks that use conventional HD interfaces (sorta like the Flash memory Drives, only on steroids when it comes to speed).
    It could even be misleading Google could have meant flash memory HDs were cheaper but mistakenly refered to them as DRAM.
- Re:Cost v Speed (Score:2)
  
  by PhotoGuy ( 189467 ) writes:
  
  I am sure the google archive is only a few 100gb
  
  Huh? I would have thought it would have been between 10x to 100x that much. Especially if they cache most pages. (Maybe they just use dram for the indexes, and hd's for the cache?)
  
  I still don't understand that claim. $300 will get me a 160G drive, and I can load four of them in a cheap PC case or 1U rackmount case, 640G per unit. That's under $2K for .64 Terabyte.
  
  RAM prices vary wide, but say on the low side I can get 256M for $20. I'd need 2560 sticks of 256M to equal 640G, or $51,200 for the equivalent storage. And that doesn't take into account that most reasonably priced PC motherboards only handle 2G or 4G of memory these days. You'd need 160 motherboards in the best case, adding $80,000 to the cost, assuming you could get 4G per unit, and $500 per motherboard/chassis. Let's, see $51K+80K = $131K, versus $2K.
  
  RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).
  
  Either their archive is a lot smaller than I assumed, or they're talking performance/price tradeoffs, where speed has a high premium.
  
  -me
  - Re:Cost v Speed (Score:2)
    
    by Graymalkin ( 13732 ) writes:
    
    Your single box for 2000$ doesn't take into consideration the fact Google needs to make their tons of information available to everyone at once. With a search engine like Google it is going to be rare information is just going to sit around and never be used. This means that by conventional database architecture logic you keep it cached in RAM. Hard drives are useful when you're cutting power to a computer, how often does Google reboot?
  - Re:Cost v Speed (Score:2)
    
    by BinxBolling ( 121740 ) writes:
    
    RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).
    The data isn't just sitting there static, though: It's being searched. To switch to hard drives and maintain their current performance level, they would have to increase the parallelism of the search, by having many more copies of the index. One copy of the index on disk is not really equivalent to one copy of the index in DRAM, because the DRAM index can be searched many times in the period it takes to search the HD index once.
    
    The quantity they're trying to minimize is not dollars per megabyte, but rather dollars per (megabytes searchable per second).
- Re:Cost v Speed (Score:2)
  
  by DrXym ( 126579 ) writes:
  
  A few 100gb to cache the entire internet?
- Re:Hard disk is an obsolete technology (Score:2)
  
  by __aaaaxm1522 ( 121860 ) writes:
  
  Look at PDAs / handheld PCs. They use flash memory, albeit out of necessity (price, power consumption, size, etc)... but we're already beginning to see laptops incorporate solid state storage technologies. It's only a matter of time.
  
  Now, if we could just get around that pesky limited-write lifetime ... ;)
- Re:Hard disk is an obsolete technology (Score:4, Interesting)
  
  by Dyolf Knip ( 165446 ) writes: on Monday February 04, 2002 @01:16AM (#2949099) Homepage
  
  So hard drives are about 10 years ahead of RAM in terms of $/MB? Sounds about right. 1GB hard drives were on the high end of normal users at the time, as is 1GB of RAM today (though I seem to recall having more than 10MB RAM at the time). Assuming the same increases in the next decade... 100GB RAM and 10TB drives. I like.
  Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.
  As someone else said, it is only a matter of time.
  
  Parent Share
  twitter facebook
- Re:Take a BUSINESS perspective (yes, it's painful. (Score:3, Insightful)
  
  by Colz Grigor ( 126123 ) writes:
  
  One other follow-up:
  
  Google will also likely break their technology into three components:
  
  spidering and indexing
  
  searching
  
  caching
  
  Each of the financial analysts for the business groups responsible for each asepct of Google's technology may calculate the value of DRAM vs. HD differently. For searching, latency is extremely critical, but it's not so critical for caching, and there may be some physical problems with solely using DRAM for indexing.
  
  That being said, I would expect Google to use HDs for spidering and indexing, DRAM for searching, and HDs for caching. Mr. Schmidt was probably only discussing technology on the most visable component of Google's technologies: searching.
  
  ::Colz Grigor

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I can see it now... (Score:2, Offtopic)

Re: Power Chord- (Score:2, Funny)

Re:I can see it now... (Score:2, Funny)

Additionally (Score:4, Insightful)

Re:Additionally (Score:2)

Re:Additionally (Score:2)

Re:Additionally (Score:2)

Re:Additionally (Score:4, Informative)

Re:Additionally (Score:3, Insightful)

Re:Additionally (Score:2, Informative)

Re:Additionally (Score:2)

Re:Additionally (Score:3, Informative)

Re:Additionally (Score:2)

Re:Additionally (Score:2)

Re:Additionally (Score:3, Informative)

Re:Additionally (Score:2, Interesting)

Re:Additionally (Score:3, Interesting)

Re:Additionally (Score:3, Insightful)

RAM vs. HDD (Score:2, Redundant)

Re:RAM vs. HDD (Score:3, Interesting)

Re:RAM vs. HDD (Score:2)

Speed saves (Score:3, Insightful)

From the article: Why DRAM is so fast (Score:5, Informative)

Re:From the article: Why DRAM is so fast (Score:4, Interesting)

I've always wondered (Score:2)

Re:I've always wondered (Score:2, Informative)

Re:I've always wondered (Score:3, Informative)

The latest 2600 mag... (Score:2)

Re:I've always wondered (Score:2, Interesting)

Re:I've always wondered (Score:2)

Re:I've always wondered (Score:2)

Scary! (Score:4, Insightful)

Re:Scary! (Score:5, Funny)

Re:Scary! (Score:2, Funny)

Once again a simplistic view (Score:3, Informative)

Re:Once again a simplistic view (Score:2, Insightful)

Re:Once again a simplistic view (Score:2)

The key to it being cheaper is.... (Score:3, Insightful)

Imperial MegaRam? (Score:4, Interesting)

Fewer servers needed (Score:5, Interesting)

Re:Fewer servers needed (Score:2)

Re:Fewer servers needed (Score:2, Informative)

I believe it... (Score:3, Informative)

RAM Disks (Score:3, Interesting)

Re:RAM Disks (Score:2)

Re:RAM Disks (Score:3, Insightful)

Index space? (Score:2)

Re:Index space? (Score:3, Insightful)

Re:RAM Disks (Score:2)

Five minute rule (Score:3, Informative)

price comparison (Score:4, Informative)

Re:price comparison (Score:2, Insightful)

Re:price comparison (Score:2)

Re:price comparison (Score:2)

Re:price comparison (Score:2)

A number of reasons it could be "cheaper"... (Score:2)

Something Nobody's Mentioned (Score:4, Interesting)

Bottlenecks... (Score:3, Insightful)

More importantly than the DRAM... (Score:2, Insightful)

Pretty amazing, but I can see it. (Score:5, Insightful)

Re:Pretty amazing, but I can see it. (Score:2, Funny)

Overview of Today's Headlines (Score:4, Insightful)

Re:Overview of Today's Headlines (Score:3, Interesting)

Re:Overview of Today's Headlines (Score:2)

You guys are missing the point... (Score:4, Insightful)

wrong... 10watts for 1GB reg. ECC SDRAM (PC133) (Score:2)

Re:You guys are missing the point... (Score:3, Informative)

The key is in the MTBF (Score:5, Informative)

Google is great... (Score:2)

Re:Google is great... (Score:2, Insightful)

Re:Google is great... (Score:3, Informative)

DRAM probably is cheaper...Here's why. (Score:3, Informative)

Re:DRAM probably is cheaper...Here's why. (Score:2)

The Google feature I want (Score:4, Funny)

Innumeracy and price comparisons (Score:2)

TOC, RAM vs. Steel Platter (Score:4, Informative)

They must mean FIXED HEAD 'disks' v DRAM (Score:2)

Why DRAM is cheaper (Score:2)

Silly people! (Score:3, Insightful)

Re:Cost v Speed (Score:5, Interesting)