Google's Academic TB Swap Project 190
eldavojohn writes "Google is transferring data the old fashioned way — by mailing hard drive arrays around to collect information and then sending copies to other institutions. All in the name of science & education. From the article, 'The program is currently informal and not open to the general public. Google either approaches bodies that it knows has large data sets or is contacted by scientists themselves. One of the largest data sets copied and distributed was data from the Hubble telescope — 120 terabytes of data. One terabyte is equivalent to 1,000 gigabytes. Mr. DiBona said he hoped that Google could one day make the data available to the public.'"
Should we be continuing this fallacy? (Score:3, Informative)
Uhh, no it isn't. It's really 0.9765625 terabytes.
Re:Should we be continuing this fallacy? (Score:5, Funny)
Re: (Score:2, Informative)
1000GB = 0.9765625 TB, not 1TB.
Nope (Score:3, Informative)
The disk manufacturers define it as 1000 megabytes which is 1000 kilobytes which is 1000 bytes.
The OS measures it as 1024 megabytes, which is 1024 kilobytes, which is 1024 bytes
Why? Because when you're buying a drive, 750 Gigs sounds bigger than 698.5 gigs.
Not acording to NIST (Score:4, Interesting)
If you want to use the binary values, you might as well use the correct "tebi" prefix. NIST [nist.gov] says you should, and it looks like the IEC, IEEE and BIPM agree.
Re: (Score:2, Informative)
(a contraction of tera binary byte) is a unit of information or computer storage, abbreviated TiB.
1 tebibyte [wikipedia.org] = 240 bytes = 1,099,511,627,776 bytes = 1,024 gibibytes
The tebibyte is closely related to the terabyte, which can either be an (inaccurate) synonym for tebibyte, or refer to 1012 bytes = 1,000,000,000,000 bytes, depending on context.
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:2, Insightful)
see: http://en.wikipedia.org/wiki/Tebibyte [wikipedia.org]
* 1 Terabyte = 1000 Gigabyte
* 1 Tebibyte = 1024 Gibibyte
Bark! Bark! Bark! (Score:5, Funny)
Re:Bark! Bark! Bark! (Score:5, Funny)
I'm sorry, that's wrong too:
* 1 byte == 2 nibbles
* 1 byte != 1 bite
--
Byte nazi police, proudly serving since 2^1025
Re:Should we be continuing this fallacy? (Score:5, Insightful)
Re: (Score:2, Insightful)
Re:Should we be continuing this fallacy? (Score:4, Insightful)
So if you buy a set for RAID one day, the next day they may no longer stock the drive you need and your vital information is put at unnecessary risk because... what, because the hard drive manufacturers can't decide whether they want to screw you out of 7% (using 1 GB = 1 billion bytes) or 5% (using 1 GB = 1 million kilobytes, which they curiously agree on equaling 1024 billion bytes. What a coincidence that KB is 2^10, but GB is 10^9?)
Think about that for a moment before you lambast the argument for proper labeling of drives.
Re: Google's Academic TB Swap Project (Score:2)
Re: (Score:2)
First, as taught in any school book and computer manual through history (see Apple, Amiga, Microsoft, Commodore): 1024 bytes = 1Kilobyte, 1024 Kilobyte = 1 Megabyte etc. because the computer could only calculate in exponents of 2 (1 and 0) and 20MB (20480 kilobyte) was about the largest size hard drive you could get.
A Kilobyte is 1024 (2^10) bytes. A Megabyte is 1024 Kilobytes or 1,048,576 bytes (2^20) and a Gigabyte is 1024 Mega
Re: (Score:2)
When I see a power of 2 next to the units, I expect the units to be in a power of 2 too.
Large datasets (Score:5, Informative)
This is becoming more and more the norm in scientific research and Google's work is quite welcome.
Re:Large datasets (Score:4, Funny)
Latency may leave something to be desired though :)
Re: (Score:2)
Never underestimate the bandwidth of a lorryload of backup tapes traveling at 60 miles an hour.
Close enough.. This is attributable to Andy Tanenbaum according to http://www.bbc.co.uk/dna/h2g2/A678576 [bbc.co.uk] (and one of his books I read).
Another ontopic remark.
Google either approaches bodies that it knows has large data sets
I know people who also approach bodies that they know have large 'data sets', but that doesn't get them a lot of 'bandwidth' ;)
In Other News (Score:5, Funny)
Mod parent up (Score:3, Informative)
Here's what happened when I FedExed my RMA to Newegg, packed very carefully. Note the bent motherboard - I didn't even know you could do that. The good news is that FedEx paid part of my claim ... they paid $100 plus the $8.33 that the FedEx store charged me to fax in the claim forms. The bad news is that they did not refund my original shipping or pay more than $100 on the over $280 of damage that they did. It also took about 4 hours of phone calls to even convince FedEx that I was not the seller, and
Re: (Score:2)
Did you buy additional insurance over the $100 you get by default?
Re:Mod parent up (Score:5, Informative)
Re: (Score:2)
Why? You said they did pay your $100 claim after all.
No idea what you're talking about. You generally fill out the form yourself, and select what insurance you want.
Re: (Score:2)
Re: (Score:3, Informative)
I still don't know where you get that idea. Insurance is meant to handle any kind of damage, including being completely destroyed in plane crashes, car accidents, train derailments, theft, loss, and anythi
Re: (Score:2)
I use the word "intentional" because it wouldn't surprise me if the kind people at my local Fe
FedEx, UPS, insurance. (Score:2)
A while back I bought a radio-controlled airplane, pre-assembled. It came in a big box, most of which contained the wing. So it was fairly fragile, but well packed, in tri-wall. Got it sent UPS, with insurance for the full val
Re: (Score:2)
Sorry to nitpick, but this scam has been around for ages - you broke something, oh no! I'll send it to myself and pretend UPS did it. Hell, I even saw it in Seinfeld. Not that you were doing this, but what you tried is pretty suspicious to an outside observer.
They need SOME proof of value or even that the box was actually full to fight this type of fraud, and the
Re: (Score:2)
Re: (Score:2)
I don't know if, in this age, this is wise. With so many corporations buying up major parts of our lives like food, communications, salaries, and transportation, I would challenge you to take a look at the structure of the different entities that affect you daily. The unfortunate fact is that every decision you make needs to be researched to find the most appropriate course of action based on who is behind the marketing. Su
Re: (Score:2)
Good Hard Padded Cases (Score:2)
Re: (Score:3, Insightful)
I remember an article I read on this I think back in the year 2000. The was a research scientist who built a standardized platform (That is to say, a specific PC case with a certain number of hard drive bays, and certain network cards) so that he could exchange data with other universities. They would fill up the data on the networked PC, and they could ship it to any of the participating projects, knowing that they'd get back the same hardware in return.
I remember at the time thinking it was just one of
Re: (Score:3, Insightful)
Re: (Score:2)
Even using the full bandwidth between Internet2 connected Unis, it would still take 2~3+ days to transfer 250Tb of data.
10Gb/s is close to the max you can do with one frequency. That will all change once they start pumping multiple colors down their fiber. Their bandwidth will explode & Go
Re: (Score:2)
Re: (Score:2)
http://www.acmqueue.org/modules.php?name=Content&p a=showpage&pid=43 [acmqueue.org]
Chris
Re: (Score:3, Informative)
This concept has also been applied to such things as the Sloan Digital Sky Survey [sdss.org]. Astronomers do tend to generate a lot of data with large surveys such as this.
Re: (Score:2)
rsync (Score:2)
Re: (Score:2)
The article and the GP is about sending large amounts of data, as in terabytes. In this discussion, 8 GB is tiny, and is easily downloaded much faster than even express mail. Besides, rsync won't really help if all your data is unique (such as astronomical data). Rsync really helps when very little of your data set changes between updates, such as ba
Re: (Score:2)
Re: (Score:2)
Praise the Google, don't point out they are just doing the same thing as everyone else.
Google is watching.
Never underestimate... (Score:2)
Oblig. (Score:2)
Still very much applies today.
Ryan Fenton
Re: (Score:2)
The page you linked to had a smart idea. Rather than just have the raw disks, create some sort of architecture inside to allow for rapid transmission of the data from the vehicle upon arrival. I could see specialized vehicles that have been hardened against an accident with an inverter to power the drives that have external fiber optic ports hooked up to massive, high speed RAID arrays to rapidly dump the contents to another system at the location and upload content for the next destination.
Then a GPS syst
How long until... (Score:2)
Re: (Score:2)
I assumed the Bugs Bunny interpretation as well. There, now you have at least two instances.
so.. (Score:3, Interesting)
Re: (Score:2, Flamebait)
Whos going to own the data?
As always the people of the world own the data. The copyright holders are, however, given a short term monopoly on making copies of it, with certain exceptions.
I hope Google isnt going to say they do like they want to with the old books theyre scanning.
Google has not, as far as I know, claimed "ownership" or even copyright on anything they've scanned. They have, however, created their own database of metadata about the works, which they use to enable people to more easily find specific items in the original data.
Everytime you download a hubble picture will it have a google watermark?
Umm, maybe. Why do I care if they add watermarks to it? If they are in the way
Re: (Score:2)
Umm, maybe. Why do I care if they add watermarks to it?
Re: (Score:3, Interesting)
For example, Google does not own the copyright on out-of-copyright books that it scans in (nobody does, by definition.) At best, it might own the copyright on the scan that it did, but that's really unlikely--copyright protects creative expression and a straight scan doesn't add any.
However, they probably have some rights under unfair competition law because they have gone through a lot of work
Re: (Score:3, Informative)
I'm not so sure that the result in theirs, necessarily. They'd need to properly attribute it. Many science archives have rules about how to properly attribute their work.
Don't get me wrong -- many of the scientists want people to use their data (eg, see The Astronomer's Data Manifesto [ivoa.net]), but they also want to know who's using it, because it's how they justify the value of their projects, and the costs incurr
Re: (Score:2)
Now, what I've done would reasonably upset you, but there is no law (at least in the US) that requires me to attribute your ideas to you. In fact, under those facts, I completely own the copyright in my article and you have no legal remedy. Now, there may be repercussions--I
Re: (Score:2)
I really don't like the idea of a "private" (yes i know its publically traded) company having control of this public information.
You do know many government agencies already outsource IT and other projects to "private" companies who have all this government generated information, right?
The data was paid for by tax payers. Google will inevitably make money from this otherwise they wouldn't be doing it.
Yeah, and right now Microsoft makes money off of selling them the OS and office suite. This isn't a question of if the government will be paying for the ability of their employees to do word processing, it is just a matter of how much and which companies will be getting the money. I don't trust Google any less than I do MS, who currently supplies
Re: (Score:2)
There's destroying and then there's locking away. There are people pushing for laws that say one person's copy of a public domain work is copyrighted by that person for the typical term and that no one else may make a copy from that copy without permission. It's specifically about granting broadcasters copyright over their rebroadcast of a public domain work, but i
Never underestimate ... (Score:3, Interesting)
Looks like Google is hoarding data. Seems they at least are equating information with power and money. And them that has the power and money makes the rules.
Re: (Score:2)
They're not hoarding the data. They're storing it online in open formats, at least according to the article.
Re: (Score:2)
Other Uses for Mass Data Transfer (Score:4, Funny)
Barney: Oh ho, oh yeah, you had a good laugh, Moe.
Moe: The results came back today. (reading a printout) You owe me seventy billion dollars.
Barney: Huh?
Moe: No, wait, wait, wait, that's for the Voyager spacecraft. Your tab is fourteen billion dollars.
Hubble Data (Score:2, Funny)
Re: (Score:2)
"Star. Star. Star. Damnit, star. Star. God this sucks. Star. Star. Space ship. Star. Star. Star. God nothing but fucking stars! Fuck hubble, useless piece of shit!"
Re: (Score:3, Funny)
Re: (Score:2)
I hope they're not using ... (Score:2)
Isn't TB... dangerous? (Score:2, Redundant)
Re: (Score:2)
I'm glad I wasn't the only one!
Dangerous precedent (Score:2)
Units... (Score:2)
One terabyte is equivalent to 1,000 gigabytes.
Hey, where do you think you are ? It's Slashdot here ! Everyone knows that ! What people here want to know is how much that does in Library of Congress...
The only thing you're getting by saying that is a flamewar between 10 kinds of people, whose who count only in MB (and disagree with you) an those who count in both MB and MiB (and agree with you) !
For my take on the issue, see this precedent post [slashdot.org] of mine.
Re: (Score:2)
Re: (Score:2)
we use binary units because formatted capacity is measured in binary units.
It seems you haven't read my previous post I was linking to. Please do :)
:) . My server at home is a FreeBSD, I launched fdisk and it reports size in "Meg", neither MB nor MiB. So I can't say :) What command did you ente
Your affirmation is wrong. The correct affirmation would be "we use binary units because some OSes reports formatted capacity in binary units".
Proof I've read your post in its entirety is that I was going to write "MS Windows" (like I did in the aforementionned post) instead of "some OSes"
As the old sayng goes (Score:2)
-Andrew Tannenbaum
Re: (Score:2)
Google != Open Source (Score:2)
I call B.S. "Lack of engineering time" is why we haven't seen the source to the core search engines or gmail?
So you're saying "Microsoft = open source", then? (Score:2)
If the average Slashdotter applied the same flawed logic to Microsoft, you'd have to say they're big open source sponsors too. After all, Microsoft has released GB of free source code for utilities, etc. for decades. Sure, the code mostly only works with their proprietary "family jewels" (the OS and development tools), but why quibble?
Now I just my own PB HD. (Score:2)
I can only hope that bandwidth can keep up. How long would it take to transfer a 120 TB bit torrent file over either cable or dsl?
Well, maybe we'll have small TB USB flashd
Just waiting for the day... (Score:2)
...that a researcher sends them all the printouts of his/her data... on greenbar...
So... (Score:2)
...why not tapes? (Score:4, Interesting)
I'm not criticizing or anything; just curious is all.
120Tb is 100 SAIT tapes (Score:2)
Re: (Score:2)
There are basically two reasons one would choose to use HDDs over tapes: compatibility and price.
Compatibility: Sure, one scientific institution may have standardized on a specific type of tape, but what about all the rest? Pretty much everyone in the world can read a standard HDD formated with a well known filesystem.
Price: what is the cost of HDDs vs. tapes per gigabyt
Re: (Score:2, Interesting)
Besides, using HDD for transfer means immediate access to the same data on the other end with speeds that are unmatched with tape backup systems. It might also be worthy to note that data sets that large usually are stored on large RAID systems like this one from LSI Logic, http [lsilogic.com]
Re: (Score:3, Interesting)
The "TeraScale SneakerNet" paper posted earlier [arxiv.org] anticipates and answers that. They ship a fully assembled computer with processor, RAM, OS and network interface. Plug it in to the wall, plug it in to the network and assuming you had previously agreed on a networking protocol, you're rolling as soon as it boots! No restoration, no decompressing, immediate access to the data.
Does anyone have a Linux distro for this specific purpose? Preferably tiny enough to fit onto a USB key and optimized for bandwidth, p
Re:1TB = 1024 GB (Score:5, Insightful)
Why is a Kilobyte 1024 bytes, if "Kilo" means 1000, both according to the SI and the greeks (Kilo is derived from khilioi). If 1 kg = 1000g, 1 kV = 1000V, 1 km = 1000m, why should hard disks break the pattern?
When we're talking about addressable computer memory, approximating the kilobyte to 1024 is a convenience, but since Terabyte gives such a huge error, and makes absolutely no sense for data transfer or disk sizes, it's really time we stopped this illogical naming convention just because some engineers found a term convenient 40 years ago.
Re: (Score:2, Interesting)
But I have long since buried my problem with using the SI prefix with byte to mean a power of 2, actually not sure i ever had one, I just accepted it. I am happy with the 1024b=1Kb, 1024Kb=1Gb and 1024Gb=1Tb. The usable space is lower in the case of non-volatile storage anyway, 1Tb never means 1024Gb might be closer to 1000Gb (i don't know).
Re: (Score:2)
As for why it's different for disks to RAM, disk manufacturers discovered a long time ago that they could make more money by using SI rather than binary measures for disk size, because it ar
Re: (Score:3, Insightful)
Real geeks have no problem with overloading.
Re: (Score:3, Informative)
Re: (Score:2)
Yes, it's so funny when all these guys just keep arguing why 1024bytes should really be 1000bytes because they don't want to care that it's history, it's practical, it wor
Re: (Score:2, Insightful)
If we want to worry about that then use KiB and MiB. But that doesn't make a huge amount of sense. 1KiB = 400h bytes. 1MiB = 100000h bytes. Powers of 256 would make a lot more sense.
Re: (Score:3, Funny)
You meant 'terabyte', not 'tarabyte'. If you're going to correct someone, do it right.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Like days of old (Score:4, Interesting)
According to what I'm told every time I watch a DVD, these scholars were in fact stealing books.
Re: (Score:2)
They're working on that part of the problem by subjecting two trucks of hard drives to quantum entanglement.