File modification date.
Yes, modification time and file size are the two things that rsync checks by default.
Now, while reflecting on the funny mod someone gave you, what operations need to be done to retrieve that information for each file? And furthermore, how does that number of operations scale with regard to the number of files in your working set?
Not that it matters in this case anyhow. Moving the problem to the desktops would multiply the original problem, not resolve it.
Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.
It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.
Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.
Somehow, I think you misunderstood what this article is about.
Given the very frequent mention of 'disk based storage', and how flash is so much better, I'm not sure that I did.
It's not about SSD.
No it's not about SSD, that is the problem, it reads like they have never heard of them.
Memcached prevents Facebook's disk-based databases from being overwhelmed by a fire hose of millions of simultaneous requests for small chunks of information.
flash memory has much faster random access than disk-based storage
Each FAWN node performs 364 queries per second per watt, which is a hundred times better than can be accomplished by a traditional disk-based system
Swanson's goal is to exploit the unique qualities of flash memory to handle problems that are currently impossible to address with anything other than the most powerful and expensive supercomputers on earth
Swanson's own high-performance, flash-memory-based server, called Gordon, which currently exists only as a simulation...
I'm not saying that a wide array of low-power nodes is a bad idea. But unless they address the current state of technology, rather than a conveniently quaint world in which using flash as your primary storage makes you some sort of innovator, it's hard to take them seriously.
"you could very easily run a small website on one of these servers, and it would draw 10 watts," says Andersen--a tenth of what a typical Web server draws.
And how does that per-website energy usage compare to a normal server, using SSDs, and running enough virtualized instances (or just virtual domains) to match the per-website performance offered by a single FAWN node?
You need to address the actual state of things, and not the strawman of what computing was 6 years ago (or however long) when the project was started. While they've been working, the world hasn't been standing still, and you cannot pretend that spinning disks are the only thing going.
Perhaps I'm being too harsh and it's a failing of TFA and not the original researchers. Given that a dual core Atom330 takes like 8 watts, it is entirely reasonable that you could build a very efficient cluster out of a whole mess of them and a few SSDs, and produce something like you insist that the article was about. That would be interesting, provided that it compared favourably against similarly state of the art systems of course.
Intel X25-E, 2.6 watts, 3300 Write IOPS, 35000 read IOPS*. So only one or two orders of magnitude more efficient...
And though no prices are given in the article for the FAWN, at $800 for the X25-E it's probably less expensive too. Particularly if you include setup and administration costs.
Not a bad idea in general, and not a bad idea in specific for 5 years ago, but pathetically outclassed in every area by a high end modern SSD.
I mean, those numbers sound small, but even I have no clue how many IO requests I am making right now... is ten cents per million a good price or a bad price? Dunno! Is a penny per 10,000 GET's a good price? Probably--that is ten bucks for 10 million requests, right?
It can add up fast.
My company provides an offsite backup solution, and we've been using S3 as our primary storage backend since a few months after S3 went live. It is not unusual for us that the per-op costs are greater than the actual data storage or transfer costs. It is worth noting however that our use of S3 is quite non-standard. We do some pretty extensive verification to catch bitrot should it ever occur, as well as some fairly convoluted data processing to minimize actual transfer overhead for updated files.
So the answer is that it really depends. If you just throw data up there all at once and hope it sticks, those additional costs won't matter. However, if you want to build something more complicated that doesn't just blindly trust S3, or that does efficient data updating, then yes, those costs do matter quite a lot.
Those that work at companies that are entirely family or employ owned, do you feel that your company is in better shape than those public stock corporations?
I am co-founder of a small company that's been around for a few years now. While we have certainly noticed the recession, we continue to grow and our monthly revenue is the highest it has ever been.
We are debt free, and actually appreciating the slowdown a little bit as it gives us time to step back and take a longer view of our product development.
"The following is not for the weak of heart or Fundamentalists." -- Dave Barry