Even their own math on the auxiliary costs doesn't add up. I outline the what I think are the inconsistencies in a peer thread to this one.
S3 is redundant. They compared themselves to S3.
I came back from a genetic research conference a couple of months ago where an IT professional at Argonne National Labs spoke about his efforts to build a compute and storage capability there to support the needs of the genetic sequencing community at the lab. Unllike BackBlaze, which has about 200-300 machines, he was managing over 250,000 cores (so, what, maybe 15,000 machines?) and said that it was getting to the point where it would be cheaper for him to move to a cloud service provider.
The lesson I took away from that is that at the small scale, you can probably do it more cheaply, but as you scale up, the larger outfits end up offering a better deal, which isn't all that surprising.
I thought that might be an interesting data point, since you basically said the same thing in your last paragraph.
You didn't read the article.
Not only did I read the article, I read their blog post. And the original blog post they had about their 67 TB pod, back when they first wrote it.
You know how I know that? They explicitly state that they don't have any costs for replacement hard drives over 3 years, because they're *all under warranty* for 3 years. When a drive fails, they get a new one from the manufacturer, no questions asked.
There's no such thing as not having any costs. They employ Sean to replace drives and build pods full time. That's a cost, and it's not included in their charts at the bottom of the blog post. Bandwidth isn't either, and they never claim it is. Read on.
And yes, actually, they do include bandwidth in the cost. They disclose it within the same paragraph.
And this is where it gets interesting. In the paragraph you refer to, they say it costs $2100 for them to run a rack of 10 pods for one month. If each pod has 135TB, and we define 3PB to be 3000TB, then we need 23 pods to get 3PB (this is assuming all they use is their new pod design, which they don't, they also use their less dense legacy design as well). But let's not get off track.
So, 2.3 racks (23 pods total), each costing $2100 a month to operate in terms of *space rental*, *power* and *bandwidth*. This is the cost that includes bandwidth. So, assuming we run 2.3 racks for 36 months, we get $2100 * 36 * 2.3 = $173,880, just to operate 3PB for 3 years. That doesn't even include the cost to build them, or Sean's salary. And yet, somehow, at the bottom of their post, they assert that it costs a *total* of $96,000 to build and operate 3PB for three years. Odd, no?
My point is that the costs they quote at the bottom of their blog post are inaccurate. Even if you take out bandwidth, which they state is roughly 1/3 of their operating costs, we're still talking about $116k of operating costs, above and beyond the cost to build the machines.
Even after you factor all that in, they still aren't beginning to offer the service that Amazon or Google does. Again, my point is they are misleading folks when they compare their product to Amazon as though they have the same features. They don't.
First: ALL Marketing is misleading. That is what marketing does. Accentuate the positive, eliminate the negative. So complaining about that is just idiotic.
OK, there's also the concept of truth in advertising. I sense you want to argue about this, but false comparisons are different than "accentuating the positives".
Second: You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing. Sure that doesn't included cost for premises, and HighSpeed Internet to multiple locations. But still, that is aggregated with all the other clients.
Amazon doesn't charge for one data center with a couple dozen rack-mounted machines being watched by one tech. If that's what you think S3 is, you are mistaken.
Third: what are you paying for in the "cloud", I mean besides ethereal concepts. Does Amazon tell you how they do things? You probably know less about Amazon (and the others) setup so you're comparing something you know something about (not everything) verses something you know almost nothing about, and the complain that they aren't doing it in a comparable way. You don't know.
Actually, I know quite a bit. Even if you ignore things I learned on the job, if you just read their basic literature about what they offer, you can see that they use Dynamo on the back end and that they have infrastructure "designed to provide 99.999999999% durability and 99.99% availability of objects over a given year", as well as "sustain the concurrent loss of data in two facilities.". I happen to know how they derived those numbers, but that's not a useful discussion...the numbers were honestly derived, even if they can't empirically show them to be correct (the service simply hasn't been around long enough for that). The point is that S3 is in an entirely different ballpark from what BackBlaze offers.
Fourth: Your basic assumption is that Backblaze has no contigency for drive replacement, which is false. Since these are "new" drives there might be insufficient data about failure rates and therefore the actual cost of replacement (never mind warranties) or having drives in both Hot and Cold Spare setups. I'm sure that Backblaze in their $5/MO service figures what it costs to store data, have spares, keep the Datacenter running and profitable. Even if they double the cost to $10, it still puts the others to shame.
No, my basic assumption is that they don't factor in the costs for drive replacement. They say in their blog post that they have drives that are out of warranty. They don't include the costs to replace them in their analysis. Even for the drives that *are* under warranty, they don't factor in the cost of identifying them, ordering replacements from the manufacturer, and replacing the actual drive. In other words, I'm asserting that the dominating cost isn't necessarily the drive, it's the labor. They even say that the "hidden costs" of doing all this are the labor. They don't discuss what they pay Sean, but they aren't factoring it in to their graphs at the bottom of the post.
Have you compared the data loss rates for the last three years between Amazon and Backblaze?
No, I'm asking them to.
Can you even compare or is that data held secret (see point 1b).
S3 has a service level agreement for the durability of the data. I haven't seen this from BackBlaze. I still maintain that the services aren't comparable in any meaningful way.
My point here, is that you're pulling shit out of your ass and thinking it doesn't stink.
I'm not sure what you're trying to say here.
Even if it isn't directly comparable, it is at least in the realm of consideration, EVEN if everything you said is true. And at 10 times less in cost, that can buy a lot of redundancy. It is just a matter of perspective.
Completely true. BackBlaze offers what seems to be a good product for a cheap price. They fill a niche. What I'm saying is that their product is so different from S3 that I'm surprised they would ever make the comparison.
I didn't mean to assert that BackBlaze's product has no place in the market. Heck, I'd love to build a couple of their pods for home use. My only point was that it is a product that has very different strengths than S3. The only reason I picked S3 in the comparison is because *they* picked S3 when they decided to discuss pricing.
I'm not sure I ever said anything about having to think about it. I can tell you that S3 runs at least triple redundancy (enough to survive the loss of two data centers simultaneously). That's a very different product from what BackBlaze is selling.
My problem with Backblaze is their marketing is very misleading...they pit these storage pods up against cloud storage and assert that they are "cheaper", as though a storage pod is anything like cloud storage. It isn't. Sure, there's the management software issue that's already been mentioned, but they do no analysis on redundancy, power usage, security, bandwidth usage, cooling, drive replacement due to failure, administrative costs, etc. It's insulting to anyone who can tell the difference, but there are suits out there who read their marketing pitch and decide that current cloud storage providers like Google and Amazon are a rip off because "Backblaze can do the same thing for a twentieth the price!" It's nuts.
You can see this yourself in their pricing chart at the bottom of their blog post. They assert that Backblaze can store a petabyte for three years for either $56k or $94k (if you include "space and power"). And then they compare that to S3 costing roughly $2.5 million. In their old graphs, they left out the "space and power" part, and I'm sure people complained about the inaccuracies. But they're making the same mistake again this time: they're implicitly assuming the cost of replicating, say, S3, is dominated by the cost of the initial hardware. It isn't. They still haven't included the cost of geographically distributing the data across data centers, the cost of drive replacement to account for drive failure over 3 years, the cost of the bandwidth to access that data, and it is totally unclear if their cost for "power" includes cooling. And what about maintaining the data center's security? Is that included in "space"?
On a side note, I'd be interested to see their analysis on mean time between data loss using their system as it is priced in their post.
You could say the Backblaze is serving a different need, so it doesn't need to incur all those additional costs, and you might be right, but then why are they comparing it to S3 in the first place? It's just marketing fluff, and it is in an article people are lauding for its technical accuracy. Meh.
Thanks so much for taking the time to write this. Great points all around. Well played.
You're glossing over the (very important) point about what exactly the rules in the store are. Sure, from a legal perspective, you can say "Their store, their rules" and be done with it. But from a practical perspective it matters a lot if they have a rule that says you can't re-implement the functionality present in current (or planned) official applications, as Apple had/has (I'm not current on their stance in this regard). Google, on the other hand, goes out of their way to point out that you can replace many components of the underlying system, like the contacts, email, dialer, and home applications. To say that both systems have rules, and therefore they are both the same is disingenuous; it would be a bit like saying "Well, the United States has laws, and so does North Korea, so they're basically the same."
Which brings me to my second point. For years, no one really knew what the rules were in Apple's store. Apps that developers had spent months on were magically rejected. There was no transparency. Compare with Google's store, which has very clear, transparent rules that developers must agree to before they can ever submit an application for inclusion in the market. I hear that Apple has tried to increase transparency recently, but I don't know to what extent they have succeeded.
If you consider both those differences, and then pile on the fact that even if Kongregate fails to abide by the rules, even though they agreed to them before submitting their application, Android users can still visit their website and install the application that way, you have a system that by all measures is more "open" than Apple's.
And all of that ignores the fact that the operating system itself is open source and can (an is!) used and modified freely by dozens (or hundreds) of companies around the globe.
All that is to say: the differences between the two systems are deep and far-reaching, not merely the fact that applications can be installed from a website on Android, but not iOS.
"...use the knowledge you gained on the job to get a better one."
Laughable, indeed. That's called "experience".
His point was that the app store is governed - you have no idea whether Apple will ever let your app make it into the actual store (see Google Voice). They might perfectly well say that it is an exact duplicate of a for-pay app, and it is therefore rejected. Remember, Apple takes 30%, so it's in their best interest to put more apps behind the pay wall. And we already know that Apple can reject any app for any reason (FCC inquiries notwithstanding).
egrep patterns are full regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. -- unix manuals