Slashdot videos: Now with more Slashdot!
You should absolutely be getting more bandwidth than that, you might contact our support to see what's up? We have students from Universities hitting 100 Mbits/sec upload rates, plus I suspect a few engineers in datacenters are getting even higher. We do not inherently throttle, although we use RAID6 with groups of 15 drives so inherently you are probably rate limited to 1 Gbit/sec by either the 1 Gbit/sec network card in the pod, or ?? which is the disk drive transfer rate.
Backblaze employee here. By the way, we're not "Linux-unfriendly", every single last datacenter machine is running Debian, that's like 950 machines! Most laptop customers use Windows or Mac so we did those versions first, and we're trying to get the Linux client finished up, it just got pushed down in priority a few times, but we don't mean it as a slight against Linux.
About CrashPlan - I have ALWAYS liked CrashPlan, and I think they are great and people should certainly consider CrashPlan if it fits their needs. You might also consider Carbonite and Mozy, I think these are all good products with a few tradeoffs here and there. Backblaze isn't perfect for all customers, for example, we don't yet have a Linux client. I believe Mozy has a better small business administration portal than Backblaze has also, if that's what your needs are.
Backblaze does support incremental backups, but it is a fairly simplistic incremental. For any file less than 30 MBytes there are no partial files, we just push a whole new copy to a whole new location in our datacenter. For any file more than 30 MBytes, we break the file into 10 MByte "chunks" and push each individual chunk if that chunk has changed. So the WORST thing you can do is prepend a single byte to the large file - this essentially causes every single 10 MByte chunk to change (slide to the right?) and so we have to retransmit the entire thing.
For a lot of programs dealing with large files, they tend to append bytes to the end of their file formats, which works great. If it is an entire bootable computer image, a lot of stuff will probably not move around (like huge swaths of binaries sitting in that computer image) and a lot of stuff WILL move around that will "accidentally" be backed up.
One final hint: by default TrueCrypt specifically thinks changing the modification time is "leaking information". Make sure you check the checkbox that when TrueCrypt changes the image, it needs to also update the last modified time. Backblaze uses that as a hint to go examine every byte in the file to see if it should be retransmitted.
I think this would be really awesome. Here's where it gets neat-> we already have an app running in hundreds of thousands of desktop and laptop computers! (Our "online backup application" involves a tiny service that runs to send your files at the datacenter through HTTPS.) So if we just updated the client with a small amount of statistics tracking (and maybe a nice checkbox to opt in or out) then we could immediately start collecting info.
Sort of related: A few years ago I played an online 3D video game (can't remember which one, might have been Quake?) that you could both report your graphics card and RAM configuration to the server, and the server would list the aggregate statistics. So there is some precedent for this kind of data collection and publication.
We do use AES to encrypt the files. We used a well known design where we use the public key to encrypt the AES256 key and FEK, then we use the AES key to symmetrically encrypt the file. Then we can use the passphrase to encrypt the private key. So it's kind of an onion, you use the passphrase, decrypt the private key, which is then used to decrypt the AES key and FEK, which is then used to decrypt the file. (We didn't invent this flow, it is used in several encrypted filesystems because it's a great design.) This was it is FAST (symmetric AES) plus has the total awesomeness of pub/private keys and all they imply (the idea that you can encrypt data with the public key that nobody listening can decrypt because they don't have the private key is really quite powerful).
We then use HTTPS to post this data from your laptop to our datacenter. From time to time this "double encryption" of both encrypting on the client and sending the already encrypted data through HTTPS anyway has helped keep our customers safe when HTTPS has been broken for a little while.
> with passphrases in case the PC is hacked. That's how important keeping the
> private key completely private is.
The flaw in your design is that when the PC dies, you can no longer decrypt the backup because you just lost the private key.
Some online backup companies in the past have solved this by having you store your private key in yet a 3rd party "escrow" location, so you don't have the only copy and yet the company with your backup data does not have the private key either. In essence that is what Backblaze does, just in an "easy to use" way. We store the private encryption keys on one particular server, completely separate from your data. The data is all on "pods". Is it as secure? I don't think anybody can claim 100 % security, we do the very very best job we can.
I leave you with the following thought -> if you would use encryption (like TrueCrypt) on your most sensitive data, *THEN* back up the TrueCrypt image to Backblaze, even if Backblaze wanted to read your data or if the NSA put their processing power on it and cracked your passphrase, they would have nothing, because you encrypted it BEFORE it was encrypted by Backblaze and sent through HTTPS to our servers. Maybe that would allow you to sleep soundly at night?
Hopefully "the truth" is a valid defense?
You might be surprised how little discount we get. Our last purchase of 4 TByte Hitachi drives (960 drives in one purchase) we paid $135 each before tax and shipping. "B&H Photo" sometimes wins the bid (I don't know how or why), but you can basically get that same price within a couple bucks in units of 1 or 2 from their website. Note: we have no affiliation with B&H other than satisfied customers, and B&H do not win the bid every time.
With that said, if anybody knows how to get more than $2 off "retail" please PLEASE let us know!!
Disclaimer: I'm a Backblaze engineer who wrote a lot of that code.
Your statement is a bit misleading, there are two levels of security in Backblaze. Data is always encrypted, and the "private key" is a totally standard OpenSSL PEM file that yes, we store for you. By default, this PEM file is secured by a passphrase that Backblaze knows, so your data is essentially only secured by your email address and password and you can recover your password by email. This is pretty light security (if somebody has access to your email they can recover your password), so it's best for backups of stuff you wouldn't mind too much if somebody got ahold of it, like say pictures of your cat. Don't laugh, I backup my public website on Backblaze servers, there is valuable data in the world that does not need encryption, that would be info you don't want to lose but is ALSO publicly readable.
So if you are concerned at all about security, you can set your own personal "passphrase" on that PEM file that Backblaze absolutely never writes to disk - we don't store it. But if you do this you MUST remember that passphrase or your data is GONE. Without that passphrase, nobody will ever retrieve your data, not you, not the US government, not the NSA, NOBODY. You cannot "recover" that passphrase, and we don't know it. This is a good mode of security if you would be arrested on the spot for the contents of your files if the NSA got ahold of your data, because we really don't think it is breakable.
We do these drive statistics and observations originally for our own selfish internal reasons - this is information that is important for running our business. When we then release this kind of information, the info release is largely because it helps people hear about our company (and also maybe a little of "good for humanity" motivation thrown in there, we're Slashdot kind of people, we work in technology in Silicon Valley). But let me be clear: the information is as accurate as we can possibly make it, and we aren't pulling any punches and we aren't "in bed with" any drive manufacturers. I see this as a WIN-WIN. You get accurate and free information, and a few people hear our company name and look into what we do and maybe we gain a few customers. These posts are often written by the engineers working on the system and are trying to be as straight-forward and non-marketing as we can be.
Are you sure? An average iPhone JPEG is only 2 MBytes or so, right? That means your wife is taking 50,000 photos a month? That's 2 photos per minute every minute she is awake, if I did the math correctly.
> They've repeatedly published their research openly... just in case anyone cares.
"Research" sounds too official, more like "observations in our environment", but THANK YOU for the kind words. What baffles me is why nobody else publishes these sorts of drive statistics. Why is Amazon silent? Why doesn't Google name drive names and failure rates? And if the answer is: "Google gets a great price on drives in exchange for their silence" then why hasn't Backblaze been offered a deal to keep quiet yet?! I'm serious, how big do you have to get before you get the better prices on drives? We essentially pay "retail".
Or maybe Uber is that much better than the old days (10 years ago Taxi situation)? Seriously, when I hear person after person rave about how a service or restaurant is good or convenient, I give it a try. So I tried Uber, and it was wonderful. Now I've had better and worse Uber rides, I'm no Uber shill. But over all it simply is better than Taxi service was 10 years ago, it solves ALL my main complaints.
Now I've heard the Taxi services admit they had dropped the ball and they are addressing their issues, I even heard they have Smartphone apps now. Well to some extent: screw them! I'm loyal to Uber now. Taxi's made their bed, they can lie in it and die as far as I'm concerned. As long as every time I call an Uber it shows up on my smartphone and I can watch it approach me - I'm ordering Uber. Now, if Uber service starts sucking as bad as Taxis did then I'll evaluate my choices again at that point.