Forgot your password?

typodupeerror

Comment: Re:Dedup is just a marketing word.... (Score 3, Informative) 306

by m.dillon (#38590670) Attached to: Ask Slashdot: Free/Open Deduplication Software?

For our production systems it depends 100% on the actual amount of duplicated data, since bulk data reads are needed to verify the duplication. The number of passes is almost irrelevant because they primarily scan meta-data N times, not bulk data (duplicated bulk data only has to be verified once).

The meta-data can be scanned much more quickly than the verification of duplicated bulk data because the meta-data is laid out on the physical disk fairly optimally for the B-Tree scan the de-dup code issues. So meta-data can be read from the hard disk at 40 MBytes/sec even without the use of a SSD to cache it. Of course, with DFly's swapcache and the meta-data cached on the SSD that scan runs at 200-300 MBytes/sec.

But in contrast, the bulk reads used to validate the duplicate data just aren't going to be laid out linearly on the disk. There's a lot of skipping around... so the more actual duplicate data we have the larger the percentage of the disk's surface we have to read to verify it.

This is an area which I could further optimize in HAMMER's dedup code. Currently I do not sort the bulk data block numbers when running the data verification pass. Not only that but I am scanning a sorted CRC list, so the bulk data offsets are going to be seriously unsorted. Doing so would definitely improve performance, probably quite a bit, but still not be anywhere near the 40 MBytes/sec the meta-data scan can achieve off the platter. It would not be a whole lot of programming, probably a day to do that. Currently isn't at the top of my list though.

What this means, in summary (and even with semi-sorting of the bulk data blocks), is that one can use a bounded amount of ram without really effecting the efficiency of the off-line de-duplication.

-Matt

Comment: Re:Dedup is just a marketing word.... (Score 4, Informative) 306

by m.dillon (#38590374) Attached to: Ask Slashdot: Free/Open Deduplication Software?

Well, I can tell you why the option is there... it's not because of collisions, it's there to handle the case where there is a huge amount of actual duplication where the blocks would verify as perfect matches. In this case the de-duplication pass winds up having to read a lot of bulk-data to validate that the matches are, in fact, perfect, which can take a lot of time verses only having to read the meta-data.

Just on principle I think it's a bad idea to just trust a checksum, cryptographic hash, CRC, or whatever. Corruption is always an issue... even if the filesystem code itself is perfect and even if the disk subsystem is perfect there is so much code running in a single address space (i.e. the KERNEL itself) that it is possible to corrupt a filesystem just from hitting unrelated bugs in the kernel.

Not to mention radiation flipping a bit somewhere in the cpu or memory (even for ECC memory it is possible to get corruption, but the more likely case is in the billions of transistors making up a modern cpu, even with parity on the L1/L2/L3 caches).

Hell, I don't even trust IP's stupid simple 1's complement checksum in HAMMER's mirroring protocols. Once during my BEST Internet days we had a T3 which bugged out certain bit patterns in a way that actually got past the IP checksum... we only tracked it down because SSH caught it in its stream and screamed bloody murder.

If you de-duplicate trusting the meta-data hash, even a big one, what you can end up doing is turning 9 good and 1 corrupted copies of a file into 10 de-duped corrupted copies of the file.

I'm sure there are many data stores that just won't care if that happens every once in a while. Google's crawlers probably wouldn't care at all, so there is definitely a use for unverified checks like this. I don't plan on using a cryptographic hash as large as the one ZFS uses any time soon but being able to optimally de-dup with 99.9999999999% accuracy it's a reasonable argument to have one that big.

-Matt

Comment: Re:Dedup is just a marketing word.... (Score 3, Informative) 306

by m.dillon (#38590120) Attached to: Ask Slashdot: Free/Open Deduplication Software?

Yes, this is correct.

For on-line de-duplication the most optimal case in my view is to only de-dup data which may already be present in the buffer cache from prior recent operations, so the on-line dedup only maintains a small in-kernel-memory table of recent CRCs. This catches common operations such as file and directory tree copying fairly nicely.

The off-line dedup catches everything using a fixed amount of memory and multiple passes (if necessary) on the meta-data, then bulk data reads only for those blocks which appear to be duplicates to verify that they are exact copies.

I've run dedup on a 2TB backup from a VM with as little as 192MB of ram and it works. A more preferable setup would be to have a bit more memory, like a gigabyte, but more importantly to have a SSD large enough to cache the filesystem meta-data. A 40G SSD is usually enough for a 2TB filesystem. That makes the off-line dedup quite optimal and also makes other maintainance and administrative operations on the large filesystem, such as du, find, ls -lR, cpdup, even a smart diff... let alone rsync or other things one might want to run... it makes all of that go screaming fast without having to waste money buying a bigger system or waste money on excessive energy use.

-Matt

Comment: Re:Dedup is just a marketing word.... (Score 1) 306

by m.dillon (#38589822) Attached to: Ask Slashdot: Free/Open Deduplication Software?

Another side note on DragonFly's HAMMER: de-duplication is implemented both as a daily pass AND can also be enabled for live writes. The daily pass can find all duplicate blocks. The live dedup uses a small fixed in-kernel-memory LRU style record of recent data block CRCs to find de-duplication candidates during live writes. Performance impact is minimal either way as recently recorded CRCs also tend to still have their data in the buffer cache.

The live-dedup mostly exists to get some up-front deduplication when someone, say, does a 'cp' or 'cp -r' or something like that. The real catch-all is the daily pass.

One interesting side effect of having de-duplicated backups is that we don't have to make a huge effort to avoid duplicate data in developer shell accounts. Developers have tons of git repos and fully checked out source trees all over the place and it doesn't bloat our backups all that much. This makes developers lives easier too as they just don't have to worry about having lots of copies of things laying around.

Plus we are also backing up multiple machine's filesystems to the same backup filesystem and there's a lot of duplication on each machine which gets de-duplicated since the backups are all going to one target filesystem. It's a great feature just for that. I'm getting something like a 3.5:1 de-duplication ratio on our current aggregated backups. 4-5 TB of data winds up collapsing to around ~700G or so on the backup system, without compression.

-Matt

Comment: Re:Dedup is just a marketing word.... (Score 5, Informative) 306

by m.dillon (#38589414) Attached to: Ask Slashdot: Free/Open Deduplication Software?

All dedup operations have a trade-off between disk I/O and memory use. The less memory you use the more disk I/O you have to do, and vise-versa.

Think of it like this: You have to scan every block on the disk at least once (or at least scan all the meta-data at least once if the CRC/SHA/whatever is already recorded in meta-data). You generate (say) a 32 bit CRC for each block. You then [re]read the blocks whos CRCs match to determine if the CRC found a matching block or simply had a collision.

The memory requirement for an all-in-one pass like this is that you have to record each block's CRC plus other information... essentially unbounded from the point of view of filesystem design and so not desirable.

To reduce memory use you can reduce the scan space... on your first pass of the disk only record CRCs in the 0x0-0x7FFFFFFF range, and ignore 0x80000000-0xFFFFFFFF. In other words, now you are using HALF the memory but you have to do TWO passes on the disk drive to find all possible matches.

The method DragonFly's HAMMER uses is to allocate a fixed-sized memory buffer and start recording all CRCs as it scans the meta-data. When the memory buffer becomes full DragonFly dynamically deletes the highest-recorded CRC (and no longer records CRCs >= to that value) to make room. Once the pass is over another pass is started beginning with the remaining range. As many passes are taken as required to exhaust the CRC space.

Because HAMMER stores a data CRC in meta-data the de-dup passes are mostly limited to just meta-data I/O, plus data reads only for those CRCs which collide, so it is fairly optimal.

This can be done with any sized CRC but what you cannot do is avoid the verification pass.. no matter how big your CRC is or your SHA-256 or whatever, you still have to physically verify that the duplicate blocks are, in fact, exactl duplicates, before you de-dup their block references. A larger CRC is preferable to reduce collisions but diminishing returns build up fairly quickly relative to the actual amount of data that can be de-duplicated. 64 bits is a reasonable trade-off, but even 32 bits works relatively well.

In anycase, most deduplication algorithms are going to do something similar unless they were really stupidly written to require unbounded memory use.

-Matt

Comment: You guys wouldn't make good investors (Score 2) 435

by m.dillon (#38518148) Attached to: Prospects Darken For Solar Energy Companies

Because, really, virtually nobody here actually understands what is going on. So I will throw some light on the matter.

There are two big problems with solar energy in the U.S. right now.

The #1 problem is that the U.S. is in the middle of a natural gas bonanza that started about a decade ago but really only started ramping up around 2007ish... and kept ramping up right on through the crash and is still ramping up today, with no end in sight. This has caused domestic NG production to go through the roof and domestic NG prices to fall through the floor.

Power companies switching from coal are going right to natural gas, which has 30% (or better) lower emissions and low prices for at least the next decade. It will take that long for our LNG export infrastructure to ramp up enough for world markets to relieve the downward pressure on domestic natural gas prices. In anycase, just switching to NG allows power companies to meet EPA requirements for probably the next decade (or longer), and that's without any magic technology to make it even cleaner.

Natural gas used to only be used for peaking plants, except in California where they started to also be used for base load earlier than other parts of the country. Now natural gas is being used for base load across the board (along with nuclear). Nearly all of those coal plants undergoing decommissioning are being replaced with natural gas plants, not solar or anything else.

The #2 problem with solar energy is China's overproduction of solar panels. China has no problem destroying their environment with the chemical leavenings from the production of solar panels. Chinese companies are facing the same problems that U.S. companies are facing with prices plunging, but they have a much lower cost of production. The result is that non-chinese companies basically can't compete (under ANY circumstances).. because we aren't willing to destroy our environment like the Chinese are.

This has created a massive drop in price for the bulk panels that large businesses (like Google for example) can purchase. Consumers cannot get at these prices, we just don't buy enough panels, but even so prices are going down for us too.

This, in turn, has created a huge problem for solar thermal power companies, because the price of bulk panels has dropped below or near par to the cost of constructing a solar thermal (mirror based) energy plant. This makes the ongoing cost of a solar thermal plant, which requires significant maintainence and has parts with limited life spans due to thermal cycling, higher than the near zero running cost of an installed conventional solar panel closer to the buyer (typically on the roof of the business premises).

Numerous other factors are also creating issues. Germany was the single biggest purchaser of solar panels for the last decade due to massive government subsidies. Those subsidies are now winding down, which only makes the market glut worse.

And that's the problem in a nutshell. Even the most environmentally minded person has to realize by now that it is impossible to go from coal to solar in one step. Natural gas is the only thing we have in-country that is even remotely capable of replacing coal at the generation levels required. I will applaud government investment in solar energy but anyone who thinks that solar can actually replace fossil fuel is fooling themselves.

-Matt

Comment: Re:Cringely again... (Score 5, Interesting) 124

by m.dillon (#38247416) Attached to: Bufferbloat: Dark Buffers In the Internet

Well, you definitely CAN tell when one or more buffers along the path begins to fill up, because latency increases. Packet loss is not necessary and, in fact, packet loss just makes the problem worse since many TCP connections implement SACK now and can keep the bandwidth saturated even in the face of packet loss.

The ideal behavior is probably not to start dropping packets immediately... eventually, sure, but definitely not immediately. Ideally what you want to do is to attempt to shift the problem closer to the edges of the network where it is easier to fairly apportion bandwidth between customers.

Send-side bandwidth limiting is very easy to implement since TCP already has a facility to collect latency information in the returned acks. I wrote a little beastie to do that in FreeBSD many years ago, and I turn it on in DragonFly releases by default.

The purpose of the feature is not to completely remove packet buffering from the network, because doing so would put the sending server at a severe disadvantage verses other servers that do not implement similar algorithms (which is most of them).

The purpose is to unload the buffers enough such that the algorithms in the edge routers aren't overloaded by the data and can do a better job apportioning bandwidth between streams.

Our little network runs this coupled with fair queueing in both directions... that is, we not only control the outgoing bandwidth, we also pipe all the incoming bandwidth through a well connected colo and control that too, before it runs over the terminal broadband links. This allows us to run FAIRQ in both direction in addition to reserving bandwidth for TCP acks and breaking down other services. FAIRQ always works much better when links are only modestly overloaded and not completely overloaded. Frankly we don't have much of a choice, we HAVE to do this because our last-leg broadband links are 100% saturated in both directions 24x7. Anything short of that and even a single video stream screws up the latency for other connections beyond hope.

This sort of solution works great near the edges.

For the center of the network, frankly, I think about the best that can be done is modest buffering and RED and then trying to reduce the load on the buffers in the center with algorithms run on the edges (that can sense end-to-end latency). The modest buffering is needed for the edge algorithms to be able to operate without bits of the network having to resort to dropping packets. In otherwords, you want the steady state load for the network to not have to drop packets. Dropping packets should be reserved for the case where the load changes too quickly for the nominal algorithms to react. That's my opinion anyhow.

-Matt

Comment: Re:Somebody help me out, seriosuly (Score 1) 521

by m.dillon (#37547748) Attached to: Amazon Kindle Fire Surfaces

You are absolutely right. Plus the iTouch is only $200 brand spanking new for the 8GB generation 4 model. Front and rear-facing cameras, microphone, accelerometer and Gyro. The iTouch doesn't have a GPS or compass (awwww) or 3G, so it is feature-comparable. The display has just about as many pixels but of course is considerably smaller. But yah, it does fit in the pocket and for anything larger that doesn't I'd rather have an iPad's bigger display anyhow.

To be fair, trying to read a book with an iPhone or iTouch would not be fun, but neither do I consider a 7" active display fun for reading. The only reason I have a Kindle at all is for the liquid paper display for outside reading. I'm sure many people will love the color Kindle for reading but that's not a reason for buying it vs a liquid paper kindle.

Amazon's fire is already right smack in the middle of a squeeze between Apple's small and Apple's large.

p.s. I'm not against Amazon per-say, but as an investor I just don't find their tablet compelling.

-Matt

Comment: Re:Doesn't even compete with the iPad 1. (Score 1) 521

by m.dillon (#37547636) Attached to: Amazon Kindle Fire Surfaces

Compare it against an 16G iPad 1 on Amazon then (I don't know of any 8G iPads). Around $350, wifi-only (about the same as on E-Bay). Around $400 for Wifi+3G.

That's $150 more for a MUCH nicer screen, double the storage, and a microphone (and thus Skype too). $200 more for 3G on top of that.

Here's the problem: Consumers who buy these things are already paying between $150-$250/mo for their phone+internet+TV services. So you're talking, literally, a 1-2 months difference in price against a readily available iPad 1 in terms of consumer cash flow.

Worse, we are talking about NO difference in price against a brand new iTouch (iPod touch) with its smaller screen but also with a microphone AND front-facing camera (meaning video calls can be done with it).

The reason Apple is making money hand over fist with their products is that they sell them in the right form factor at the right price point.

Now is Amazon selling their pad at the right price point? I personally don't think they are. Why would I want to get an Amazon pad verses a brand new 4th generation iPod touch? Let alone a used iPad 1. All the Amazon pad has going for it is a screen that's a little bigger but still not big enough. I don't think so.

-Matt

Comment: Re:DOA? (Score 1) 521

by m.dillon (#37547398) Attached to: Amazon Kindle Fire Surfaces

Apple has something like 85% of the pad market. Most of the android tablet vendors have volumes in the 500,000 - 800,000 range (each) whereas Apple's volumes are in the ten's of millions.

The market is something like 12 million pads this year and expected to be something like 40 million pads next year, if I remember correctly. It's a very big pie.

Both the Nook and Amazon appear to have moved away from liquid paper for their color displays. I don't have a nook but I gotta say that the Amazon B&W Kindle w/liquid paper is really wonderful when reading outside in bright sunlight, and just fine everywhere else. It's the ONLY reason why I have a kindle. I bring it along on vacations and don't even bother to bring a charging cord.

If these color devices don't have wonderful visibility in bright sunlight then I have no interest in buying them relative to buying an iPad. It's the only reason why I read books on the kindle in the first place!

Amazon might want their new gadget to compete in the Nook space, and I'm sure it will do well in that space, but if it is large enough to be considered a pad and yet can't compete in the Pad space, and its display isn't as readable as their B&W liquid paper display, then their volume improvements will only be incremental at best.

Customers often view products very differently than companies would like them to, and I think that is going to be the case for the Amazon pad.

-Matt

"Given the choice between accomplishing something and just lying around, I'd rather lie around. No contest." -- Eric Clapton

Working...