Mr Z - Slashdot User

Comment Re:not in the field, eh? (Score 1) 634

by Mr Z on Saturday May 10, 2014 @02:01PM (#46967939) Attached to: Why Scientists Are Still Using FORTRAN in 2014

Comment Re:not in the field, eh? (Score 1) 634

by Mr Z on Saturday May 10, 2014 @04:59AM (#46965407) Attached to: Why Scientists Are Still Using FORTRAN in 2014

Why Scientists Are Still Using FORTRAN in 2014 634

Posted by timothy on Friday May 09, 2014 @08:44PM from the why-change dept.

Comment Re:Terabyte flash drives are 10% overprovisioned (Score 1) 264

by Mr Z on Saturday May 03, 2014 @06:38PM (#46910205) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Oh goody (Score 1) 264

by Mr Z on Saturday May 03, 2014 @06:32PM (#46910169) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

I'm not challenging the 30 day number, to be sure.

It's not entirely true that write amplification won't appear to speed up the rate at which an SSD erases sectors. SSDs generally have multiple independent flash banks, and each can process an erasure independent of the others. To maximize your erasure rate, you need a pattern of writes that triggers erasures across all banks as often as possible. Each bank will split its time spent receiving data to write, committing write data to flash cells, and erasing flash cells. (My assumption is that a given bank can only be doing one of these operations at a time, which was certainly true for the flash devices I programmed.)

Consider a host sending a stream of writes as fast as it can send it. The writes will land on the drive as fast as the SSD controller can process them and direct them to flash cells. If there are any bottlenecks in that path, such as generating ECC codes and allocating physical blocks in the FTL, it will slow down the part of the duty cycle devoted receiving and committing write data.

A "friendly" write stream would minimize the number of GC cycles the SSD performs, and thus the amount of write amplification that occurs. Thus, the total number of writes to the SSD media is at most slightly larger than what the PC sends, and the "receive-write" portion of the "receive-write-erase" cycle gets lengthened by whatever bottlenecks might be in the PC-controller-flash path. A "hostile" write stream triggers a larger number of GC cycles to migrate sectors. It seems reasonable to me that an on-board chip-to-chip block migration might be quite a bit faster than receiving data from the PC. For one thing, you don't necessarily need to recompute ECC. The block transfer itself could be handled by a dedicated DMA-like controller transferring between independent banks in parallel with other activity. So, generating more write data locally to the SSD could reduce the time spent in the receive-write portion of the receive-write-erase cycle, so you can spend a greater percentage of your time erasing as opposed to receiving or writing.

It seems a little counter-intuitive, but it's in some ways similar to getting a super-linear speedup on an SMP system, which is indeed possible with the right workload. How? By keeping more of the traffic local.

The main effect of write amplification, though, is on the SSD wear specs themselves, as I said. They're stated in terms of days/months/years of writes at a particular average write rate. So really, when you multiply that out, they're specified in terms of total writes from the PC. There's at least one flash endurance experiment out there showing that drives often massively exceed their rated maximum total writes by very large factors. One reason for that, I suspect, is that they aren't sending challenging enough write patterns to the drive to trigger worst case (in terms of bytes written, not wall-clock time) failure rates.

Comment Re:Terabyte flash drives are 10% overprovisioned (Score 1) 264

by Mr Z on Saturday May 03, 2014 @12:48PM (#46908487) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Terabyte flash drives are 10% overprovisioned (Score 1) 264

by Mr Z on Saturday May 03, 2014 @12:21PM (#46908341) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

A 4TB drive would bother with compression why exactly? It won't improve any benchmarks. Someone like me trying to break the media would just write uncompressible garbage anyway.

A large overprovisioning pool does help stay in the dynamic wear leveling paradigm longer. If the drive performs any amount of static wear leveling, though, where it can get the rest of the sectors into the fun and there's a limited aging ratio / aging difference between the most-erased sector and least-erased sector, then the size of the overprovisioning space doesn't matter too much this attack—rather, the total media size matters.

The point of this attack is to limit the ability of the SSD to separate disk sectors into hot and cold effectively, so you can force the maximum number of erasures through garbage collection cycles. As long as the overprovisioned space is less than the ratio of exposed sector size (512 or 4K bytes) to erase sector size (likely something huge like 512K bytes), you can force a lot of erasures just for GC compaction. The drive will still spread these erasures out as much as possible. With the blended dynamic/static wear leveling model, you'll end up aging all of the sectors roughly evenly.

Once you get one sector to fail and get taken out of service, you're close to making as many more fail as you like, since the SSD did everything it could to age sectors evenly. The additional delta work you need to get it to fail completely isn't very big.

In case my GC point wasn't clear: Consider a simple SSD with a capacity of 32 "blocks", of which it advertises a capacity of 24 to the user, leaving 8 for overprovisioning. Suppose that filesystem blocks get grouped into erase blocks with a 4:1 ratio. Now suppose I've filled that disk up to capacity with a linear series of writes. You might represent it like so, with letters representing FS blocks with data, dashes representing empty clean FS blocks that the FTL can write to, and the groups of 4 separated by dots representing the erase blocks:

abcd.efgh.ijkl.mnop.qrst.uvwx.----.----

Now suppose I write to A, E, I, M. I need to migrate these FS blocks to new locations to absorb the writes. Their old locations are freed, but remain dirty (cannot be rewritten). After these 4 writes, my flash might look like this: (The '#' indicate free-but-dirty FS blocks.)

#bcd.#fgh.#jkl.#nop.qrst.uvwx.aeim.----

Before this SSD can execute my next rewrite, the flash needs to perform a GC cycle to ensure it always has a place to migrate data for GC. So, before processing another rewrite, it performs a GC cycle, migrating blocks until it has at least clean two erase sectors. For the sake of argument, let's assume it employs a simple round robin scheme to spread the GC over the media evenly:

##cd.#fgh.#jkl.#nop.qrst.uvwx.aeim.b--- Migrate 'b'.
###d.#fgh.#jkl.#nop.qrst.uvwx.aeim.bc-- Migrate 'c'.
####.#fgh.#jkl.#nop.qrst.uvwx.aeim.bcd- Migrate 'd'.
----.#fgh.#jkl.#nop.qrst.uvwx.aeim.bcd- Erase first sector.
gh--.####.#jkl.#nop.qrst.uvwx.aeim.bcdf Migrate 'f', 'g', and 'h'.
gh--.----.#jkl.#nop.qrst.uvwx.aeim.bcdf Erase second sector.
ghjk.l---.####.#nop.qrst.uvwx.aeim.bcdf Migrate 'j', 'k', 'l'.
ghjk.l---.----.#nop.qrst.uvwx.aeim.bcdf Erase third sector.
ghjk.lnop.----.####.qrst.uvwx.aeim.bcdf Migrate 'n', 'o', 'p'.
ghjk.lnop.----.----.qrst.uvwx.aeim.bcdf Erase fourth sector.

Now I continue with my dickish attack, and rewrite Q, U, A, and B:

#hjk.#nop.quab.----.#rst.#vwx.#eim.#cdf

Oh, hey, that'll trigger another GC cycle to free up a sector:

#hjk.#nop.quab.rst-.####.#vwx.#eim.#cdf Migrate 'r', 's', 't'
#hjk.#nop.quab.rst-.----.#vwx.#eim.#cdf Erase fifth sector
#hjk.#nop.quab.rstv.wx--.####.#eim.#cdf Migrate 'v', 'w', 'x'
#hjk.#nop.quab.rstv.wx--.----.#eim.#cdf Erase sixth sector
#hjk.#nop.quab.rstv.wxei.m---.####.#cdf Migrate 'e', 'i', 'm'
#hjk.#nop.quab.rstv.wxei.m---.----.#cdf Erase seventh sector
#hjk.#nop.quab.rstv.wxei.mcdf.----.#### Migrate 'c', 'd', 'f'
#hjk.#nop.quab.rstv.wxei.mcdf.----.---- Erase eighth sector

Keep up the pattern, writing to the first FS block in each erase block, and you'll maximize the number of erasures due to GC. Sure, all those erasures take time, but you can bring your time to failure down to a function of the total number of erasures the media can tolerate before it fails in a minimal number of writes. If the media has any parallelism between flash modules (so that it can execute multiple erasures in parallel), you may even be able to get that working in your favor, assuming your explicit goal is to make the drive fail as quickly as possible.

Comment Re:Oh goody (Score 1) 264

by Mr Z on Saturday May 03, 2014 @11:29AM (#46908063) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Oh goody (Score 1) 264

by Mr Z on Saturday May 03, 2014 @11:19AM (#46908015) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Oh goody (Score 1) 264

by Mr Z on Saturday May 03, 2014 @11:13AM (#46907975) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Oh goody (Score 1) 264

by Mr Z on Saturday May 03, 2014 @05:49AM (#46906747) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

Comment Re:Oh goody (Score 3, Informative) 264

by Mr Z on Saturday May 03, 2014 @05:45AM (#46906733) Attached to: SanDisk Announces 4TB SSD, Plans For 8TB Next Year

If you know something about the drive's sector migration policies, in theory you could construct a worst-case amplification attack against a given drive. Leverage that against the drive's wear leveling policies. But, that seems rather unlikely.

Flash pages retain their data until they're erased. You can write at the byte level, but you must erase at the full page level. You can't rewrite a byte until you erase the page that contains it. That's the heart of the attack: Rewriting sectors with new data. You can't rewrite a sector in-place. You mark the old location as "dirty but free", and write the new data to a new location. The SSD can't reclaim the dirty-but-free sectors for writing until they're erased.

Thus, the basic idea goes something like this: Fill the disk to 99.9% full. Then, selectively rewrite individual sectors, forcing the sector to migrate to a new flash page. Wash, rinse, repeat until the drive fails.

If the drive only performs dynamic wear leveling, all subsequent rewrites will erase and reuse only among the free space. (Note: This free space includes all of the space the drive reserves to itself for dynamic wear leveling purposes.) Now all you need to do is reach the erase/rewrite limit among the available dynamic wear leveling pool, which is significantly smaller than the full drive capacity. You can achieve this by rewriting a small subset of sectors until the disk falls over.

Modern drives perform a blend of dynamic and static wear leveling. Dynamic wear leveling only erases/rewrites among the "free" space. Static wear leveling gets otherwise untouched sectors into the fray by wear leveling over all sectors. This blended approach defers static wear leveling until it becomes absolutely necessary. The flash translation layer (FTL) detects when the wear difference between sectors gets too imbalanced, and migrates static sectors into the worn regions and wear-levels over the previously "static" sectors.

A successful attack would take this into account and attempt to keep track of which sectors would be marked "static" vs. "dynamic". It would also predict how the static sectors were grouped together into pages, so it could cherry-pick and inflict the maximum damage: All it needs to do is write to a single sector in each static flash page (creating a bunch of unallocated "dirty-but-free" holes), continuing until the SSD was forced into a garbage collection cycle. That GC cycle then would have to touch all the static pages (or at least a significant fraction) to compact the holes away and make space available for future writes.

If you can keep that up, you can magnify your writes by the ratio between the page size and the sector size. If you have 512 byte sectors and 512K bytes pages, the amplification factor is 1024.

But, as I suggested above, to achieve this directly, you need to have some idea of how the SSD marks things static vs. dynamic. Without such knowledge, you have to approximate.

I imagine if you really wanted to kill an SSD without any knowledge of its algorithms, you could do something simple like rewrite every allocated sector in an arbitrary order, shuffling the order each time. SSD algorithms assume a distribution of "hotness" (ie. some sectors are "hot" and will be rewritten regularly, and most are "cold" and will be rewritten rarely if ever), and so rewriting all sectors in a random order will cause rather persistent fragmentation, recurring GC cycles, and pretty noticeable amplification.

You wouldn't get to the 40 day mark, but if you started with a mostly full SSD, you might get to a few months.

That's my back-of-the-napkin, "I wrote an FTL once and had to reason through all this" estimate.

Comment Re:STL is painful to use (Score 1) 435

by Mr Z on Thursday May 01, 2014 @09:29PM (#46896011) Attached to: C++ and the STL 12 Years Later: What Do You Think Now?

Comment Re:It's great, but we try not to use it. (Score 1) 435

by Mr Z on Thursday May 01, 2014 @02:26AM (#46887063) Attached to: C++ and the STL 12 Years Later: What Do You Think Now?

I actually use C++ for embedded programming, because when used with care, it can actually do a better job than C for a number of things. I use template meta-programming to compute various things at compile time, such as, say, register initialization values and what not. Sure, I can do the same with #define and a boat load of macros, but that has its own issues. Not only are macros messy in their own way, they don't provide a good way to sanity check your settings. With templates and types done right, I can actually get the compiler to sanity check my settings at compile time. I don't know how many times I've chased down a bug due to swapped macro parameters that could have been caught compile-time with some type checking / trait checking.

I've written an entire C++ based support library just for this purpose. One of its goals is extreme compactness and cycle efficiency, since the code often needs to run in RTL simulation. Software RTL simulation of a large SoC runs in the 10s to 1000s of cycles per second, so cycle efficiency is at an extreme premium.

What my library largely replaces is other C and assembly code that (often hamfistedly) computes everything at run time, and so my code can handily beat that.

I haven't quite hit the nirvana of generating an entire MMU page tree from a compact memory map description using templates (I have a perl script for that), but it sure beats 100,000s cycles or more computing it at run time when that translates to hours of sim time. (Fun fact: Some rather popular modern processors run really slow until you turn the MMU on, because they can't cache any data until you do.)

I have however written dynamic code generators that use templates and function overloading to resolve as much of the opcode encoding as possible at compile time, so that the run-time portion usually is just a "store constant" or maybe a quick field insert into a constant followed by a store. Those can pump opcodes to memory as fast as an opcode per cycle (and in some special cases, faster), which is pretty darn good. Again, all typechecked as much as possible at compile time, to minimize or eliminate the possibility I generate invalid instructions.

Comment Re:STL is painful to use (Score 1) 435

by Mr Z on Thursday May 01, 2014 @02:10AM (#46887013) Attached to: C++ and the STL 12 Years Later: What Do You Think Now?

Slashdot Top Deals