Can someone please explain... how you spend 3-4B
Is that the cost of the Russian Taxi service plus the SpaceX vacuum-friendly FedEx truck deliveries?
Can someone please explain... how you spend 3-4B
Is that the cost of the Russian Taxi service plus the SpaceX vacuum-friendly FedEx truck deliveries?
Insulator breakdowns on circuit boards happen less often these days but they are still prevalent in Electrolytic caps and anything with windings (transformers, inductors, DC motors, etc), though it can take 20-50 years to happen and depends on conditions. And the failure mode depends too.
Generally speaking, any component with an insulator which is getting beat up is subject to the issue.
Circuit boards got a lot better as vendors switched to solid state caps. Electrolytics tend to dry out and little arc-throughs punch holes in the insulator over time (running them at less than half their rated voltage goes a long ways to lengthening their lives, which is why you usually see voltage ratings much higher than the voltages that are actually run through them).
The insulating coatings in wires used for windings has gotten better. Typically shorts develop over time and change the value of the inductance (or voltage ratio for a transformer), and other parameters until it gets to the point where it is so out of spec it stops doing its function properly. DC motors will get weaker, etc etc.
Just so happens I have an Intel 750 in the pile, here's the issue that the linux NVMe code had to work around:
nvme3: mem 0xc7310000-0xc7313fff irq 40 at device 0.0 on pci4
nvme3: mapped 32 MSIX IRQs
nvme3: NVME Version 1.0 maxqe=4096 caps=0000002028010fff
nvme3: Model INTEL_SSDPEDMW400G4 BaseSerial CVCQ535100LC400AGN nscount=1
nvme3: Request 64/32 queues, Returns 31/31 queues, rw-sep map (31, 31)
nvme3: Interrupt Coalesce: 100uS / 4 qentries
nvme3: Disk nvme3 ns=1 blksize=512 lbacnt=781422768 cap=372GB serno=CVCQ535100LC400AGN-1
If I run a randread test on uncompressed data using block sizes 512... 131072, you can see the glitch that occurs at 65536 bytes. I will use a deep queue (128 threads, around QD4 per HW queue but considered to be QD128 globally), so this is the absolute limit of the device's performance. Look at what happens when the block size transitions from 32768 to 65536. That's the firmware screwup that the Linux folks worked around. No other NVME vendor has this issue:
487912/s avg=262.34uS bw=249.81MB/s lo=60.69uS, hi=2693.38uS stddev=101.09uS
488698/s avg=261.92uS bw=250.14MB/s lo=44.12uS, hi=2693.58uS stddev=101.79uS
489023/s avg=261.75uS bw=250.37MB/s lo=54.44uS, hi=2629.42uS stddev=98.31uS
485963/s avg=263.39uS bw=497.62MB/s lo=45.28uS, hi=2593.95uS stddev=91.90uS
486353/s avg=263.18uS bw=497.98MB/s lo=60.05uS, hi=1268.07uS stddev=89.05uS
486312/s avg=263.21uS bw=497.97MB/s lo=62.99uS, hi=1131.01uS stddev=89.04uS
459915/s avg=278.31uS bw=941.89MB/s lo=61.83uS, hi=1244.34uS stddev=95.07uS
459681/s avg=278.45uS bw=941.33MB/s lo=68.47uS, hi=2890.23uS stddev=99.12uS
458907/s avg=278.92uS bw=939.81MB/s lo=67.12uS, hi=2838.20uS stddev=110.08uS
442539/s avg=289.24uS bw=1812.62MB/s lo=75.33uS, hi=2985.67uS stddev=154.67uS
444166/s avg=288.18uS bw=1819.13MB/s lo=76.80uS, hi=2618.38uS stddev=145.94uS
443966/s avg=288.31uS bw=1818.44MB/s lo=73.81uS, hi=2854.27uS stddev=146.88uS
248658/s avg=514.76uS bw=2036.99MB/s lo=81.98uS, hi=3809.30uS stddev=321.11uS
249693/s avg=512.63uS bw=2045.32MB/s lo=84.38uS, hi=3278.75uS stddev=317.38uS
247367/s avg=517.45uS bw=2026.38MB/s lo=86.12uS, hi=3032.98uS stddev=323.87uS
124276/s avg=1029.97uS bw=2036.11MB/s lo=115.63uS, hi=3886.27uS stddev=558.13uS
124526/s avg=1027.90uS bw=2040.07MB/s lo=118.72uS, hi=3894.09uS stddev=574.04uS
125651/s avg=1018.69uS bw=2058.63MB/s lo=109.03uS, hi=3843.91uS stddev=550.71uS
62540/s avg=2046.68uS bw=2049.30MB/s lo=137.03uS, hi=6263.58uS stddev=1148.11uS
63146/s avg=2027.05uS bw=2068.84MB/s lo=157.29uS, hi=5875.07uS stddev=1134.63uS
62563/s avg=2045.95uS bw=2050.01MB/s lo=147.76uS, hi=6244.51uS stddev=1285.00uS
4431/s avg=28887.12uS bw=290.39MB/s lo=195.41uS, hi=59137.97uS stddev=-34838.70uS
4598/s avg=27835.93uS bw=301.31MB/s lo=214.55uS, hi=59121.64uS stddev=-33681.49uS
4541/s avg=28186.97uS bw=297.59MB/s lo=257.66uS, hi=61115.02uS stddev=-34015.71uS
1679/s avg=76235.18uS bw=220.07MB/s lo=66136.55uS, hi=94294.59uS stddev=-70453.03uS
1696/s avg=75465.57uS bw=222.28MB/s lo=65954.03uS, hi=96303.18uS stddev=-70093.39uS
1687/s avg=75872.61uS bw=221.11MB/s lo=64842.69uS, hi=92678.51uS stddev=-70288.51uS
See what happened? Everything was going dandy, the device maxes out at around 2 GBytes/sec with a block size of 8192 bytes, which is *very* good, and it stays there through 32768 bytes. But then we use a block size of 65536 bytes and BOOM, the device implodes... bandwidth drops to a mere 300 MBytes/sec. And at 131072 it drops to 220 MBytes/sec.
That's the story behind that quirk. Intel messed up (as in SERIOUSLY messed up) for this and a few other older models. And as I said, no other NVMe vendor has this problem. What's worse is that Intel is still selling this crap without fixing the problem. Its like a file-and-forget... they come out with a device and then they forget to support it or fix bugs.
Now personally speaking, I would never put a quirk like that in MY NVMe driver to fix a vendor screwup of that proportion. If linux had not added that quirk Intel would probably have been shamed into fixing the problem in those models. Who in there right mind screws up DMA block sizes as tiny as 65536 and 131072? Apparently Intel.
I'll bet a lot of you folks didn't know this bug existed. If you look at all the media reviews of the 750 nowhere will you see them using a block size 65536 bytes or larger (or if they are, they probably don't realize that the NVMe driver in the OS has severe hacks in it to make it work).
Here's a 4K QD1 test. The 750 does have some decent qualities. Around 12K IOPS at QD1 (note that all NVMe vendors can typically do 10K-15K IOPS at QD1). More importantly, a low standard deviation.
11971/s avg= 83.53uS bw=49.03 MB/s lo=37.81uS, hi=229.67uS stddev=8.85uS
11932/s avg= 83.81uS bw=48.87 MB/s lo=62.02uS, hi=435.44uS stddev=9.28uS
11977/s avg= 83.49uS bw=49.06 MB/s lo=39.86uS, hi=446.53uS stddev=9.41uS
11935/s avg= 83.79uS bw=48.89 MB/s lo=61.53uS, hi=586.35uS stddev=10.80uS
11922/s avg= 83.88uS bw=48.83 MB/s lo=62.17uS, hi=363.81uS stddev=9.12uS
11934/s avg= 83.79uS bw=48.88 MB/s lo=61.95uS, hi=1422.06uS stddev=15.20uS
That's very good.
This Intel 750 has other problems. The second biggest one is that the more queues one configures, the lower the performance at EVERY test level. Again, no other NVMe vendor does anything that stupid.
The consumer 600P has its own terrible issues, but block size is not one of them. The 600P does fine at 65536 and 131072 bytes.
Now you know the real history here.
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS
And here is QD4 (really QD1 x 4 threads on 4 HW queues):
44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
32989/s avg=1940.03uS bw=1080.98MB/s lo=1396.85uS, hi=3343.49uS stddev=291.76uS
32928/s avg=1943.62uS bw=1078.84MB/s lo=1386.21uS, hi=3462.96uS stddev=297.14uS
33012/s avg=1938.67uS bw=1081.73MB/s lo=1371.41uS, hi=3676.83uS stddev=290.64uS
33217/s avg=1926.70uS bw=1088.44MB/s lo=1385.18uS, hi=3344.11uS stddev=282.63uS
14739/s avg=4342.19uS bw=965.93MB/s lo=3189.96uS, hi=6937.79uS stddev=466.51uS
14813/s avg=4320.60uS bw=970.67MB/s lo=3273.82uS, hi=6327.81uS stddev=442.92uS
14991/s avg=4269.15uS bw=982.43MB/s lo=3205.54uS, hi=6355.74uS stddev=432.94uS
8052/s avg=7948.27uS bw=1055.38MB/s lo=6575.48uS, hi=9744.12uS stddev=496.41uS
8150/s avg=7853.00uS bw=1068.01MB/s lo=6540.51uS, hi=9496.64uS stddev=465.37uS
7986/s avg=8013.88uS bw=1046.72MB/s lo=6446.20uS, hi=9815.01uS stddev=518.95uS
Now the thing to note here is that with deeper queues the latency also goes up. At QD4 the latency is around 200uS, for example.
Where Optane (aka XPoint) 'wins' is on latency. I don't have an Optane device to test yet, but Intel is saying an average latency of 10uS at QD1 over the NVMe interface (over a direct DDR interface it will of course be much faster). That's the 'win'. But its completely irrelevant for the consumer case because the consumer case is for multi-block transfers and filesystems always do read ahead (i.e. at least QD4). A disk cache does not need 10uS latency to be effective.
And who the hell do you think I am mister Anonymous Coward?
So, as I thought, you don't understand either that commit or the commit later on that simplified it (159b67d7).
It's not a stripe-size limitation per say, it's just a limitation on the maximum physical transfer size per I/O request, which for 99.9% of the NVMe devices out in the wild will be >= 131072 bytes and completely irrelevant for all filesystem I/O and even most softRAID I/O.
More to the point, that particular commit does not apply to the 600P at all. It applies to several older Intel datacenter SSDs as well as the 750 series and it exists because Intel really screwed up the firmware on those devices and put crazy stupid low limitations on physical transfer size. Then they carefully designed tests that didn't hit those limitations to sell the devices.
The 750, for example, loses a huge amount of performance with a block size >= 65536 bytes. Intel maybe didn't advertise the mistake, but that is a limitation that doesn't exist in the 600P nor does it exist on ANY OTHER NVME SSD IN EXISTENCE. Only a complete idiot creates a NVMe device which can't handle block transfers of 65536 or 131072 bytes without losing massive performance. Intel = 65536 bytes.
This was a well known bug in these particular models.
In anycase, even for these models, this particular quirk has no effect on block I/O tests for block sizes 65536 bytes. And, as I mentioned already, NO OTHER NVME VENDOR has such absurdly low limits or such massively disconnected performance metrics when you exceed them. And even Intel fixed the issue on the P600.
This just points to the idiocy inside Intel. And it shows your stupidity as well, believing that a little quirk like this somehow effects the entire NVMe space (or even the entire Intel NVMe space), which it doesn't. These sorts of quirks exist for all manner of hardware, not just NVMe, to work around poor, buggy implementations.
And, of course, any Linux or BSD operating system will use all available memory for cache data from storage anyway. I guess Windows needs a little more help to do that.
This certainly shows up in, for example, Chrome startup times. It takes around 4 seconds from a hard drive, uncached, 1 second from a SSD, 1 second from a NVMe drive, and presumably 1 second from any other form of storage because chrome itself needs a bit of cpu time to initialize itself, not to mention the time it takes to load a tab (minimum 0.5 seconds).
So honestly once one transitions from the HDD to a SATA SSD, where the difference is noticeable, any further transitions (SATA SSD -> NAND NVME SSD -> XPOINT NVME SSD -> XPOINT DDRs) are not likely to be noticeable, even without a ram cache.
I think Intel's ENTIRE marketing effort revolves around Windows' slow startup times. Or more to the point, Windows tends to seek the storage device a lot while starting up which is *very* noticeable if you have a hard drive, but most irrelevant if you have any sort of SSD.
Since one can accomplish the same thing simply buy purchasing a small SSD, I just don't see them being able to make a case for it being 'easier' as a disk caching substitute verses someone coming to the realization that their time and data are valuable enough to actually spend a bit more money on buying some native SSD storage in the first place.
The advent of the cloud is also making local mass storage less and less relevant. Here I'm not talking about those of us who insist on having our own local archives (mine is getting close to 4TB now, with another 4TB in two backup locations so... that's 12TB of storage for me). I'm talking about 'normal' people who are using cloud storage more and more often. They won't need Intel's ridiculous 'solution' either (not even mentioning the fact that a normal NAND NVME SSD to cache a HDD is a better fix for the solution they are marketing than their Optane junk).
Motherboard vendors are just now, finally, starting to put M.2 connectors on the motherboard. Blame Intel for the slow rate of adoption. Intel came out with three different formats, all basically incompatible with each other, and created mass confusion.
But now, finally, mobo vendors are settling on a single PCIe-only M.2 format. Thank god. They are finally starting to put one or more M.2 slots and finally starting to put on U.2 connectors for larger NVMe SSDs. Having fewer SATA ports on the mobo is no longer a marketing issue. I've seen many more mobos recently with just 2-4 SATA ports.
It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.
In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.
Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.
The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.
The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.
The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.
In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).
But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.
Maybe you should point me at the commitid you are referring to, then I can address your comment more directly. I can tell you straight out, even without seeing it, that you are probably misinterpreting it.
Intel devices have quirks, but I think you are mixing apples and oranges here. All modern filesystems systems have used larger alignments for ages. The only real issue was that the original *DOS* partition table offset the base of the slice the main filesystem was put on by a weird multiple of 512 bytes which was not even 4K aligned.
This has not been an issue for years. It was fixed long ago on DOS systems and does not exist at all on EFI systems. Regardless of the operating system.
At the same time, all SSDs past the second generation became sophisticated enough that they really stopped caring about alignment for most practical use cases.
Where Intel does mess up depends on the device. In the 600P's case, the firmware is poorly designed in many respects. In other cases, such as with the 750, performance implodes with large block sizes (64KB or higher). This just makes the device less worthy, because frankly NO OTHER SSD VENDOR has these sorts of idiotic problems.
All of that said, insofar as operating systems go, these storage-level devices have no real visibility into, understanding of, or optimizations for one particular filesystem verses another. So for all practical situations, there is NO raw performance difference between Windows, MacOS, Linux, or any of the BSD's for these storage level devices. They are completely OS-agnostic and have always been completely OS-agnostic.
Right. They are trying to market it as something cool and new, which would be great except for the fact that it isn't cool OR new. A person can already use ANY storage device to accelerate any OTHER storage device. There are dozens of 'drive accelerators' on the market and have been for years. So if a person really wanted to, they could trivially use a small NAND flash based NVMe SSD to do the same thing, and get better results because they'll have a lot more flash. A person could even use a normal SATA SSD for the same purpose.
What Intel is not telling people is that NOBODY WILL NOTICE the lower latency of their XPoint product. At (I am assuming for this product) 10uS the Intel XPoint NVMe is roughly 1/6 the latency of a Samsung NVMe device. Nobody is going to notice the difference between 10uS and 60uS. Even most *server* workloads wouldn't care. But I guarantee that people WILL notice the fact that the Intel device is caching much less data than they could be caching for the same money with a NAND-based NVMe SSD or even just a SATA SSD.
In otherwords, Intel's product is worthless.
I think you are a little confused by Intel marketing speak. Actually, you are a lot confused.
Smoke. Total and complete nonsense. Why would I want to buy their over-priced octane junk verses a Samsung 951* or 960* NVMe drive? Far more storage for around $115-$130, 1.4 GBytes/sec consistent read performance, decent write performance, and decent durability.
P.S. the Intel 600P NVMe drive is also horrid, don't buy it.
The DC-X was successful except for the idiot who didn't connect the landing leg.
Human error is not the same as a fundamental flaw in the program.
If the DC-X had six landing legs instead of 4, one could have failed (like it did) without the thing tipping over and exploding. It could have also landed on rough terrain, both on Earth, and off (moon landing, anyone?) without needing a relatively flat place to land.
Making it small didn't really serve any purpose, other than to save some money up front, and McDonnell Douglas, at the time, was pretty much printing money (which is what made it such an attractive target for a takeover).
Scale models are useful when they fail, though.
They are useful in limited realms, given that what's being tested is not the final product. We should probably be designing things to not fail.
The amount of beauty required launch 1 ship = 1 Millihelen