Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 Internet speed test! ×

Comment This particular speedup (Score 1) 121

Not 100% sure but I think this particular speedup was due to an issue with non-temporal writes to memory. Such instructions are used in heavily optimized game code but not generally used in critical paths elsewhere. They are also known to be highly temperamental instructions even across Intel cpus. The Ryzen box was synchronizing the memory writes to all cores which imploded some of the heavily optimized algorithms.

So far my tests with a 1700X show Ryzen to be an excellent performance cpu, it goes up well against nearly all of Intel's offerings. It does still run a bit hotter than Intel in my tests but the power consumption is significantly better than past AMD cpus. It's a lot closer to Intel now.

More importantly, Intel's FAB advantage is dissipating fairly quickly as other fabs catch up. The combination of a modern cpu design and competitive third party fabs puts AMD in a good position to compete from this point forwards.

As AMD has shown just in the past few days, Ryzen can definitely be competitive and even more so as game devs begin to make Ryzen-optimized builds available.

-Matt

Comment VPNs kinda sorta ... they will help, a little. (Score 4, Informative) 141

I've been running an openvpn link from my home to our colo for years. I also have it set up on all my devices so I can use it while traveling. Some of our DFly devs also use it when they are traveling. Here's my cumulative wisdom on the matter:

Generally speaking it works quite well. I use a medium-numbered port but I also have a server running on port 443 because the many weird networks one runs through when traveling often block most parts, but usually leave the https port open.

* Use UDP for the transport when running openvpn over a broadband link. This provides the most consistent experience.

* Use TCP for the transport for connections from mobile devices. This provides the most consistent experience. There are several reasons for this not the least of which being that the telco infrastructure seems to devalue UDP by a lot verses other traffic. TCP is also a lot easier to run on the server-side if you potentially have many devices connecting in, because you can run one server instance.

* Configure a smaller mss, I use 1300, so the encapsulation doesn't get fragmented by the transport. This is very important.

* Configure a relatively frequent keepalive in openvpn over a WAN link (I use 1sec/10sec), but a less frequent one over mobile (I use 20sec/120sec). This is particularly important on mobile because cell tower switches can cause long disruptions. You don't want to drop the VPN link in such circumstances if you can help it. DO NOT DISABLE THE KEEPALIVE. Always have an openvpn keepalive setup, particularly over TCP, because the TCP connection backoff can prevent your sessions from recovering or cause them to take a long time to recover if one or the other direction is not actively sending data (such as with most web connections, downloads, streaming, etc).

I personally like 'OpenVPN Connect' on IOS (which I use to connect to our project colo). And of course I run openvpn on all the DragonFly boxes including my laptop.

--

Reliability of the VPN depends entirely on the path between your location and the VPN server. The packet must travel this path in addition to the path from the VPN server to the nominal destination, and even in the best of circumstances it will double the chances of something going wrong.

I've had a number outages at home where my cable link is still operational but the cable company's path to the VPN server is having problems. Also, recovery times are longer because not only does the dead network have to revive, but the openvpn setup has to reconnect and renegotiate.

--

Commercial services are going to be hit or miss. VPN'ing your broadband link might be problematic and you have no real visibility into what the commercial service is doing with your data. That said, they are probably going to be a lot better than trusting your data to the telco and wifi hot-spots you connect from when you are mobile.

Netflix and other video streaming providers will often block-out commercial VPN IPs from the service. Generally speaking, using a commercial service for high-bandwidth connections is really hit-or-miss. You are using their bandwidth as well as your own.

When using a VPN, you are bypassing any special deals your broadband provider has made with the likes of YouTube, Netflix, etc. Remember that if the cell bandwidth is supposed to be free, because it won't be over the VPN.

--

In terms of security, its a mixed bag. The VPN will secure your traffic from your immediately ISP/Telco (aka Comcast, AT&T), and that's actually very important. However, you are not anonymous and once your traffic reaches the egress point its up for grabs by any network it flows through and, in particular, the target web page or whatever might be doing its own data collection.

But the telco data collection is MUCH more valuable to third parties than target data collection, and the VPN link at least protects you from that.

The VPN will not do a whole lot for your internal network security. If someone breaks into an IOT device on your home network you are pretty much screwed. The best defenses here are (A) to not use IOT devices in your home - disable their internet access for the most part, and (b) is to have a router inbetween your cable modem / U-verse device and your home network:

cable modem home router home network + WIFI router

I run all the NAT and openvpn stuff on my home router, so a compromised cable modem has no access to my home network. I also segregate the wired ethernet's IP space from the wireless router's IP space, and firewall the IPs, so nothing on the wireless router can fake my wired IPs.

More on the IOT devices. Obviously things like a printer or AppleTV have to be on the wifi network. But your refrigerator, 'smart' TV, Blueray player, receiver, and other junk does not. And you can further segregate the wifi devices by running several different WIFI SSIDs with different passwords. I don't quite go that far even though my printer is almost certainly vulnerable to a LAN hack.

-Matt

Comment Flash... (Score 2) 230

What did it in should be obvious... one security exploit after another, non-stop, for over 8 years. HTML5 might have been the final nail in the coffin but Flash really did itself in.

When Flash was originally conceived by Macromedia very little thought went into security, because at the time security wasn't a big issue (the Internet was still fairly small, compared to today, and hackers had not yet really ramped up on a large scale). The entire codebase was inherently insecure and trusting of the flash handed to it.

In all that time, ever since that first flash product went out the door, right on up to today, nobody did more than basic hand-waving around the security problems. I'm sure they will claim that they tried... but no... they really didn't.

In the end, people finally got tired of the endless stream of security exploits.

-Matt

Comment Re:Yeah, but no (Score 1) 109

Dissecting the test output:

11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS

That means the average latency is 85uS (averaged over all reads), the lowest latency measured was 66uS and the highest was 140uS. Another important metric is the standard deviation... that is, how 'tight' access times are around that average latency of 85uS. In this case, a standard deviation of 7.5uS is very good.

Comparing this to the Optane. what Intel has stated is that the average latency over all reads for Optane NVMe will be around 10uS. They also stated that the standard deviation would be much tighter. So that is comparative.

But here's the real problem... you ask whether Optane will beat a PCIe SSD as a HDD cache in actual real-world desktop circumstances. I will add 'at the same price point'. The answer to that is going to be 'no'. The reason is that you can buy 4x to 8x the amount of NAND NVMe-based storage as you can Optaane NVMe storage for the same price.

So instead of having a 32G Octane cache, you could have a 128GB-256GB NAND SSD cache for the same price. That *completely* trumps Octane, no matter how low Octane's latency is, for this use case.

-Matt

Comment Re:Optane is cool (Score 1) 109

Insulator breakdowns on circuit boards happen less often these days but they are still prevalent in Electrolytic caps and anything with windings (transformers, inductors, DC motors, etc), though it can take 20-50 years to happen and depends on conditions. And the failure mode depends too.

Generally speaking, any component with an insulator which is getting beat up is subject to the issue.

Circuit boards got a lot better as vendors switched to solid state caps. Electrolytics tend to dry out and little arc-throughs punch holes in the insulator over time (running them at less than half their rated voltage goes a long ways to lengthening their lives, which is why you usually see voltage ratings much higher than the voltages that are actually run through them).

The insulating coatings in wires used for windings has gotten better. Typically shorts develop over time and change the value of the inductance (or voltage ratio for a transformer), and other parameters until it gets to the point where it is so out of spec it stops doing its function properly. DC motors will get weaker, etc etc.

-Matt

Comment Re:Intel is blowing (Score 1) 109

Just so happens I have an Intel 750 in the pile, here's the issue that the linux NVMe code had to work around:

nvme3: mem 0xc7310000-0xc7313fff irq 40 at device 0.0 on pci4
nvme3: mapped 32 MSIX IRQs
nvme3: NVME Version 1.0 maxqe=4096 caps=0000002028010fff
nvme3: Model INTEL_SSDPEDMW400G4 BaseSerial CVCQ535100LC400AGN nscount=1
nvme3: Request 64/32 queues, Returns 31/31 queues, rw-sep map (31, 31)
nvme3: Interrupt Coalesce: 100uS / 4 qentries
nvme3: Disk nvme3 ns=1 blksize=512 lbacnt=781422768 cap=372GB serno=CVCQ535100LC400AGN-1

If I run a randread test on uncompressed data using block sizes 512... 131072, you can see the glitch that occurs at 65536 bytes. I will use a deep queue (128 threads, around QD4 per HW queue but considered to be QD128 globally), so this is the absolute limit of the device's performance. Look at what happens when the block size transitions from 32768 to 65536. That's the firmware screwup that the Linux folks worked around. No other NVME vendor has this issue:

xeon126# randread /dev/nvme3s1b 512 100 128
device /dev/nvme3s1b bufsize 512 limit 16.000GB nprocs 128
487912/s avg=262.34uS bw=249.81MB/s lo=60.69uS, hi=2693.38uS stddev=101.09uS
488698/s avg=261.92uS bw=250.14MB/s lo=44.12uS, hi=2693.58uS stddev=101.79uS
489023/s avg=261.75uS bw=250.37MB/s lo=54.44uS, hi=2629.42uS stddev=98.31uS
^C
xeon126# randread /dev/nvme3s1b 1024 100 128
device /dev/nvme3s1b bufsize 1024 limit 16.000GB nprocs 128
485963/s avg=263.39uS bw=497.62MB/s lo=45.28uS, hi=2593.95uS stddev=91.90uS
486353/s avg=263.18uS bw=497.98MB/s lo=60.05uS, hi=1268.07uS stddev=89.05uS
486312/s avg=263.21uS bw=497.97MB/s lo=62.99uS, hi=1131.01uS stddev=89.04uS
^C
xeon126# randread /dev/nvme3s1b 2048 100 128
device /dev/nvme3s1b bufsize 2048 limit 16.000GB nprocs 128
459915/s avg=278.31uS bw=941.89MB/s lo=61.83uS, hi=1244.34uS stddev=95.07uS
459681/s avg=278.45uS bw=941.33MB/s lo=68.47uS, hi=2890.23uS stddev=99.12uS
458907/s avg=278.92uS bw=939.81MB/s lo=67.12uS, hi=2838.20uS stddev=110.08uS
^C
xeon126# randread /dev/nvme3s1b 4096 100 128
device /dev/nvme3s1b bufsize 4096 limit 16.000GB nprocs 128
442539/s avg=289.24uS bw=1812.62MB/s lo=75.33uS, hi=2985.67uS stddev=154.67uS
444166/s avg=288.18uS bw=1819.13MB/s lo=76.80uS, hi=2618.38uS stddev=145.94uS
443966/s avg=288.31uS bw=1818.44MB/s lo=73.81uS, hi=2854.27uS stddev=146.88uS
^C
xeon126# randread /dev/nvme3s1b 8192 100 128
device /dev/nvme3s1b bufsize 8192 limit 16.000GB nprocs 128
248658/s avg=514.76uS bw=2036.99MB/s lo=81.98uS, hi=3809.30uS stddev=321.11uS
249693/s avg=512.63uS bw=2045.32MB/s lo=84.38uS, hi=3278.75uS stddev=317.38uS
247367/s avg=517.45uS bw=2026.38MB/s lo=86.12uS, hi=3032.98uS stddev=323.87uS
^C
xeon126# randread /dev/nvme3s1b 16384 100 128
device /dev/nvme3s1b bufsize 16384 limit 16.000GB nprocs 128
124276/s avg=1029.97uS bw=2036.11MB/s lo=115.63uS, hi=3886.27uS stddev=558.13uS
124526/s avg=1027.90uS bw=2040.07MB/s lo=118.72uS, hi=3894.09uS stddev=574.04uS
125651/s avg=1018.69uS bw=2058.63MB/s lo=109.03uS, hi=3843.91uS stddev=550.71uS
^C
xeon126# randread /dev/nvme3s1b 32768 100 128
device /dev/nvme3s1b bufsize 32768 limit 16.000GB nprocs 128
  62540/s avg=2046.68uS bw=2049.30MB/s lo=137.03uS, hi=6263.58uS stddev=1148.11uS
  63146/s avg=2027.05uS bw=2068.84MB/s lo=157.29uS, hi=5875.07uS stddev=1134.63uS
  62563/s avg=2045.95uS bw=2050.01MB/s lo=147.76uS, hi=6244.51uS stddev=1285.00uS
^C
xeon126# randread /dev/nvme3s1b 65536 100 128
device /dev/nvme3s1b bufsize 65536 limit 16.000GB nprocs 128
    4431/s avg=28887.12uS bw=290.39MB/s lo=195.41uS, hi=59137.97uS stddev=-34838.70uS
    4598/s avg=27835.93uS bw=301.31MB/s lo=214.55uS, hi=59121.64uS stddev=-33681.49uS
    4541/s avg=28186.97uS bw=297.59MB/s lo=257.66uS, hi=61115.02uS stddev=-34015.71uS
^C
xeon126# randread /dev/nvme3s1b 131072 100 128
device /dev/nvme3s1b bufsize 131072 limit 16.000GB nprocs 128
    1679/s avg=76235.18uS bw=220.07MB/s lo=66136.55uS, hi=94294.59uS stddev=-70453.03uS
    1696/s avg=75465.57uS bw=222.28MB/s lo=65954.03uS, hi=96303.18uS stddev=-70093.39uS
    1687/s avg=75872.61uS bw=221.11MB/s lo=64842.69uS, hi=92678.51uS stddev=-70288.51uS
^C

See what happened? Everything was going dandy, the device maxes out at around 2 GBytes/sec with a block size of 8192 bytes, which is *very* good, and it stays there through 32768 bytes. But then we use a block size of 65536 bytes and BOOM, the device implodes... bandwidth drops to a mere 300 MBytes/sec. And at 131072 it drops to 220 MBytes/sec.

That's the story behind that quirk. Intel messed up (as in SERIOUSLY messed up) for this and a few other older models. And as I said, no other NVMe vendor has this problem. What's worse is that Intel is still selling this crap without fixing the problem. Its like a file-and-forget... they come out with a device and then they forget to support it or fix bugs.

Now personally speaking, I would never put a quirk like that in MY NVMe driver to fix a vendor screwup of that proportion. If linux had not added that quirk Intel would probably have been shamed into fixing the problem in those models. Who in there right mind screws up DMA block sizes as tiny as 65536 and 131072? Apparently Intel.

I'll bet a lot of you folks didn't know this bug existed. If you look at all the media reviews of the 750 nowhere will you see them using a block size 65536 bytes or larger (or if they are, they probably don't realize that the NVMe driver in the OS has severe hacks in it to make it work).

Here's a 4K QD1 test. The 750 does have some decent qualities. Around 12K IOPS at QD1 (note that all NVMe vendors can typically do 10K-15K IOPS at QD1). More importantly, a low standard deviation.

xeon126# randread /dev/nvme3s1b 4096 100 1
device /dev/nvme3s1b bufsize 4096 limit 16.000GB nprocs 1
  11971/s avg= 83.53uS bw=49.03 MB/s lo=37.81uS, hi=229.67uS stddev=8.85uS
  11932/s avg= 83.81uS bw=48.87 MB/s lo=62.02uS, hi=435.44uS stddev=9.28uS
  11977/s avg= 83.49uS bw=49.06 MB/s lo=39.86uS, hi=446.53uS stddev=9.41uS
  11935/s avg= 83.79uS bw=48.89 MB/s lo=61.53uS, hi=586.35uS stddev=10.80uS
  11922/s avg= 83.88uS bw=48.83 MB/s lo=62.17uS, hi=363.81uS stddev=9.12uS
  11934/s avg= 83.79uS bw=48.88 MB/s lo=61.95uS, hi=1422.06uS stddev=15.20uS

That's very good.

--

This Intel 750 has other problems. The second biggest one is that the more queues one configures, the lower the performance at EVERY test level. Again, no other NVMe vendor does anything that stupid.

The consumer 600P has its own terrible issues, but block size is not one of them. The 600P does fine at 65536 and 131072 bytes.

Now you know the real history here.

-Matt

Comment Re:Yeah, but no (Score 3, Informative) 109

Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').

XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.

So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.

This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.

Here's the 960 EVO and some randread tests on it at QD1 and QD4.

nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1

(/dev/nvme1s1b is a partition filled with uncompressible data)

xeon126# randread /dev/nvme1s1b 4096 100 1
device /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 1
  11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
  11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
  11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
  11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS

And here is QD4 (really QD1 x 4 threads on 4 HW queues):

xeon126# randread /dev/nvme1s1b 4096 100 4
device /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 4
  44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
  44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
  44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
  44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
  44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS

So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:

xeon126# randread /dev/nvme1s1b 32768 100 4
device /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 4
  19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
  20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
  20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
  20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
  20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
  20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS

And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:

xeon126# randread /dev/nvme1s1b 32768 100 64
device /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 64
  32989/s avg=1940.03uS bw=1080.98MB/s lo=1396.85uS, hi=3343.49uS stddev=291.76uS
  32928/s avg=1943.62uS bw=1078.84MB/s lo=1386.21uS, hi=3462.96uS stddev=297.14uS
  33012/s avg=1938.67uS bw=1081.73MB/s lo=1371.41uS, hi=3676.83uS stddev=290.64uS
  33217/s avg=1926.70uS bw=1088.44MB/s lo=1385.18uS, hi=3344.11uS stddev=282.63uS

xeon126# randread /dev/nvme1s1b 65536 100 64
device /dev/nvme1s1b bufsize 65536 limit 16.000GB nprocs 64
  14739/s avg=4342.19uS bw=965.93MB/s lo=3189.96uS, hi=6937.79uS stddev=466.51uS
  14813/s avg=4320.60uS bw=970.67MB/s lo=3273.82uS, hi=6327.81uS stddev=442.92uS
  14991/s avg=4269.15uS bw=982.43MB/s lo=3205.54uS, hi=6355.74uS stddev=432.94uS

xeon126# randread /dev/nvme1s1b 131072 100 64
device /dev/nvme1s1b bufsize 131072 limit 16.000GB nprocs 64
    8052/s avg=7948.27uS bw=1055.38MB/s lo=6575.48uS, hi=9744.12uS stddev=496.41uS
    8150/s avg=7853.00uS bw=1068.01MB/s lo=6540.51uS, hi=9496.64uS stddev=465.37uS
    7986/s avg=8013.88uS bw=1046.72MB/s lo=6446.20uS, hi=9815.01uS stddev=518.95uS

--

Now the thing to note here is that with deeper queues the latency also goes up. At QD4 the latency is around 200uS, for example.

Where Optane (aka XPoint) 'wins' is on latency. I don't have an Optane device to test yet, but Intel is saying an average latency of 10uS at QD1 over the NVMe interface (over a direct DDR interface it will of course be much faster). That's the 'win'. But its completely irrelevant for the consumer case because the consumer case is for multi-block transfers and filesystems always do read ahead (i.e. at least QD4). A disk cache does not need 10uS latency to be effective.

-Matt

Comment Re:Intel is blowing (Score 1) 109

And who the hell do you think I am mister Anonymous Coward?

So, as I thought, you don't understand either that commit or the commit later on that simplified it (159b67d7).

It's not a stripe-size limitation per say, it's just a limitation on the maximum physical transfer size per I/O request, which for 99.9% of the NVMe devices out in the wild will be >= 131072 bytes and completely irrelevant for all filesystem I/O and even most softRAID I/O.

More to the point, that particular commit does not apply to the 600P at all. It applies to several older Intel datacenter SSDs as well as the 750 series and it exists because Intel really screwed up the firmware on those devices and put crazy stupid low limitations on physical transfer size. Then they carefully designed tests that didn't hit those limitations to sell the devices.

The 750, for example, loses a huge amount of performance with a block size >= 65536 bytes. Intel maybe didn't advertise the mistake, but that is a limitation that doesn't exist in the 600P nor does it exist on ANY OTHER NVME SSD IN EXISTENCE. Only a complete idiot creates a NVMe device which can't handle block transfers of 65536 or 131072 bytes without losing massive performance. Intel = 65536 bytes.

This was a well known bug in these particular models.

In anycase, even for these models, this particular quirk has no effect on block I/O tests for block sizes 65536 bytes. And, as I mentioned already, NO OTHER NVME VENDOR has such absurdly low limits or such massively disconnected performance metrics when you exceed them. And even Intel fixed the issue on the P600.

This just points to the idiocy inside Intel. And it shows your stupidity as well, believing that a little quirk like this somehow effects the entire NVMe space (or even the entire Intel NVMe space), which it doesn't. These sorts of quirks exist for all manner of hardware, not just NVMe, to work around poor, buggy implementations.

-Matt

Comment Re:Intel is blowing (Score 1) 109

And, of course, any Linux or BSD operating system will use all available memory for cache data from storage anyway. I guess Windows needs a little more help to do that.

This certainly shows up in, for example, Chrome startup times. It takes around 4 seconds from a hard drive, uncached, 1 second from a SSD, 1 second from a NVMe drive, and presumably 1 second from any other form of storage because chrome itself needs a bit of cpu time to initialize itself, not to mention the time it takes to load a tab (minimum 0.5 seconds).

So honestly once one transitions from the HDD to a SATA SSD, where the difference is noticeable, any further transitions (SATA SSD -> NAND NVME SSD -> XPOINT NVME SSD -> XPOINT DDRs) are not likely to be noticeable, even without a ram cache.

I think Intel's ENTIRE marketing effort revolves around Windows' slow startup times. Or more to the point, Windows tends to seek the storage device a lot while starting up which is *very* noticeable if you have a hard drive, but most irrelevant if you have any sort of SSD.

Since one can accomplish the same thing simply buy purchasing a small SSD, I just don't see them being able to make a case for it being 'easier' as a disk caching substitute verses someone coming to the realization that their time and data are valuable enough to actually spend a bit more money on buying some native SSD storage in the first place.

The advent of the cloud is also making local mass storage less and less relevant. Here I'm not talking about those of us who insist on having our own local archives (mine is getting close to 4TB now, with another 4TB in two backup locations so... that's 12TB of storage for me). I'm talking about 'normal' people who are using cloud storage more and more often. They won't need Intel's ridiculous 'solution' either (not even mentioning the fact that a normal NAND NVME SSD to cache a HDD is a better fix for the solution they are marketing than their Optane junk).

-Matt

Comment Re:Being confused... (Score 1) 109

Motherboard vendors are just now, finally, starting to put M.2 connectors on the motherboard. Blame Intel for the slow rate of adoption. Intel came out with three different formats, all basically incompatible with each other, and created mass confusion.

But now, finally, mobo vendors are settling on a single PCIe-only M.2 format. Thank god. They are finally starting to put one or more M.2 slots and finally starting to put on U.2 connectors for larger NVMe SSDs. Having fewer SATA ports on the mobo is no longer a marketing issue. I've seen many more mobos recently with just 2-4 SATA ports.

-Matt

Comment Re:Thanks for the ad, I guess, but you missed some (Score 2) 109

It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.

In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.

Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.

The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.

The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.

The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.

In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).

But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.

-Matt

Comment Re:Intel is blowing (Score 2) 109

Intel devices have quirks, but I think you are mixing apples and oranges here. All modern filesystems systems have used larger alignments for ages. The only real issue was that the original *DOS* partition table offset the base of the slice the main filesystem was put on by a weird multiple of 512 bytes which was not even 4K aligned.

This has not been an issue for years. It was fixed long ago on DOS systems and does not exist at all on EFI systems. Regardless of the operating system.

At the same time, all SSDs past the second generation became sophisticated enough that they really stopped caring about alignment for most practical use cases.

Where Intel does mess up depends on the device. In the 600P's case, the firmware is poorly designed in many respects. In other cases, such as with the 750, performance implodes with large block sizes (64KB or higher). This just makes the device less worthy, because frankly NO OTHER SSD VENDOR has these sorts of idiotic problems.

All of that said, insofar as operating systems go, these storage-level devices have no real visibility into, understanding of, or optimizations for one particular filesystem verses another. So for all practical situations, there is NO raw performance difference between Windows, MacOS, Linux, or any of the BSD's for these storage level devices. They are completely OS-agnostic and have always been completely OS-agnostic.

-Matt

Comment Re:Intel is blowing (Score 3, Informative) 109

Right. They are trying to market it as something cool and new, which would be great except for the fact that it isn't cool OR new. A person can already use ANY storage device to accelerate any OTHER storage device. There are dozens of 'drive accelerators' on the market and have been for years. So if a person really wanted to, they could trivially use a small NAND flash based NVMe SSD to do the same thing, and get better results because they'll have a lot more flash. A person could even use a normal SATA SSD for the same purpose.

What Intel is not telling people is that NOBODY WILL NOTICE the lower latency of their XPoint product. At (I am assuming for this product) 10uS the Intel XPoint NVMe is roughly 1/6 the latency of a Samsung NVMe device. Nobody is going to notice the difference between 10uS and 60uS. Even most *server* workloads wouldn't care. But I guarantee that people WILL notice the fact that the Intel device is caching much less data than they could be caching for the same money with a NAND-based NVMe SSD or even just a SATA SSD.

In otherwords, Intel's product is worthless.

-Matt

Slashdot Top Deals

"There is nothing new under the sun, but there are lots of old things we don't know yet." -Ambrose Bierce

Working...