Comment Re:Access time latency (Score 2) 162
Huh? This sounds like nonsense. Operating systems already cache frequently used data in ram.
-Matt
Huh? This sounds like nonsense. Operating systems already cache frequently used data in ram.
-Matt
That's isn't correct. The queue depth for a normal AHCI controller is 31 (assuming 1 tag is reserved for error handling). It only takes a queue depth of 2 or 3 for maximum linear throughput.
Also, most operating systems are doing read-ahead for the program. Even if a program is requesting data from a file in small 4K read() chunks, the OS itself is doing read-ahead with multiple tags and likely much larger 16K-64K chunks. That's assuming the data hasn't been cached in ram yet.
For writing, the OS is buffering the data and issuing the writes asynchronously so writing is not usually a bottleneck unless a vast amount of data is being shoved out.
-Matt
Actually, large compiles use surprisingly little actual I/O. Run a large compile... e.g. a parallel buildworld or a large ports bulk build or something like that while observing physical disk I/O statistics. You'll realize very quickly that the compiles are not I/O constrained in the least.
'most' server demons are also not I/O constrained in the least. A web server can be IOPS-constrained when asked to load, e.g. tons of small icons or thumbnails. If managing a lot of video or audio streams a web server typically becomes network-constrained but the IOPS will be high enough to warrant at least a SATA SSD and not a HDD.
Random database accesses are I/O constrained if not well-cached in ram, which depends on the size of the database too, of course. Very large databases which cannot be well cached are the best suited for PCIe SSDs. Not a whole lot else.
-Matt
I mean, why would anyone think images would load faster? The cpu is doing enough transformative work processing the image for display that the storage system only has to be able to keep ahead of it... which it can do trivially at 600 MBytes/sec if the data is not otherwise cached.
Did the author think that the OS wouldn't request the data from storage until the program actually asked for it? Of course the OS is doing read-ahead.
And programs aren't going to load much faster either, dynamic linking overhead puts a cap on it and the program is going to be cached in ram indefinitely after the first load anyway.
These PCIe SSDs are useful only in a few special mostly server-oriented cases. That said, it doesn't actually cost any more to have a direct PCIe interface verses a SATA interface so I these things are here to stay. Personally though I prefer the far more portable SATA SSDs.
-Matt
Author must not know the difference between the real the rebrand. I would never buy Kingston anything. They just slap random components into those boards. There are hundreds of rebranders in the SSD space but only a handful of real companies. Kingston isn't one of them.
-Matt
Well, except that it isn't a mere month. Unpowered data retention is around 10 years for relatively unworn flash and around 1 year for worn flash. Powered data retention is almost indefinite (doesn't matter if the data is static or not). The modern SSD controller will rewrite blocks as the bits leave the sweet zone.
The main benefit, though, is that SSD wear is essentially based on how much data you've written, which is a very controllable parameter and means, among other things, that even a SSD which has been sitting on a shelf for a long time and lost its data can still be used for fresh data (TRIM wipe + newfs). I have tons of SSDs sitting on a shelf ready to be reused when I need them next. I can't really do that with HDDs and still expect them to be reliable.
Hard drives have a relatively fixed life whether powered or not. If you have a modestly used hard drive and take it out and put it on a shelf for a year, chances are it either won't be able to spin up after that year or it will die relatively quickly (within a few weeks, possibly even faster) once you have spun it up. So get your data off it fast if you can.
So SSDs already win in the data retention and reliability-on-reuse department.
-Matt
I don't understand what you mean by 'non graphics competitors'. Intel, AMD, and ARM cpu offerings already have integrated GPUs with dual-head capability (and have for a few years now). There are no non graphics competitors.
Currently the best open source kernel and driver compatibility is with the Intel and AMD integrated GPUs. That's what all the KMS work was responsible for giving us. The performance of integrated GPUs has increased steadily over the last few years and has reached a point now where most 3D games will run with modest (but not high-end) settings, and *all* 2D (aka desktop operations) will run faster than you can blink.
I splurged for a mid-range card for my windows gaming box, but all my workstations just use the cpu-integrated gpus these days for dual-head operation. And they're nice and quiet and fast.
-Matt
Well, nobody with a laptop is really going to notice much of a difference because frankly there isn't a whole lot of software that actually needs that kind of performance over the ~550 MBytes/sec that can already be obtained with SATA-III. Certainly not that would be run on a laptop anyway.
It's just using the PCI-e lanes on the M.2 connector instead of the SATA-III lanes. This isn't a magical technology. There's a loss of robustness and portability that gets traded off. It does point to SATA needing another few speed bumps, though. The fundamental serial link technology used at the physical level by PCI-e and SATA is almost identical. The main difference is that SATA is designed for cabling while M.2 is not (at least not M.2's PCI-e lanes).
-Matt
Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.
On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.
And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.
So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:
(1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.
(2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.
#2 also works well for data processing pipelines.
-Matt
Well... basic procedures using only MOV/CMP/JMP is not something that even linux really needs to code in assembly. What is being talked about here is primarily the trap, exception, syscall, signal trampoline, and interrupt entry and exit mechanisms. Also thread switch code can get pretty complex because there is a lot more hardware state involved than just the basic register set. When you start having to deal with SWAPGS and MSR registers, you've really gone down the rabbit hole.
-Matt
It's not a major refresh, only a modest one, and it doesn't really fix the readability issues (which would require a complete rewrite). Linux assembly is a mostly unreadable, badly formatted, macro-happy mess. The assembly in the BSDs is much more elegant and minimalistic.
-Matt
The core of the issue has nothing to do with going off-grid and everything to do with matching production from renewal sources to the actual load on the grid. Without that we get into the situation that Germany finds itself in, which is two fold: (1) That electricity prices fall to zero during the day due to all the solar, and as subsidies go away the owners can't make money from providing power to the grid. And (2) The base load differential between day and night is so great that the traditional generation (i.e. coal) cannot run continuously at critical mass and so becomes extremely inefficient and uneconomical. So coal power generation companies in Germany are also going bankrupt.
Ultimately consumers with PV systems will be forced to pay spot rates and feel the pain. This is already beginning to happen in many parts of the country... where day-time electricity rates are lower but the buy-back is also lower, and night-time rates are higher and have a higher buy-back.
The idea with using the electric car battery (or some other form of temporary storage) is to use it store energy when prices are cheap and inject it into the grid when prices are expensive. This also has the side effect of reducing the base load differential between day and night, so other generation sources such as nuclear and coal can operate efficiently (and thus profitably) to make up the difference.
There is nothing nefarious going on. Really, going entirely off-grid is not something anyone should be trying to do unless they actually live somewhere with a flaky grid (or no grid). And the reality is that electricity prices are going to fluctuate even more between day and night, or rainy vs not, or windy vs not, as more renewable energy sources are brought online.
-Matt
Depends on a lot of things but one thing for certain... swap on a SSD greatly improves system responsiveness when you have a lot of open applications (on any OS: Windows, Mac, Linux, BSD). Being able to page out anonymous memory to fast swap is a big deal. Nominal file storage on a SSD greatly improves program startup, boot times, photo and document handling. I've found though that it's really having swap space on the SSD that makes the biggest difference. I have a multitude of machines ranging from 1GB to 32GB of ram, with various cpus.
My 2GB haswell-based chomebook is snappy for the tasks I use it for (of course, I replaced the 16G SSD with a 128G SSD and run DragonFly on it). Heck, I even still have an old pre-haswell netbook and throwing a SSD into that made it usable again (but these days I only use it for legacy testing). But I'm really happy with Intel's haswell-or-later based laptop cpus.
There are some caveats. Firefox has huge gaping memory leaks and horrible memory footprint use, so leaving it open in the background for a few days usually builds it up to around ~2GB VSZ and 1.5GB RSS (and it keeps growing. On my 8GB workstation I've let it grow well past 4GB before closing out all the windows and reopening it). If it just paged all that leaked memory out it wouldn't be a problem, but it's so fragmented internally that it winds up touching most of the footprint under normal operation all the time. In this situation, having a bit more memory on the workstation or laptop does help quite a bit.
Another caveat is of course any heavy cpu workloads, such as batch photo or video processing or large compiles. But nobody in their right mind runs that kind of workload on a laptop anyway so.... maybe not so much of an issue.
Other than the browser, there isn't really a whole lot that eats memory to the point where you'd notice it. And beyond photo/video processing, only large compiles really loads down these modern cpus enough to be noticeable.
-Matt
char
EncryptChar(char x)
{
return 0;
}
I started serious programming (at around age 14) on the Pet. First in BASIC, but once I found out you could break into a machine language monitor by wiring up a NMI button (we called it the two-button salute), there began my machine coding. In HEX, directly. Didn't even have a relocator at the beginning. It was a year before I could buy the expansion rom to add disassembly and relocation features to the machine language monitor.
Ultimately I wrote an assmbler too. I don't think I have any of that code any more, it's been lost in time. Kinda makes me sad.
The PETs 8-bit IEEE-488 bus was pretty awesome. The PET had a 1 MHz 6502. The external floppy drive also had a 1 MHz 6502 in it, and you could reprogram it. So one of my many projects was to speed up the data transfer between the two by synchronizing the processors with a series of handshakes and then pushing or pulling the sector data without any further handshakes (using processor timing).
My friend did the same thing for the C64's serial interface (which didn't even have a uart) and sold a product called '1514 Flash!' that sped up the serial interface. Basically a little messing around at the beginning of each transfer to get the two sides synchronized within 1 clock cycle of each other and then pushing/pulling bits as fast as the little cpus would go without any further handshaking. The clocks would stay synchronized long enough to copy a whole sector.
Other projects on the PET... it had a character generator rom which I replaced with a static ram. so when I powered it up I had to load a copy of the original rom into the ram blindly (because the display was total garbage due to it being an uninitialized ram).
The PET had built-in CRT screen but the key was that the data input for the screen was actually a TTL input! So I could pop the wire off the connector and use it like a digital oscilloscope to probe various TTL-based projects (as well as the PET's motherboard itself).
Another project...the character generator rom had something called quarter-block graphics. Basically 16 characters that had all 16 combinations of four quarter-blocks (2x2), so you could (I think) 320x200 graphics on it. I spent many hours optimizing the machine code to generate a pixel pusher.
I got so good at writing editors from scratch, once when I went to computer camp and forgot to bring the tape I rewrote the text editor in machine code in less than an hour.
Met Richard Garriott at that camp too, we were both on staff. He was working on (I think) Ultima II at the time (on an Apple II I think) and had an awesome ram disk for storing code temporarily while he was working on it. Once his computer stopped responding and after unsuccessfully trying to resurrect it he finally gave up and power cycled it, losing his work in the ram disk. It turned out he had accidentally disconnected the keyboard and the computer was actually fine. Oh well! Richard taught a class at that camp on human-interface parsing... basically had people write a dungeon game where you typed in what you wanted to do in English. Primitive of course, but the kids had a blast.
I wrote a centipede game in machine code, incredibly fast and awesome (the last level the centipede was invisible and only blinked into existence for a second or two every few seconds), and submitted it to Cursor magazine. They rejected it because they thought I had stolen it from someone
The 6502 had two awesome indirect EA modes. (TABLE,X) and (ADDR),Y, along with the standard modes.
Decimal mode wasn't as interesting, I wound up not using it for display conversion at all.
The 6522 I/O chip was an incredibly powerful chip for its time, with multiple timers and timer-driven events. It had a few bugs, too.
I remember all the unsupported machine codes the 6502 had. It was a hardwired cpu so all instruction codes did *something* (even if it was mostly to just crash the cpu). LDAX was my favorite. Mostly, though, the hidden codes were not very useful.
The list goes on. Twas an awesome time, a time before PCs took over the world.
-Matt
HOLY MACRO!