I saw a recent review of a smartphone that had two screens, one LCD and one eInk. The modern eInk display is able to get a high enough refresh for interactive use and doesn't drain the battery when done. The screen that I'd love to see is eInk with a transparent OLED on top, so that text can be rendered with the eInk display and graphics / video overlaid on the OLED. The biggest problem with eInk is that the PPI is not high enough to make them colour yet. You get 1/3 (or 1/4 if you want a dedicated black) of the resolution when you make the colour and so that means you're going to need at least 600PPI to make them plausible.
The other problem that they've had is that LCDs have ramped up the resolution. My first eBook reader had a 166PPI eInk display. Now LCDs are over 300PPI but the Kindle Paperwhite is only 212PPI, so text looks crisper on the LCD than the eInk display, meaning that you're trading different annoyances rather than having the eInk be obviously superior. With real paper you get (at least, typically a lot more than) 300DPI and no backlight.
Your neurones also become very complex, as they need to all be network nodes with store and forward and they are going to have to handle multiple inputs every cycle (consider a node in the middle. In the first cycle it can be signalled by 8 others, in the next it can be signalled by 12 and so on. The exact number depends on how you wire the network, but for a flexible implementation you need to allow this.
What's the justification for compilation unit boundary? It seems like you could expose the layout of the struct (and therefore any compiler shenanigans) through other means within a compilation unit. offsetof comes to mind.
That's the granularity at which you can do escape analysis accurately. One thing that my student explored was using different representations for the internal and public versions of the structure. Unless the pointer is marked volatile or any atomic operations occur that establish happens-before relationships that affect the pointer (you have to assume functions that you can't see the body of contain operations), C allows you to do a deep copy, work on the copy, and then copy the result back. He tried this to transform between column-major and row-major order for some image processing workloads. He got a speedup for the computation step, but the cost of the copying outweighed it (a programmable virtualised DMA controller might change this).
I suppose you could do that in C++ with template specialization. In fact, doesn't that happen today in C++11 and later, with movable types vs. copyable types in certain containers? Otherwise you couldn't have vector >. Granted, that specialization is based on a very specific trait, and without it the particular combination wouldn't even work.
The problem with C++ is that these decisions are made early. The fields of a collection are all visible (so that you can allocate it on the stack) and the algorithms are as well (so that you can inline them). These have nice properties for micro optimisation, but they mean that you miss macro optimisation opportunities.
To give a simple example, libstdc++ and libc++ use very different representations for std::string. The implementation in libstdc++ uses reference counting and lazy copying for the data. This made a lot of sense when most code was single threaded and caches were very small but now is far from optimal. The libc++ implementation (and possibly the new libstdc++ one - they're breaking the ABI at the moment) uses the short-string optimisation, where small strings are embedded in the object (so fit in a single cache line) and doesn't bother with the CoW trick (which costs cache coherency bus traffic and doesn't buy much saving anymore, especially now people use std::move or std::shared_ptr for the places where the optimisation would matter).
In Objective-C (and other late-bound languages) this optimisation can be done at run time. For example, if you use NSRegularExpression with GNUstep, it uses ICU to implement it. ICU has a UText object that implements an abstract text thing and has a callback to fill a buffer with a row of characters. We have a custom NSString subclass and a custom UText callback which do the bridging. The abstract NSString class has a method for getting a range of characters. The default implementation gets them one at a time, but most subclasses can get a run at once. The version that wraps UText does this by invoking the callback to fill the UText buffer and then copying. The version that wraps in the other direction just uses this method to fill the UText buffer. This ends up being a lot more efficient than if we'd had to copy between two entirely different implementations of a string.
Similarly, objects in a typical JavaScript implementation have a number of different representations (something like a struct for properties that are on a lot of objects, something like an array for properties indexed by numbers, something like a linked list for rare properties) and will change between these representations dynamically over the lifetime of an object. This is something that, of course, you can do in C/C++, but the language doesn't provide any support for making it easy.
In general though, there are some constructs that it is easy for a JVM to map efficiently to modern hardware and some that are hard. For example, pointer chasing in data is inefficient in any language and there's little that the JVM can do about it (if you're lucky, it might be able to insert prefetching hints after a lot of profiling). Cache coherency can still cause false sharing, so you want to make sure that fields of your classes that are accessed in different threads are far apart and ones accessed together want to be close - a JVM will sometimes do this for you (I had a student work on this, but I don't know if any commercial JVM does it).
Heck, in C / C++, such as transformation is actually illegal
Actually, it isn't if the compiler can prove that the layout is not visible outside of the compilation unit. I did have a student work on this, but the performance gains were negligible in most C code because complex data structures tend to leak across compilation unit boundaries (this may be less true with LTO). Even then, if you can recognise data structures that are bad then you can probably teach programmers not to use them, or put them in a standard library where their implementations can be easily changed.
It's much more interesting in environments with on-the-fly compilation, because then you can adapt data structures to use. Even then, you can do it outside of the compiler (for example, the NeXT implementations of the Objective-C collection classes would switch between a few different internal representations based on the data that you put in them).
We have a *lot* of neurons with a *lot* of connections.
The second part is the important one. Neurones in the human brain have an average of 7,000 connections to other neurones. That's basically impossible to do on a silicon die, where you only have two dimensions to play with and paths can't cross - you end up needing to build very complex networks-on-chip to get anywhere close.
fortune: No such file or directory