So, they make this announcement right before the new Top500 list is unveiled in the SuperComputing conference... What clearly means that once again there will be no US system in the Top1 position, right?
Personal and business reasons are actually opposite. These people are being fired.
It is really surprising that neither the linked Extremetech article, nor the slashdot summary cite the original source. This research was presented in HPCA'13 in a paper titled "Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures", by Emily Blem et al, from the University of Wisconsin's Vertical Research group, led by Dr. Karu Sankaralingam. You can find the original conference paper in their website.
The Extremtech article indicates that there are new results with some additional architectures (MIPS Loongson and AMD processors were not included in the original HPCA paper), so I assume that they have published an extended journal version of this work, which is not yet listed in their website. Please add a comment if you have a link to the new work.
I do not have any relation with them, but I knew the original HPCA work.
This is why we use the terms "Instruction Set Architecture" to define the interface to the (assembler) programmer, and "microarchitecture" to refer to the actual internal implementation. ISA is not bullshit, unless you confuse it with the internal microarchitecture.
This is a real pity for the TM community. This is not the first chip with transactional memory support in hardware: The Sun Rock was announced to have hardware TM support, and the IBM Blue Gene/Q Compute chip also supports it. Unlike other proposals for unbounded transactional memory, all these systems employ Hybrid Transactional Memory (ref, ref, ref), in which restricted hardware transactions are designed to correctly coexist with unbounded software transactions, so a software transaction can be started in case a hardware transaction fails for some unavoidable issue (such as lack of cache size or associativity to hold speculative data from the transaction, not because of a conflict). Note that, in any case, very large transactions should arguably be very uncommon, since they would significantly reduce performance (similar to very large critical sections protected by locks).
The problem with the hardware implementation of transactional memory is that they are not simply a new set of instructions which are independent from the rest of the processor. HTM implies multiple aspects, including multiversioning caching for speculative data; allowing for the commit of speculative (transactional) instructions, which could be later rolled back (note that in any other speculative operation such as instructions after branch prediction, the speculation is always resolved before instruction commits because the branch commits earlier); a tight integration with the coherence protocol (see LogTM-SE for an alternative to this very last issue, but still...); a mechanism to support atomic commits in presence of coherence invalidations... From the point of view of processor verification, this is a complete nightmare because these new "extensions" basically impact the complete processor pipeline and coherence protocol, and verifying that every single instruction and data structure behaves as expected in isolation does not guarantee that they will operate correctly in presence of multiple transactions (and non-transactional conflicting code) in multiple cores. There are some formal studies such as this or this, and the IBM people discuss the verification of their Blue Gene TM system in this paper (paywalled).
As some others commented before, the nature of the "bug" has not been disclosed. However, since it seems to be easy to reproduce systematically, I would expect it to be related to incorrect speculative data handling in a single transaction (or something similar), rather than races between multiple transactions.
Regarding the alternatives, Intel cannot simply remove these instructions opcodes because previous code would fail. I assume that the patch will make all hardware transactions fail on startup, with an specific error (EAX bit 1 indicates if the transaction can succeed on a retry; setting this flag to 0 should trigger a software transaction). In such case, execution continues at the fallback routine indicated in the XBEGIN instruction, which should begin a software transaction. Effectively, this will be similar to a software TM (STM) with additional overheads (starting the hardware transaction and aborting it; detecting conflicts with nonexistent hardware transactions) that would make it slower than a pure STM implementation.
They are right doing so. There are letters in different alphabets whose typing is very very similar -- or in fact they are written exactly the same, depending on the font used.
This can be exploited for interesting uses. For example, "E" and "ÃZ"** are respectively the latin "e" and the greek "epsilon" vowels, but they are indistinguishable in caps, at least in Arial font. The second one is the UTF 395 code. My name has an "E" on it, and for my email signature I spell my name using the traditional latin letter from the keyboard when the email is important and should be archived. By contrast, when the email is mostly irrelevant for future use (such as meeting arrangement emails, which are useless after the meeting takes place) I spell my name using the Greek epsilon letter (hint: 395 followed by Alt+X in most Windows programs). There is no obvious difference for the receiver, but a search tool can be used to quickly find all sent emails which can be deleted safely.
While the previous is a somehow "legit" use, in general any word which combines letters from different alphabets could be used to confuse an trick the receiver, for example by creating an email account which reads exactly the same as the one from another person. There is a nice image of 5 letters a-b-c-d-e in different alphabets in the linked post. I agree with Google in preventing such combinations for email accounts. It would be interesting to know the exact policy used to forbid account names, which is not detailed.
** At the time of writing, these two letters look exactly the same. Classic Slashdot lacks Unicode support and does not represent the greek Unicode letter from my comment. I tried logging into Slashdot Beta (first time, I swear it!!) and it seems to represent a different letter... Please try this on your own computer!
Note that gigawatts are power units; gigawattshour are energy units and gigawatts per hour is wrong and misleading. I would expect that the editor would correct such basic mistakes, even tough they come from the linked article.
Some knowledge about multicore cache coherence here. You are completely right, Slashdot's summary does not introduce any novel idea. In fact, a cache-coherent mesh-based multicore system with one router associated to each core was presented on the market years ago by a startup from MIT, Tilera. Also, the article claims that today's cores are connected by a single shared bus -- that's far outdated, since most processors today employ some form of switched communication (an arbitrated ring, a single crossbar, a mesh of routers, etc).
What the actual ISCA paper presents is a novel mechanism to guarantee total ordering on a distributed network. Essentially, when your network is distributed (i.e., not a single shared bus, basically most current on-chip network) there are several problems with guaranteeing ordering: i) it is really hard to provide a global ordering of messages (like a bus) without making all messages cross a single centralized point which becomes a bottleneck, and ii) if you employ adaptive routing, it is impossible to provide point-to-point ordering of messages.
Coherence messages are divided in different classes in order to prevent deadlock. Depending on the coherence protocol implementation, messages of certain classes need to be delivered in order between the same pair of endpoints, and for this, some of the virtual networks can require static routing (e.g. Dimension-Ordered Routing in a mesh). Note a "virtual network" is a subset of the network resources which is used by the different classes of coherence messages to prevent deadlock. This is a remedy for the second problem. However, a network that provided global ordering would allow for potentially huge simplifications of the coherence mechanisms, since many races would disappear (the devil is in the details), and a snoopy mechanism would be possible -- as they implement. Additionally, this might also impact the consistency model. In fact, their model implements sequential consistency, which is the most restrictive -- yet simple to reason about -- consistency model.
Disclaimer: I am not affiliated with their research group, and in fact, I have not read the paper in detail.
Just curiosity... The bandwidth required to do this should be enormous, how did they implement it? Are the trunk switches compromised and they locally record every conversation, and later send it to the USA? Did they install dedicated fibers to do this? TFA lacks any details.
Invent anything you want to sell, name it after bitcoin and you receive free advertisement in Slashdot front page!
This post is incomplete because of problems with html markings. Please see the complete post below (and mod this one down!)
(sorry for the duplicated posting; the previous one was cut because of problems with the html marks)
In order to obtain a 90% reduction in the energy bill, cooling must account for 90% of the power of the DC. This implies a PUE >= 10. As a reference, 5 years ago virtually any DC had a PUE lower than 3. Nowadays, PUE lower than 1.15 can be obtained easily. As a referecence, Facebook publishes the instantaneous PUE of one of its DC in Prineville, which at the moment is 1.05. This implies that any savings in cooling would reduce the bill, at much, in a factor of 1.05 (1/1.05 = 0.9523).
On the other hand, I believe that this is not the first commertial offer for a liquid-cooled server, Intel was already considering two years ago, and the idea has been discussed in other forums for several years. I can't remember right now which company that was actually selling these solutions, but I believe it was already in the market.
In order to obtain a 90% reduction in the energy bill, cooling must account for 90% of the power of the DC. This implies a PUE >= 10. As a reference, 5 years ago virtually any DC had a PUE instantaneous PUE of one of its DC in Prineville, which at the moment is 1.05. This implies that any savings in cooling would reduce the bill, at much, in a factor of 1.05 (1/1.05 = 0.9523).
On the other hand, I believe that this is not the first commertial offer for a liquid-cooled server, Intel was already considering it two years ago, and the idea has been discussed in other forums for several years. I can't remember right now which company that was actually selling these solutions, but I believe it was already in the market.