Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Comment Re:Morse Code (Score 1) 620

Oh, wait, you didn't need to pass a test for that.

I'm just trying to think how that would have been possible. I think back then there was a medical exception you could plead for. I didn't. I passed the 20 WPM test fair and square and got K6BP as a vanity call, long before there was any way to get that call without passing a 20 WPM test.

Unfortunately, ARRL did fight to keep those code speeds in place, and to keep code requirements, for the last several decades that I know of and probably continuously since 1936. Of course there was all of the regulation around incentive licensing, where code speeds were given a primary role. Just a few years ago, they sent Rod Stafford to the final IARU meeting on the code issue with one mission: preventing an international vote for removal of S25.5 . They lost.

I am not blaming this on ARRL staff and officers. Many of them have privately told me of their support, including some directors and their First VP, now SK. It's the membership that has been the problem.

I am having a lot of trouble believing the government agency and NGO thing, as well. I talked with some corporate emergency managers as part of my opposition to the encryption proceeding (we won that too, by the way, and I dragged an unwilling ARRL, who had said they would not comment, into the fight). Big hospitals, etc.

What I got from the corporate folks was that their management was resistant to using Radio Amateurs regardless of what the law was. Not that they were chomping at the bit waiting to be able to carry HIPAA-protected emergency information via encrypted Amateur radio. Indeed, if you read the encryption proceeding, public agencies and corporations hardly commented at all. That point was made very clearly in FCC's statement - the agencies that were theorized by Amateurs to want encryption didn't show any interest in the proceeding.

So, I am having trouble believing that the federal agency and NGO thing is real because of that.

Comment Re:Commission (Score 1) 634

Google routinely contacts everyone who has been through their hiring process before. I applied when I was a PhD student and was rejected, but started getting calls from them after 6 months and got them every six months after that. When I was a bit bored, I let them interview me again (free trip to Paris to visit friends, not California for me, and since I stayed with friends instead of in a hotel they paid for a nice meal out to thank my friends rather than a nice hotel room). I turned them down that time, but they still call me every few months. Saying yes on those calls is basically the same as reapplying - it just sticks you into step 1 of the hiring process, they still then want you to send them an up-to-date CV and other things.

Comment Re:The 19 year old is a lunatic (Score 1) 150

At a single core, we have a 128KB multibanked scratchpad memory, which you can think of as just like an L1 cache but smaller and lower latency. We have one cycle latency for a load/store from your registers to or from the scratchpad

Note that a single-cycle latency for L1 is not that uncommon in in-order pipelines - the Cortex A7, for example, has single-cycle access to L1.

That scratchpad is physically addressed, and does not have a bunch of extra (and in our opinion, wasted) logic to handle address translations,

The usual trick for this is to arrange your cache lines such that your L1 is virtually indexed and physically tagged, which means that you only need the TLB lookup (which can come from a micro-TLB) on the response. If you look at the cache design on the Cortex A72, it does a few more tricks that let you get roughly the same power as a direct-mapped L1 (which has very similar power to a scratchpad) from an associative L1.

If the address requested by a core is not in its own scratchpad's range, it goes to the router and hops on the NoC until it gets there... with a one cycle latency per hop

To get that latency, it sounds like you're using the NoC topology that some MIT folks presented at ISCA last year. I seem to remember that it was pretty easy to come up with cases that would overload their network (propagating wavefronts of messages) and end up breaking the latency guarantees. It also sounds like you're requiring physical layout awareness from your jobs, bringing NUMA scheduling problems from the OS (where they're hard) into the compiler (where they're harder).

Building a compiler for this sounds like a fun set of research problems (if you're looking for consultants, my rates are very reasonable! Though I have a different research architecture that presents interesting compiler problems to occupy most of my time).

Oh, one more quick question: Have you looked at Loki? The lowRISC project is likely to include an implementation of those ideas and it sounds as if they have a lot in common with your design (though also a number of significant differences).

Comment Re:Linux, on the other hand... (Score 1) 405

"Acer Aspire One can't do ... compositing".

Um, yes it can.. It can also do 3D -- most of the Aspire Ones, anyway... The line started with the Intel 945GSE Express. Later, some used ATI Radeon 4225.

The AAO D270 has an Atom N2600 (or N2800)- with Intel GMA 3600/3650 (PowerVR SGX 545), and that one doesn't do Linux 3D.

So, for use with Linux, avoid the D270 (use a D257), and 3d and compositing will work just fine.

(owner of 5 of these, running Linux).

Comment Re:Morse Code (Score 1) 620

The Technican Element 3 test wasn't more difficult than the Novice Element 1 and 2 together, so Technican became the lowest license class when they stopped having to take Element 1.

The change to 13 WPM was in 1936, and was specifically to reduce the number of Amateur applicants. It was 10 WPM before that. ARRL asked for 12.5 WPM in their filing, FCC rounded the number because they felt it would be difficult to set 12.5 on the Instructograph and other equipment available for code practice at the time.

It was meant to keep otherwise-worthy hams out of the hobby. And then we let that requirement keep going for 60 years.

The Indianapolis cop episode was back in 2009. It wasn't the first time we've had intruders, and won't be the last, and if you have to reach back that long for an example, the situation can't be that bad. It had nothing to do with code rules or NGOs getting their operators licenses.

A satphone is less expensive than a trained HF operator. Iridium costs $30 per month and $0.89 per minute to call another Iridium phone. That's the over-the-counter rate. Government agencies get a better rate than that. And the phone costs $1100, again that's retail not the government rate, less than an HF rig with antenna and tower will cost any public agency to install.

You think it's a big deal to lobby against paid operators because there will be objections? How difficult do you think it was to reform the code regulations? Don't you think there were lots of opposing comments?

And you don't care about young people getting into Amateur Radio. That's non-survival thinking.

Fortunately, when the real hams go to get something done, folks like you aren't hard to fight, because you don't really do much other than whine and send in the occassional FCC comment. Do you know I even spoke in Iceland when I was lobbying against the code rules? Their IARU vote had the same power as that of the U.S., and half of the hams in the country came to see me. That's how you make real change.

Comment Re:The 19 year old is a lunatic (Score 2) 150

Prefetching in the general case is non-computable, but a lot of accesses are predictable. If the stack is in the scratchpad, then you're really only looking at heap accesses and globals for prefetching. Globals are easy to statically hint and heap variables are accessed by pointers that are reachable. It's fairly easy for each function that you might call to emit a prefetch version that doesn't do any calculation and just loads the data, then insert a call to that earlier. You don't have to get it right all of the time, you just have to get it right often enough that it's a benefit.

For prefetching vs eviction, it's a question of window size. Even with no prefetching, most programs exhibit a lot of locality of reference and so caches work pretty well without prefetching - it doesn't matter that you take a miss on the first access, because you hit on the next few dozen (and in a multithreaded chip, you just let another thread run while you wait), but if you're evicting data too early then it's a problem. A combination of LRU / LFU works well, though all of the good algorithms in this space are patented. Although issuing prefetch hints is fairly easy, the reason that most compilers don't is that there's a good chance of accidentally pushing something else out of the cache. That said, if they're targeting HPC workloads, then just running them in a trace and then using that for hinting would probably be enough for a lot of things.

I heard a nice anecdote from some friends at Apple a while ago. They found that one of their core frameworks was getting a significant slowdown on their newer chip. The eventual cause was quite surprising. In the old version, they had a branch being mispredicted, and a load speculatively executed. The correct branch target was identified quite early, so they only had a few cancelled instructions in the pipeline. About a hundred cycles later, they hit the same instruction and this time ran it correctly. With the new CPU, the initial branch was correctly predicted. This time, when they hit the load for real, it hadn't been speculatively executed and so they had to wait for a cache miss.

Also, if you're trying to create a parallel system with manual caches... good luck. Cache coherency is a pain to get right, but it's then fundamental to most modern parallel software. Implementing the shootdowns in software is going to give you a programming model that's horrible.

And finally there's the problem that doing it in software makes it serial. The main reason that we use hardware page-table walkers in modern CPUs is not that they're much better than a software TLB fill, it's that it's much easier to make them run completely asynchronously with the main pipeline. The same applies to caches.

Comment Re:The 19 year old is a lunatic (Score 1) 150

Whether he can actually produce a compiler than will insert the necessary memory fetch instructions at compile time in an efficient manner remains to be seen

That's not the hard bit of the problem. Compiler-aided prefetching is fairly well understood. The problem is the eviction. Having a good policy for when data won't be referenced in the future is hard. A simple round-robin policy on cache lines works okay, but part of the reason that modern caches are complex is that they try to have more clever eviction strategies. Even then, most of the die usage by caches is the SRAM cells - the controller logic is tiny in comparison.

Comment Re:The 19 year old is a lunatic (Score 1) 150

"Virtual Memory translation and paging are two of the worst decisions in computing history"

He's not completely wrong there. Paging is nice for operating systems isolating processes and for enabling swapping, but it's horrible to implement in hardware and it's not very useful for userland software. Conflating translation with protection means that the OS has to be on the fast path for any userland changes and means that the protection granule and translation granule have to be the same size. The TLB needs to be an associative structure that can return results in a single cycle, which makes it hard to scale. Larger pages help (though then you make the protection granule even larger), but the amount of physical memory that the TLB can cover has dropped with each successive generation since paging was first introduced into microprocessors.

"Introduction of hardware managed caching is what I consider 'The beginning of the end'"

I don't completely agree with this, but given the amount of effort that people writing high-performance code (and compilers) have to spend understanding the hardware caching policy and working around it, I'm not completely convinced that it's a win in the HPC arena - you end up spending almost as much time fighting the cache as you would working with a hardware scratchpad. I'm still a fan of single-level stores as a programmer abstraction though.

Comment Re:Not sure whats more impressive... (Score 1) 150

I'm hoping that there's a million missing there. Are you just planning on selling IP cores? When I talked to a former Intel Chief Architect a few years ago (hmm, about 10 years ago now), he was looking at creating a startup and figured that $60m was about the absolute minimum to bring something to market. From talking to colleagues on the lowRISC project and at ARM, $1-2m is just enough to produce a prototype on a modern process, but won't get you close to mass production. Do you plan on raising more money or partnering with someone else for production?

Comment Re:Not sure whats more impressive... (Score 2) 150

When it comes to being better than a GPU for applications, you have to remember GPUs have abysmal memory bandwidth (due to being limited by PCIe's 16GB/s to the CPU)

That's a somewhat odd claim. One of the reasons that computations on GPUs are fast is that they have high memory bandwidth. Being hampered by using the same DRAM as the CPU is one of the reasons that integrated GPUs perform worse. If you're writing GPU code that's doing anything other than initial setup over PCIe, then you're doing it badly wrong.

That said, GPU memory controllers tend to be highly specialised. The nVidia ones have around 20 different streaming modes for different access patterns (I think the new version has a programmable prefetcher - Intel is also adding one), but if your memory access patterns are data dependent then GPUs can suck.

after you run out of data in the relatively small memory on the GPU

Not really. If you're doing big workloads on a GPU, your overflow isn't main memory over PCIe, it's the next GPU along over a much faster interconnect. And even with PCIe, most of the latency comes from the protocol and not the physical interconnect - you can get a lot more speed out of the PCIe hardware if you don't need all of the features of the PCIe bus.

The DARPA grant is specifically for continued research and work on our development tools, which are intended to automate the unique features of our memory system. We have some papers in the works and will be talking pubicly about our *very* cool software in the next couple of months.

Where have you sent them? I'll keep an eye out.

Your mention of the Mill and running existing code well, I had a pretty good laugh

You certainly wouldn't be alone there.

stack machines are notorious for having HORRIBLE support for languages like C

That's not really true (not sure what the relevance to The Mill is though - it's not a stack machine). Algol support for stack machines became pretty good (C wasn't really popular until stack machines had largely died out, but the back end of a C compiler is not that different from the back end of an Algol compiler). The reason that stack machines died is that it's basically impossible for the hardware to extract ILP from a stack ISA. That's less of an issue if your throughput comes from thread-level parallelism. There are some experimental architectures floating around that get very good i-cache usage and solid performance from a stack-based ISA and a massive number of hardware threads.

Comment Re:Spoilers (Score 1) 70

Alien also managed the suspense well at the time by having the least-well-known actor be the survivor. Of course, if you've seen trailers for any of the later ones, then you expect her to be fine, so this didn't last much beyond the original release.

Comment Re:Beautifully put (Score 1) 250

I think we are disagreeing on facts here

Yes, you're talking about projects that I worked on and actions of people that I collaborate with and claiming that things happened very differently to how I remember them.

Yes a lot of their changes weren't merged in the end but that was because the GCC developers didn't like the direction they were going. Early on they got merged. When really matters.

Most of Apple's changes were in Objective-C (not merged), blocks (not merged) and early stuff on PowerPC autovectorisation (also mostly not merged as GCC contributors at IBM blocked the AltiVec stuff from being merged while it was a Freescale-only feature). About the only stuff that was routinely merged was the Mach-O support stuff, and that was largely useless as the corresponding binutils changes often weren't, so it couldn't be used without Apple's build of the rest of the toolchain.

As for Sony I was talking PS3 they used Apple's early work on GCC.

Sorry, I meant PS3 - the SPU stuff (where the PS3 actually got decent performance) was all new. The PPU stuff may have used a little of the Apple stuff, but very little. Apple did not do the PowerPC bring-up, that was done mostly by IBM folks (who were shipping GCC on AIX on PowerPC, and later Linux on PowerPC). Apple only did the vectorisation work because FreeScale wasn't contributing to GCC.

How does that disprove anything? The question is cooperation. Webkit clearly has lots of people working with it. That's what GPL does

Because the GPL was completely irrelevant to this. Ignoring the fact that it's LGPL, for a moment (and that every iOS device is violating the LGPL, because stuff links WebKit and doesn't allow the recipient of the code to re-link the code against their own build of WebKit), the sequence of events was:

  1. KHTML was open.
  2. Apple created a proprietary fork of KHTML.
  3. To comply with the letter of the license, Apple did code dumps of the KHTML-derived code on every binary release.
  4. No one used the Apple code - even the KHTML developers couldn't work out how to merge the changes, because they were given a big blob containing about six months of work from multiple developers with no revision history.
  5. Other companies (amusingly, Nokia was one of the leaders here) approached Apple and offered to contribute to WebKit if it were developed in the open.
  6. Apple creates a public svn repository for WebKit.
  7. WebKit becomes a successful open source project.

The LGPL was in no way responsible for WebKit becoming a successful open source project, that happened solely because external contributors with deep pockets approached Apple and offered to devote engineering manpower if Apple would collaborate with them. If the license had been BSDL, then the sequence would likely have been:

  1. KHTML was open.
  2. Apple created a proprietary fork of KHTML.
  3. Other companies approached Apple and offered to contribute to WebKit if it were developed in the open.
  4. Apple creates a public svn repository for WebKit.
  5. WebKit becomes a successful open source project.

The steps that were enforced by the license were completely irrelevant to WebKit's success.

Slashdot Top Deals

Love makes the world go 'round, with a little help from intrinsic angular momentum.

Working...