I'm not really familiar with AMBA. However, if for whatever reason AMBA does not scale, one could simply architect a system where the reconfigurable fabric interfaces with multiple AMBA islands
It wouldn't be the first architecture to use FPGAs to support cooperation of processors. The Cray XD1 is one example, it had a mix of opterons and virtex fpgas, some of which were available for compute others solely for interconnect. On a side note the intel paragon used xilinx fpgas to control the leds on the doors, back when super computers where more fun to watch.
I agree I don't think Moore's law has been focused on sequential or even single threaded performance for quite a while. I do think things are getting more interesting. Clock and voltage scaling seem to be very slow if not entirely stalled. Density and die size still seem to be scaling nicely.
I'd also like to point out the trend for lower power devices. I wonder what the trend is for balance of compute and energy is for things like laptops and cell phones. The laptop trend seems to be slow decrease in power consumption with whatever compute fits in that power budget. Cell phones seem a bit more confusing. I would expect to see battery life of smart phones increase with each new generation, but there seems to be an obsession with computational power. On the upshot, at least cell battery life doesn't seem to be getting much worse. I suppose people might actually reject a phone that fails to survive a normal day of use.
All the early MPEG 4 accelerators I saw were implemented in FPGAs. Of course much of that was encoders instead of decoders, since that is the harder problem. Now you can buy cheap mpeg 4 asic/ip core accelerators. Those are still going to be much more energy efficient than using the array of general cores on a GPU.
As for implementing GPU pipelines on FPGAs, it has been done: http://hackaday.com/2008/05/21/open-graphics-card-available-for-preorder/ I'm sure I've seen other research projects or maybe just people screwing around and implementing GPU pipelines "because we can". Its also a convenient solution for educational purposes. But no, if you want to make an efficient GPU for general use, it does not make sense to map GPU logic onto the FPGA fabric. You would loose on the order of an order of magnitude in clock speed, and doing it that way you completely toss away the positive benefits of the FPGA architecture.
I think you might have a skewed impression of how complex mpeg4 encoding and decoding is, and how much area it consumes. Also in the comparison of FPGA logic cells and "gates" in a GPU is a bit faulty. In terms of raw transistor count the largest FPGAs tend to be a little ahead. That "million" or so logic elements in a FPGA does not translate to simple logic gates or transistors. The logic cells are multiple input lookup tables that are used to evaluate arbitrary boolean functions. How many traditional gates can you replace with a single 4 input lookup table? What about an 8 input LUT? The answer does depend on the logic you are mapping, but its almost never a 1:1 mapping.
Also FPGAs do have ram, fixed logic cores (dsp blocks/multipliers, etc), and even conventional processor cores. While its true that however big the array, someone will have a problem that won't fit, you can put an awful lot on a modern FPGA.
As for your final thought about fixed silicon. Not necessarily, look at this fellow's research: http://cas.ee.ic.ac.uk/people/nachiket/ He goes into why CPUs and GPUs are slow for running SPICE circuit simulations. Despite running at a fraction of the clock speed, his FPGA implementation completes the simulations faster and consumes much less power than the CPU or GPU. True a fixed logic accelerator specifically designed to implement the algorithm would be faster, but how many special purpose fixed accelerators do you want to put on your chip? What if the implementation can benefit from dynamically adapting to the current problem? Sometimes it really is more efficient to provide reconfigurable logic and load in the best implementation you have for each problem. Dynamic hardware acceleration is likely one of the reasons intel is producing Atom-FPGA combos. There are ongoing research projects examining the benefits for mobile computing devices. Transistors are cheap, but people want to use cell phones for all sorts of strange things, and there's always something new on the horizon.
Reconfigurable logic can be virtualized to get around the area limitations. Have a look at the SCORE publications for research on that topic: http://www.seas.upenn.edu/~andre/compute_models.html
Tabula is a new FPGA company that implements time multiplexed logic to extend the effective size of a computation you can put in a given size chunk of silicon: http://www.tabula.com/ Their products are still statically scheduled and not really amenable to the full virtualization of the SCORE model, but its a real product and you can buy one today.
There's a big space between the fully spatial FPGA and the fully temporal CPU, and we've been seeing that space fill slowly over time. From the CPU side, we've seen cores handle more operations per cycle, hyperthreading, and no multi-core is the default configuration. GPUs are now composed of hundreds or thousands of execution units that are simpler than CPU cores, but more complex than the logic blocks in FPGAs.
There are problems that are best suited for each of these architectures. When you play a graphics intensive game, you expect the GPU to handle stuff its good at and the CPU to handle the bits its good at. FPGAs are just a little bit more obscure. But hybridization does make sense. That's why we've seen PowerPC and Arm cores embedded in reconfigurable fabrics, and now Intel putting FPGA cores in the same package as their dies. We're long past the point of saying that any of these are irrelevant because they are not the optimal solution for all problems.
To add to that, your comment on idle resources. We're also hitting thermal limits. Yes we can still put more and more transistors in a chip, but we can't switch them all simultaneously at full speed without frying the chip. Increasing cache size and core count helps. But if you're going to have more area than can be used simultaneously it makes sense to add different resources that handle different tasks more efficiently (energy and latency). That's part of why intel, amd and nvidia are all mixing GPU and CPU cores on die. If the atom+fpga combo works out well, I would expect to see regions on reconfigurable fabric directly on die in the not to distant future.
a spreadsheet application will deliver results a lot faster.
Not really, particularly if you have the data already entered.
Running:
R
data=read.csv("data.csv")
hist(data)
takes far less time than selecting your columns, dragging the mouse over to the graph button, selecting the region for your plot, and then trudging through a multi-stage wizard. Even if you actually want to type in some data in a spreadsheet its frequently faster to save the table and load it up in R or gnuplot to graph it. And if you do want something like a histogram or a boxplot, excel doesn't stand a chance (gnumeric at least supports boxplots).
I might accept that creating a slightly prettied up graph might be a little quicker in a gui spreadsheet. But for quick and dirty and higher quality graphs they are slower, if they work at all. Once you start encoding your style preferences in little scripts that you load before graphing, you'll find even higher quality graphs take less time than mediocre graphs from a spreadsheet. And really there's something satisfying about tweaking one line in a single file and that automatically updating the style of 20 graphs in an article.
I generally find that when plotting, if doing it once its a coin toss whether to write a script or manipulate the data and plot manually, twice and scripting definitely breaks even and of course more than that and scripting just gets more and more valuable. R (and many other environments) save your history, so that if you do decide a day later you should have just written a script, its already there, you just need to copy the commands out of the history file. In excel, well at least you learned from experience what to do that next day.
As I see it there are two reasons to graph in a spreadsheet. First if you're actually working in a spreadsheet and just want a quick look at some data (not debating the merits of that, separate discussion). Second, when you're not sure what you want and are unfamiliar with the tools available, a gui gives you something to poke at blindly with a mouse. In that second case, I think one should accept the accept the pitfalls of ignorance with an intent to learn more and improve. Stubbornly grasping your spreadsheets, knowing there's a better world out there, will only hurt you in the long run.
"Experience has proved that some people indeed know everything." -- Russell Baker