Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Hardware

Low-cost Reconfigurable Computing (FPGA's) 165

Anonymous Coward writes: "People at the at Chinese University of Hong Kong have developed a reconfigurable computing card which uses the SDRAM memory slot instead of the PCI bus. Measurements in the paper show greatly improved bandwidth and latency - why aren't more people using this idea?"
This discussion has been archived. No new comments can be posted.

Low-cost Reconfigurable Computing (FPGA's)

Comments Filter:
  • Someone want to post text, pdf, or ... something?!
    • by daniel_isaacs ( 249732 ) on Sunday November 04, 2001 @04:06PM (#2519582) Homepage
      http://danisaacs.com/temp/fccm01_pilchard.pdf

      Be gentle. And mirror and post mirrors, please. Bandwith costs, and I'm poor.
    • Here is a PDF version [msbnetworks.net] Download all you want - I'm not metered (but its also only 384kbps :) )
    • 2 Comments: (Score:2, Informative)

      by N Monkey ( 313423 )
      Comment 1 RE:Linked directly to Postscript?

      Try using ghostscript/GSView which will display the postscript file directly. (A quick search on Google gave the following link which should be useful
      GhostScript [wisc.edu]

      Comment 2: I don't think this is ever really likely to be part of 'consumer' system. FPGA's are great for
      1) prototyping circuits that will later be implemented as an ASIC where the cost of "respinning" a chip is extremely high or
      2) Situations where the system is only produced in very small numbers.

      The main problems with FPGAs are that they are
      1)Expensive!.
      2) Relatively small in terms of the gates they can implement and
      3) The clock speed that can be achieved is probably about an order of magnitude lower than an equivalent ASIC.

      For many situations a multi-CPU system may be a much better option, and I certainly think that they'd be impractical for a mass produced system.

      Simon
  • RAM-slot FPGAs (Score:4, Insightful)

    by Frothy Walrus ( 534163 ) on Sunday November 04, 2001 @04:01PM (#2519560)
    the idea of FPGA computing has been around for a little while at least (look here [io.com] for examples). i think Scientific American even wrote about "configurable computers" in 1997 or so. why aren't they more popular, then?

    modern processors are well-adapted to general computing tasks.

    FPGAs (read: custom iron) might be good for a few specialized tasks (breaking 3DES, for instance), but most of us will be a lot happier on our UltraSparcs and Athlons and G4s.
    • Re:RAM-slot FPGAs (Score:2, Interesting)

      re-configurable computing is very useful in embedded systems with somewhat limited resources and real-estate. Satellite computers, for example.

      It is very useful to have a chip to data gathering for a while, then reconfigure to do a DFT on the data, then reconfigure to spit this back to earth through telemetry.
    • wouldn't you be willing to sacrifice a RAM slot to excel your computer at a particular task? maybe fast mp3 enc/dec, genome grinding, SETI@home, Photoshop filters, etc? i mean, i don't see the Linux drivers coming around anytime soon, but that's no reason to shun this development; we have to embrace the future sometime... maybe we can even make a Linux card, and since these are _Field-Programmable_ Gate Arrays, we could even update kernels when necessary!
      • The only thing slowing Linux drivers is availability.

        I can see these devices being included under the SMP concept. And guess what? Now you can have two processors of different architectures!

        Albeit the custom processor would have to be somewhat simple depending on the size of the FPGA.
      • on the contrary, the *only* available drivers seem to be for linux. From the article:

        "A simple Linux device driver was developed which allows user mode programs to access the Pilchard hardware. Although this driver was tested only with Linux Kernel 2.2.17, ports to other operating systems and Linux versions should be trivial."

        I have always found that for research oriented stuff like this, linux is the primary development platform.
      • i don't see the Linux drivers coming around anytime soon

        did you even read the paper? One of the design goals was "uses the Linux operating system" (see page 2)

    • FPGAs (read: custom iron) might be good for a few specialized tasks (breaking 3DES, for instance), but most of us will be a lot happier on our UltraSparcs and Athlons and G4s.

      As greater amounts of processing power become available, we will start doing more complicated things with them in our homes; Imagine generalized, easy-to-use programs which do (for example) CFD tasks. Anyone could roll their own body kit for a car, or a windshield for a recumbent bicycle, et cetera. With reconfigurable computing we can make better use of our computing resources, and optimize for such tasks.

      OTOH, there's really no need for this kind of technology any more. Processing power is getting cheaper and cheaper all the time. Might as well just use it. In addition, multiprocessor support is growing, and multiprocessor systems are getting cheaper.

      • or a windshield for a recumbent bicycle

        That was my final year project in University. :-)
      • Processing power is getting cheaper and cheaper all the time.

        Yeah, and we've reached the point where you can break a motherboard in a tower case by the weight of your heatsink and fan.

        If you ask me, this would be the perfect solution for gaming consoles. A console includes one of these, and the game comes with a configuration in mind.

        Imagine decent AIs in games becoming more common, or on-the-side graphics rendering. How about easy emulation of other console architectures?

        Remember the Final Fantasy movie? The skin on the people didn't look quite right, because the movie developers hadn't figured out how to quickly calculate the light diffusion and reflection in human skin. This neat little device would have solved their problem.
        • Remember the Final Fantasy movie? The skin on the people didn't look quite right, because the movie developers hadn't figured out how to quickly calculate the light diffusion and reflection in human skin. This neat little device would have solved their problem.

          So would simply having more processing power. The additional power is coming quickly. This neat little device might or might not have solved their problem, either. It depends on if it's a matter of having to do an awful lot of things CPUs are already good at, or if it's something that is easily solved with custom logic.

          Also, you can only do so much custom logic at once. I used to work for a company which made (among other things) a chip which was used to model a very well known intel CPU in hardware, though at a fraction of the full speed. These chips had a very large die (close to the maximum size for a die at the time, actually, and this was not so long ago) and it took eight of them to model the chip. Even if you enhanced this by a factor of ten, it's probably still cheaper just to throw more CPUs at the problem, especially with the new bus standards cropping up, making multiprocessing easier and cheaper.

          • Re:RAM-slot FPGAs (Score:2, Interesting)

            by mmol_6453 ( 231450 )
            Basically what they needed to do was an extension of ray-tracing. This device would have gone a long way in making it faster.

            Picture all the algorithms required to ray-trace one pixel. Now picture all those algorithms made into one small portion of an FPGA. Now picture many many such portions occupying a single FPGA.

            Suddenly you have a device that encompasses your entire process, and can execute that process much faster than a normal CPU can. To give you an idea of the scale involved, picture a thousand CPU instructions compressed into one clock cycle.

            Kinda cool, huh?
            • The problem with using FPGA's for ray-tracing is that, when you fire a ray into the scene, you don't know what it's going to hit. If it hits something, then typically you examine illumination to that spot from all the light sources, and spawn another ray if you're doing reflection, transmission, or diffuse interreflection.

              The problem is, you can't parellel this easily:

              for each pixel in image
              for each primative in scene
              if ray through pixel hits primative
              for each light
              if ray from light to hit doesn't hit something else first, calculate illumination
              calculate reflected color based on material
              write color to image

              so let's say we equip each node of an FPGA with a program that evaluates this program. If we have 1,000 of these nodes, that means we can like, render 1000 pixels at the same time right?

              wrong. The scene is going to be far to large to store in each FPGA. So, each FPGA node is going to have to wander down the list of primatives in ram to do it's intersection tests. That is not fast.

              Now sure, you can set things up so that all the nodes are listening to one broadcast bus, and all the primatives in the stream are listed off, and any nodes find a hit, they remember it. After that you let list the light sources, letting nodes calculate illumination at the hit, then let them process the material. Most likely they have to do some texture lookups here.

              So sure, that's a way of reshuffling the loop order and doing a lot of tests in parellel, but the real truth is, if you use some sort of spatial hierarchy on a general purpose cpu, it will be much faster.

              Traditional beowulf clusters are typically much better for this sort of thing, because they usually can store an signifigant portion of the scene discription locally, so there's no communication overhead that limits the parellelism.

              The deal with Final Fantasy is they didn't take into account subsurface scattering. Only recently have good models for that surfaced, and the computation time is prehibitive.
        • Nvidia and ATI have already come out with programmable subunits (vertex and pixel shaders). The developer can write small programs that are applied to each vertex/pixel that comes through the pipe. This is not quite as flexible as a full-blown FPGA, but is a more likely direction for future consoles because it is an easier platform to learn. The XBox already has the Nvidia part with these features.
    • Re:RAM-slot FPGAs (Score:4, Informative)

      by jbuhler ( 489 ) on Sunday November 04, 2001 @05:21PM (#2519873) Homepage
      > modern processors are well-adapted to general computing tasks.

      Rather, "modern processors are well-adapted to general *serial* computing tasks." If you have a computation with an embarrassing amount of low-level parallelism (e.g. applying a filter to an image), you can either hope that streaming SIMD will come to your rescue, or you can burn an FPGA with an embarrassing number of parallel computation paths that implements the desired function. The FPGA would already win in many real-world computations, were it not for the fact that it's limited by the cost of getting the data on and off the chip over a slow data bus.
      • well, modern processors are well adapted in that they have lots of tricks to get things done quickly. they exploit effects of the physical silicon they are composed of. meanwhile, fpgas are only able to emulate that, at a speed cost. an example is the cmos 2 collecter and gate, which is essentially composed of only one transistor, allowing it to operate very quickly, versus the fpga and gate, which uses definatly >2 transistors.

        making them embarassingly parallel would simply _kill_ serial processors speed-wise, but only where the process is parallel to begin with. in theory, you could make an equivalently parallel cmos (non-reconfigurable) processor, and jack the clock up much higher than an fpga would tolerate.
    • I still have some papers from Heriot-Watt university somewhere regarding reconfigurable supercomputers from the late 1980s, the most interesting apps I have seen todate are GAs (genetic algorithms) implemented with FPGAs. Using the memory subsystem is certainly a good way of interfacing them, one thing I am surprised about is that I havent seen the same thing done with Content Addressable memories, if anyone has seen anything like that I'd like to hear about it.
    • Many people seems to be reading this the wrong way. They should not be used to replace the CPUs, but rather be used as another addon unit (like the FPU). Special apps written to support an FPGA unit would reconfigure it, and then use it for their specific tasks.

      So think FPGA co-processor, not CPU replacement.

      • Spot on, think about embedded computing apps not desktops.

        Usually plugging away at one task, but occasionally need to do something different, given the choice of increaseing board space, having hardware powered when not needed, installing a faster more powerful micro just to do that one intermittent task using the FPGA as a co pro or even as the processor (Plenty of IP cores around that can run embedded apps.) can make a lot of sense.

        For example how about encryption? gather data over weeks working as a slow dumb 4 or 8 bit micro, get ready to send, switch fpga from data gather and compress functions to highly parallel encryption, send data securely, back to gathering in serial processing type mode.
        Not a real app yet AFIK, just thinking
    • Re:RAM-slot FPGAs (Score:3, Insightful)

      by Nindalf ( 526257 )
      modern processors are well-adapted to general computing tasks.

      This is a completely meaningless statement, because there's no such thing as a general computing task. Today's popular uses for computers developed as a result of the hardware's capabilities (which influenced the hardware's design, in an evolutionary feedback loop). We are only beginning to explore the uses of digital microcircuitry.

      Modern processors and modern programming methods are well adapted to each other, so one should expect that unorthodox hardware would be difficult to program and give poor results. We just don't have the experience for it.

      However, it becomes increasingly harder to get a consistent return on larger and larger surface-element counts with serial execution programming. Random memory accesses and conditional branching are discouraged in favor of "predictable" memory accesses and instruction execution, and greater and greater sacrifices of the illusion of serial execution are made in favor of efficiency. The advantages of parallelism grow as the chips grow, and reconfigurability at the level of the gate logic is the natural extreme we will likely tend toward as we figure out how to handle trillions of transistors in one device.

      Can you really imagine current design trends extrapolated to instruction pipelines millions deep? Serial execution does not scale infinitely.
    • I think the most fascinating part of this article is by far the use of dimm slots for an fpga. I think many people missed this point entirely and instead simply focused on the applications of fpgas. Take a moment and re-read the first page of the paper...then think about it...
  • Catalina Research offers a card line called the "Chameleon" based on Virtix-E series FPGAs. This allows the card to provide millions of reconfigurable system gates that the use can apply. Connected to the FPGAs are independent blocks of ZBT static RAM which can support a single read or write operation per cycle. The Chameleon products have three high-speed IO daughtercard sites compatible with the industry PMC standard.

    Chameleon VME Block Diagram [catalinaresearch.com]
  • Yeah but.... (Score:1, Insightful)

    by Britano ( 183479 )
    If this were so cool and such an awesome way to do computing, then why do we even have the PCI standard? They should make motherboards with 6 SDRAM slots instead of 6 PCI slots. They would help out SETI@Home!
  • by Anonymous Coward
    StarBridge Systems [starbridgesystems.com] uses these to make their supercomputers that run in Standard ATX case... And I think they charge around $15 million for the low end models. These info came from a slashdot story but I can remember when and I don't have time to verify. I just remember CmdrTaco being very sceptical.
    • I remember reading about star bridge a while ago, the general consensus on slashdot was that it was probably a scam, or at lest an operation with a dim chance of success. The benchmarks they touted did things like compare 4-bit integer adds (on their machine) to full blown 64bit operations on IBM iron.

      Like I said, most people didn't buy into it, but the company is still around a year or so later, so who knows. Maybe they are selling some systems. Either way, they certainly aren't making much of an impact on anything in the computing world. Let's not forget that $15 million would get you a fuck of a lot of conventional CPU as well. It's enough to buy 10,000 high-end PCs and network 'em together (resulting in about the biggest Beowulf cluster in existence). And that wouldn't require any new programming technology.

      I think these guys also claimed that they'd sell these things for $3-4k...
  • by fudboy ( 199618 ) on Sunday November 04, 2001 @04:07PM (#2519584) Homepage Journal
    I have also wondered why more people aren't using the memory bus for peripherals. For instance, the VGA adaptor would greatly benefit from that interface (3d work, video games), also, using that bus as a network connection in a renderfarm would probably be nice too. Seriously, the PCI buss can only offer so much (132 MB/S) which is certainly going to be a problem with anything faster than gigabit ethernet... Meanwhile, modern memory busses are upwards of 4.8Gb/s. Imagine multiple machines strung together with that kind of bandwidth between them!

    Another question I've had bouncing around in the back of my head is why no one uses MPEG decoder circuitry for MP3 playback? All the players I've tried, windows or linux, take 10-30% of the CPU for noraml playback operation. This is unacceptable when working in big apps like 3DStudio Max, make-ing a big app or running big scripts. I have an old MPEG decoder card from a Creative DVD, also I believe my G-Force has MPEG decoder acceleration... How much trouble would it be to write a driver for Winamp that uses preferred devices like that?
    • by Christian Smith ( 3497 ) on Sunday November 04, 2001 @04:26PM (#2519668) Homepage

      I have also wondered why more people aren't using the memory bus for peripherals....Seriously, the PCI buss can only offer so much (132 MB/S) which is certainly going to be a problem with anything faster than gigabit ethernet


      Because the memory bus is a memory bus, and NOT a peripheral bus! Peripheral busses have things like interrupts, address space configuration, buffering, bridging, hot-plugging, and long-term stability that memory busses are simply not designed for.

      How would you like it if you couldn't use the latest whizz bang 8.4GB/s memory technology because some peripheral you bought a year earlier needs to be on a 4.8GB/s memory interface?

      Anyway, PCI v2.2 (?) offers 512MB/s in 64 bit 66MHz mode. And then there's PCI-X...

      And show me a game that is PCI/AGP bandwidth limited once textures are uploaded to the GXF card anyway. Memory is cheap, use it...


      Meanwhile, modern memory busses are upwards of 4.8Gb/s. Imagine multiple machines strung together with that kind of bandwidth between them!

      Unfortunately, those pesky laws of physics (like the speed of light) come in and put paid to schemes like this. While it may be possible to get that bandwidth between machines, the latency becomes a problem. Certainly not feasable as a memory bus.
        • Because the memory bus is a memory bus, and NOT a peripheral bus! Peripheral busses have things like interrupts, address space configuration, buffering, bridging, hot-plugging, and long-term stability that memory busses are simply not designed for.

          How would you like it if you couldn't use the latest whizz bang 8.4GB/s memory technology because some peripheral you bought a year earlier needs to be on a 4.8GB/s memory interface?

        it's all in the controller. Perhaps abandoning the other busses and rigging up the interrupts for a single bus would be best? Also, it seems that having several memory busses would solve the problems of speed dependencies. multiplexing the south bridge to 2 or 4 spereate channels should do it.
        • Anyway, PCI v2.2 (?) offers 512MB/s in 64 bit 66MHz mode. And then there's PCI-X...

          And show me a game that is PCI/AGP bandwidth limited once textures are uploaded to the GXF card anyway. Memory is cheap, use it...


        The slowdowns in heavy geometry transforms are still a bottleneck coming from the processor, even with h T&L. and who's to say game programming techniques wouldn't take off with so much more flexible pathways to design against?
        • Unfortunately, those pesky laws of physics (like the speed of light) come in and put paid to schemes like this. While it may be possible to get that bandwidth between machines, the latency becomes a problem. Certainly not feasable as a memory bus.

        Speaking of the 'speed of light', you could use actual fiber optic network cables much nearer their capacity with a bus that fast dumping straight into the RAM, cutting out several steps (which is where the latency comes from in the first place) along the way. This would make clustered systems fly, and open up altogether new techniques as well.
        • it's all in the controller. Perhaps abandoning the other busses and rigging up the interrupts for a single bus would be best? Also, it seems that having several memory busses would solve the problems of speed dependencies. multiplexing the south bridge to 2 or 4 seriate channels should do it.

          I don't know if you realize this or not, but that's exactly the way PCs currently work, except the busses are different for different things. PCI bus, memory bus, CPU/Cache bus, ISA bus, IDE bus, etc, etc, etc. Making all of the busses use the same interface would be insane, what's the point of having a 4.8gb/sec modem port? And with the huge memory caches on video cards these days (32-64megs) You don't need all that much bandwidth(but AGP4x provides plenty).

          Modern PCs use different busses for different reasons, there's a lot more to consider then pure speed.

          Speaking of the 'speed of light', you could use actual fiber optic network cables much nearer their capacity with a bus that fast dumping straight into the RAM, cutting out several steps (which is where the latency comes from in the first place) along the way. This would make clustered systems fly, and open up altogether new techniques as well.

          Fiber optic memory?? Ai-ya! First of all, wire is about 66% as fast as the speed of light, and secondly, even then you won't overcome the lag. Current ram technology has lag measured in nanoseconds. And, that lag needs to be absolutely constant.

          A simpler design isn't always the best design.
      • Not that I'm recommending using the memory bus for 3d, but...

        PCI/AGP are great for uploading static textures, on that you are correct.

        However, there's more data than that to saturate the bus:
        * Procedural textures
        * Vertex cloud animation (bones aren't always appropriate!)
        * Swapping textures when insufficient video ram is available.

        Any of these can cause bus saturation. While many games are following the Half-Life model (static everything, use matrix driven hierarchical bones animation), this creates pretty bland worlds.

        If you want to realistic water, more organically animated content, or more subtle animations, this bandwidth becomes critical. Vertex/pixel shaders regain some of this by allowing processing to be moved back into the 3D GPU, but that only works for inherently procedural and low order polynomial effects - data driven or more complex procedurals still need to upload obscene amounts of data!

        I should also mention that accelerator card drivers are optimizing pipelines for static textures, Unreal ran into this problem badly, and it continues to this day.
    • by Anonymous Coward
      The memory bus is very poorly suited for stringing anything together, there are very strict assumptions on tracelengths from the connector to the memory chip and more such restrictions to be able to get the high bandwith. If you have something you can connect to it locally like these FPGA's its just about possible, but trying it with the highest speed memory standards would be a formidable task.

      Something like Hypertransport is a lot more suited for high bandwith clustering, unfortunately AMD has not designed a port for it ... its only for backplane use. Parallel unidirectional LVDS connections with forwarded clocks are the most balanced solution to high bandwith interconnects, and its easy to use over cable's (if you can solve the latency mismatch problems, which is possible with tapped delay lines). Intels serial stuff is just plain icky, high latency and expensive silicon.

      But the forces that be have always resisted a cheap high bandwith non local interconnect, SCI has been kept down by the man ... and although Hypertransport is alike in many way's for some reason there isnt a specification for cable connections in the works.

      The industry does not want us to have cheap clusters with the same interconnect bandwith as the ultra expensive heavy iron, there is too much money at stake ...
    • The mpeg audio layer uses different transforms than the video layer. Even the mpeg decoder cards often have software audio decode (which usually isn't mpeg audio anyway, but AC-3).

      I believe that some cards from Phillips claimed MP3 acceleration, and there is no reason why the Soundblaster Live chipset couldn't be programmed to do the same. (Phillips, iirc, doesn't support linux, btw)
    • You seriously need to look at your choice of mp3 plaer. I only use 1-3% on my celeron 300 with winamp or xmms.
    • My Celeron 300 only used 1-2% in WinAmp .. now that I have a 900, it's going to be even less. Perhaps you either need a CPU upgrade, or a better player
    • by Waffle Iron ( 339739 ) on Sunday November 04, 2001 @05:25PM (#2519887)
      I have also wondered why more people aren't using the memory bus for peripherals.

      Been there, done that. Most PCs prior to the 386 models used the ISA bus for both peripherals and memory. The buses where separated out in modern PCs for a reason: the laws of physics. At today's speeds, a memory bus can't be more than an inch or two in length. If you use your one free memory slot for I/O, you have no more memory expansion capability.

    • Another question I've had bouncing around in the back of my head is why no one uses MPEG decoder circuitry for MP3 playback? All the players I've tried, windows or linux, take 10-30% of the CPU for noraml playback operation. This is unacceptable when working in big apps like 3DStudio Max, make-ing a big app or running big scripts.



      I'm curious, what hardware are you rendering on, where the cpu usage for decoding an mp3 stream takes up 10-30% of your cpu? Running winamp with the mini-vis set to 70fps, and checking task manager in NT4Wks reports that winamp uses 0-1% of my cpu. This can actually be taken to read 1-2%, since I am running a dual-processor pentium 2 at 233mhz, with 256mb of 60ns ram, an ISA sb64, and an old pci TNT using old detonator drivers, since the new detonators break avi playback for me.


      As an alternative test, I fired up MXaudio on my SGI Indigo, which has a 100mhz r4000 cpu and ELAN graphics, and to decode a 256kilobit mp3 stream, it takes 35% of the cpu. (Not bad considering the age of the machine)


      Although I am a sick bastard and raytrace images on the Indigo and my 486sx laptop, I hope you have a slightly more powerful machine for 3DStudio. Perhaps the huge amount of cpu usage for your mp3 player is due to poorly written sound card drivers? I would seriously look into this.

    • by pacc ( 163090 )
      It seems like the Nforce chipset from Nvidia is working somewhat like that. The graphics controller is integrated into the chipset and uses the internal memory, but since you have dual memory buses (and current AMD processors don't need more than one 266 MHz bus) the chipset can have the same bandwith to the system memory as the processor.

      This only makes the point that the processor should be able to use the bandwith better and that the 8xAGP bus the chipset is getting is 16 times faster than the PCI bus you are referring to.

      Lets hope that the next generation point to point databuses is open enought to make adding an extra co-processor as easy as adding more storage.
    • For instance, the VGA adaptor would greatly benefit from that interface (3d work, video games),

      Um... just what do you think AGP is?
  • by mj6798 ( 514047 ) on Sunday November 04, 2001 @04:08PM (#2519587)
    This may help a little, but in general, people haven't figured out how to make FPGA-based computing sufficiently useful, cheap, and easy in order for it to catch on. Programming an FPGA is still rather hard and the architecture limits severely what you can do. And there is the chicken-and-egg problem with the boards: if you write software for them, few people can run it, and few people are motivated to buy a board because there is no software that uses it. Right now, you are probably a lot better off buying a dual processor board or a cluster than an FPGA add-on.
    • With no prior knowledge of verilog, I followed Jan Gray's articles in Circuit Cellar and extended the design on my own. Verilog is superficially similar to 'C', so long as you remember that each "function" will operate in parallel.

      Boards are cheap (approx $110 US for a 200k gate chip that could easily hold 4 processors and a lot more).

      My own direction is interfacing stuff to my own processor that is based heavily on Jan's design. It's for purely personal use, and saying that you are running code (assembly language only :-) on your own processor rates up there, as far as geekdom is concerned. At least to me, but maybe I'm biased :-))

      Jan's site is at fpgacpu.org [fpgacpu.org] if you're interested. There are lots of details about all sorts of issues on the site. Some technical, some not so technical. Have a look under GR CPU's or XSOC :-)

      Simon.
    • .. I feel this is a great idea!

      Come on, if FPGA CPUs catch on in a big way, I can start writing my own paychecks and stop designing these boring circuit boards, just concentrate on Logic design.

      Furthermore, FPGA is embrassingly *slow* technology. ASIC is the real custom/screaming fast stuff. In embedded envrionment, you usually use FPGA for slow custom computational stuff, if you need 1-clock-cycle-latency or 6GB/s bandwith or anything like that, you need an ASIC. Just look at those motherboard chipsets, they're not FPGAs, no way..

      From R&D point-of-view, FPGA is really nice to work with, tho. You can play around with it to your heart's content until you get it right and there's nothing to stop you from burning/flashing/uploading a new logic code every day if you want to. There are tools for FPGA-on-ASIC which gives you the capability to tape out that groovy logic design into mass-marked ASIC once you're sufficiently sure you got it right. Naturally, it's as slow as the FPGA would be, since otherwise it'd screw up your logic timings..
    • If you can write in C type languages you can design on FPGAs.
      A Handel-C (C with extensions for parallelism and timming models etc.) to FPGA, development environment called DK1 was released by a UK company Celoxica [celoxica.com]earlier this year.
      There's an eval download for DK1 on their site.

  • by baptiste ( 256004 ) <mike AT baptiste DOT us> on Sunday November 04, 2001 @04:09PM (#2519592) Homepage Journal
    Here is a PDF version [msbnetworks.net]
  • PDF AVAILABLE (Score:1, Redundant)

    by mduell ( 72367 )
    PDF (thanks to ps2pdf.com) available at http://homer.artificialcheese.com/fccm01_pilchard. pdf (I'm not putting an HTML link in for a reason, I don't want everyont to get it from me)
    PLEASE MIRROR! I dont have nearly enough bandwidth to withstand the /. effect!
    • PDF (thanks to ps2pdf.com) available at http://homer.artificialcheese.com/fccm01_pilchard. pdf (I'm not putting an HTML link in for a reason, I don't want everyont to get it from me)
      PLEASE MIRROR! I dont have nearly enough bandwidth to withstand the /. effect!

      Done - mirror here [cam.ac.uk] Should be enough bandwidth - couple of megabits available - if not, I'll move it to a bigger box next door...

    • Would whomever decided to waste a mod point rating this guy's post redundant please look at the time stamps on the earlier posts? It's entirely reasonable that the other links were posted while this guy was still writing his up.

      Try to be a bit less anxious to punish.
      • It may not be his fault, but it's still redundant. Moderation is to improve the reader's experience, and we don't need to see 10 different comments with links to identical PDF files.
        • Moderation is to improve the reader's experience, and we don't need to see 10 different comments with links to identical PDF files.

          Yes, that sounds acceptable. Just hate to see someone lose karma for trying to help out.

  • Speed and gates... (Score:4, Insightful)

    by tcc ( 140386 ) on Sunday November 04, 2001 @04:12PM (#2519612) Homepage Journal
    FPGA technology to replace (or more like having a "flashable") Current processors could/would be a great leap in computing, it would mean having a "soft-hardware upgrade", microcode or "sillicon" bugs could be addressed, but there would probably be the downside of everything else in the computing industry: companies would released bugged stuff, beta would go around like current drivers :), etc etc.

    All this said, unless some big breakthrough happens, we won't see out Athlon or Pentium IV system replaced by these, the 2 main limitation of FPGA are the number of available gates, and the speed at which they operate.

    While they've managed to increase the number of gates to something quite big (last time I read about this I think it was in the low million? 1 or 2, but I can't be sure), this is enough to "emulate" microcontrollers or lower end processors, but not enough for higher end microprocessors. While eventually they will catch up and maybe someone will do his thesis on emulating an Athlon off FPGA stuff, by that time we'll be at the 2nd or 3rd rev of Post-hammer processors, so it will look like today being able to emulate a 486 (granted, there could be some use in that, but none come to mind right now.. parrallel processing? 1 athlon can replace zillion of 486s...) Also the developpement of microprocessor is going at a faster pace than FPGA technology. I am not saying this couldn't happen, but it would need a serious bump in the fab process and technology to be able to reach Ghz speed, and probably few 100M's of gates.

    Still, it's a very interresting technology.
    • Believe it or not, they are used for CPU design. The folks at ZWorld designed their [zworld.com] Rabbit CPU [rabbitsemiconductor.com] architecture using FPGAs and then created the chip from that design (vs the usual prototyping on silicon over and over). Its not uncommon. Now using FPGAs in realworld 32-bit CPU scenarios for Windows is another thing :)
      • Using FPGA for prototyping is definetly a good thing, and is one of the areas where you can get hardware with FPGA to go in a PC.

        You can get pci cards with FPGAs that interface to a digital simulator (like modelsim or quicksim). These are rather nice since they shorten simulation times hundredfold.

        As for a reconfigurable device in a houshold device, they will certanly not be used as microprocessors, that would be a criminal waste!

        Rather you would implement the time critical part of your algorithm in synthsizable code (rtl code) an dump that to the FPGA. There would not really be any need to send programmable circuits to such devices, you allready got one of those, your CPU.
    • by Suslik ( 59646 )

      FPGAs are great in prototyping when you want to produce a relatively small number of devices in a relatively short time. Problem is, a conventional microprocessor is always going to beat an FPGA at a comparable VLSI design level in terms of flexibility, and an ASIC will draw less power and perform a specific task more quickly. Limiting factor for an ASIC is cost (minimum production runs often in thousands) and design-production latency.

      If you want to emulate an Athlon, use an Athlon. :-)

      Stuff like high-volume encryption (e.g. Rijndael) is well suited to FPGAs because you've got a shedload of data coming in for a relatively simple series of calculations. Note that the AES contestants were evaluated partly on their ease of implementation in FPGA-like devices.

      This SDRAM link might be a useful thing to increase bandwidth between a conventional microproc/bus/memory system and multiple FPGAs, bumping up performance by a factor of 2-3 maybe (on bandwidth-limited computations), but it won't change the world. IMHO, obviously.

      Check out my review of the FPL 2000 [suslik.org] and FPGA 2001 [suslik.org] conferences for summaries of the current research in FPGAs.

      Adrian

  • Configurable logic boards have been around for a while, the main problem is cost. Last I heard, you could get one that could be configured into a 386 for about $2000 but that was a few years ago. It would be great if they could do stuff like program the chip into a divx decoder when you play a divx or an processor dedicated to executeing the 3d graphics engine when playing a game.
    • For all intent and purpose new graphics and sounds cards are programmable to do specific operations. The gate logic doesn't change -- it's a microcode fix. Ends up being basically the same thing in the end for them as long as you don't try to coax them to do other things (like analog logic) they'll be just as good.
      • In normal CPU design, you have to take into account the distances between various parts of the CPU. The greater the distance, the slower (and hotter) the CPU.

        That's just fine, when you can control exactly where the transister goes on the die, but FPGAs throw a curveball in:

        The components in an FPGA are, IIRC, arrange in one big, massive grid. While it's still possible, controlling what a transister does by location is going to be much more difficult and time-consuming, in the development process.

        Don't forget that one FPGA is different from another, so you can use the same 'ROM image' for different hardware. That's going to impact portability and development time.

        Finally, don't forget that no matter how encrypted or secure the ROM image is before it gets flashed, it still has to be put into the hardware raw. Just build a virtual machine to intersept the flash data, and viola! You now have your (or your competitor's) CPU layout in a semireadable format. Now to run it through an FPGA emulator...

        When (not if) this all comes about, you'll probably have hackers trying to tweak the trace lengths in their CPUs.

        What Linux is to operating system kernels, a future hacker group will be to CPUs.
        • I agree that it is difficult to control where components get placed in an FPGA. But with the sizes of these things getting so big, it is hard to make a design that you will be hand placing and routing. Usually, we let the tools do this for us and just concentrate on the HDL.

          Yes, portability is a big issue, but at least there is the hope of porting between similar architectures (such as between a Xilinx 4000-series and the Virtex series). In the case of ASICs, well, what does portability really mean? It's completely fabrication dependent.

          Finally, regarding security of ROM images: I know that Xilinx keeps the format and interpretation of the bitstream proprietary and confidential. This doesn't mean that it is impossible to figure out, just more difficult than inserting a VM and "voila!".

          I for one welcome the chance to re-design the processor in my computer :>.
  • I wasn't able to read what the post-script file said (nor do I have a pdf reader on this system), but from the description on the slashdot posting it almost sounds like what the Panda Project team was trying to do. Maybe someone that was able to read the article could tell me if this "new" idea is similar to the Panda Project.
  • by Skapare ( 16644 ) on Sunday November 04, 2001 @04:19PM (#2519640) Homepage

    Using memory slots for devices is a bad idea. The interface is not designed for devices. There are no IRQ lines. The address space can be configured by the chipset to fall anywhere in the address space of the whole machine (your device may end up starting at 0). The address space may even be interleaved with other memory devices in other slots. And the next generation of memory will use a whole different interface, and most new motherboards will soon migrate to it with little concern for backward compatibility.

    • The interface is not designed for devices.

      Perhaps you didn't read the article so carefully, but they seem to have overcome some of the difficulties, and they also aren't purporting this as a general solution to all computing woes ever. This device is a prototype and it currently is only setup to work on one motherboard type. What this does demonstrate is that for some applications (such as cryptography) this can be useful. The article specifically states that it can be useful for education, research, and a few other very focused tasks.

      I can see an application where this is an aspect of the totally secure machine where all RAM is encrypted, and the only place that unencrypted data lies is on the silicon of the processor itself.

      They aren't saying that the next sound cards should be made as DIMM socketed FPGAs. FPGAs only have a niche market currently, and almost none of the applications are for the average home user.
    • Depends on what you're trying to do with the FPGA. Are you modeling a new chip, and you want the ability to poke around anywhere in the insides? Or are you modeling an I/O device ASIC that needs to have lots of inputs and outputs? For the latter, probably a memory interface is bad. For the former, maybe it's better. Are you planning to use if for a processor adjunct, like an MPEG encoder? Maybe a memory bus connection is just about right. How much do you need to interface with the outside world? Is your primary application "A Grad-Student Project That Enables Other Grad-Student Projects"? In that case, a memory-bus interface would be cool, and if nobody's done it for the last 3-5 generation s of processor/bus architecture, that makes it even cooler :-)

      If you're trying to explore new coprocessor architectures, it's an interesting thing to try - certainly better than hanging coprocessors out on a PCI bus somewhere. Of course, these days, CPUs are fast enough that it's difficult to find applications that really need enough more horsepower than general-purpose processors can provide, but there are still enough edgy things to try that it could be worthwhile.

  • Anonymous Coward writes: "People at the at Chinese University of Hong Kong ..."
    --> Shuld't it be "The People's University of China"?
  • Would there be any speed advantage to useing a reconfigurable chip vs. a programmable DSP for a very processor intensive task, like MPEG encoding or real-time full-screen graphics rendering? (think Fractal Flames [draves.org] as a music visualization plug-in) Assume that all your algorithm code can fit in on-chip cache or high-speed L2 so you don't clog up the memory bus)
    • by svirre ( 39068 ) on Sunday November 04, 2001 @07:10PM (#2520311)
      A DSP is just a very specialized CPU, primarily focusing on math intesive stuff, but less on branching and conditionals.

      As any CPU they are sequential devices. The load a instruction, decode it and execute it and repeat. Though modern DSP can paralellize many intructions it's resources are still statically allocated at the time of design. A DSP with two multipliers may at most perform two multiplications at any one time.

      Using a fpga on the other hand allows you to design the circuit from the ground up. now if your algorithm needs to do 20 multiplications at a time, you can do so simply by building them on the device.

      Using a fpga is fundamentally different from using a DSP or microcontroller/processor. The latter is a finished circuit with an assorment of operators selectable by an instruction opcode. The former can be configured into any circuit.
  • by Space cowboy ( 13680 ) on Sunday November 04, 2001 @04:37PM (#2519719) Journal
    There are several FPGA cpu's available already. For loadsadetails, go to http://www.fpgacpu.org/ and see just how easy it is to create a CPU. I've even managed to (starting with Jan's work) build my own without any prior knowledge of verilog.

    The main drawback is always going to be speed though - it's simply far and away more complex to have reconfigurable hardware than static h/w. The current "hot" CPU of any generation will almost certainly never be reconfigurable!

    Simon.
  • Asus Board (Score:2, Funny)

    by tempmpi ( 233132 )
    I find it very interresting that a Chinese Universesity is allowed to use a board produced in Taiwan. Maybe it is just too hard to find a board, that isn't produced in Taiwan.
    • Well, China believes Taiwan is a part of China , so why shouldn't it use Taiwanese boards?
    • Re:Asus Board (Score:1, Informative)

      by Anonymous Coward
      "Chinese" in the name "Chinese University of
      Hong Kong" refers to "Chinese language" (which
      is obvious from the Chinese name of the
      university), so politics are irrelevant.
    • You have demonstrated an amazing lack understanding in current Chinese politics... First of all, the University is in Hong-Kong SAR (Special Administrative Region), which while technically a part of the PRC has it's own economic system, and is pretty independent from the central government. Also, the mainland Chinese don't have any real problem with Taiwan, and they believe that it is in fact a part of their country. Taiwanese citizens can buy land in China, for example. They aren't happy with the current government, though. But they wouldn't have much of a problem with Taiwanese motherboards (especially since Taiwan is the only country in the world where motherboards are made... which was the point of your joke, I know)
  • by C0vardeAn0nim0 ( 232451 ) on Sunday November 04, 2001 @04:41PM (#2519738) Journal
    as expansion slots were used by a few companies that sold G3 expansion cards for older PowerMacs.

    IIRC, they had an expansion card that you'd attach to the cache slot near the original PowerPC CPU.

    This way the new CPU would have all the memory bandwith it needed to run at 400 Mhz. 400 Mhz in a performa 6200... wow!
  • by Anonymous Coward
    Most mobos only come with 2-3 memory slots. 4 if you're lucky, more if you're paying through the nose for a server mobo.
    • You can spare a memory slot for the FPGA board, if your mobo has three slots.
      Memory has become ridiculously large and cheap. 512MB boards are under $50. I'm sure there are people who need more than 1GB on a non-production machine (obviously production machines like database servers need all they can fit), but for most applications, by the time you need to fill the third memory slot on your box, you could just as well buy a new card that's 4X larger than the old one you're rolling out.

      If you've only got two slots, you may have problems, but usually the main time you need the third slot is if you're upgrading a machine and want to keep the old memory as well. And sometimes you've got a board that doesn't have enough address lines or has a BIOS that doesn't understand them (my home machine *says* it can use 3x256MB memory, but it looks like I'll have to flash a new BIOS to do it, and so far 192MB has been plenty.)
      If you don't have something else special to do, like FPGA, you might as well keep the old RAM - doesn't hurt, and more memory is always usable as long as it doesn't force you to a lower speed. I recently added 512MB to a 128MB maachine at work, giving 640MB. Bill Gates says that ought to be enough for anybody :-)

  • http://booya.dorm.duke.edu/temp/fccm01_pilchard.pd f
    Suck that bandwidth up.
  • by Bowie J. Poag ( 16898 ) on Sunday November 04, 2001 @04:56PM (#2519782) Homepage


    If Gates were reprogrammable, then we wouldn't be in this mess in the first place.

  • http://www.fpgacpu.org/log/aug01.html#010821-dimm

    On FPGAs as PC Coprocessors, redux:
    http://www.fpgacpu.org/log/aug01.html#010811

    On FPGAs as PC Coprocessors (1996):
    http://www.fpgacpu.org/usenet/fpgas_as_pc_coproces sors.html

  • Let's see, you'd need to modify whatever OS you got to NOT use everything it has. Modify the BIOS to NOT check the memory status in certian areas (or tell the periphrial to emulate being ram for the moment. And then to actually use it you'd have to poke values into memory and peek at them after blowing off a couple clock cycles (kinda like in a Commodore 64.) Heck isn't that how normal buses work anyhow? Just slower & has a way to find your perhiphrial other than memorizing memory locations?

    Then again I hardly know squat about how this stuff works, apologies to those who know what they're talking about. (:
  • as was stated in the main post, it was just *developed*. Do you even try to think about what you write before you write it?
  • FPGA's are really a pretty neat piece of hardware. They're cheap easily constructed memory arrays. I've had experience using them in my digital design class at Georgia Tech. Basically you use a piece of software to plunk down whatever gates you want, the software compiles your schematic into a series of truth tables that get loaded onto the chip. There were few articles on slashdot in the past that really interested me. One was on the reprogramable "supercomputer desktop" [slashdot.org] and the other was using computers with FPGA's that could evolve [slashdot.org] to perform task faster. Actually the computer optimized the FGPA to use electromagnetic noise from other cells in the chip to perform the same task. It used the FPGA's in ways the current paradigm never intended. Imagine a computer that can evolve to work faster....


    If someone is looking to tinker with some (F)PGA's I would recommend Altera's student kits and software [altera.com]. You can use the standard part schematics included or you can define your own using VHDL. Only $150 or $105 if you're a Georgia Tech student.

  • I am afraid that I must disagree with many of the comments posted concerning FPGAs. First off, FPGAs have been successfully demonstrated in the multiple GHz frequency range using SiGe as a base material (Dr. Jack McDonald's group at RPI has done such an implimentation with SiGe BiCMOS based systems.) Further, the contention that FPGAs are "difficult" to program, is I believe an oversimplification of the hardware/software relation in general. Are FPGAs more difficult to program than to implement C++ code on a PC? Yes, but they are also significantly more powerful pieces of hardware than the current computer architecture. For example, one of the most visionary uses of high speed FPGAs would be to replace component cards in the PC of today. For example, in a base system today, one typically has a video, audio, and I/O type card (i.e. hard disk/floppy disk/CD-CDRW-DVD). Imagine now a computer that consists of a large number of FPGAs, essentially reconfiguarable hardware. Now drivers can be reset on the fly, power up ready OSs with no boot time (using non-volatile configurations), and a host of other interesting and desirable properties are possible. If you want to send email the FPGA bank can reconfigure itself into a network or wireless ethernet card. This has some significant advantages over the current paradigm.

    Several readers commented concerning the adoption of FPGAs is not going to happen quickly(i.e. no development support) or that the problems with bus interface speeds are nontrivial. However, these difficult problems are not the problem of the hardware, and attempting to interface it to the standard PC, however kludgy, is a rational approach. Criticizing the implimentation here is a bit like telling someone that they should have used a Porsche instead of a Pinto to build a time machine.

  • Well, while I was doing my final year project at Department Of Computing, Imperial College London we have the actual card made by Chinese University of HK. Bascially it is just a matter of having a SDRAM contoller "programmed" onto the FPGA, and wired correctly(it uses Xilinx Virtex series of FPGA, and Xilinx has implemented a SDRAM controller on it - see their tech notes - their sites is down when I write this). Having done my project with FPGAs, I would say the problem with this thing is that it is very fiddly to program the thing - it requires the understanding of software as well as hardware. I also agreed with some comments above that it lacks interrupts on SDRAM bus - therefore it is even more difficult to program the card.

    Anyway, PCI FPGA cards has always been available, and they are hugely expansive. But they are getting down in price. One of the problem of FPGA is that the speed of the chips are slow (depends on the complexity of the circuit, you can only clock it to around 1Ghz for very simple cores, lot slower for complex circuit) so consider the speed of microprocessor it is not worthwhile to use them in normal computer systems - but a new niche is open up in embedded market.

    Notice the name of the university is right! Chinese name of the university suggested that the "Chinese" in the university name is actually means Chinese language, not China the country.
  • The closest thing like this I have seen in person is my Mac desktop machine. It was a PowerMac 6400 (a PPC 603e fixed to the motherboard). It was never meant to be upgraded. 3rd-Party upgraders developed G3 processor cards that snapped into the L2 Cache slot. The card then tells the original 603e CPU to "sleep" and the L2 mounted G3 takes over all CPU functions. It works perfectly.
  • Back when machines came with 4K of memory (yes, K, not M), but could address 64K, it wasn't uncommon to memory map devices. It was an easy thing to do; a few discrete nand gates (7400 series ICs) for decode logic and you were done. Since a lot of the code was written in assembler then, it was easy to move stuff in and out of the memory locations.

    Everything old is new again... I wonder how many other ancient techniques would be useful now...

    Anyone remember hardware memory swaping (bank switching)? You could take that machine with a measly 8GB memory limit and expand to 256 banks of 8GB for 2TB of memory (assuming you could afford all that) with a single memory mapped bank select byte. Only 1 memory access cycle to swap 8GB.

    Actually, that might be pretty cool. :-)
  • Having first heard about FPGA tech a few years ago, and realising that while it presented some interesting "tricky" problems, I completely failed to think of one application proving both interesting and vaguely plausible for which an FPGA would be the best practical "way to go". Recently I had several ideas - for which only IO bandwidth seemed an unavoidable problem. Is there someone who can comment on the feasibility of implementing:-

    1. A bit addressable cache ... By this I mean accepting that main store is addressed typically by words of length 1,2,4 or 8 octets, and words can be read/written at word boundaries... can I use FPGA technology to allow me to write the first n bits from my register to an arbitrary bit offset in main store in a single processor cycle? This would be a boon for processing data structures for transmission/storage. Yes, storage and bandwidth are cheap, but in applications where serialisation is demanded, efficient and flexible data structures are very important for scalability. I see this becoming more of an issue as processor word sizes increase.
    2. Is it yet feasible to implement an AVL (red-black or similar) tree in hardware? I'd love to be able to consider sets a natively supported data structure:-)
    • well, i'm no guru, but in response to (1)...

      there's alot of things going on here. by "main store do you mean main memory? if so, forget about it. DRAM is never going to be as fast as the SRAM in your cache. IMHO, capacitors just can't do it. writing to cache is a different story, but now you're talking about cache control issues by having two devices dirtying up the cache. maybe this wouldn't be such a problem becuase the processor is going to know when it tells the "bit manipulator" to do its work.

      I have no idea what kind of clock speed you can get out of different FPGA's. I could look it up, but I won't. ASIC would probably be better for this. it's fast. the operation you're talking about is something like this (i think):

      Given an arbitrary 8 bit word:

      8 7 6 5 4 3 2 1

      x x x x x x x x

      We want to modify bits 4, 3, 2, so we go:

      x x x x x x x x

      AND 1 1 1 1 0 0 0 1 to get

      x x x x 0 0 0 x

      OR 0 0 0 0 y y y 0 to end up with

      x x x x y y y x

      So you need something that can do a word-length AND + OR in one clock cycle... Can you do it at 2GHz+? I have no idea...

  • The Chinese are the most technologically advanced people on earth, and will likely eclipse the Americans as world leaders in short order. As a nation, their industriousness and ingenuity are unmatched. While "fat cat" nations like the USA and those of Western Europe rest on their laurels of years long gone by, China forges ahead with a clear purpose and not a little justifiable umbrage at the rest of the world's perception of it. Advances like this FCPGA will more and more often come from China in the future.
    • Pilchard does look pretty cool. However, it looks like the Nuron board, and the Penn state board both predate the pilchard.

      Don't know what the status of the boards are, I just took a little offense at being called a fat cat.
  • The cheapest FPGA I know of is the Altera Educational board, you can get it for about $150 if you are a student (I'm not sure what else, you might have to go to one of the schools (colleges) that participate in their program.) For $150, you get suprisingly little performance, I'd be suprised if you could get a Mips 3k (one of the first mips cpu's, 32 bits and really slow), or an early Sparc cpu in one. I'd be really suprised if you could get a fast 486 or early pentium in one.
  • It seems a lot of people think all peripherals are created equal. Considering that many high end CPUs have more cache memory than the total memory of any PC produced before 1989 one could consider main memory to be "peripheral" already anyway.

    At any rate, placing the FPGA on the memory bus allows the massive amount of communications required to reconfigure the board to occur in a reasonable time and also effectively gives you parallel processing for free. You treat part of the FPGA board as a second CPU (but not a general purpose one) and the rest of it as memory. When you've got an instruction stream that is better adapted to the FPGA's current confuguration, you write the data to it's memory, let it run and you can use the CPU for other tasks. Since most algorithms you'd want to run on the FPGA have a deterministic time of completion (even searches because you can bound the time it would take to search the entire data set) you just read the results off the FPGA memory after a little while.

    Anyway I haven't read the article yet, so I'll go do that now.

fortune: No such file or directory

Working...