Forgot your password?
typodupeerror

A Generic PCI Based FPGA Coprocessor? 34

Posted by Cliff
from the programmable-hardware dept.
graveyhead asks: "Inspired by a recent Slashdot article, I came across this fantastic package from the fine students in the Configurable Computing Laboratory at BYU: JHDL, which is a set of open source FPGA CAD tools. I am writing a proposal for a prototype system and I have a requirement for a 32 or 64 bit PCI card compatible with JHDL. I do not require any IP cores (which Xilinx seems to want to sell me hundreds of 'em), since our project uses its own custom cores. Also, I will not be building additional hardware around the FPGA processor, so the package needs to be fully self-contained. I simply need the ability to use JHDL to program the FPGA device over PCI under Linux, execute my circuit with parameterized values, and return the result. I found these boards, but I'm not sure which is appropriate or compatible, or if there are better alternatives that I am not finding in a similar price range (up to USD2000). Preferably, I want a chip with lots (1-10M+) of programmable gates."
This discussion has been archived. No new comments can be posted.

A Generic PCI Based FPGA Coprocessor?

Comments Filter:
  • A bit overkill (Score:5, Interesting)

    by brejc8 (223089) on Saturday April 26, 2003 @08:10PM (#5816841) Homepage Journal
    1-10 mil gates is a very large number.
    Take a look at my MIPS on an FPGA [man.ac.uk]. That used less than 100k vertex gates including MMU and other things youprobably dont need.

    Also why PCI? Why not talk to it via serial/usb/network? And why not make your own? We made these [man.ac.uk] for just over £100 ($150) each (plus virtex). having the board outside the PC allows you to have more freedom and external connections to do things like this [man.ac.uk]. Also it allows you to write a simpler download software routine to program the thing (serial vs PCI).
    • Re:A bit overkill (Score:3, Interesting)

      by Jamie Lokier (104820)
      How do you get a PCB like that made with all those components and connectors for just £100? Is it just a 2 layer board?

      Also, you didn't mention how much the Virtex cost. Last time I looked, which was about a year ago, they were many hundreds of pounds from RS. I've been told that with FPGAs the real price depends on where you buy them and your relationship with the supplier. How much did you get yours for?

      Thanks,
      -- Jamie
      • Its a four lare board but it costs about £50 to produce in numbers over 50 and the other components were standard cheapo ones, making it just over £100 (including a spartan which is big enough for a 32bit cpu). The virtex is an extra and it depends where you get them from. Xilinx were nice enough to give us quite a few vertex E's as we are a university. But make a plea and they might be generous with you too. Sepperately they cost £100 up depending on size, quantity and supplyer.
    • Re:A bit overkill (Score:4, Interesting)

      by chriss (26574) <chriss@memomo.net> on Saturday April 26, 2003 @09:50PM (#5817088) Homepage
      Also why PCI? Why not talk to it via serial/usb/network? And why not make your own?

      I'm coming from the software side, so for me an FPGA has always been kind of an hardware accelerator for my software. My prefered idea is to plug one of these into my machine, learn to program it to do a specific part in e.g. my webserver and never bother with the fact that it is really hardware. So I need the bandwidth of PCI to do something usefull and do not want to touch a soldering iron. I'm aware that there are many more possibilities, but asume that in total numbers more programmers are actualy interested in FPGA than hardware engineers, since there are many more programmers than hardware engineers.

      Chriss

      • Exactly, thank you. This is what I'm after.

        In the future I'd like to be able to build high level mathematical expressions and have the chip build my expression in circuit.

        The bandwidth is important because certain operations would block and wait. In a time sensitive operation, the less time spent waiting, the better.

      • Re:A bit overkill (Score:4, Interesting)

        by megabeck42 (45659) on Sunday April 27, 2003 @12:49AM (#5817600)
        Well, why constrain yourself at the PCI level? If you recall the prilchard is exactly what you suggest: It's a Virtex FPGA attached to the PC133 bus. You interface with the board using regular push/mov/movsw instructions. The author was able to demonstrate almost a gigabyte/second DES, much faster than any other PCI accelerator. Furthermore, you could do fancy off-loading using DMA. E.G. DMA from the ethernet card into a ring buffer maintained by the FPGA, and have the FPGA preprocess.

        For a real general approach, you would need another card in the system, probably PCI to be able to program the fpga, and it would allow the FPGA to raise an interrupt if necessary. Personally, I would be extremely interested in one of these because of the phenomenal possibilities it allows.

        For example, it would make the gnuradio project much easier by building an FPGA with all the requisite, processor intensive FIR/IIR filters, FFTs, and Viterbi decoders.

        Furthermore, you could buckle the FPGA to a high-speed A/D converter, and use the FPGA to do do the initial signal processing. That would really make a difference for very low-latency sound processing for example.

        You could implement custom cache functions in hardware for databases, you could accelerate SSL like mad. With open cores, you could patch libmad, etc. to use the FPGA when its available, accelerating MPEG encoding, for example.

        For some reason, I can't access his webpage, but, his work is at, http://www.cse.cuhk.edu.hk/~phwl/ [cuhk.edu.hk]

        If you're interested in this route, and want some help, shout.
      • Fellow moderators: this is not a troll. :)

        I'm coming from the software side, so for me an FPGA has always been kind of an hardware accelerator for my software.

        Then please, find a hardware engineer to help you. I make my money as an ECE (working with FPGAs since 1995), and let me tell you that the worst VHDL/Verilog/whatever source code that I have seen is always written by software programmers. Unless you are a programmer that can context switch into thinking about what happens to your data at the gat
  • it's probably a very bad suggestion, but current video cards do come to mind....

    they're basically a very fast and very programmable math coprocessor that runs 3D-to-2D software. maybe it's possible to disable the RAMDAC and just use the video chip?
    • by Jamie Lokier (104820) on Saturday April 26, 2003 @09:24PM (#5817032) Homepage
      I believe video coprocessors tend to be quite specialised for video ops.

      Even if they weren't, programming a maths coprocessor is very different from programming an FPGA, and the things you can do are very different.

      An FPGA is a programmable logic circuit, and you can connect almost any kind of digital electronics to it. From flashing lights to memory to network interfaces to bus interfaces (like PCI and USB) to whatever else turns you on.
      • I can't think of a good use of FPGA for video
        off the top of my head, frankly. Almost anything
        that an FPGA could do for video co-processing could
        be done better, cheaper, faster with a DSP.
        That doesn't mean there aren't a lot of things
        I haven't thought of -- this is not my bread-and-
        butter tech zone, and I make no pretense of
        expertise. Just offering my .02 in hopes that
        it may be useful if only to elicit the rebuttal
        of the better informed.
        • I've got one for you, real-time video processing.

          And DSPs are good for stuff like this too. But they are generally less suited for large data sets. (Like operating on entire images or even multiple images.)

          Also it might be that encoding and such are easier to do fast with FPGAs than with DSPs. I haven't really looked in detail what modern video codecs require though.

          But there's certainly a place for DSPs as well. So lets put both DPSs and FPGAs on our PCI "coprocessor". ;-)
          • Well, the obvious difference between the two is that you can implement a DSP (or any other kind of processor) using an FPGA, but not vice-versa. However, it's a lot more expensive per gate on the FPGA; there's always a tradeoff between cost and flexibility. If you know you're going to be doing a lot of the type of signal processing for which DSP's were designed, using a DSP is probably the way to go. On the other hand, if you might be doing some signal processing, or some DESing, or experimenting with a com
      • But aren't FPGA's essentially SRAM? Have you ever looked at Xilinx or Altera's patents?
  • by graveyhead (210996) <`fletch' `at' `fletchtronics.net'> on Saturday April 26, 2003 @11:04PM (#5817317)

    The experience of looking for this card got me to thinking:

    Does anyone else remember back in the day, OrangeMicro [orangemicro.com] used to sell a card [orangemicro.com], now discontinued :( for Macintoshes that put a fully working PC on a PCI card on your Mac. In fact, I think I still have that laying around here somewhere! You could switch between Windows95 and MacOS, both running in native hardware by hitting Command-Enter. It was very neat, like VirtualPC except in an actual Pentium instead of a virtual one.

    Anyhow the inspiring part of the old OrangePC in this case is the multi-functional cable attached to the back. It was a truly monsterous wonder. One side was a huge hundreds-of-pins cable which plugged into the PCI card, and the other side split off into VGA video (which could pass-through the Mac signal, or interrupt it and output the PC signal), audio IO, 2 serial ports, a parallel port and a game port! It was truly an engineering masterpiece :)

    So, why doesn't someone build a generic PCI device with such an awesome cable attached? It would give a whole new meaning to opencores.org [opencores.org]. Software could be written that could drop in an arbitrary core and turn your card into any device that you desired that minute. Remember what Homer says: "Aww, I want it now!". With such a device, you could have it, or build it yourself right from your desktop if you were so inclined ;) For example, if the bass is rattling on my friends new album and we want to try cutting off frequencies below 10hz:

    • install open extensible DSP core
    • install custom logic: if (f < 10) v = 0
    • Play/record through card

    You could even do what OrangePC did and drop a whole processor/OS combination (or develop one) on the board and seamlessly switch between it and the host OS. If the card had multiple FPGAs, it could even drive multiple custom devices simultaneously.

    Bye bye PCI hardware vendors (except ones to make the general purpose boards). Next, let's build an AGP8x version (with a stable on-board backup VGA core, just in case ;) and set our sights on NVidia and ATI! Now, any volunteers to build an open-source OpenGL accelerated VGA core? All it is is a couple of multipliers, right? ;)

    The problem may be that no PCI/AGP vendor in their right minds would ever build such a thing, because it would replace all their products. Still, it's fun to dream about such a useful piece of hardware.

    • > The problem may be that no PCI/AGP vendor in their right minds would ever build such a thing

      You are forgetting one important point: FPGA's can never be as fast as custom made silicon.
      AFAIK nVidia uses Xilinx stations to emulate their cores at some KHZ frequency.
      (sorry couldnt dig up the link :8) /winke
      • You're correct, I've read the same article. It was from the Anandtech tour of an Nvidia plant I'm 99% sure. They looked at a pre-fab lab and had a large IKOS box there that they were emulating the NV30 core on at about 1/10000 or lower speed ratio. Article is located here on Anandtech [anandtech.com].

        Basically what it boils down to is it would work great for lower performance applicatons like sound and such, but would fail miserably at any higher demand applications such as video processor.
      • The company QuickTurn [quickturn.com] specializes in boxes whose purpose in life is to simulate a CPU or other silicon design in a giant array of FPGAs. Many silicon vendors make use of these machines.

        --Joe
      • I'm positive that NV could run the cores in the high MHz range, not KHz. Modern FPGAs can run at up to 200-300 MHz IIRC.

        But in large volumes (like the ones NVidia operates with), FPGAs are MUCH more expensive than ASICs. For low volumes, the NRE (nonrecurring expense) of an ASIC makes its per-unit price higher, but over a certain number, the ASIC gets cheaper. NV could probably release an FPGA-based video accelerator nearly as fast as the current units, but it would cost $2000 instead of $200.
      • However,
        As you move to smaller process technology you tend to increase speed and gate counts exponentially. Moore's law and all that. But you also increase the mask costs which means FPGAs definitely are the future of IC design for economic reasons. The speed difference between ASIC and FPGA will become much less of a factor than the overall price difference of brining a product to market as we move into the 90nm and lower level.
        Not only does an FPGA allow you to use one mask to create thousands
  • by wowbagger (69688) * on Saturday April 26, 2003 @11:21PM (#5817365) Homepage Journal
    The biggest problem with the idea of generic FPGA accelerators is sharing the resource in a multitasking environment.

    Suppose you had a nice 10 million gate FPGA on your PCI bus. Now, if you program it to do one thing and one thing only, then there's no problem. So if that FPGA set up to be a hardware DiVX encoder and that's it, all is well.

    But let's suppose that you fire up your video editor, and it wants the FPGA to be a DiVX encoder. Then you fire up SSH, and SSH wants the FPGA to be an encryption engine.

    It is currently VERY hard to dynamically reprogram sections of an FPGA (some do support partial reprogramming while running, but not all). It is also very hard for the hardware compiler to merge tasks - you would have to re-generate the layout of the FPGA each time you added a function.

    What we need is the equivilent of a malloc() call to allow a single large FPGA to be used by multiple applications at the same time (until you run out of gates).

    Once that exists, then you will see vendors making generic FPGA accelerators.

    Until then, FPGA boards will be the province of a very few people, and prices won't come down very much, I fear.
    • I don't see this as a problem. There are plenty of resources that can be used by only one application at a time. That may make them slightly less useful, but they are still a lot more useful than not having them at all.
  • Hey...heads up! (Score:3, Informative)

    by cybermace5 (446439) <g.ryan@macetech.com> on Sunday April 27, 2003 @12:58AM (#5817626) Homepage Journal
    *tosses you a link* The FPGA-FAQ Development Boards List [fpga-faq.com]

    Contains a BIG listing (updated at the end of last month) showing boards and prices when available. Also lets you know if it's a PCI board or whatever.

    I really doubt you will need a million or more gates. What are you trying to do, physically model a barrel of quarters? Is your code going to be as complex as a late-model Pentium?

    To start out with, a 600kgate SpartanIIE-ish chip will have lots of room and won't break the bank. A mega-gate Virtex, on the other hand, will eat into your budget requirements for the chip alone.

    But check out that link and you'll probably find something.
    • Re:Hey...heads up! (Score:3, Interesting)

      by graveyhead (210996)
      Well, first off, thanks. That list is great.

      What are you trying to do, physically model a barrel of quarters?

      As fun as that sounds, no. Actually I am interested in genetic programming and a huge bottleneck is the time that it takes to test a generated program or circuit. Substantial time savings could be made if some of the calculations during that step could be offloaded to a coprocessor that was dynamically tuned for each generated program. This means that genetic runs with much larger populations that


  • I wonder if the libc code can be transferred to the coprocessors memory during boot, at least larger functions, then just passing data to it with running programs.

    Another idea is to send the driver code of all installed hardware to the coprocessor, and just communicating with it in a standard way. The coprocessor takes the driver overhead. I dont think trying this with the AGP for instance is a good idea, but should work well with winmodems or CDROMS, which freeze the entire system sometimes.

    Yet ano
    • I doubt you'll see anything like this is real systems. First off, the big bonus with FPGAs is that they can crunch a /lot/ of data very fast. They can also do a lot of parallel computations at once. Using one to communicate with a modem would (IMHO) be an almost criminal misuse of hardware.

      A better alternative is to put algorithms or parts of them on a FPGA. Typically for simulations, intensive mathematical processes and similar. It's noteworthy that floating point is very expensive in hardware, so you pro
  • These guys [edt.com] have an assortment of PCI cards with FPGAs and Linux drivers. I've worked with their boards for several years with success. Just don't forget cables when ordering.
  • FPGA boards are very expensive because they're not mainstream and are typically a real PITA to integrate into software projects. I worked on FPGAs for about a year - the industry is coming a long way, new tools and compilers are becoming available almost monthly - but it's not a cheap place to play in. Even the low-end of the scale you're talking about $1000-1500USD for a board with a mid-sized Virtex on it.

  • I remember that Star Bridge Systems (www.starbridgesystems.com) is developing what they call "hypercomputers", basically large collections of FPGAs attached to a Windows host. They claim they can build the world's most powerful supercomputer (in terms of MIPS, which we all know really stands for Meaningless Integer Performance Statistic). In any case, they also have a software suite called Viva for developing and compiling software to the chips. I also seem to remember that chips are dynamically readjusted,
    • That's one of the links from the older slashdot article that I linked to in this ask slashdot submission.

      Their work is very cool, but their machines are very expensive. Also, I am looking at the PCI solution as a possible upgrade solution for an existing cluster, so it needs to be cost effective and fit in the boxes currently doing the job.

      cybermace5 posted a great list earlier that led me to this neat card [insight-electronics.com] which may do the job nicely for $250 US! It doesn't have quite the gate count that I wanted, but at

"Someone's been mean to you! Tell me who it is, so I can punch him tastefully." -- Ralph Bakshi's Mighty Mouse

Working...