Comment Re:Some questions for those who may know... (Score 1) 103
I'm assuming this thing is MIMD. Separate processors with separate memory seems to me to imply that. Am I right?
- Yes, you are correct. Epiphany is MIMD.
How does this design relate to the old Inmos Transputer, which, from what I recall, was conceptually fairly similar? Is it a development of the same ideas, or is it something completely different?
- Transputers are now fairly antiquated - they were 8-bit engines! However, the basic concepts are pretty similar. It's been a long time since I thought about Transputer implementation details, but the biggest, most obvious difference in my mind is the standard, open programming environment... Inmos was very pigheaded about only supporting OCCAM; the CTO was quoted saying "we'll support C when someone can show me something that can be coded in C that can't be coded in OCCAM"... clearly he was missing the point, so they missed the market.
My understanding, so far, for what it's worth, is that the key features of the Epiphany architecture are:
- Each processor is able to address four other processors directly, through its own router, implying a grid or surface architecture rather than a cube or hypercube architecture.
- Not exactly correct. Any core can directly address any other core, but with added latency beyond the nearest neighbors. Theorectically, cubes etc... could be implemented, but Epiphany is most naturally suited to a 2d grid.
There is a single flat memory map, of which each processor 'owns' a local page; but each processor can read from (? and write to ?) pages belonging to other processors. There is presumably some overhead in accessing other node's memory(?)
- I'm not a SW guy, but I don't think the concept of 'owning' is correct. Each processor node physically has local memory, but it can be 'owned' by another processor. Assuming a non-blocked optimized network and the availability of the memory resource, accessing any memory on the same chip has no overhead (i.e. the bandwidth can be the same as addressing the processor's local memory) other than the added latency of going through the routers (1.5 cycles per node).