The high speed is because they currently don't have any on-chip flash (flash being slower to access than SRAM, and typically being what slows 32-bit microcontrollers down). That means this isn't a single-chip solution like most microcontrollers, though they are working on changing that.
Instead of flash, they store their program in the same SRAM used to store data (which makes that 8 kB of SRAM a lot more limiting than it would be on a Cortex M0 with the same amount of SRAM plus 16-256 kB flash). Most microcontrollers use a Harvard architecture with separate program and data memory, allowing instructions to be fetched from flash while performing reads from and writes to SRAM. If they don't do this, I wonder what sort of performance they'll see when they have to make regular reads from a slow flash memory in between SRAM accesses. Or will they just load the entire program into SRAM? That's not going to be ideal in terms of power consumption, requiring a much bigger memory array than they'd otherwise use, something that's going to get worse as they try to compete with larger microcontrollers.
Also, the Harvard architecture has some advantages in security: things can be set up so a very specific sequence of actions has to be performed to enable writing to program memory. With IoT devices, this sort of thing is becoming more important...not an issue at present, with their 8 kB memory, but something to consider when thinking about this thing's future.