Afaik, the problem with VLIW processors in general is that they attempt to exploit instruction-level parallelism.
This is an entirely different beast to what's presented in the paper.
Instruction-level parallelism occurs when there are instructions within a (fixed) window of a code stream where there are no dependencies between two or more instructions.
The VLIW paradigm is have bundles of instructions which contain all instructions that can be executed simultaneously. This shifts complexity from the hardware[1] to the compiler.
Unfortunately, ILP can be very difficult to extract from arbitrary code, though cases exist where it's trivial.
[1] Latter RISC chips and today's non-mobile CPUs take advantage of ILP through the use of multi-issue out of order execution. Out-of-order execution typically defers execution of any given instruction until all its dependencies have been fulfilled i.e. memory/cache accesses have occurred, previous results are available, etc. By making these units multi-issue the CPU dynamically exploits ILP to the availability of hardware, no recompilation required (though it may help).
These hardware techniques are slowly coming to the mobile arena as they are relatively expensive transistor wise.