I conclude this from section 5.3, which I think states MPTCP over 2 links was slower than ordinary TCP over one link, when message size was 30K.
For very small flow-sizes (like less than 30KB), MPTCP should not try to create additional subflows. Because, the whole data fits in the initial window of the first subflow. However, at the moment the linux implementation always tries to establish new subflows. In the paper's stress-testing scenario these additional subflows just consumed CPU-cycles and thus the "bad" results for MPTCP with very small flows
An easy fix would be to delay the establishment of additional subflows until a certain threshold of data has been sent or a certain time has passed.