I recommend skimming the paper (second link in TFS), it's short and quite readable. At the very least, check out the provided sample of successful manipulations (PDF; the notation is explained on page 2).
Highlights include:
Our intuition that abstract principles would involve more moderate attitudes, and engender less detection was not supported by the data.
The more the participants agreed or disagreed with a statement, the more likely they were to correct the manipulation.
But:
The overall rating of the non-detected manipulated trials was notably high. Using a 9-point scale, the average rating was 2.8 or 7.2 depending on the direction of the rating, which means that the average ‘distance’ being manipulated when a statement was reversed was 4.4 units on the scale. This is evidence that the participants cared about the issues involved, and expressed seemingly polarized opinions about the manipulated issues they failed to detect.
Of course, serious multiple choice questionnaires often repeat the same questions with a different wording each time (or with a reversed scale), precisely to limit issues with bad self-reporting. It would be interesting to see if there's a correlation between consistent replies to differently worded versions of the same question, and ability to detect manipulations like in this study. If so, multiple-choice might be a useful tool after all.