There is so much wrong with the study, I can hardly decide where to begin...
1) They used a single project of 4500 lines of code. That's quite small. The project may have been fairly easy to analyze/debug/maintain even if the code quality is crap.
2) No objective measure was given of the "bad code smells" identified in the selected project.
3) It is possible they did a very bad job refactoring the code base. In addition to refactoring, they should have had other software professionals evaluate whether the "refactored" code was more maintainable in order to control for this.
4) There is no indication that they controlled for variability in the student's skill sets. The students who worked on the original code may have been much better students than those that worked on the refactored code.
5) The student group sizes were far too small to get meaningful results.
6) The "quiz" to determine analyzability consisted of 15 questions. This is far too small a size to determine how analyzable the code is.
7) The mean analyzability scores were 7 vs 6.63. This suggests the student's in both sets may have poor understanding of the code bases.
8) 10 refactoring techniques were not chosen based on applicability to the system in question, but were instead chosen based on previous studies that ranked the impact of refactoring on code quality. i.e. they treated refactoring as a "silver bullet" rather than deciding what types of refactoring were most applicable to the project in question.
There are so many flaws in the methodology that the results are meaningless. The only reason to waste time with this sort of nonsense is publicity.