I think the model wrongly assumes that elites draw down essential resources faster than commoners. In pre-modern society, that appears to have been incorrect. In pre-modern civilizations, it was over-farming and the reduction in soil fertility which was subject to draw-down, and not "resources" more generally. (For example, there is reasonably good evidence that soil degradation contributed to the collapse of the western roman empire). Elites do not consume much more food than commoners. As a result, I'm not sure it would make any different how stratified society is. Take the chateaux of the Loire Valley as an example: they're extravagant, but they're not built out of materials (such as stone) which became exhausted anywhere or threatened civilization.
In pre-modern societies, elites subsisted off the surplus labor which was left over after commoners had provided for their own subsistence. According to best estimates, this "surplus" labor available for exploitation by elites was never more than 20% of the total commoner labor available. Most labor in pre-modern societies was used in simply providing enough food for everyone to survive. In ancient Egypt, more than 90% of the population spent all their working time devoted to agriculture or household work, and similar ratios existed in other civilizations. As a result, the total consumption of elites in pre-modern society was never a large fraction of the total production of society. Some elites may have had extremely extravagant lifestyles compared to commoners, but that is because such elites' numbers were extremely small, generally much less than 1% of the population.
Another important consideration here is the difference between reduction of population, and the collapse of some political order. Insofar as I can tell, soil degradation often leads to a gradual reduction in population over centuries until some political order suddenly cannot be sustained. Often, ancient civilizations were empires in which some center had a large army and long transportation networks. The empire dominated a group of subject peoples on the periphery, and extracted the products of their surplus labor beyond subsistence and transported those surplus products to the center. Usually, the subject peoples disliked being so dominated. It seems possible to me that soil degradation could lead to a reduction in the size of the surplus, and thus the size and power of the army of the empire, until the arrangement suddenly could not be maintained. Take the western roman empire as an example: soil degradation and population decline had been happening for centuries, until the army weakened and a barbarian tribe invaded and suddenly overran and destroyed the empire.
Of course, the main criticism of the paper is that it's wildly speculative. There is no data whatsoever in the paper. This is excusable because there is very little "data" in the modern sense left over from pre-modern civilizations. Pre-modern peoples were extremely good at telling stories and writing epics, but poor at keeping records and statistics of commoners' well-being. For this reason, and other reasons, the causes of the collapses of many civilizations (such as the meso-American civilizations) are not well understood, and the explanations are highly speculative and different from each other. Many researchers speculate that the American civilizations collapsed because of long-lasting mega-droughts, which obviously would not fit this model of resource draw-down.
Usually, when constructing a model, it's at least necessary to verify that the model agrees with past evidence. Even then, the model may not be predictive at all; however, constructing a model which agrees with past evidence is often a first step. Unfortunately, the model in this case is just wildly speculative. There are virtually no examples of egalitarian civilizations prior to the 18th century, and so no data on how egalitarian civilizations would have fared. There is no data on soil fertility, consumption by elites, resource draw-down, total populations of civilizations, etc, which this model refers to. Instead, the model is along the lines of "this seems plausible".