I've got some experience sorting huge stacks of pages. You basically want to maximize the work done per trivial-human-step. If you stick with some algorithm based on binary-comparisons you're missing out on some of the work a brain can do essentially for free.
If you're sorting based on a number, it's a pretty quick easy step to drop the current paper in one of ten piles. If you're sorting by alphabetical then you can do one pass 26 piles (bulky but workable) or two pass (first pass A-F, G-M, N-S, T-Z, second pass sort into individual letters). This provides you with more than one bit-comparison of sorting per action. If you're sorting by date then year, month, first-digit-day, second-digit-day make excellent radix values.
Merge sort isn't bad, but it's probably less efficient. If you work with two-stack merge you're only getting one bit of work per step. If you work with more than two stacks you have to scan the tops of the stacks to figure out which page to pick up. Contrast this with radix sort - it's quicker/easier to look at one page and drop it in one of N piles than it is to scan N piles to find which one to pick up.
I see a lot of people mentioning bubble sort and related sorts, but I doubt those people ever had to deal with a few hundred pages. Those sorts are O(N^2), inherently worse. And shuffling the order of pages in a stack is a much messier and slower physical operation than simply dropping pages on the top of stack.
All the other sorting algorithms I can think of seem to suffer from smaller work per step and/or messy physical manipulation. I'm open to other suggestions, but Radix sort seems to be best suited to human work. I had great success with it.
-