Some responses (informed by the actual paper
The second DB doesn't have any of the the password hashes, it just knows which one is correct. It's a single table of (userid, hashid) where hashid is just some small integer.
The idea seems to be that the second system can be a smaller, less complicated single-function server, easier to harden and could be running a different OS/Webserver/DB stack. You could (by sacrificing real-time validation) even have the second system entirely firewalled off and unreachable to an attacker, just polling the login servers to validate the sessions at some small interval.
If the second system goes down, one approach would be to just accept any of the passwords until it comes back up. Then check the logs of what happened while it was offline and act accordingly (invalidate sessions, raise alarms, whatever).
Overall, I like the idea tremendously. It seems like it's not quite
all there yet, but we're probably going to start implementing some variant of it immediately.