Wow. You've never done this for a living, right?
Network failures in such a complex, distributed system cause unexpected problems. 'Router' should be thought of in this scenario as 'data flow device', and of course data is at risk.Transaction rollbacks, session timeouts, more than these cause problems that become data loss events.
Not that SWA is without blame here. At work we had a server failure that impacted thousands of virtual machines. What was a storage failure became a corruption failure, and ultimately we lost most of those VMs. Recovery varied from restore image from backup to, for our team, rebuild from source. Total loss of data of 3 years' data. Rebuild the data for only 9 months due to unforeseen limitations. And silence form the technology team. We had to go to C-level execs to be included in the M&M and analysis, and were asked continually why, since we were just customers. Accountability was not even considered until we demonstrated the ultimate costs for *our* real customers. Even now they keep trying to write it off as unpredictable, and we go back to apparent lack of testing, disaster recovery validation, and the abject failure of a three-letter vendor to recover their flagship system from an error induced by their own software update. After pointing out that the only real penalty for their team is to remove team member, we had to say out loud, in front of execs, "and if we do not, will this happen again?" Of course not, they say. And of course, they could not say that they never lead us to believe that prior to the failure.
To this day, and I will reference this on a call in about 2 hours, when they take up my current top issue, it will be blamed on an unexpected failure. And I'll say 'like $%^&* this spring?'. And every one on the call will remember, and know that I called them out again. And even the C-level is reluctant to actually cost the team anything, since this was a failure of routine maintenance, preferred and strategic vendor failure, recovery and data loss prevention failure, and even a system design deficiency resulting in a significant loss and concurrent brand damage/customer dissatisfaction/recovery cost impacts, or to put it simply, everything failed. No one is willing to acknowledge that all this failed. And they may, unknown to me, be in an investigation that will result in changes, but sadly I doubt it.
SWA will, however, be looking into this, since it is not just lost bookings but huge overtime costs, make up flight costs, penalties, and compensation. My niece was flying then and this turned a 6 hour trip into an 11 hour ordeal with lost baggage and a very unsatisfactory experience at the counters, since after all the systems were down and no info was readily available. We won't know about that. And this is a first for SWA, but Continental failed like this a few years ago, and the USAir merger with an airline to be named later resulted in a huge system merge and a failure similarly. Big systems fail big. It is hard to test recovery when it costs so much to replicate the hardware, and the production system is 24x7x365. Glad I'm not in that business any more, though there is nothing like a realistic DR exercise to sharpen your focus and get the blood flowing, and when it actually works, a huge validation.