I have actually seen something similar to this before, also involving an Air Traffic Control.
They were having some problem in handling "Large Messages", I am not sure of the exact details / circumstances - I was only peripherally involved. Anyway, the programmer wrote these to a file, then they were processed asynchronously and deleted. This minor change was tested - as usual at the site - by someone shooting an hour's production traffic through the test system and checking for unexpected aborts or other abnormalities. All was fine, the spooling file was 1% full.
The patch went online. 4 days later (it was a Sunday morning and it was snowing) the file hit some limit and refused to accept new messages. At that moment things went "Keystone Cops".
- All department heads were informed, except programming. Given that only one the patch had been applied in the previous week, not very helpful. Headless chickens ran around trying to find a solution.
- Standard practice in this type of situation was to switch to the backup/standby system. Since ATC data is very short lived, the backup system had an empty database which would then be populated dynamically. All "Station Chiefs" had to approve this step. One refused because he could not see any problem. Finally someone managed to make him understand what the problem was, then it was "oh yes, we are seeing that as well". His was the smallest station of course.
- Standard procedure was also to switch to manual control - rather than automated - and cancel short-haul flights. The railways could take up the slack. This was done.
The switch was duly made and everything was working again.
It turned out that the deletion of the processed records had a bug. One hour of live data left the file 1% full. 100 hours . . . do the math. It took 5 or 10 minutes for the programmer to fix the problem, he could have done it live on the Sunday if anyone had bothered to tell him what was going on.
One of the lessons from that is also relevant here - one hour of live data left the file 1% full. I'd bet that they were testing that the new feature worked, not looking for hidden side-effects.