Tests need to be fast and repeatable (among other characteristics). Tests must be of high quality as your production code. If you would fix "timing related" issues in your production code, there is no reason your tests suffer from the "timing related" issues either.
There's no reason they *should*, but they do unless you correct the test. The problem is in the test code, or in the wrapper that runs the test code. But consider an automated login test on an isolated network with a credentials server that races to come up with the browser that's attempting the login in the test case. If the login happens to start before the login server gets up and stable, then your login fails, and so does your test case, even though it's not a problem with the browser you are nominally testing.
This is/was a pretty common failure case with the ChomeOS build waterfall because Chrome was considered an "upstream" product, and therefore changes in Chrome, when they occurred, could throw off the timing. There wasn't a specific, separate effort to ensure that the test environment was free from timing issues. And since you can't let any test run forever, if you intend to get a result that you can act upon it in an automated way, you get transient failures.
Transient test failures can (sort of) be addressed by repeating failed tests; by the time you attempt to reproduce, the cache is likely warmed up anyway, and the transient failure goes away. Problem solved. Sort of. But what if everyone starts taking that tack? Then you end up with 5 or 6 transient failures, and any one of them is enough to shoot you in the foot on any given retry.
Now add that these are reactive tests: they're intended to avoid the recurrence of a bug which has occurred previously, but is probabilistically unlikely to occur again; when do you retire one of these tests? Do you retire one of these tests?
Consider that you remove a feature, a login methodology, a special URL, or some other facility that used to be there; what do you do with the tests which used to test that code? If you remove them, then your data values are no longer directly comparable with historical data; if you don't remove them, then your test fails. What about the opposite case: what are the historical values, necessarily synthetic, for a new feature? What about for a new feature where the test is not quite correct, or where the test is correct, but the feature is not yet fully stable, or not yet implemented, but instead merely stubbed out?
You see, I think, the problem.
And while in theory your build sheriff or other person, who's under fire to reopen the tree, rather than actually root-causing the problem, doesn't have time to actually determine a root cause. At that point, you're back to fear driven development, because for every half hour you keep the tree closed, you have 120 engineers unable to commit new code that's nor related to fixing the build failure. Conservatively estimate their salary at $120K/year, then their TCO for computers and everything else is probably $240K/year, and for every half hour you don't open the tree back up, that's ~$14K of lost productivity, and then once you open it up, there's another half hour for the next build to be ready, so even if you react immediately, you're costing the company at least $25K one of those bugs pops on you and you don't just say "screw it" and open the tree back up. Have that happen 3X a day on average, and that's $75K lost money per day, so let's call it $19.5M a year in lost productivity.
This very quickly leads to a "We Fear Change" mentality for anyone making commits. At the very least, it leads to a "We Fear Large Change" mentality which won't stop forward progress, but will insure that all forward progress is incremental and evolutionary. The problem then becomes that you never make anything revolutionary because sometimes there's no drunkard's walk from where you are to the new, innovative place you want to get to (eventually). So you don't go there.
The whole "We Fear Large Change" mentality - the anti-innovation mentality - tends to creep in any place you have the Agile/SCRUM coding pattern, where you're trying to do large things in small steps, and it's just not possible to, for example, change an API out from everyone, without committing changes to everyone else at the same time.
You can avoid the problem (somewhat) by adding the new API before taking the old API away. So you end up with things like "stat64" that returns a different structure from "stat", and then when you go and try to kill "stat" after you've changed everywhere to call "stat64" instead, with the new structure, you have to change the "stat" API to be the same as the "stat64" API, and then convert all the call sites back, one by one, until you can then get rid of the "stat64".
That leads to things like Solaris, where the way you endure binary compatibility is "give the hell up; you're never going to kill off the old stat, just live with carrying around two APIs, and pray people use the new one and you can kill off the old one in a decade or so". So you're back to another drunkard's walk of very slow progress, but at least you have the new API out of it.
And maybe someday the formal process around the "We Fear Change" mentality, otherwise known as "The Architectural Board" or "The Change Control Committee" or "Senior VP Bob" will let you finally kill off the old API, but you know, at that point, frankly you don't care, and the threat to get rid of it is just a bug in a bug database somewhere that someone has helpfully marked "NTBF" because you can close "Not To Be Fixed" bugs immediately, and hey, it gets the total number of P2 or P3 bugs down, and that looks good on the team stats.