Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Get HideMyAss! VPN, PC Mag's Top 10 VPNs of 2016 for 55% off for a Limited Time ×

Comment In Other Words (Score 1) 81

In other words:

"We are famously over-stocked on items that we are not actually using because of huge budget allocations. We don't want to lose those budget numbers and the goverment is saying we need to buy their defense contractor friends' goods. The plan is to just purchase a billion dollars of equipment and just sink it never to be seen again. Everybody wins, except maybe the taxpayers."

--Tirian

Comment Re:"and they halt operations when they do so" (Score 2) 112

Many supercomputers that utilize specialized hardware just can't take component failure. For example, on a Cray XT5, if a single system interconnect link (SeaStar) goes dead the entire system will come to a screeching halt because with SeaStar all the interconnect routes are calculated at boot and can not update during operation. In any tightly coupled system these failures are a real challenge, not just because the entire system may crash, but if users submit jobs requesting 50,000 cores but only 49,900 cores are available.

Checkpoints are necessary, but in large-scale situations they are often difficult. You usually have a walltime allocation for your job and you certainly don't want to use 20% of it writing checkpoint files to Lustre (or whatever high-performance filesystem you are utilizing). Perhaps frequent checkpointing works on smaller systems/jobs, but for a capability job on a large system you are talking about a significant block of non-computational cycles being burned.

Slashdot Top Deals

I just asked myself... what would John DeLorean do? -- Raoul Duke

Working...