Forgot your password?
typodupeerror

Comment: Re:"and they halt operations when they do so" (Score 1) 112

by Anthony (#42062453) Attached to: Supercomputers' Growing Resilience Problems
This is my experience as well. I have supported a shared memory system and now a distributed memory cluster, the resilience of the latter means job resubmission of jobs related to the failing node is the standard response. A failed blade (Altix brick in my case) meant the entire numalink connected system would go down. Component-level resilience and predictive diagnostics help. Job suspension and resumption and/or migration is also useful to work around predicted or degraded component failure.
User Journal

Journal: Honours Update

Journal by Anthony

Honours completed sometime ago, with a mark that enables further research. The model results were nothing to write home about. I was able to tweak the model to match, broadly, alkalinity observations around towards the end of the last glacial period. I did also get to learn more about myself. After years of denial, I sought help with mild depression and my outlook improved significantly.

Comment: Re:Ho ho ho (Score 1) 314

by Anthony (#37545622) Attached to: Ask Slashdot: Successful Software From Academia?
Thanks for the links, especially the SIAM one. Some of those ideas I touched on in my Honours thesis as my project was directly affected by the issues of lost programs; old media storage where was were parts of programs found; a paper with an idealised function only described with a graph; incomplete mathematical description of a model component. I have begun planning future research along with a methodology to avoid a lot of the above issues.

Machines certainly can solve problems, store information, correlate, and play games -- but not with pleasure. -- Leo Rosten

Working...