Awesome link! I'm the OP and I really appreciate your response. The reason I'm looking into this is that I work with many scientists who use commercial software packages where they don't control the code or compiler and their results are archived and can be reanalyzed years later. I was recently helping someone revive an old server to perform just such a reanalysis and we had so much trouble getting the machine going again I started planning to clone/virtualize it. That got me thinking about where to put the virtual machine (dedicated hardware, cloud, etc) and it also got me curious about hypervisors. I found some papers indicating that commercial hypervisors can have variability in their floating point math performance and all of that culminated in my post. Thanks again.
Thank you, I appreciate the link. I'm the OP and I didn't mean to speculate about or disparage Mathematica, I meant to use it as an example of a commercial software package where the person running the calculation doesn't control the code or compiling process.
Awesome! Thank you for both the posts, I'm the OP and I really appreciate them. Your comment about scientific computing is spot on with the use cases I'm interested in.
I'm the OP and I really appreciate this comment. I did give some thought as to whether it was reproducibility or repeatability and I decided on reproducibility because the experimental equipment (underlying hardware and firmware) would be different, different analysts would be involved, and the replication of analysis would be occurring after a long period of time. I agree though that it's not clear cut in my post.
I'm the OP and I agree that I should've proofreader my post.
Nice, I'm the OP and I appreciate your comment and experimental design. I agree that in situations where you are coding the algorithm, it's easier to control/adjust for the variability with each hardware platform. The use cases I'm really wanting to learn more about are with commercial software packages that have traditionally been run on dedicated hardware and are now being virtualized and moved across multiple hardware types. I like your approach though and may do some testing along those lines.
Thanks for your comments, I really appreciate them. Your mention of experiments was spot on with the use cases I'm trying to learn about. I've worked with many scientists who use commercial software packages for biomedical research where their experimental results may be archived for 10+ years before being reanalyzed. I recently helped a colleague pull a Windows 2000 server out of storage to rerun an experiment. We got it going after some difficulty and that got me thinking about virtualizing the harddrive, which then lead me to wonder about the portability of virtualized machines between hardware hosts (including cloud providers) and the resulting reproducibility issues that could occur. I then read through several interesting papers showing variability of floating point math in commercial hypervisors, which lead to my posting on Slashdot. Thanks again. Some interesting links: http://faculty.cs.gwu.edu/~timwood/papers/im2013_tech.pdf http://www.vmware.com/pdf/hypervisor_performance.pdf http://www.cc.iitd.ernet.in/misc/cloud/XenExpress.pdf
goodminton (825605) writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who "archive" hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."
I agree with the previous commenters that it's important to have a solution or plan before raising the alarm. Having said that, once you raise the alarm and you're not being heard to your satisfication, there are several options available: * First, clear your mind of what you think you know about software development and what SHOULD be and try to see the situation from an open-minded perspective. Are the issues you're seeing really an indicator of poor quality or are they an indicator of a system that's different from what you know/like? As a quality/regulatory person myself, I've seen many unnecessary projects and alarm bells simply because of a lack of understanding/perspective on a given topic. I'm not saying that's the case with you, just that this is the kind of issue that's good to be absolutely clear with yourself about. * Once you're clear that there is in fact an issue, go to QA and request an internal audit on your software development/quality systems. If your QA team doesn't have a procedure by which you can request an audit, then you should find a QA partner who can work with you on this. It's good to have a QA partner anyway so building the bridge on this project won't be a waste of time regardless of the outcome. * If auditing isn't an option, or if you need ammo to sell the audit idea, another approach is to analyze deviation and/or CAPA trends on your software development process, as well as your validation process. For example, try to find out how many validation deviations are being generated when new/updated software is released from your development team. Working with QA, you could develop an estimated cost-per-deviation, which would be a huge pile of ammo for your management presentation. Also, pretty charts and graphs will help too. * Find other tangible evidence of the issue. Without specific examples it will be difficult to be clear about the problem and/or the solution(s). * If you find evidence and QA and/or management still won't listen, it's time to consider your options. You can either stay, knowing that a ticking time-bomb exists, or you should carefully plan and execute your exit from the company. My litmus test for working at a company is to regularly ask myself whether I'd give my company's medicine to a family-member [a family member I love
:P ]. If the answer is no, I don't stick around. So far, I've only had to do that once in 10 years and it was absolutely the right choice. Good luck!