Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Microsoft

Interrobang's Journal: Data Corruption from Excel Autocorrect 1

Journal by Interrobang

Someone on TECHWR-L posted a link to this paper (under the paradoxical title "The Cupertino Effect"), which is about how Excel's autocorrect feature can corrupt statistical analysis of genetic data if/when Excel "makes the wrong assumption" about an entry based on how it looks:

When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered.

As the author points out, this can cause gene names to come back in analyses as "unknown," because "[a] default date conversion feature in Excel ... was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 ... was being converted to '1-DEC.'"

The authors also note that there is a problem with "RIKEN [4] clone identifiers of the form nnnnnnnEnn" being converted to a floating-point number.

The paper also gives some idea of the devastating scale of the problem and its significance for people doing these sorts of analyses: "A non-expert user might well fail to notice that approximately 3% of the identifiers on a microarray with tens of thousands of genes had been converted to an incorrect form, yet the potential for 2,000 identifiers to be transmogrified without notice is a considerable concern."

As far as I know personally and can glean from the paper, the autocorrect and/or conversion feature is nearly impossible to disable completely, and can only be worked around, possibly unsuccessfully 100% of the time. This suggests that perhaps Excel is not the tool of choice for doing these sorts of analyses. (Does the spreadsheet application in OpenOffice work differently?)

This discussion has been archived. No new comments can be posted.

Data Corruption from Excel Autocorrect

Comments Filter:
  • I have both Excel and OpenOffice at work, so I'll check tomorrow.

    It's a bit silly that they provided a C program to check for corruptions. Providing a VB program (that runs in Excel) would be much more useful for people.

    Hugs

Modeling paged and segmented memories is tricky business. -- P.J. Denning

Working...