Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gene name errors are widespread in the scientific literature due to Excel (biomedcentral.com)
2 points by samwillis on Aug 6, 2022 | hide | past | favorite | 5 comments


What is convenient behavior for Excel in one domain can cause errors in another. Scientists should not use it for what is not meant for, and they should learn R or Python etc. when those tools are more suitable.


I would phrase it differently, in three statements:

1) scientists did not know of this behaviour of Excel when importing data

2) scientists did not check imported data

3) the editors, peer reviewers, and whomever else involved in the publishing of the scientific articles did not check the data

... and the culprit is Excel.


They're using it more like python or perl than like a spreadsheet. Being able to checking large datasets for silent corruption like this would imply they'd be using a more appropriate tool.

As for (3), the data sets are often unavailable before publication or released well before the paper is submitted.

Sadly Excel is one of the more popular tools in this space. I agree Excel is the culprit; imagine what would happen if it treated dollar amounts as poorly as gene names!


In my mind, Excel can treat data however it wants. Things can be redefined. The import data process has a "Transform Data" button. Click it. It opens a window where you can check the data and manually edit the query. It's called Power Query. They aren't even using the tools right there in front of them to fix these things. Imagine them using Python and R.


Still, if you know how to use Excel, you can set the columns with names as text.

The final effect is a combination of a stupid default behaviour summed to the incompetence and lazyness/sloppyness of the people involved.

Anyone talking about the issues put all the blame on Excel, which is (IMHO) not fair.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: