Yesterday, I mentioned the problems that text miners have when they want to parse gene names out of scientific texts. Maybe they should just do it the Microsoft way. Few people know that Microsoft has incorporated a high-end text parser in their MS-Excel program, which automatically recognizes and corrects gene names. The recognition rate is so high that the user doesn’t even has to be bothered with a confirmation question. Here is how it works:
Text before MS-Excel:
Uniprot-ID Gene Description SEPT7_HUMAN SEPT7 Septin-7, Cdc10 homolog
Text after MS-Excel:
Uniprot-ID Gene Description SEPT7_HUMAN 2007-09-07 Septin-7, Cdc10 homolog
🙂
As I always tell people, just because it’s in rows and columns doesn’t mean you need a spreadsheet.
By: nsaunders on July 24, 2007
at 8:43 am
See also the BMC Bioinformatics paper by Zeeberg et al (2004): “Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics” (http://www.biomedcentral.com/1471-2105/5/80)
By: Jan Aerts on July 25, 2007
at 3:47 pm
LEaD by excel,
They thought they excelLED
By: Animesh on July 26, 2007
at 4:29 am
eheh I was reading to that article last week.
These kind of artificial intelligence techniques are really impressive.
By: dalloliogm on August 6, 2007
at 10:53 am