I found something misleading that the size of pseudogene protein families database is quite different with the Eukaryote Database. Say,
gene ENSPTRG00000021298, it is contained in Pseudogene Families in Chimp (http://www.pseudogene.org/FAMILY/genome_seq_show.php?genome_ac=9598), but not the Eukaryote Database (http://tables.pseudogene.org/chimp). I wonder whether a gene that has an Ensemble ID is a pseudogene or not.
Which database should I depend on?
For the latest pseudogene families, you may want to take a look of our
Pseudofam database published on NAR recently (http://nar.oxfordjournals.org/cgi/content/abstract/gkn758v1).
However, pseudogene families were built upon the parent proteins of the pseudogenes (which means using Ensembl Peptide/Protein ID rather than Gene ID). Also, pseudogene families only contain pseudogenes that
can be classified into families.
As a result, if you have a set of gene IDs and you wish to see if they have any pseudogenes, I recommend you to download the Chimp's pseudogene set available at Pseudogene.org: http://tables.pseudogene.org/flatfiles/chimp.txt and search for the
gene ID annotation.
For the details of our pseudogene identification, you might want to read our paper published on Bioinformatics previously:
Thursday, February 26, 2009
How to reconcile between the Pseudogene Family Database and Eukaryote Database?
Posted by Gerstein Lab FAQs at 10:31 AM
Post a Comment