Thursday, February 26, 2009

How to reconcile between the Pseudogene Family Database and Eukaryote Database?

I found something misleading that the size of pseudogene protein families database is quite different with the Eukaryote Database. Say,
gene ENSPTRG00000021298, it is contained in Pseudogene Families in Chimp (, but not the Eukaryote Database ( I wonder whether a gene that has an Ensemble ID is a pseudogene or not.
Which database should I depend on?

For the latest pseudogene families, you may want to take a look of our
Pseudofam database published on NAR recently (

However, pseudogene families were built upon the parent proteins of the pseudogenes (which means using Ensembl Peptide/Protein ID rather than Gene ID). Also, pseudogene families only contain pseudogenes that
can be classified into families.

As a result, if you have a set of gene IDs and you wish to see if they have any pseudogenes, I recommend you to download the Chimp's pseudogene set available at and search for the
gene ID annotation.

For the details of our pseudogene identification, you might want to read our paper published on Bioinformatics previously:

No comments: