Thursday, February 26, 2009

How to reconcile between the Pseudogene Family Database and Eukaryote Database?

I found something misleading that the size of pseudogene protein families database is quite different with the Eukaryote Database. Say,
gene ENSPTRG00000021298, it is contained in Pseudogene Families in Chimp (, but not the Eukaryote Database ( I wonder whether a gene that has an Ensemble ID is a pseudogene or not.
Which database should I depend on?

For the latest pseudogene families, you may want to take a look of our
Pseudofam database published on NAR recently (

However, pseudogene families were built upon the parent proteins of the pseudogenes (which means using Ensembl Peptide/Protein ID rather than Gene ID). Also, pseudogene families only contain pseudogenes that
can be classified into families.

As a result, if you have a set of gene IDs and you wish to see if they have any pseudogenes, I recommend you to download the Chimp's pseudogene set available at and search for the
gene ID annotation.

For the details of our pseudogene identification, you might want to read our paper published on Bioinformatics previously:

Sunday, February 15, 2009

Huge discrepancy in the numbers in "Modeling ChIP Sequencing in Silico with Applications"

In your article "Modeling ChIP Sequencing in Silico with Applications", you mentioned that the initial 2,915,382 sequence reads obtained in Robertson's experiments, but when I refer this number to the original paper, the total sequenced reads is 24.1M, which is significantlydifferent from your data. Could you clarify, please?

For our ChIP-seq analysis, we used whatever read sequences that Robertson et al sent to us upon our request, which was made well before the publication of their paper.