I found something misleading that the size of pseudogene protein families database is quite different with the Eukaryote Database. Say,
gene ENSPTRG00000021298, it is contained in Pseudogene Families in Chimp (http://www.pseudogene.org/FAMILY/genome_seq_show.php?genome_ac=9598), but not the Eukaryote Database (http://tables.pseudogene.org/chimp). I wonder whether a gene that has an Ensemble ID is a pseudogene or not.
Which database should I depend on?
For the latest pseudogene families, you may want to take a look of our
Pseudofam database published on NAR recently (http://nar.oxfordjournals.org/cgi/content/abstract/gkn758v1).
However, pseudogene families were built upon the parent proteins of the pseudogenes (which means using Ensembl Peptide/Protein ID rather than Gene ID). Also, pseudogene families only contain pseudogenes that
can be classified into families.
As a result, if you have a set of gene IDs and you wish to see if they have any pseudogenes, I recommend you to download the Chimp's pseudogene set available at Pseudogene.org: http://tables.pseudogene.org/flatfiles/chimp.txt and search for the
gene ID annotation.
For the details of our pseudogene identification, you might want to read our paper published on Bioinformatics previously:
http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1437
Thursday, February 26, 2009
Sunday, February 15, 2009
Huge discrepancy in the numbers in "Modeling ChIP Sequencing in Silico with Applications"
In your article "Modeling ChIP Sequencing in Silico with Applications", you mentioned that the initial 2,915,382 sequence reads obtained in Robertson's experiments, but when I refer this number to the original paper, the total sequenced reads is 24.1M, which is significantlydifferent from your data. Could you clarify, please?
For our ChIP-seq analysis, we used whatever read sequences that Robertson et al sent to us upon our request, which was made well before the publication of their paper.
For our ChIP-seq analysis, we used whatever read sequences that Robertson et al sent to us upon our request, which was made well before the publication of their paper.
Tuesday, January 6, 2009
How do you submit private data to the morph server?
How do you submit private data to the morph server?
There is now a "Private" check box on the morph submission form. If checked, the submitter will still receive an email upon submission but the general public will have no way to find the morph. It will not appear when people use the search box on the front page, nor will it appear among the user-submitted morphs on the movies page. Search engines do not index morph pages since they are dynamically generated. The only way third parties could know of its existence is if they were somehow able to intercept the email sent from the server to the submitter. Thus it is highly unlikely that third parties would ever know of or find it. We trust this will provide sufficient privacy.
There is now a "Private" check box on the morph submission form. If checked, the submitter will still receive an email upon submission but the general public will have no way to find the morph. It will not appear when people use the search box on the front page, nor will it appear among the user-submitted morphs on the movies page. Search engines do not index morph pages since they are dynamically generated. The only way third parties could know of its existence is if they were somehow able to intercept the email sent from the server to the submitter. Thus it is highly unlikely that third parties would ever know of or find it. We trust this will provide sufficient privacy.
Monday, November 3, 2008
I have studied your article "The role of disorder in interaction networks :a structural analysis". I am trying to get a list of hub -party and date- and non-hub proteins, so I am trying to get the datasets you used; however, I cant find the datasets. And so what I am asking for is that is it possible that you guide me to get the dataset somehow?
thank you for your interest in our paper. To answer your question, we employed the datasets provided as supplemental material by Han et al.,
Bertin et al. and Batada et al. Hereafter are the reference of those papers:
From Han et al., Supplementary table S1 includes date and party hub information. In our paper, this is referred as FYI 2004. From Bertin et al, we used the filtered-HC Protein-Protein Interaction dataset, provided as Supplementary table S1; called FYI 2007 in our paper.
From the Batada et al. paper, we used the High-Confidence Interaction Dataset provided as Dataset S1 in the Supplementary information . As described in our paper, hubs are then defined as ORFs with more than 10 interacting partners. Finally, to determine which hub is a party or date
hub, we computed the co-expression correlation with their partners. Party hubs have a correlation higher than 0.25 with their interacting partners.
In order to perform the correlation analysis, we employed thecompendium dataset by Huges et al. [ Cell (2000) 102:109-126 ].
thank you for your interest in our paper. To answer your question, we employed the datasets provided as supplemental material by Han et al.,
Bertin et al. and Batada et al. Hereafter are the reference of those papers:
- Han JDJ, et al, (2004) Nature 430, 88-93
- Bertin N. et al, (2007) PloS Biol, 5(6):e153
- Batada NN et al. (2007) PloS Biol, 5(6):e154
From Han et al., Supplementary table S1 includes date and party hub information. In our paper, this is referred as FYI 2004. From Bertin et al, we used the filtered-HC Protein-Protein Interaction dataset, provided as Supplementary table S1; called FYI 2007 in our paper.
From the Batada et al. paper, we used the High-Confidence Interaction Dataset provided as Dataset S1 in the Supplementary information . As described in our paper, hubs are then defined as ORFs with more than 10 interacting partners. Finally, to determine which hub is a party or date
hub, we computed the co-expression correlation with their partners. Party hubs have a correlation higher than 0.25 with their interacting partners.
In order to perform the correlation analysis, we employed thecompendium dataset by Huges et al. [ Cell (2000) 102:109-126 ].
Monday, August 11, 2008
Where can I get access to the data driving PubNet?
Where can I get access to the data driving PubNet?
PubNet is based exclusively on PubMed, which you access via query or bulk download from the NCBI.
PubNet is based exclusively on PubMed, which you access via query or bulk download from the NCBI.
Thursday, August 7, 2008
can you provide an example script for the integrated system for studying residue coevolution in proteins?
Regarding the integrated system for studying residue coevolution in proteins, can you provide an example script that can take one input file in command-line and write result to an output file? Also, can you send me those matlab code for SCA implementation?
The system was implemented in Java. It requires the Java virtual machine, plus some additional packages in order to run locally. If you are interested in installing it on your own machine, I can help sort out the steps. The Matlab code for SCA is unfortunately licensed and we cannot redistribute it. You may contact Rama Ranganathan for it. I think they are also releasing a newer version of SCA.
What is the Protein in the Logo at the Top of the Molmovdb Page?
I've been using your molecular movement database to find candidate proteins for a functionalization experiment that my group at Oregon State University is working on. The logo at the very top of the browsing page caught my attention (http://www.molmovdb.org/images/ProtMotDB.lrg-logo.gif). Would you happen to know what protein this is?
Lactoferrin. It's also part of Figure 4 in the original database publication from a very long time ago:
This has references to the specific structure publications.
Lactoferrin. It's also part of Figure 4 in the original database publication from a very long time ago:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=9722650
This has references to the specific structure publications.