Monday, January 21, 2008
PARE Perl Script
Saturday, January 12, 2008
Missing information regarding mutation rates
I need to use your results for my research however, there are some missing information in your paper specifically in Figure2-A the following mutation rates are missing:
- G-> C with neighboring of C
- G->T with neighboring of C
- G->A with neighboring of C
We excluded CpG di-nucleotides from our analysis since they are known to have hyper mutation rates than any other di-nucleotides, due to the mechanism of methylation–deamination of cytosine. In fact mammalian genomes are depleted of CpGs, except of CG islands.
We also mentioned this in the paper.
Friday, January 4, 2008
Trouble Running Program to Reproduce Results in Defective Clique Paper
So I had some trouble reproducing the results you had in your paper. I got the negative and postive gold standard from this website: http://networks.gersteinlab.org/intint/supplementary.htm to pass in. The G+ matched up (8250), though by G- was off. These are the results I got running your large data set with the above mentioned gold standard: Initial G+ = 8250, G- = 2697594, bogus(?) = 7573, new edges = 388, initial max cliques = 4934, 61 new pos interactions, -37 neg interactions (total 98), and LR = -539.08. The results you got on your paper were as follows: G- = 2708622, new edges = 437, edges detected in gold standard = 73, in neg = 21; thus LR = 1141.3. So I definitely am passing in the wrong variables/datasets.
Can you please direct me to the datasets on gold standard you are using for the large dataset? I looked at your additional websites/supplements mentioned in your paper, but was unable to find it. If I am doing this correctly, can you then explain why I am getting different numbers? That would be very kind of you and greatly, greatly appreciated.
Furthermore, on running the 56by56 network, if I understand correctly, you just run it with out a negative gold standard? So all you are doing in your paper is comparing the maximum cliques created after clique completion, so I do not have to worry about LR results, right?
Thanks a lot for your interest in my paper. You are doing the right things. I got the same number as you did using the same gold standard sets. My only explanation is that this part were done by another co-author (Valery). I guess he must have slightly different sets. Unfortunately, I couldn't get in touch with him anymore (this is why the delay in my response). But, the numbers are in the ballpark. It does not in any way diminish the effectiveness of our method.
As for the 56x56 network, you don't need a negative set or worry about the LR results
Wednesday, January 2, 2008
Rice Genome Pseudogenes
Unfortunately, we have not run our PseudoPipe program on the rice genome. The PseudoPipe software itself is rather complex and designed to run on a computing cluster. (It is computationally intensive) If you wish to try to install the software anyway, the source is located at http://www.pseudogene.org/DOWNLOADS/pipeline_codes/