Friday, January 4, 2008

Trouble Running Program to Reproduce Results in Defective Clique Paper

To test out your algorithm in the paper "Predicting interactions in protein networks by completing defective cliques.", I was running the dcc code on C, and this is what I am assuming to pass in order: (repetition if needed), minimal overlap size, max size of non-overlapping parts, the network you want to find cliques, the negative Gold standard binfile, and the positive gold standard binfile, and output file to write results.

So I had some trouble reproducing the results you had in your paper. I got the negative and postive gold standard from this website: http://networks.gersteinlab.org/intint/supplementary.htm to pass in. The G+ matched up (8250), though by G- was off. These are the results I got running your large data set with the above mentioned gold standard: Initial G+ = 8250, G- = 2697594, bogus(?) = 7573, new edges = 388, initial max cliques = 4934, 61 new pos interactions, -37 neg interactions (total 98), and LR = -539.08. The results you got on your paper were as follows: G- = 2708622, new edges = 437, edges detected in gold standard = 73, in neg = 21; thus LR = 1141.3. So I definitely am passing in the wrong variables/datasets.

Can you please direct me to the datasets on gold standard you are using for the large dataset? I looked at your additional websites/supplements mentioned in your paper, but was unable to find it. If I am doing this correctly, can you then explain why I am getting different numbers? That would be very kind of you and greatly, greatly appreciated.

Furthermore, on running the 56by56 network, if I understand correctly, you just run it with out a negative gold standard? So all you are doing in your paper is comparing the maximum cliques created after clique completion, so I do not have to worry about LR results, right?

Thanks a lot for your interest in my paper. You are doing the right things. I got the same number as you did using the same gold standard sets. My only explanation is that this part were done by another co-author (Valery). I guess he must have slightly different sets. Unfortunately, I couldn't get in touch with him anymore (this is why the delay in my response). But, the numbers are in the ballpark. It does not in any way diminish the effectiveness of our method.

As for the 56x56 network, you don't need a negative set or worry about the LR results

No comments: