Gerstein Lab FAQs: 2007

Tuesday, December 18, 2007

Affy CEL files for Tilescope

I was trying to use the TILESCOPE program for analyzing Arabidopsis tiling array data, but was unsuccessful. First of all, the .CEL files were not uploaded as it always showed that the file extension is incorrect. I was wondering if tilescope does not support .CEL files? But in the sample you have provided, there is an option for the same but that never appears on my computer (I am using a Mac machine). Do you think I should be downloading the program in order to solve it?

Tilescope does support CEL files. It is possible that it doesn't recognize your extension if it is not in all lower case. Since it is a web program, it doesn't matter if you're using a Mac or PC or any other OS as long as you're using a browser.

Thursday, December 13, 2007

Tiling array analysis tools

I wonder whether you could give me any advice regarding software that could evaluate (measure) transcription levels detected by tiling arrays for established annotation sets. In other words I'm looking for programs that would do the same thing they do for expression arrays (e.g. GeneChip Affy arrays) but in the tiling context.

You might want first to take a look at:

http://tiling.gersteinlab.org/platformcmp/

and the tool Tilescope as described in this paper:
http://papers.gersteinlab.org/papers/tilescope/

Monday, November 5, 2007

How to Cite Morph Server and CNS Script

I have found your Morph Server and CNS script very useful, particularly for providing a good visual aid for presentations and such. I am currently preparing some manuscripts for several protein crystal structures, which have two different conformational states. I was considering using your script to generate a morph of the two states, and creating an animated video of the conformational change to be included in the supplementary material of the paper. I was wondering if you had any requirements or wishes regarding the citation of your script. Obviously we would cite your work in the paper itself, but I was wondering if you had any other preferences for the inclusion of a citation in the supplementary material, such as putting a citation at the end of the animation. Any recommendation you could provide would be much appreciated.

Best to just cite
http://papers.gersteinlab.org/papers/molmovdb-update-nar/
http://papers.gersteinlab.org/papers/morphs-nar/
in the paper and end of suppl. material (if possible).

Wednesday, October 24, 2007

Mixed up figures in "Divergence of TFBS across related yeast species"

I am a PhD student at CS Dept, Univ of Washington, Seattle. From your Science paper "Divergence of TFBS across related yeast species" I have some questions.

Should Fig 2C be current Fig 2E?
Should Fig 2D_1 be current Fig 2C?
Should Fig 2E be current Fig 2D_1?

Looks like the Figure legend was mixed up. The text in the paper matches
up with the Figure, but in the legend parts C-E are mixed up. It should
read:

Comparison of binding by Ste12 and Tec1 across S. cerevisiae (red), S.
mikatae (blue), and S. bayanus (green). (A) Conserved binding. (B)
Conserved binding with quantitative signal differences. (C) Species-specific
binding despite conserved consensus sequences. (D) Binding only in S.
mikatae and S. bayanus. (E) Conserved binding with loss of consensus
sequences in one species. ChIp-chip enrichment signals are shown (log 2
ratios). Circles and squares represent matches to Tec1 PWM and Ste12
PWM, respectively. Triangles, nonconserved peaks; **, >2-fold difference
in peak signal intensity; *, >1.5-fold difference in peak signal
intensity.

Friday, October 19, 2007

Electronic Versions of Papers

Could you please send me an electronic version of your paper "The packing density in proteins: standard radii and volumes", JMB 290, 1999, p253-266?

Essentially all of my work is available on-line. Go to:

http://papers.gersteinlab.org

and click on the appropriate "preprint" link. You will be get a preprint or (if appropriate) journal reprint of the paper you want. There should be NO password challenges or other barriers. Usually, the papers are in PDF format but some are in HTML. (Other formats are available directly from http://papers.gersteinlab.org/e-print.)

Thursday, October 18, 2007

How to input parameters in NuProt Calculator

I would like to use the NucProt Calculator for an RNA/protein mix, more exactly the genome of a virus plus several copies of its viral enzymes. I am not sure how I should input things in the program. Shall I introduce the sequence of the viral genome followed by those of the viral proteins?

The NucProt Packing-Eff calculator works on PDB structures NOT sequences, so they cannot mean that:
http://www.molmovdb.org/cgi-bin/voronoi.cgi

The NucProt PSV and Volume Calculator:
http://www.molmovdb.org/cgi-bin/psv.cgi
is not the best written CGI code that can take several sequences at once, but since I didn't understand HTTP GET and POST very well, I had it put all in the information in the GET rather than the POST, so there is a browser limit. I don't think I have write access anymore to this file.

Technically it can take several sequence in FASTA format though.

Wednesday, October 17, 2007

How to obtain datasets from interolog?

In the process of evaluating our "interologs" studies, we found that we need urgently the interaction datasets you used in Yu /et al. /(2004) /Genome Res/ *14*:1107-1118. We wish that we could acquire the datasets of fly (4768 interactions) and worm (410 interactions).

Please visit http://interolog.gersteinlab.org/

Tuesday, October 9, 2007

Are there errors in "Relating 3D structures to protein networks provides evolutionary insights"?

After reading your paper "Relating 3D structures to protein networks provides evolutionary insights" which was published at SCIENCE in 2006, I have some questions about some details. In the third paragraph you said "proteins connected by simultaneously possible interactions are more likely to share the same function than are those connected by mutually exclusive ones", but the result which was shown in Table 1 looks like conversed. I also want to know the computing method of coexpression correlation and expression correlation, because I found their respective results which occur in Table1 and Table2 are also conversed.

You're right, also refer to our website sin.gersteinlab.org, where we list this typo.

Data for Ste12 amd Tec1

Is your raw data avail for the tiling arrays of Ste12 and Tec1?

Please find it on GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5421

Wednesday, October 3, 2007

A tricky problem with PDB files

I've submitted this request a few times over the past month, never really looking too far into what might be causing the error. Today, however, I checked my pdb files and they have the same numbering and chain identification yet the error is the same. This is a pdb of an RNA under two conditions (one is the crystal structure, one is our model of an intermediate). Can you suggest any way forward?

The most common problems arise from use of a flavor of the PDB format that the morph server doesn't recognize, as you may be doing for the modelled structure. You could try submitting a truncated version of the the crystal structure (which I'm guessing you downloaded from the pdb) as structure 1, change a few coordinates, and submit that modified file as structure 2. This will give us a quick test to determine if it is your modelled structure that is causing the problem.

Monday, October 1, 2007

Morph Server functioning correctly?

Is the morph server functioning correctly? I’ve used it for several years for course projects but students are having difficulty with it this year. We have a new network system and I am trying to find out if it is on your end or ours where the problem lies.

The morph server has experienced an unexpected surge in popularity recently. We are discussing upgrading hardware and this should help. Also we will be investigating whether we have software issues that need to be addressed. I can tell you that new submissions are successfully being added daily, so the server does work. My suggestion for fast results is to try small proteins and ask for a small number of frames (maybe just 4 or so). Also we have three different morph engines as you may appreciate, so if one doesn't work you might try another the next day. Please don't flood our server with multiple submissions of similar proteins in a single day though.

Thursday, September 13, 2007

Tilescope not working

Is Tilescope working?

Currently Tilescope is going through some revisions and will be offline until further notice. Thank you for your patience.

Thursday, September 6, 2007

CK19 pseudogenes effect on primer design

I read with great interest your article "Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome" Genome Res. 2003 13:2541-2558. As I'm especially interested in CK19 pseudogenes, because we use this gene in the tumour biology department as a marker to detect tumour cells in lymph nodes of breast
cancer patients, I have a few questions to you and would be very grateful if you could answer them:

We designed our primers for the PCR approach in that way that the two pseudogenes (Genbank accession number M33101 (CK19a) and U85961 (CK19b)) mentioned in the literature cannot be amplified. You wrote in your article that there exist 4 pseudogenes of CK19 on chromosome 4, 6, 10 and 12. As we want to avoid false positive results in our detection approach it is very important for us to know the sequence of the other 2 pseudogenes. Could you provide us with the sequence or some further information on these pseudogenes – possibly the alignment of the CK19 sequence with the 4 pseudogenes? Do you know if CK19 pseudogenes can be transcribed? I looked in the literature and databases for this information but could not find an answer.

There are indeed 4 processed pseudogenes identified by the pipeline methodology developed in Dr. Gerstein's lab. You can retrieve the nucleotide sequences and the alignment of each pseudogene to CK19 from the following URL

http://homes.gersteinlab.org/people/suganthi/outbox/CK19/

I have also included one other potentially pseudogenic fragment upstream of CK19 at Chr 17. This is labeled 'ambiguous' as only a small portion matches to the parent gene. I have provided nucleotide alignments as I assume that is what is relevant for PCR purposes. Also, please note that all the pseudogenes are processed pseudogenes.

Wednesday, August 22, 2007

Method for applying spectral biclustering algorithm

Regarding your spectral biclustering algorithm, I need to apply it for biclustering of web pages to analyse Internet users behaviour. Would you mind tell me what was the tool you used to apply this algorithm or send me the program if it's possible as I need to apply many biclustering algorithms and it's impossible for me to program all them.

The code was written in MATLAB. The key steps involve using an iterative bi-normalization procedure followed by the standard SVD function. The subsequent steps involve partitioning of the left and right eigenvectors using a k-means like procedure.

We have not put all the components in one fully automated version. If you think that getting the components is useful for you I will dig in the directories of my old computer.

Turning morph to a movie

I have a morph I created some time ago that illustrates an essential motion in the protein I study. I'd like to have the morph as part of my thesis talk. Can I turn it into a movie that I can store on my computer and have as a part of my power point presentation (in other words, can I have a copy of it so that I don't have to access the webpage?). How do I do this? Are there obvious directions on teh webpage that I"m just missing?

On the jmol morph page you will see a link that says:

Orient the molecule to your liking and then:
Generate high-res gif

Unfortunately it is just a wireframe animation. I had it set to do cartoons but for some reason it's back to wireframe now.

Alternatively, you can download the interpolated trajectory, then animate using vmd or jmol.

Look for an NMR-formatted PDB file called movie.pdb here:

http://www.molmovdb.org/uploads/b061676-11139

Compiled Sequences for Human and Chimp

I am seeking sequences for all the pseudogenes listed in the flat files for (at minimum) human and chimp ( 9606.71.gtf and 9598.2.gtf). I tried to look at the assembled sets on the website but I only found compiled sequences for processed or putative pseudogenes, and not duplicate pseudogenes. I wanted to ask you if there are files somewhere on the site that have sequence data for all pseudogenes listed in the species gtf files.

Sorry, we don't have that.

Monday, August 20, 2007

Local Clustering of Expression Data Software Download

I am interested in using your local clustering of expression data software
(J Mol Biol. 2001 Dec 14;314(5):1053-66). I would really appreciate if
you provide me the link to download the software for my academic use.

http://bioinfo.mbb.yale.edu/expression/cluster/program.html

Friday, August 17, 2007

Real motions for residues in paper "Normal Modes for Predicting Protein Motions"

About your paper "Normal

modes for predicting protein motions". Probably, I miss something when I'm reading... I didn't understand how you get the real (observed) motions for residues? Did you perform MD simulation for all proteins in your set?

Regarding my "Normal modes" paper: the “real (observed) motions for residues” are not
actually “real motions” – as long as we could find two substantially different conformations for the same protein in PDB, we assumed that such motion (1st conformation and 2nd one) could potentially take place. We didn’t do any MD simulations in that paper (although, we did possess technologies that would allow us to simulate such motion – e.g. our MorphServer). In our normal modes paper we figured that any such simulation is unnecessary – all we needed for that study was just a set of vectors connecting the residues from the two conformations (to examine how they correlate with the NM-predicted motion vectors).

Wednesday, July 18, 2007

Non-PDF format for 1136174s_TableS6 in Paper

In the paper relating three dimensional structures to protein networks provides evolutionary insights, the file (1136174s_TableS6) that could be obtained on line is only in PDF format. I wonder whether you have other format file (like .txt)?

We have put up Table S6 (and other supplementary data) as .xls and .tab format on our website:
http://sin.gersteinlab.org

Monday, July 2, 2007

Information concerning source code of the calculation for betweenness

Is there source code of the calculation for betweenness referenced in your paper The importance of Bottlenecks in Protein networks: Correlation with gene essentially and expression dynamics?

The supplementary website: http://www.gersteinlab.org/proj/bottleneck/ contains more information about this paper.

Saturday, June 23, 2007

Program for calculating number of waters inside protein

Concerning your paper entiled Packing at the protein-water interface (PNAS
93), do you have an algorithm available for distribution (or know of one)
that can calculate the number of waters inside a protein ?

Only programs are available at geometry.molmovdb.org which calculate packing density but position waters best done by other programs.

Monday, June 18, 2007

Adding Original Genes with Gene Names to Pseudogenes Website

On the Pseudogenes website, only a few(~300) of the 16K pseudogene hits could be linked to a gene in the RefSeq gene list file. Would it be possible to add a list off all the original genes with their genome location on your website.

Most of the original genes are listed by Ensembl ID. You can look up their information at http://www.ensembl.org/

Sunday, June 17, 2007

Deciding which Fragment is Flexible or not on MolmovDB

We want to know which fragment is flexible or not according to the information from “the molmovdb” database.

If your protein has been crystallized in two conformations you can submit a job to our "morph server." Visual inspection may then give you the information you seek. If you have only one structure, you can submit it to our HingeMaster server, and several different flexibility analysis programs will be used to find the hinge location. Note that this is designed for use with domain hinge bending proteins. It will not be very helpful if your protein moves by shear or order-disorder transition mechanisms.

We also have a motion prediction program, the Conformation Explorer, which predicts conformational change for hinge bending proteins, either induced by ligand or otherwise. It's still under development, and use would require further discussion.

Tuesday, June 12, 2007

MolMovDB Data Dump

You offer to provide the MolMovDB as a complete data set upon request. Where can we get a copy of it, preferably as a MySQL database?

The Hinge Atlas dataset, described in our recently accepted BMC Bioinformatics paper, is available for download here:

http://molmovdb.org/cgi-bin/sets.cgi

Just scroll down to the Hinge Atlas section. You probably want the coordinate set data, more than the mysql dump. The rest of the database is not yet available for download.

Monday, June 11, 2007

How to Create a Movie of Motions if there's DNA Lesion

We would like to create a movie of these motions for our publication and would not like to use the public morphing server if possible. The morphing of our structures might not possible at all, due to the cisplatin containing DNA lesion. We could send you also the CNS parameter and topology files we used for refinements.

If you submit the protein and DNA separately this should work. You would then have the problem, though, of putting the two back together. What I would suggest is submitting the two jobs and emailing me when they are done. I would then remove them from the public part of the database, but you could still download the structural interpolations. I would suggest using a small number of interpolated frames, at least for a first try, to minimize compute time.

Tuesday, June 5, 2007

Pseudogene Sequence Data Download

We know there is a way to download the pseugogenes of each organism, the file that is downloaded comes with the name of the pseudogene, the start and end position etc. But we wanted to download the sequences of each pseudogene of each organism directly, and we didn't find a way to do that in the database. Is it possible to download the sequence? Or do we have to make a program that, given the genome of the organism and the start/end positions of each peseudogene, extract the correspondent sequence?

None of the flatfiles contain the raw sequence information. On an individual pseudogene basis, however you can query the system for either the amino acid or nucleotide sequence. Simply search for the pseudogene you're looking for and on the results page click either the red or yellow button.

(Example results page: http://www.pseudogene.org/cgi-bin/search-results.cgi?tax_id=9606&set_search=63&amp;criterion0=&operator0=&searchValue0=&all=View+All+Pseudogenes&sort=1&output=html )

To get the sequence information for a large set of pseudogenes, however, it would probably be best to write the program you suggested.

Tuesday, May 29, 2007

Packing Software on Mac OS X

The packing software on molvovdb geometry site working on Mac OS X? If not, can you help with porting it to work on Mac OS X?

There has been problems compiling the program and making it work on Mac OS X. The program was written by a lab member who no longer is available for this. Please continue using the web interface.

PNAS Paper Error

Regarding the PNAS paper titled Genomic analysis of the hierarchical structure of regulatory networks, I am having trouble understanding the organization of the columns. How was the placement of proteins in columns determined? And what is the purpose behind the duplication of level 1 proteins such as SPT8, and gaps within the columns?

Unfortunately, I think that is a typo when PNAS edited our paper. It is not in our pdf file that we sent to PNAS.

Sunday, May 20, 2007

Background Probability

In your paper, "Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures" what is the background probability for the Viterbi and forward algorithm you used?

We used spatial ellipsoid of coordinate dist. in ali. for atom's position and used a flat prior in estimating this.

Table 1 Discrepancy

Regarding the Science paper titled Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights the numbers in the Table 1 tell something different. According to Table 1, Simultaneously possible interactions have less fraction of same functions, as well as less fraction in co-expression correlation. Can you please clarify?

The table headings unfortunately got switched at some unknown point during copy editing. We have posted a note to that point on our site. Do let me know if you need more help on this!

Friday, May 18, 2007

Sieve-Fit Program

Regarding the availability of the "sieve-fit". Do you have the program (source, script or whatever) or know who could provide it to me?

see http://geometry.molmovdb.org/
then
http://bioinfo.mbb.yale.edu/geometry/screw-axis/

http://geometry.molmovdb.org/files/geometry/readme.html
then
http://geometry.molmovdb.org/files/geometry/src-prog3/sieve-fit.main.c

Unix Version of Voronoi Calculator?

Where is there a stand alone version of the Voronoi packing efficiency calculator program for unix platform?

The website runs the packing-eff.exe program (in src-prog3/ folder)from this package:

http://geometry.molmovdb.org/files/libproteingeometry-2.2.tgz

Currently there is no unix version.

Monday, May 14, 2007

How do I get your original, published papers online?

Essentially all of my work is available on-line. Go to:

http://papers.gersteinlab.org

and click on the appropriate "preprint" link. You will be get a preprint or (if appropriate) journal reprint of the paper you want. There should be NO password challenges or other barriers. Usually, the papers are in PDF format but some are in HTML. (Other formats are available directly from http://papers.gersteinlab.org/e-print.)

Please let me know if you have any problems with this service. If you can't get
what you want, we can easily post you normal paper reprints.

PS I'm CC'ing this message to our FAQ archive (faq@bioinfo.mbb.yale.edu) as it
enables me to track the popularity of various papers.

Monday, April 16, 2007

Complicated question regarding a protocol to decrease the transition

We are trying to calculate the energetic cost of the transition of two structures of TFIIB protein. I am using the CNS script available on the molmovdb to obtain multiples states along the transition between both structures. I am doing this from the xray to nmr structure on both ways. My problem is the following, after a few attempts I found a protocol to decrease the hysteresis of the transition. I begin with 500 frames and every frame energy minimizated with 500 powell steps. I tried to decreased even more the hysteresis and increase to 1000 steps with the same 500 powell steps. My problem began when in other attempt to decrease the hysterisis and I made 1500 frames with the same 500 powell steps, as you can see on the graph that I attached, It appears that the hysteresis increased with the 1500 steps. When I repeated the 1500 frames and minimizated with 1000 powell steps, the hysteresis decrease.

Do you think that this behavior is correct? Does it look strange that I have any energetic barrier of the transition?

I would be very cautious about assuming that a linear interpolation between two structures represents the thermodynamically most probable trajectory of motion. I don't know why using more frames would increase hysteresis -- presumably you mean that an energy difference resulted when the protein nearly finished its morph trajectory. Maybe the coarser interpolation jumped over a barrier and found a more favorable path. If you clarify what you are doing I may be able to provide a hint, though I think most likely I do not have a rigorous answer for you.

Sunday, April 15, 2007

Atomic Structure

Regarding the paper, "Relating Three-Dimensional, Structures to Protein Networks Provides Evolutionary Insights", how do I obtain the the atomic information about your structural network, namely PDB files containing your interacting interfaces?

I think this is possible. I think we can put this on our data download page on
sin.gersteinlab.org . We'll follow up shortly on this.

Tuesday, January 30, 2007

Clarification for table 1 of "Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights"

I have been reading your paper titled "Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights" with great interest. Specifically, there is one section that I am confused about that I would like to ask for clarification of: page 1939, column three, paragraph 1, description of table 1. In comparing hubs with simultaneously possible interaction partners to those with mutually exclusive interaction partners, table 1 seems to indicate that the fraction of protein partners with same biological process, molecular function, cellular component and coexpression correlation is higher in hubs with mutually exclusive interactions with partners, however, the text seems to indicate the reverse. I feel that I must be misinterpreting something, so any clarification would be greatly appreciated.

Yes, you're right, unfortunately the column headings of table 1 got switched at a late stage of editing. Since we've been getting a fair share of email about this, there is a note regarding this on the paper website http://sin.gersteinlab.org