Wednesday, April 23, 2008

Human genome build for hg16 Tiling HMM

I read your 2006 paper that describes a supervised hidden markov model framework for tiling array data in transcriptional experiment. In order to gain some insight, the transcriptional experimental data was downloaded from the website. But I wonder what human genome version should the positions in these data be mapped to. I'll appreciate if you could tell me what human genome it is.

hg16 (NCBIv34)

Monday, April 21, 2008

Do you call mutations at key resideus of a protein pseudogenes?

If a gene is transcribed and translated, but there are some mutations at key residues of the protein to make the protein non-functional or very unstable (with very short half life), do we call these kind of gene pseudogene? If it is not, it will be less meaningful for me to classify them, because it is not under the same evolutional pressure as functional genes. If it is, computer algorithms may have difficulty to identify them.

This is a very interesting question. The definition of "gene" and "pseudogene" is extremely fuzzy. Currently, our pipeline will not call this a pseudogene because we primarily look for frame-shifts and nonsense mutations. We can identify processed pseudogenes which don't have frame-shifts or nonsense mutaions, specifically pseudogenes of multiexon genes that will appear as a single exon retrotransposed gene. We don't have a clear way of differentiating between a functional retrogene and pseudogene. We simply flag it is a processed pseudogene. If the gene of interest has many exons and this structure is retained in the non-functional entities, we will not call it a pseudogene unless we detect a frame-shift or a nonsense mutation. But we are constantly adding new features to our pipeline and will have a discussion with Prof. Gerstein and the rest of the team about this aspect.

How good is Pseudogene Identification?

How good is the current algorithm to identify pseudogenes? For the maiz example, how can we know that thousands of copies of RVP genes on transoposons are functional or not?

1. Our pipeline has specific criteria for identifying pseudogenes and the first step involves filtering out exons annotated as protein coding. Therefore, if the underlying genome annotation is incorrect, then we will miss some pseudogenes. The scenario you have described is similar to ribosomal protein pseudogenes where we observe several retrotransposed pseudogenes. In this case, we specifically modified the pipeline to not mask the exons as most of the ribosomal proteins were misannotated in databases.

2. I am not very familiar with work on maize genome or pseudogenes in plants. I will discuss more with my colleagues and get back to you if there are new insights. But based on my experience with ribosomal protein pseudogenes, most such processed pseudogenes are non-functional. While one can never be sure if something is non-functional, there are a few things that one could do

a. Compare multiple genomes at various distances to maize genome to see if that region is conserved. If it is, there is some biological preference for retaining those pseudogenes.

b. Look to see if there are known promoter elements upstream of these regiosn which could potentially enable transcription/translation.

You might want to refer to a paper we recently published on ribosomal protein pseudogenes,Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes

Saturday, April 12, 2008

How do I get data for the Bayesian Networks Paper?

I am a graduate student in University of Science and Technology of China majoring in bioinformatics. I am to perform some experiment concerning PPI network, and i found your article A Bayesian networks approach for predicting protein-protein interactions from genomic data very useful to my work, and need the variousdatasets preprocessed by your group. Will you send me a copy or tell me the site where i can download the data?

You could find the data from the supplementary website:

Thursday, April 10, 2008

How to obtain Standalone version of Calc-surface?

I am wondering if I could get a stand-alone version of calc-surface programe, I cannot find it in the software page of Prof. Gerstein lab.

I don't release binaries, because I am not good enough at it, to make it compatible with most systems. I would be happy to compile it for you, but I need to know what type of system you have: Linux, Mac, 32bit, 64bit, G4, G5. I am sorry to say I can't do Windows. Please contact the lab

Wednesday, April 9, 2008

How do I modify source code of Calc-Surface?

I want to use calc-surface with a large probe of 14 Angstroms but calc-surface seg faults when the probe is larger than 3 to 4 Å. I would like to modify the code to handler larger probes, can you point me in the right direction as to where to edit the code? I am using version 2.3.1.

The code you should be editing is called calc-surface.main.c which is in:


I would also be careful about some other scripts or sub-routines that this program calls. Most of the scripts are usually in the same dir. All the programs are in one of these three:
src-prog, src-pro2 and src-pro3.

Wednesday, April 2, 2008

We have a new integrin ectodomain structure with two molecules in the asymmetric unit. There is a small amount of breathing at what we call the headpiece-tailpiece interface when the two molecules are compared. This is very important to some molecular dynamic simulations that we are doing. The movement between the two molecules is small, a few degrees, but involves large units of the molecule. We would like to use the morph server to extrapolate, rather than interpolate, this motion. I have looked at your description of frodo lite and this seems a good approach. So, following up our meeting at Yale, we would appreciate some help with this.

By "extrapolate" I believe you mean that you want to predict an unknown conformation from one or two known ones. FRODA can in fact be run in "undirected" mode and this will sample the accessible phase space consistent with sterics and the hydrogen bonding pattern. However it does not pick out the desired conformer from the large number of generated conformers. Certain assumptions are also made about the hydrogen bonding pattern which may result in the desired conformation not being present at all in the generated ensemble.

Instead of FRODA, perhaps you want to try our soon to be announced motion prediction tool, the Conformation Explorer. It is specifically designed to predict the motion of domains. Domains are often too large and slow moving to be dynamically characterized by MD. Further complicating matters, the motions are stochastic and the MD force fields are far from perfect; therefore the motion may not be observed even when the trajectory has been computed for a period of time experimentally
known to be sufficient. We have been successful in predicting large scale domain hinge bending motions for five proteins, including biotin carboxylase, glutamine binding protein, and MurA.

We do need to know something about the conformation which is to be predicted, however, in order to pick the right conformer out of the ensemble. If it binds a small ligand, for instance, we can find the holo structure given the apo by computing stability, free energy of ligand binding, gyration radius, and other quantities. Your use of the word "extrapolate" suggests you may have some geometric information about the target conformer which we can use.