Tuesday, June 5, 2007

Pseudogene Sequence Data Download

We know there is a way to download the pseugogenes of each organism, the file that is downloaded comes with the name of the pseudogene, the start and end position etc. But we wanted to download the sequences of each pseudogene of each organism directly, and we didn't find a way to do that in the database. Is it possible to download the sequence? Or do we have to make a program that, given the genome of the organism and the start/end positions of each peseudogene, extract the correspondent sequence?

None of the flatfiles contain the raw sequence information. On an individual pseudogene basis, however you can query the system for either the amino acid or nucleotide sequence. Simply search for the pseudogene you're looking for and on the results page click either the red or yellow button.

(Example results page: http://www.pseudogene.org/cgi-bin/search-results.cgi?tax_id=9606&set_search=63&criterion0=&operator0=&searchValue0=&all=View+All+Pseudogenes&sort=1&output=html )

To get the sequence information for a large set of pseudogenes, however, it would probably be best to write the program you suggested.

No comments: