How good is the current algorithm to identify pseudogenes? For the maiz example, how can we know that thousands of copies of RVP genes on transoposons are functional or not?
1. Our pipeline has specific criteria for identifying pseudogenes and the first step involves filtering out exons annotated as protein coding. Therefore, if the underlying genome annotation is incorrect, then we will miss some pseudogenes. The scenario you have described is similar to ribosomal protein pseudogenes where we observe several retrotransposed pseudogenes. In this case, we specifically modified the pipeline to not mask the exons as most of the ribosomal proteins were misannotated in databases.
2. I am not very familiar with work on maize genome or pseudogenes in plants. I will discuss more with my colleagues and get back to you if there are new insights. But based on my experience with ribosomal protein pseudogenes, most such processed pseudogenes are non-functional. While one can never be sure if something is non-functional, there are a few things that one could do
a. Compare multiple genomes at various distances to maize genome to see if that region is conserved. If it is, there is some biological preference for retaining those pseudogenes.
b. Look to see if there are known promoter elements upstream of these regiosn which could potentially enable transcription/translation.
You might want to refer to a paper we recently published on ribosomal protein pseudogenes,Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes