Identification of endogenous retroviral reading frames in the human genome

Research output: Contribution to journalJournal articleResearchpeer-review

Background: Human endogenous retroviruses (HERVs) comprise a large class of repetitive retroelements. Most HERVs are ancient and invaded our genome at least 25 million years ago, except for the evolutionary young HERV-K group. The far majority of the encoded genes are degenerate due to mutational decay and only a few non-HERV-K loci are known to retain intact reading frames. Additional intact HERV genes may exist, since retroviral reading frames have not been systematically annotated on a genome-wide scale. Results: By clustering of hits from multiple BLAST searches using known retroviral sequences we have mapped 1.1% of the human genome as retrovirus related. The coding potential of all identified HERV regions were analyzed by annotating viral open reading frames (vORFs) and we report 7836 loci as verified by protein homology criteria. Among 59 intact or almost-intact viral polyproteins scattered around the human genome we have found 29 envelope genes including two novel gammaretroviral types. One encodes a protein similar to a recently discovered zebrafish retrovirus (ZFERV) while another shows partial, C-terminal, homology to Syncytin (HERV-W/FRD). Conclusions: This compilation of HERV sequences and their coding potential provide a useful tool for pursuing functional analysis such as RNA expression profiling and effects of viral proteins, which may, in turn, reveal a role for HERVs in human health and disease. All data are publicly available through a database at http://www.retrosearch.dk.

Original languageEnglish
Article number32
JournalRetrovirology
Volume1
ISSN1742-4690
DOIs
Publication statusPublished - 11 Oct 2004
Externally publishedYes

ID: 203902389