The split structure of most mammalian protein-coding
genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover
conserved coding exonic
sequences subject to AS that have so far gone undetected. Applying UNCOVER to orthologous
introns of known human and mouse genes predicts
skipped exons or retained introns present in both species, while discriminating them from conserved noncoding sequences. The accuracy of the model is evaluated on a curated set of genes with known conserved AS events. The prediction of skipped exons in the ~1 of the human genome represented by the ENCODE regions leads to more than 50 new exon
candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis of 15 introns with strong UNCOVER predictions and lacking EST evidence. These results imply that a considerable number of conserved exonic sequences and associated isoforms are still completely missing from the current annotation of known genes. UNCOVER also identifies a small number of candidates for conserved intron retention.
More abstracts about the Recognition of Unknown Conserved Alternatively Spliced Exons