Ph.D. Case Western Reserve University
The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.