Ph.D. Case Western Reserve University
Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.