Gene Annotation

Prof. Mark Gerstein is the Albert L Williams professor of Biomedical Informatics, Molecular Biophysics and Biochemistry, and Computer Science at Yale, whose research group employs and develops computational methods to identify and annotate the functional regions in the human genome. As part of these efforts, he is a member of various international consortia such as the Encyclopedia of DNA Elements (ENCODE), exRNA and the 1000 Genomes projects. Knowledge of these elements will allow better prevention, diagnosis, and treatment of diseases. The high performance computing (HPC) facilities at Yale University are an essential part of the Gerstein lab’s daily work. The Gerstein group utilizes the HPC facilities to integrate highly heterogeneous functional genomic datasets for annotating and inferring the function of various regions within the human genome.

Professor Gerstein’s group is using the Yale supercomputers to run the in house pseudogene annotation pipeline as part of the GENCODE/ENCODE project. The human pseudogene pipeline is run on a monthly basis providing the scientific community with a rigorous and up-to-date annotation. The results are automatically published on the pseudogene.org website and made freely available.  Professor Gerstein’s group also makes use of the HPC facilities in our work as part of the 1000 Genomes project analyzing and annotating variations in numerous genomic elements. The results of this work conducted on the Yale clusters have been successfully published in high impact journals and picked up by several news and media outlets. These studies are funded by the National Institute of Health and the results from these studies are disseminated widely within the community.