Repetitive Regions of Genomes

Repetitive DNA is nucleotide sequences that appear multiple times across genome which are abundant in a broad range of species, from bacteria to mammals and cover nearly half of the human genome. The role of repetitive DNA in the genome has remained speculative for decades however several recent studies have demonstrated that repetitive DNA can be potentially a vital drive for stabilizing genome contact. For instance, short tandem repeats, whose expansion is the direct reason for more than 25 inherited human disorders such as fragile X syndrome and Huntington, has proven to co-localize with chromatin domain boundaries (Sun et al. 2018. Cell).

My interest in the repetitive DNA began when I was participating in the ENCODE III project where I developed Permseq (Xin et al. 2015. PLOS Computational Biology), an R package to mapping protein-DNA interactions in highly repetitive regions of the genomes with prior-enhanced read mapping. I have found striking false positives and false negatives if reads originated from the repetitive regions are discarded due to alignment uncertainty.

te

Later on, I developed mHi-C (Zheng et al. 2018. bioRxiv) to address similar multi-mapping reads issue in 3D nucleosome studies yielding significant improvement in sequencing depth and refined inference of genome structure and functions.

te

te

Avatar
Ye Zheng
Ph.D. Candidate in Statistics

Publications

Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells’ regulatory programs. …