The first end-to-end (‘telomere-to-telomere’) fully gapless DNA sequence of a human chromosome is a serious milestone for genomics analysis.
Although the present human reference genome is essentially the most correct and full vertebrate genome ever produced, there are nonetheless gaps within the DNA sequence, even after twenty years of enhancements. Now, for the primary time, scientists have decided the whole sequence of a human chromosome from one finish to the opposite (‘telomere to telomere’) with no gaps and an unprecedented stage of accuracy.
The publication of the telomere-to-telomere meeting of an entire human X chromosome on July 14, 2020, in Nature is a landmark achievement for genomics researchers. Lead writer Karen Miga, a analysis scientist on the UC Santa Cruz Genomics Institute, mentioned the mission was made doable by new sequencing applied sciences that allow “ultra-long reads,” such because the nanopore sequencing expertise pioneered at UC Santa Cruz.
Repetitive DNA sequences are widespread all through the genome and have all the time posed a problem for sequencing as a result of most applied sciences produce comparatively quick “reads” of the sequence, which then need to be pieced collectively like a jigsaw puzzle to assemble the genome. Repetitive sequences yield tons of quick reads that look virtually similar, like a big expanse of blue sky in a puzzle, with no clues to how the items match collectively or what number of repeats there are.
“These repeat-rich sequences were once deemed intractable, but now we’ve made leaps and bounds in sequencing technology,” Miga mentioned. “With nanopore sequencing, we get ultra-long reads of hundreds of thousands of base pairs that can span an entire repeat region, so that bypasses some of the challenges.”
Filling within the remaining gaps within the human genome sequence opens up new areas of the genome the place researchers can search for associations between sequence variations and illness and for different clues to essential questions on human biology and evolution.
“We’re starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we’ve been missing a lot of information that could be important to understanding human biology and disease,” Miga mentioned.
Telomere to telomere
Miga and Adam Phillippy on the National Human Genome Research Institute (NHGRI), each corresponding authors of the brand new paper, co-founded the Telomere-to-Telomere (T2T) consortium to pursue an entire genome meeting after working collectively on a 2018 paper that demonstrated the potential of nanopore expertise to supply an entire human genome sequence. That effort used the Oxford Nanopore Technologies MinION sequencer, which sequences DNA by detecting the change in present circulation as single molecules of DNA go by means of a tiny gap (a “nanopore”) in a membrane.
The new mission constructed on that effort, combining nanopore sequencing with different sequencing applied sciences from PacBio and Illumina, and optical maps from BioNano Genomics. Using these applied sciences, the staff produced a whole-genome meeting that exceeds all prior human genome assemblies in phrases of continuity, completeness, and accuracy, even surpassing the present human reference genome by some metrics.
Nevertheless, there have been nonetheless a number of breaks within the sequence, Miga mentioned. To end the X chromosome, the staff needed to manually resolve a number of gaps within the sequence. Two segmental duplications have been resolved with ultra-long nanopore reads that fully spanned the repeats and have been uniquely anchored on both facet. The remaining break was on the centromere, a notoriously tough area of repetitive DNA present in each chromosome.
In the X chromosome, the centromere encompasses a area of extremely repetitive DNA spanning 3.1 million base pairs (the bases A, C, T, and G kind pairs within the DNA double helix and encode genetic data of their sequence). The staff was in a position to determine variants inside the repeat sequence to function markers, which they used to align the lengthy reads and join them collectively to span all the centromere.
“For me, the idea that we can put together a 3-megabase-size tandem repeat is just mind-blowing. We can now reach these repeat regions covering millions of bases that were previously thought intractable,” Miga mentioned.
The subsequent step was a sprucing technique utilizing knowledge from a number of sequencing applied sciences to make sure the accuracy of each base within the sequence.
“We used an iterative process over three different sequencing platforms to polish the sequence and reach a high level of accuracy,” Miga defined. “The unique markers provide an anchoring system for the ultra-long reads, and once you anchor the reads, you can use multiple data sets to call each base.”
Nanopore sequencing, along with offering ultra-long reads, may detect bases which were modified by methylation, an “epigenetic” change that doesn’t alter the sequence however has essential results on DNA construction and gene expression. By mapping patterns of methylation on the X chromosome, the staff was in a position to affirm earlier observations and reveal some intriguing tendencies in methylation patterns inside the centromere.
The new human genome sequence, derived from a human cell line known as CHM13, closes many gaps within the present reference genome, often called Genome Reference Consortium construct 38 (GRCh38).
The T2T consortium is constant to work towards completion of all of the CHM13 chromosomes. “It’s an open consortium, so in many respects this is a community-driven project, with a lot of people dedicating time and resources to it,” Miga mentioned.
Read First End-to-End DNA Sequence of a Human Chromosome for extra on this breakthrough.
Reference: “Telomere-to-telomere assembly of a complete human X chromosome” by Karen H. Miga, Sergey Koren, Arang Rhie, Mitchell R. Vollger, Ariel Gershman, Andrey Bzikadze, Shelise Brooks, Edmund Howe, David Porubsky, Glennis A. Logsdon, Valerie A. Schneider, Tamara Potapova, Jonathan Wood, William Chow, Joel Armstrong, Jeanne Fredrickson, Evgenia Pak, Kristof Tigyi, Milinn Kremitzki, Christopher Markovic, Valerie Maduro, Amalia Dutra, Gerard G. Bouffard, Alexander M. Chang, Nancy F. Hansen, Amy B. Wilfert, Françoise Thibaud-Nissen, Anthony D. Schmitt, Jon-Matthew Belton, Siddarth Selvaraj, Megan Y. Dennis, Daniela C. Soto, Ruta Sahasrabudhe, Gulhan Kaya, Josh Quick, Nicholas J. Loman, Nadine Holmes, Matthew Loose, Urvashi Surti, Rosa ana Risques, Tina A. Graves Lindsay, Robert Fulton, Ira Hall, Benedict Paten, Kerstin Howe, Winston Timp, Alice Young, James C. Mullikin, Pavel A. Pevzner, Jennifer L. Gerton, Beth A. Sullivan, Evan E. Eichler and Adam M. Phillippy, 14 July 2020, Nature.
In addition to Miga and Phillippy, the authors of the paper embrace co-first writer Sergey Koren on the National Human Genome Research Institute and scientists at practically two dozen establishments within the U.S. and U.Okay., together with the University of Washington, Johns Hopkins University, UC San Diego, and the Wellcome Sanger Institute. This work was supported by the U.S. National Institutes of Health.