Making Complete Genome Sequencing Affordable, Accessible
Automation software uses advanced computational tools to generate gapless genome sequences in just a couple of days.
In October 2004, the International Human Genome Sequencing Consortium published the first-ever “near-complete” human genome sequence. The sequence contained 2.85 billion nucleotides (building blocks of the DNA molecule) interrupted by 341 gaps.1 But the caveat was that the researchers were only able to sequence the “euchromatin” part of the human DNA. Euchromatin consists of lightly coiled/packed regions of DNA, and the genes located in these regions are accessible and often actively engaged in protein production.2
Sequencing only the euchromatin parts of the human DNA with as many as ~300 gaps—despite being 99.999 percent accurate—meant it still wasn’t a complete, perfect picture.3 The 2004 draft of the human genome missed several pieces of crucial genetic information. It needed advanced technology and robust research to sequence the highly repetitive, less accessible regions of DNA (heterochromatin) and stitch all the pieces together to produce the full picture.
Piecing the Jigsaw Puzzle
Almost two decades later, in 2022, the Telomere-to-Telomere (T2T) consortium—which includes leadership from the National Institutes of Health’s National Human Genome Research Institute (NHGRI), as well as several universities—published the complete, gap-free human genome sequence.4
The full sequence comprises the euchromatic DNA that was mapped in 2004 (92 percent) and the newly added bits (8 percent) of highly repetitive, heterochromatic DNA near the telomeres (trailing ends) and centromeres (dense, tightly coiled middle sections) of each chromosome.4
Role of Technology in Completing the Human Genome Sequence
Two new DNA sequencing technologies that emerged over the last decade—Oxford Nanopore and PacBio HiFi—generated longer sequence reads with intermediate to high accuracy. The T2T consortium researchers employed both methods to complete the genome sequencing efficiently. The long-read sequencing methods helped the team crack through the inaccessible, repeat-rich parts of the genome.4
Researchers sequenced, assembled, and validated the resultant three billion nucleotides of the human genome at a relatively low cost, thanks to technological advancements. Except for the Y, all chromosomes were sequenced telomere-to-telomere without any gaps.
“In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their health care,” said the consortium co-chair Adam Phillippy, PhD, in a press release.4 “Truly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means.”
Automation for Democratization and Accessibility
In efforts to democratize the complete human genome sequence and ensure it remains accessible and cost-effective, the NHGRI researchers leveraged the power of automation. The team developed and recently launched a software, called Verkko, that assembles complete and gapless genome sequences from various species on demand.5
Born from the T2T strategy of integrating ultra-long sequences, Verkko (meaning “network” in Finnish) assembles long, accurate reads using an iterative, graph-based method and produces complete, gapless, diploid (carrying two copies of each chromosome) genomes with 99.9997 percent accuracy.6
What took thousands of highly skilled researchers several years to achieve, Verkko finishes in a couple of days.
How Verkko Works
Verkko starts by assembling discontinuous segments of short, highly accurate sequence reads and compares the assembled regions with the longer, less precise reads—both products of Oxford Nanopore and PacBio HiFi DNA sequencing technologies used in the T2T strategy. “Now with Verkko, we can essentially push a button and automatically get a complete genome sequence,” said NHGRI associate investigator Sergey Koren, PhD, who led Verkko’s creation and is the senior author of the study published in Nature Biotechnology.
To validate Verkko’s method, researchers fed it human and non-human genome sequencing data. The software assembled the sequences of whole chromosomes quickly and precisely.6
What Lies Ahead
As a robust and affordable software, Verkko can be applied to explore, investigate, and solve research problems at various levels. It can help generate whole genome sequences of experimental models, like zebrafish, mice, fruit flies, and insects, proving to be a handy clinical research asset.6 Verkko can also aid in studying the evolutionary relationships between humans, animals, plants, and a diverse range of organisms.
Above all, a deeper insight into the truly complete human genome would help propel research in the fields of precision medicine, cancer diagnostics, therapeutics, infectious diseases, genomics, proteomics, pharma, and biotechnology.
References
- https://www.genome.gov/12513430/2004-release-ihgsc-describes-finished-human-sequence
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2904936/
- https://www.nature.com/articles/nature03001
- https://www.genome.gov/news/news-release/researchers-generate-the-first-complete-gapless-sequence-of-a-human-genome
- https://www.nih.gov/news-events/news-releases/nih-software-assembles-complete-genome-sequences-demand
- https://www.nature.com/articles/s41587-023-01662-6
Subscribe to Clinical Diagnostics Insider to view
Start a Free Trial for immediate access to this article