NextCODE Health (Boston) has launched clinical genomic services leveraging a platform originally developed at deCODE genetics. With the advent of affordable whole-genome sequencing, NextCODE’s end-to-end solution is designed to tackle the interpretation and data management bottleneck to expedite the incorporation of sequence data into clinical care.
NextCODE secured a five-year, exclusive license for the genomics platform—including the information technology (IT) infrastructure and data analysis capabilities—from Amgen, which acquired deCODE genetics in December 2012. NextCODE also secured $15 million in Series A financing in fall 2013. This is enabling NextCODE to rapidly scale the integration of its genomics services into clinical settings. NextCODE’s services include clinical- and research-grade sequencing, analysis of clients’ legacy data, genome interpretation tools, and big genomic data solutions proven scalable for the efficient management of genomic and medical data on up to hundreds of thousands of patients.
Analysis and interpretation by the Clinical Sequencing Analyzer (CSA) are powered by the proprietary Genomic Ordered Relations (GOR) database. The CSA is also backed by what the company says is the world’s largest clinical genomics reference database, including more than 40 million validated variants paired with clinical data. Together the GOR and CSA enable real-time confirmation of potentially pathogenic mutations through visualization of raw sequence reads, while the Sequence Miner tool enables researchers to perform sophisticated real-time queries and data mining on tens thousands of samples.
DTET recently spoke with Jeff Gulcher, M.D., Ph.D., co-founder and former chief scientific officer of deCODE and now co-founder and chief scientific officer of NextCODE, to learn more about the company’s platform and genomic services as well as industrywide challenges to the adoption of clinical next-generation sequencing testing.
The scale of data the NextCODE system can handle is remarkable. Can you tell us about the system’s development.
We came from deCODE where we developed software for systems to deal with both the research and clinical aspects of large amounts of genetic data. Our goal was to enroll most of the Icelandic population, a little more than 300,000 people, and we successfully enrolled well over half of the adult population with blood samples and informed consent to use medical data. Over the years we built up this very large biobank in Iceland and then had another 300,000 from outside Iceland for confirmation studies. When we started out, we were using microsatellite markers. So we were measuring 2,000 markers per patient times 150,000 patients, and we had no trouble using a conventional database infrastructure. These are traditional databases designed for bank transactions, and the average person isn’t going to do more than 2,000 to 3,000 bank transactions per year.
But when we got into the DNA chip era, we started measuring 1 million single nucleotide polymorphism markers per patient. And what we found was that when we tried to load in 1 million columns (each marker needs its own column) times 150,000 patients, we had enormous problems. The problem was not in storing the data, but in getting the data out. You had a huge input-output problem, and when you tried to hunt for a subset of those millions of markers in a subset of those 150,000 patients, the system just gummed up. It just couldn’t get the data out quickly enough for statistical algorithms to handle or for immediate quality control.
That is when we invented a different way of storing this huge amount of data. We call it the GOR—Genomically Ordered Relational—database infrastructure. The principle is very simple: to make sure all of the data you have are tied to genomic positions. With DNA data, chip data, RNA data, or annotation data it is easy to ascribe it a specific location in the whole human genome. As a result, your algorithms end up being orders of magnitude more efficient than if you were to use a traditional relational database. That’s what we initially designed eight or nine years ago, and it continues to work very well for us now in the sequencing era. Today you don’t have 1 million but 3 billion letters you need to keep track of per genome, so we’ve solved a problem that everyone who is going to sequence whole genomes is about to face.
What is the business case for using NextCODE’s services?
With sequencing, a big data problem ends up being an enormous data problem.You see academic groups trying to retrofit Oracle and IBM databases to handle this much data and it ends up crashing. They just aren’t able to access their own data.
The broad business case for using NextCODE’s platform is that it offers a holistic solution to many of the shortcomings that clinical geneticists currently face using traditional technology. Even in major medical centers with substantial expertise in genetics, clinicians are often analyzing data by running bespoke R and Perl scripts that have to be generated by their bioinformaticians. And the results come back as Excel tables with lists of potential causative variants that they then have to follow up on one-by-one.
By making it possible to correlate all the variants in the genome with all the public and proprietary annotation data available, in real time, our system delivers huge time savings as well as increased power to find causative variants. That translates into better care, and our customers have reported many instances of our tools solving cases in which other approaches had failed. With current approaches, many leading medical institutions report that they can solve about 25 percent of pediatric rare disease cases. The feedback we are getting from our users suggests NextCODE tools can increase that yield substantially. For the medical system, that means huge savings in terms of ending diagnostic odysseys earlier and making it possible to get through more cases in a much shorter time. The business case in that sense is compelling on several fronts.
We talk to lots of academic centers that may be sequencing 100 genomes per month on the clinical side and hundreds on the research side. They would like solutions to aggregate the data and learn something more than just making a diagnosis in an individual, although in most cases they won’t be able to make a diagnosis because most patients don’t have a disease mutation that’s already known. The information can be quite useful to collate, yet in order to do that you must be able to store data efficiently and have algorithms to query it swiftly to discover new things.
That’s the other part of our business: to enable groups to aggregate data rather than just sequence a patient and make a diagnosis and never use the data again. That is throwing away valuable information and we can provide a mechanism and set of tools and infrastructure that enables them to use it. We are also working with big pharma companies in order to make use of big next-generation data sets. We see them as natural customers for this system. They need these tools to be successful. And there are other population-scale projects like Genomic England, which is going to sequence 100,000 people in the United Kingdom in the context of clinical care. It is a very large project, and we hope to take part in it and several other large projects around the world that are following in the footsteps of deCODE’s work in Iceland.
Your clients to date are primarily clinicians at large, academic centers. Do you have plans to sell services to smaller laboratories?
Yes. Our customer base includes diagnostic laboratories that do not have extensive informatics expertise. One of the advantages of our system is that it enables smaller labs to begin offering whole-exome and whole-genome testing without having to build up their own IT infrastructure.
One of the notable features of your service is its user-friendly interface. How can this aid clinicians less comfortable with interpreting genomic test results?
When you talk to even some top medical centers, they are taking sequence data and the informatics department is doing by hand what our tools can provide systematically and efficiently. What do they provide to the physician? Not a nice system to ask questions of the data. Instead they give the geneticist an Excel table with hundreds of variants and he or she must now sort to make a diagnosis. I still can’t believe it when I see this.
We made interfaces that let clinicians analyze their patients’ genomes and all of the annotation data available, using signs and symptoms as simple search terms. By clicking on different buttons they can integrate different inheritance models, commonly used or custom-made panels of genes, and filter out all common variants to quickly narrow their search to rarer or even novel variants. So a clinical geneticist can search the entire genome, hone in quickly on the culprit variant, and instantly visualize it in all the raw sequence reads for confirmation. They can trust their own eyes rather than a black-box algorithm. They can drill down to look at the function of the variant—what effect it has on protein structure—to end up with a detailed picture of the root of the patient’s disease. All of this is summarized in concise and comprehensible terms, so that it can inform a diagnosis and treatment options.
How can these results be integrated into electronic health records (EHRs)?
In order to bring genomic data into clinical care, the results of genomic analyses have to be simply and seamlessly integrated into patients’ medical records. For this reason we have made all our systems HL-7 compatible, and this compatibility is critical in two ways. First, the CSA is able to extract information from medical records and lab results to provide the phenotypic information that needs to be correlated with sequence data to inform the analysis and diagnosis. Once the analysis is complete, the CSA then stores the detailed data used and distils it into a concise form that can be uploaded to an EHR.
Side Box:
NextCODE By-the-Numbers
$1 billion invested in development of platform with capabilities including:
- »Proven scalability to 350,000 whole genomes
- »40 million-plus validated variant frequencies generated from mining 30 times more data than the 1000 Genomes Project
- »10 times decrease in hands-on-time for genetic diagnosis
- »Analysis and IT behind 350-plus publications