Looking for a cancer drug target database to guide sequencing of patient tumor DNA

Looking for a cancer drug target database to guide sequencing of patient tumor DNA

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have a question I would like to pose to the community. I have recently received access to a bench-top ion torrent DNA sequencer. Our idea is to use this machine to sequence the DNA from patient's tumors in order to guide treatment options. My job is to identify a list of all currently used anti-neoplastic drugs along with their known targets (i.e., specific genes and mutations) and accession numbers. I would like to put these data in a table in which each row corresponds to a different drug.

For example, a row in the table might read (column names are indicated in brackets): [disease] breast cancer, [drug] trastuzumab, [drug target] HER2/neu receptor, [gene] ERBB2, [location] chr17:37844393-37884915, [mutation type] amplification, [accession number] ENSG00000141736. The pathologists would then be able to use this database in order to select appropriate genes for sequencing whenever they receive a tumor specimen. If the patient's tumor had an amplified ERBB2 gene, they could be given trastuzumab.

Currently our study is in pre-planning stages (i.e., we won't actually be testing this on patients any time soon). I would appreciate it if anyone could give me on advice on how to go about creating such a database. I am aware of online databases including COSMIC, Sanger's Cancer Gene Census, and the Potential Drug Target Database (PDTD), but they don't have everything that I'm looking for. I am familiar with R and could use it to combine data from multiple sources if necessary. If anyone else has comments or suggestions for further reading that would also be appreciated. Thanks!

Note: This question has also been posed on a Research Gate forum:

Not sure if the exome sequencing is the way to go for this kind of tasks, especially if you have an idea of the mutations you might be looking for. Current arrays are pretty performant and are much more rapid and cheap.

For the data, you might consider having a look on The Cancer Genome Atlas. Otherwise Biological Networks might provide you the API for you if you aren't afraid of doing a little bit of Java to interface it with R.

There are no perfect resources for this information in the public domain quite yet. However, there are three that are making really good progress. The first is mycancergenome at Vanderbilt. They were one of the first resources to put this type of information on the web. They tend to be pretty stringent in the level of evidence and type of aberrations that go on the Web site. However, I am not sure if you can programmatically access there information. The second resource, is the developed at MD Anderson. This is really good resources however, it is only for a handful of genes, but albeit some of the most frequently mutated. I don't see a programmatic access to the information. The third resource and most promising for the community is the CIVIC database: This is crowdsourcing resource that is set up to aggregate cancer associated mutation to drugs, phenotypes, and outcomes. I highly recommend this site and encourage not only consumption of data but also to engage in comments. As someone that has been doing drug to genome matching a couple pieces of advice. DNA based alterations are the best evidence to use for matching. If you do RNA be a little bit more cautious of matching based on 'is a target' mechanism. So rank DNA based changes above RNA. Additionally, just start with drug targets as place holders but try to acquire information on alteration as it relates to drug response. So alterations in a drug target could be (and often) are passengers and not drivers and in this case will not respond to drug. Frequency in other tumors and mutations in important domains are good guides for triaging variants like this. Also, the information on the disease and the mutations is important. If a drug variant mutation is observed in one disease there maybe mechanisms intrinsic to another tumor type that would preclude acting on drug to variant association in the different tumor type. Lastly, drug mechanism is something to take into account. Certain drugs that hit the same target do so but in different mechanisms that may preclude them being given in the context of particular mutations. Good luck in your endeavors.

The emerging clinical relevance of genomics in cancer medicine

The combination of next-generation sequencing and advanced computational data analysis approaches has revolutionized our understanding of the genomic underpinnings of cancer development and progression. The coincident development of targeted small molecule and antibody-based therapies that target a cancer's genomic dependencies has fuelled the transition of genomic assays into clinical use in patients with cancer. Beyond the identification of individual targetable alterations, genomic methods can gauge mutational load, which might predict a therapeutic response to immune-checkpoint inhibitors or identify cancer-specific proteins that inform the design of personalized anticancer vaccines. Emerging clinical applications of cancer genomics include monitoring treatment responses and characterizing mechanisms of resistance. The increasing relevance of genomics to clinical cancer care also highlights several considerable challenges, including the need to promote equal access to genomic testing.


Fig. 1. Clinical utility of genomic assays…

Fig. 1. Clinical utility of genomic assays in cancer care.

Following the introduction of next-generation…

Fig. 2. Clinical trial designs invoking cancer…

Fig. 2. Clinical trial designs invoking cancer genomics assays.

Two basic clinical trial designs for…

Fig. 3. NGS-based neoantigen discovery.

Fig. 3. NGS-based neoantigen discovery.

Neoantigen discovery is pursued using next-generation sequencing (NGS) data from…

20 mer) or short (8–11 mer) peptides, RNA-based or DNA-based vaccines, or dendritic cell vaccines. MHC, major histocompatibility complex TCR, T cell receptor. Figure is adapted from images courtesy of J. Hundal, Washington University School of Medicine in St Louis, MO, USA, and K Campbell, Washington University in St Louis, MO, USA.

Fig. 4. Liquid biopsy assays enable the…

Fig. 4. Liquid biopsy assays enable the monitoring of genomic alterations present in circulating tumour…

Foundation Medicine: Personalizing Cancer Drugs

Michael Pellini fires up his computer and opens a report on a patient with a tumor of the salivary gland. The patient had surgery, but the cancer recurred. That’s when a biopsy was sent to Foundation Medicine, the company that Pellini runs, for a detailed DNA study. Foundation deciphered some 200 genes with a known link to cancer and found what he calls “actionable” mutations in three of them. That is, each genetic defect is the target of anticancer drugs undergoing testing—though not for salivary tumors. Should the patient take one of them? “Without the DNA, no one would have thought to try these drugs,” says Pellini.

Starting this spring, for about $5,000, any oncologist will be able to ship a sliver of tumor in a bar-coded package to Foundation’s lab. Foundation will extract the DNA, sequence scores of cancer genes, and prepare a report to steer doctors and patients toward drugs, most still in early testing, that are known to target the cellular defects caused by the DNA errors the analysis turns up. Pellini says that about 70 percent of cases studied to date have yielded information that a doctor could act on—whether by prescribing a particular drug, stopping treatment with another, or enrolling the patient in a clinical trial.

The idea of personalized medicine tailored to an individual’s genes isn’t new. In fact, several of the key figures behind Foundation have been pursuing the idea for over a decade, with mixed success. “There is still a lot to prove,” agrees Pellini, who says that Foundation is working with several medical centers to expand the evidence that DNA information can broadly guide cancer treatment.

Foundation’s business model hinges on the convergence of three recent developments: a steep drop in the cost of decoding DNA, much new data about the genetics of cancer, and a growing effort by pharmaceutical companies to develop drugs that combat the specific DNA defects that prompt cells to become cancerous. Last year, two of the 10 cancer drugs approved by the U.S. Food and Drug Administration came with a companion DNA test (previously, only one drug had required such a test). So, for instance, doctors who want to prescribe Zelboraf, Roche’s treatment for advanced skin cancer, first test the patient for the BRAFV 600E mutation, which is found in about half of all cases.

About a third of the 900 cancer drugs currently in clinical trials could eventually come to market with a DNA or other molecular test attached, according to drug benefits manager Medco. Foundation thinks it makes sense to look at all relevant genes at once—what it calls a “pan-cancer” test. By accurately decoding cancer genes, Foundation says, it uncovers not only the most commonly seen mutations but also rare ones that might give doctors additional clues. “You can see how it will get very expensive, if not impossible, to test for each individual marker separately,” Foundation Medicine’s COO, Kevin Krenitsky, says. A more complete study “switches on all the lights in the room.”

So far, most of Foundation’s business is coming from five drug companies seeking genetic explanations for why their cancer drugs work spectacularly in some patients but not at all in others. The industry has recognized that drugs targeted to subsets of patients cost less to develop, can get FDA approval faster, and can be sold for higher prices than traditional medications. “Our portfolio is full of targets where we’re developing tests based on the biology of disease,” says Nicholas Dracopoli, vice president for oncology biomarkers at Janssen R&D, which is among the companies that send samples to Foundation. “If a pathway isn’t activated, you get no clinical benefit by inhibiting it. We have to know which pathway is driving the dissemination of the disease.”

Cancer is the most important testing ground for the idea of targeted drugs. Worldwide spending on cancer drugs is expected to reach $80 billion this year—more than is spent on any other type of medicine. But “the average cancer drug only works about 25 percent of the time,” says Randy Scott, executive chairman of the molecular diagnostics company Genomic Health, which sells a test that examines 16 breast-cancer genes. “That means as a society we’re spending $60 billion on drugs that don’t work.”

Analyzing tumor DNA is also important because research over the past decade or so has demonstrated that different types of tumors can have genetic features in common, making them treatable with the same drugs. Consider Herceptin, the first cancer drug approved for use with a DNA test to determine who should receive it (there is also a protein-based test). The FDA cleared it in 1998 to target breast cancers that overexpress the HER2 gene, a change that drives the cancer cells to multiply. The same mutation has been found in gastric, ovarian, and other cancers—and indeed, in 2010 the drug was approved to treat gastric cancer. “We’ve always seen breast cancer as breast cancer. What if a breast cancer is actually like a gastric cancer and they both have the same genetic changes?” asks Jennifer Obel, an oncologist in Chicago who has used the Foundation test.

The science underlying Foundation Medicine had its roots in a 2007 paper published by Levi Garraway and Matthew Meyerson, cancer researchers at the Broad Institute, in Cambridge, Massachusetts. They came up with a speedy way to find 238 DNA mutations then known to make cells cancerous. At the time, DNA sequencing was still too expensive for a consumer test—but, Garraway says, “we realized it would be possible to generate a high-yield set of information for a reasonable cost.” He and Meyerson began talking with Broad director Eric Lander about how to get that information into the hands of oncologists.

In the 1990s, Lander had helped start Millennium Pharmaceuticals, a genomics company that had boldly promised to revolutionize oncology using similar genetic research. Ultimately, Millennium abandoned the idea—but Lander was ready to try again and began contacting former colleagues to “discuss next steps in the genomics revolution,” recalls Mark Levin, who had been Millennium’s CEO.

Levin had since become an investor with Third Rock Ventures. Money was no object for Third Rock, but Levin was cautious—diagnostics businesses are difficult to build and sometimes offer low returns. What followed was nearly two years of strategizing between Broad scientists and a parade of patent lawyers, oncologists, and insurance experts, which Garraway describes as being “like a customized business-school curriculum around how we’re going to do diagnostics in the new era.”

In 2010, Levin’s firm put $18 million into the company Google Ventures and other investors have since followed suit with $15.5 million more. Though Foundation’s goals echo some of Millennium’s, its investors say the technology has finally caught up. “The vision was right 10 to 15 years ago, but things took time to develop,” says Alexis Borisy, a partner with Third Rock who is chairman of Foundation. “What’s different now is that genomics is leading to personalized actions.”

One reason for the difference is the falling cost of acquiring DNA data. Consider that last year, before his death from pancreatic cancer, Apple founder Steve Jobs paid scientists more than $100,000 to decode all the DNA of both his cancerous and his normal cells. Today, the same feat might cost half as much, and some predict that it will soon cost a few thousand dollars.

So why pay $5,000 to know the status of only about 200 genes? Foundation has several answers. First, each gene is decoded not once but hundreds of times, to yield more accurate results. The company also scours the medical literature to provide doctors with the latest information on how genetic changes influence the efficacy of specific drugs. As Krenitsky puts it, data analysis, not data generation, is now the rate-limiting factor in cancer genomics.

Although most of Foundation’s customers to date are drug companies, Borisy says the company intends to build its business around serving oncologists and patients. In the United States, 1.5 million cancer cases are diagnosed annually. Borisy estimates that Foundation will process 20,000 samples this year. At $5,000 per sample, it’s easy to see how such a business could reward investors. “That’s … a $100-million-a-year business,” says Borisy. “But that volume is still low if this truly fulfills its potential.”

Pellini says Foundation is receiving mentoring from Google in how to achieve its aim of becoming a molecular “information company.” It is developing apps, longitudinal databases, and social-media tools that a patient and a doctor might use, pulling out an iPad together to drill down from the Foundation report to relevant publications and clinical trials. “It will be a new way for the world to look at molecular information in all types of settings,” he says.

Several practical obstacles stand in the way of that vision. One is that some important cancer-related genes have already been patented by other companies—notably BRCA1 and BRCA2, which are owned by Myriad Genetics. These genes help repair damaged DNA, and mutations in them increase the risk of breast or ovarian cancer. Although Myriad’s claim to a monopoly on testing those genes is being contested in the courts and could be overturned, Pellini agrees that patents could pose problems for a pan-cancer test like Foundation’s. That’s one reason Foundation itself has been racing to file patent applications as it starts to make its own discoveries. Pellini says the goal is to build a “defensive” patent position that will give the company “freedom to operate.”

Another obstacle is that the idea of using DNA to guide cancer treatment puts doctors in an unfamiliar position. Physicians, as well as the FDA and insurance companies, still classify tumors and drug treatments anatomically. “We’re used to calling cancers breast, colon, salivary,” says oncologist Thomas Davis, of the Dartmouth-Hitchcock Medical Center, in Lebanon, New Hampshire. “That was our shorthand for what to do, based on empirical experience: ‘We tried this drug in salivary [gland] cancer and it didn’t work.’ ‘We tried this one and 20 percent of the patients responded.’”

Now the familiar taxonomy is being replaced by a molecular one. It was Davis who ordered DNA tests from several companies for the patient with the salivary-gland tumor. “I got bowled over by the amount of very precise, specific molecular information,” he says. “It’s wonderful, but it’s a little overwhelming.” The most promising lead that came out of the testing, he thinks, was evidence of overactivity by the HER2 gene—a result he says was not picked up by Foundation but was found by a different test. That DNA clue suggests to him that he could try prescribing Herceptin, the breast-cancer drug, even though evidence is limited that it works in salivary-gland cancer. “My next challenge is to get the insurance to agree to pay for these expensive therapies based on rather speculative data,” he says.

Insurance companies may also be unwilling to pay $5,000 for the pan-cancer test itself, at least initially. Some already balk at paying for well-established tests, says Christopher-Paul Milne, associate director of the Tufts Center for the Study of Drug Development, who calls reimbursement “one of the biggest impediments to personalized medicine.” But Milne predicts that it’s just a matter of time before payers come around as the number of medications targeted to people’s DNA grows. “Once you get 10 drugs that require screening, or to where practitioners wouldn’t think about using a drug without screening first, the floodgates will open,” he says. “Soon, in cancer, this is the way you will do medicine.”


In this article we present a framework to match tumor genomes to targeted therapies. For that, we use public databases to filter potentially actionable variants from a tumor sample and then classify the results using an evidence-based system. In this section we first detail the databases used. Then, we present the patients’ datasets used for evaluating the feasibility and provide a proof-of-concept of our approach. Finally, we describe all statistical analyses.

Databases of actionable variants

Among the main criteria of database selection, we focused on those databases that compile information not only on the drug and actionable gene, but also the actionable variant (SNV, CNV, rearrangement), the type of association (response, resistance), the strength of the evidence (approved, clinical trials, preclinical) and the cancer type. As a result, we have selected the following databases: (1) Gene Drug Knowledge database (GDKD) [20] (version 19, downloaded from Synapse syn2370773) (2) Clinical Interpretation of Variants in Cancer database (CIViC) [21] (version 01_June_2017) and (3) Tumor Alterations Relevant for Genomics-driven Therapy database (TARGET) [12] (version 3). GDKD is a manually curated database of predictive biomarkers updated monthly. It integrates several layers of annotations, comprising cancer type, gene, variant, response/resistance, consensus/emerging, and the corresponding references. CIViC uses very similar layers of annotation, but is a community-driven web resource. For this work, both GDKD and CIViC were modified in such a way that variants in the same gene sharing annotations of disease, drug, evidence, and association levels were aggregated into one single entry. Finally, TARGET was published in 2014 and has not been updated since. Taken together, the three databases compile a comprehensive list of variants comprising a total of 289 actionable genes (conferring either resistance or response to anticancer drugs). The main characteristics of the three databases can be found in Table 1. We also consulted other sources such as NCCN guidelines,, and Meric-Bernstam et al. [22] and manually added some expert rules. The complete list of 312 actionable genes can be found in Additional file 1.


TCGA and GENIE datasets

The Pan Cancer 12 data freeze of TCGA was used as a patient cohort for testing our reporting method. Three data types were downloaded from the Synapse repository: clinical data (syn2325436), somatic SNVs (syn1729383), and somatic CNVs (syn1711454). Data on a total of 5277 samples from 12 different cancer types were collected. Only samples with both mutation and copy number data (3184 samples) were considered for the analyses in this paper. Regarding mutation data, “silent” variants were not studied. For CNVs, the output from GISTIC 2.0 all_thresholded.by_genes was used. This file contains the copy number data of genes discretized into values of the set <− 2, − 1, 0, 1, 2>, where 0 means no deletion or amplification, +/− 1 means amplification or deletion above the low noise threshold and +/− 2 are amplifications and deletions above the high level threshold. High level thresholds are calculated on a sample basis and are an approximation to homozygous events. Only high level amplifications and deep losses (+/− 2) were considered for the analysis, as previously done [23]. The cancer type abbreviations used throughout the article and number of samples are: BRCA (breast cancer, 756), BLCA (bladder cancer, 97), UCEC (uterine cancer, 244), READ (rectal cancer, 69), COAD (colon cancer, 155), OV (ovarian cancer, 313), LUSC (lung squamous carcinoma, 178), LUAD (lung adenocarcinoma, 172), LAML (acute myeloid leukaemia, 190), KIRC (kidney cancer, 417), HNSC (head and neck cancer, 306), and GBM (glioblastoma multiforme, 287).

The GENIE dataset was downloaded from the Synapse repository (SNVs, syn7851250 CNVs, syn7851245 fusions, syn7851249 clinical data, syn7851246). This dataset comprises data of 18,804 advanced cancer patients with more than 50 different cancer entities [24]. Mutation and copy number data were analyzed in the same way as described for TCGA dataset.

NCT MASTER dataset


A proof-of-concept of the method in a clinical context was investigated within a retrospective study. We used the data of 11 patients with advanced tumor diseases who had undergone whole exome and transcriptome sequencing within the so-called NCT MASTER trial, an institutional review board-approved clinical sequencing program for young adults with advanced-stage hematological and oncological diseases across all malignancies. A tumor tissue and a matched normal blood sample for whole-exome sequencing were obtained following written informed consent under an institutional review board-approved protocol.

Whole-exome sequencing data

Tissue samples were provided by the NCT Heidelberg Tissue Bank. Whole-exome sequencing of normal and tumor tissue samples was followed by a bioinformatic analysis for detecting SNVs, small insertions and deletions (indels), CNVs, and structural variations that might lead to gene fusions. On average, coverage was 133× and 126× for tumor and normal samples, respectively (sample-wise data can be found in Additional file 2). Reads were mapped to the 1000 Genomes phase 2 assembly of the human reference genome (NCBI build 37.1) using BWA (version 0.6.2) with default parameters and maximum insert size set to 1000 bp [25]. BAM files were sorted with SAMtools (version 0.1.19) [26] and duplicates were marked with Picard tools (version 1.90). For the detection of SNVs, we applied our in-house analysis pipeline based on SAMtools mpileup and bcftools with parameter adjustments to allow the calling of somatic variants with heuristic filtering as previously described [27,28,29]. We used Platypus [30] version 0.5.2 to identify indels with a similar reliability scoring as for SNVs. All mutations were annotated with ANNOVAR [31] version September 2013 using the RefSeq gene model. From the set of somatic high confidence mutations, we extracted nonsynonymous, stopgain, and stoploss SNVs as well as SNVs at splice sites, and indels that are located in a coding sequence or splice site. CNVs were analyzed by read depth plots and an in-house pipeline using the VarScan2 copynumber and copyCaller modules [32]. Regions were filtered for unmappable genomic stretches and merged by requiring at least 70 markers per called copy number event. We selected regions with a log ratio of tumor coverage over control coverage higher than 0.55 or lower than − 0.55 as copy number gains and losses, respectively, and annotated them with RefSeq genes using BEDTools [33]. We searched for structural variants such as translocations that might lead to gene fusions with CREST [34] on the DNA level.

The variant calls were delivered in excel files, and for each patient, a group of expert bioinformaticians and oncologists manually revised the list of somatic alterations looking for actionable alterations that could guide the treatment decision. The genomic somatic calls of the patients, together with the experts’ interpretations, are summarized in Additional file 2.

Statistical analyses

We performed an unsupervised clustering using the molecular status of the 312 actionable genes in the 3184 TCGA tumor samples. Four molecular categories were used: wild type, mutated, high-level amplification, and deep loss. Genes with no mutations or with “silent” variants were considered as wild type. Genes with any other type of mutation were considered as mutated. With respect to CNVs, GISTIC output all_thresholded.by_genes was used genes with a value of − 2 were classified as deep losses and + 2 as high-level amplifications. Next, our algorithm to filter variants was applied, and a molecular status matrix (sample x gene) was constructed. Genes without any alteration in any sample were removed. Complete-linkage hierarchical clustering was performed on rows and columns based on the Gower distance metric for nominal data (daisy function from R package cluster). The respective heatmap of all samples and the top 50 genes was plotted with dendrogram on the columns (heatmap.2 from gplots R package).

Choice of clinical sample

Most specimens that are examined by anatomical pathologists are fixed in formalin (4 % formaldehyde) and embedded in paraffin (FFPE). The formalin introduces crosslinks that can both fragment DNA and cause chemical alterations that may alter sequencing results [19]. Early studies demonstrated that using FFPE specimens in PCR-based sequencing led to more errors than using frozen specimens [20]. Some projects, including The Cancer Genome Atlas (TCGA), required the use of fresh frozen tissue [21]. There has been great progress in altering DNA extraction methods such that FFPE specimens are just as useful for NGS as fresh frozen samples [22]. While there have been some early attempts at using FFPE specimens for other modalities besides DNA sequencing [23, 24], these tests are not yet widely used clinically, and the reliability of FFPE versus frozen samples is less well established. Clinicians should feel comfortable requesting NGS on FFPE samples, and do not necessarily have to handle the specimens differently from other diagnostic samples.

For most cancers, the standard pathological diagnosis will require a direct sample of tissue for biopsy. However, many research groups are exploring the diagnostic and therapeutic utility of “liquid biopsies”. One such source of genetic material for disease monitoring are circulating tumor cells (CTCs). These suffer from a low frequency (approximately 1 cell in 10 6 –10 8 total circulating cells) and must, therefore, go through an enrichment step. A large number of CTC collection and sequencing protocols have been reported and are being evaluated prospectively [25, 26]. Alternatively, DNA released from apoptotic cells in the tumor can be assayed from the peripheral blood, and is usually referred to as circulating tumor DNA (ctDNA). Progress in utilizing ctDNA was recently reviewed [27], with the authors concluding that this approach shows great promise for the purpose of detecting minimal residual disease [28], or helping to improve diagnosis by looking for mutations specifically associated with a particular disease type [29]. RNA is much less stable than DNA in circulating blood, but RNA species can be preserved in extracellular vesicles and information about tumor recurrence can be gleaned from them as well [30]. However, reproducibility has plagued RNA-based studies, and RNA assays are not yet ready for clinical use [31].

Tumor heterogeneity is both a challenge for liquid biopsies and the reason they can be more useful than tissue biopsies [32]. Initially, mutations with a low allele fraction owing to only being present in a subset of tumor cells may be missed by liquid biopsies, as the low amount of DNA input to the assay is compounded by the low incidence of the mutation. This makes distinguishing low allele fraction mutants from errors that are inherent to high-throughput sequencing very difficult (see below). However, the ability for minimally invasive samples to be sequenced repeatedly over time will allow for faster recognition of known resistance mutations. Sequencing artifacts should be random, but sequences that appear serially can be weighted and followed more closely. It should also be noted that errors in aligning reads to the correct locus will give what appear to be recurrent mutations, so all mutations that are used for serial tracking of tumor burden should be manually reviewed. Overall, there is much promise in sequencing tumor DNA from peripheral blood, but its use is still under investigation and clinicians should rely on other methods for tracking disease progression.


Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands

Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD, USA

Ludwig Institute for Cancer Research and University of Lausanne, Lausanne, Switzerland

Howard Hughes Medical Institute and Cancer Biology and Genetics Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA

Institute for Genomic Medicine at Nationwide Children’s Hospital, Ohio State University College of Medicine, Columbus, OH, USA

Howard Hughes Medical Institute and University of Texas Southwestern Medical Center, Dallas, TX, USA

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA

Bloomberg-Kimmel Institute for Cancer Immunotherapy and Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA

Francis Crick Institute, London, UK

BIOPIC, Beijing Advanced Innovation Center for Genomics, School of Life Sciences, Peking University, Beijing, China

Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China

This work was supported by the Consejo Nacional de Ciencia y Tecnolog໚ (SEP-CONACYT-2016-285544 and FRONTERAS-2017-2115), and the National Institute of Genomic Medicine, México. Additional support has been granted by the Laboratorio Nacional de Ciencias de la Complejidad, from the Universidad Nacional Autónoma de México. EH-L is recipient of the 2016 Marcos Moshinsky Fellowship in the Physical Sciences.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


The implementation of precision medicine through molecular profiling technologies has increasingly been integrated with standard clinicopathological evaluations to enhance diagnosis, prognostication, and prediction of clinical outcomes. Although there have been clear successes in the era of molecular characterization, the utility of NGS and other omics-based tests remains unproven on many fronts. A vision for the future of precision medicine will integrate comprehensive multi-omic tumor characterization, dynamic monitoring of liquid biopsy samples, annotation that is automated through advancements in artificial intelligence but guided by experts’ clinical input, the enrollment of patients into innovative clinical trials that not only test molecular profile–drug matching but also investigate the utility of different drug-assignment algorithms [179], and the real-time addition of information from each case to global knowledgebases to enhance precision cancer medicine learning. The path forward in precision medicine will require not only extension beyond genomics from a technical viewpoint, but also the education and engagement of end-users such as clinicians and patients, the increase of access to genotype–drug matching through adaptive and other innovative clinical trial designs, and the promotion of data sharing to maximize knowledge gain.

Pros and Cons of Genomic-Based Clinical Trials

Compared with an “all comers” trial, genomic-based clinical trials offer some definite advantages. Chief among these are the opportunity to develop clinical evidence that targeting several particular molecular abnormalities in the same trial will reveal one or several potentially clinically beneficial treatments. An umbrella trial will allow evaluation of treatments for several molecular subclasses of patients within the same histologic tumor type. If such a trial is designed to proceed to a randomized phase III trial, as in LUNG-MAP (28, 29), it may significantly shorten time to regulatory approval in a particular disease, although effectively “weeding out” targeted agents without significant benefit, and allowing inclusion of additional “arms” as promising data emerge. In LUNG-MAP, for example, the phase II arm sets an HR of 0.4 to 0.5 for progression-free survival. If this endpoint is met, the phase II trial will proceed to phase III within LUNG-MAP, with the endpoint of overall survival. In addition, presuming all molecularly characterized patients with the same histology will not respond uniformly, such trials will allow research about other genomic or clinical features that may modify or prevent response, leading to new insights about combinations with drugs that address resistance pathways. Similar advantages apply to a basket trial, in which a broad screening effort can identify patients with various tumors with the same molecular abnormality. Those cancer histologies that have relatively high prevalence of the molecular target create subgroups that may develop solid signals of efficacy within a particular histologic tumor type. Here, in addition to being able to discern clinical and molecular features that modify response, one can also look at such features across histologic tumor types. Such a trial could speed registration for a given treatment to many and not just a single tumor histologic type.

However, there are also certain challenges to genomic-based clinical trials. International collaboration may be required for all but the most common histologic tumor types and genomic mutations. Time and effort are required to set up proper infrastructure and account for different regulations across countries as well as to negotiate agreements with several pharmaceutical companies. For trials to be most efficient, high levels of evidence should exist that the chosen drug will result in benefit for tumors with a particular molecular characteristic. It is not often possible to discern “best in class” drugs, or promising drugs may still be too early in development to be included. Many molecular abnormalities have low prevalence in any given tumor type (<10%), requiring a broad screening effort, and molecular abnormalities with high prevalence (e.g., TP53 or RAS mutation) may not have effective targeted regimens. Time delays may increase the risk that standard of care treatment may change during the time required to activate and/or complete the trial, or new drugs may be approved that affect the design of, or accrual to, the trial. For basket trials, the regulatory pathway may not be clear, unless the signal of activity is exceptionally strong. In rare tumors, without existing standard treatment, the regulatory path may be easier than in tumors with standard treatment. In tumors where standard treatment is of high efficacy, there is significant challenge to show improvement by a novel treatment. Finally, there is also the possibility that NGS may identify previously undiagnosed germline mutations (which may be present in tumor tissue as well), even in patients without a family history (30, 31). Best practices to address these issues have yet to be worked out.

Acute Myeloid Leukemia

The TARGET Acute Myeloid Leukemia projects elucidate comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of high-risk or hard-to-treat childhood cancers. Acute myeloid leukemia (AML) is a cancer that originates in the bone marrow from immature white blood cells known as myeloblasts. About 25% of all children with leukemia have AML. Although survival rates have increased since the 1970s, approximately half of all childhood AML cases relapse despite intensive treatment. Additional therapies following relapse are often unsuccessful and can be especially difficult and damaging for children. These patients would clearly benefit from targeted therapeutic approaches.

Through comprehensive genome-wide characterization, TARGET researchers are identifying the genetic and epigenetic alterations of relapsed disease. The ultimate goal is to translate their discoveries into novel treatments that will improve outcomes for children with AML. To learn more about pediatric AML and current treatment strategies, visit the NCI pediatric AML website.

TARGET investigators are analyzing tumors from pediatric patients, many who have relapsed, to identify biomarkers that correlate with poor clinical outcome and/or new therapeutic approaches to treat childhood AML. The tissues used in this study were collected from patients enrolled in Children's Oncology Group (COG) biology studies and clinical trials.

The AML project team members (like other TARGET researchers) are generating data in two phases: Discovery and Validation. Visit the TARGET Research page to learn more.

Discovery Dataset

The TARGET AML project has produced comprehensive genomic profiles of nearly 200 relapse-enriched, clinically annotated patient cases in the discovery dataset. This cohort includes

100 patients with adequate relapse specimens to study as trios (see three sample types below). Each fully-characterized TARGET AML case includes data from nucleic acid samples extracted from peripheral blood or bone marrow tissues as follows:

  • Primary tumor sample collected at diagnosis
  • Case-matched tissue sample collected at remission (<5% blasts detected following standard induction therapy)
  • Relapsed tumor sample (case-matched) when available

Additional cases with partial molecular characterization and/or sequencing data are available to the research community.

Case Selection Criteria

Tissues and clinical data used for the TARGET AML project were obtained from patients enrolled on biology studies and clinical trials managed through the Children’s Oncology Group (COG). Patient samples with full characterization were chosen based on the following criteria:

  • Patients achieved a remission following a standard two rounds of induction therapy (fewer than 5% blasts)
  • Bone marrow and peripheral blood blast counts of >50% in tumor specimens
  • Adequate amount of high-quality nucleic acids for comprehensive genomic profiling
  • 3 or fewer clinically-relevant cytogenetic findings (majority of cases)

Molecular Characterization

The TARGET AML project team relied on a variety of platforms to obtain a fully characterized dataset of

200 relapse-enriched cases. The COG AML Statistics and Data Center provided clinical annotations and outcome data for all cases. Visit the TARGET Project Experimental Methods page for detailed information and protocols.

General Methodology Platform
Clinical Annotation COG Protocols (CCG-2961, AAML03P1, AAML0531)
Gene Expression Affymetrix Gene ST Array
Chromosome Copy Number Analyses & Loss of Heterozygosity Affymetrix SNP 6.0 Array
Epigenetics (DNA Methylation) Illumina Infinium 27K or 450K
Whole Genome Sequencing Complete Genomics Incorporated
Whole Exome Sequencing Illumina Hi-Seq 2000
mRNA-seq Illumina Hi-Seq 2000
miRNA-seq Illumina Hi-Seq 2000

Verification of Discovery Variants

The TARGET AML project team utilized a variety of sequencing approaches to confirm candidate variants identified in the discovery sample cohort as somatic. For example, mRNA-seq results are being used to determine variants which were expressed and originally identified through whole genome or exome sequencing. These verified variants will be made available as open-access data. The TARGET AML project team has additionally employed the same targeted capture sequencing approach described in the validation strategy below to verify variants seen in a variety of tumor, remission and/or relapse samples from a majority of fully-characterized patient cases in the TARGET AML discovery cohort.

Validation Strategy

Some sequence mutations identified in the relapse-enriched discovery cohort, along with some previously published variants in adult AML, were further analyzed in an additional 600-plus cases. The TARGET AML project team employed targeted capture sequencing to look at the presence and frequency of alterations in 400 gene variants. This validation effort was performed in an unbiased cohort that was randomly selected from patients enrolled on a single COG protocol, which allowed for determination of the frequency of these changes across a broader spectrum of AML subtypes.


Children with AML whose disease is refractory to standard induction chemotherapy therapy or who experience relapse after initial response have dismal outcomes. Pediatric AML miRNA samples were sequenced to identify dysregulated genes and assess the utility of miRNA signature for improved outcome prediction. RNA-seq and other high-throughput next-generation sequencing platforms have emerged as powerful approaches for discovering pathogenic pathways and potential targets for clinical intervention in patients with AML. Within the Fred Hutchinson Cancer Research Center (FHCRC) AML dataset, there are 18 cases which got RNA-seq that overlap with the AML Discovery cohort and 26 cases that overlap with the AML Validation cohort. These data from the Children’s Oncology Group are stored together at the Data Coordinating Center and Genomic Data Commons for ease of use.

All data from the discovery and validation efforts are made available as specified in the Using TARGET Data and TARGET Publication Guidelines pages. The TARGET Data Matrix provides an overview of the data generated and described above.

TARGET includes comprehensive genomic characterization of some highly aggressive subtypes of pediatric cancers, including acute myeloid leukemia “induction failures” (AML-IF). Upon diagnosis, AML patients without high-risk genetic markers undergo standard chemotherapy regimen, called the primary induction phase to eliminate most cancer cells and induce a remission state. Successful treatment reduces the percentage of myeloblasts (immature white blood cells) detected in the patient following primary induction chemotherapy to less than 15%. Patients with greater than 15% myeloblasts did not sufficiently respond to standard therapy and are “induction failures.” Currently, these patients have few clinical options for further treatment.

Discovery Dataset

The TARGET AML-IF subproject has produced comprehensive genomic profiles of 30 clinically annotated patient cases. Each fully-characterized TARGET AML-IF case consists of data generated from nucleic acid samples extracted from case-matched tumor and normal tissues as follows:

  • Primary tumor sample collected at diagnosis
  • Control fibroblast sample grown from the patient’s bone marrow (case-matched)
  • Tumor obtained at the end of induction phase of treatment (2 rounds)

Case Selection Criteria

Tissues and clinical data used for the TARGET AML-IF project were obtained from patients enrolled on biology studies and clinical trials managed through the Children’s Oncology Group (COG). Patient samples with full characterization were chosen based on the following criteria:

  • Patients failed to achieve remission following a standard two rounds of induction therapy (>15% blasts)
  • Adequate amount of high-quality nucleic acids for comprehensive genomic profiling

Molecular Characterization

The TARGET AML project team relied on a variety of platforms to obtain a fully characterized dataset of 30 AML-IF cases. The COG AML Statistics and Data Center provided clinical annotations and outcome data for all cases. Visit the TARGET Project Experimental Methods page for detailed information and protocols.