Is nucleosomal DNA ever degraded?

Is nucleosomal DNA ever degraded?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

A hallmark of apoptotic DNA fragmentation is the digestion of DNA into individual nucleosomes. In addition, in the following experiment that tried to isolate the components of the nucleosome, the micrococcal nuclease only cleaved the linker DNA and not the nucleosomal DNA.

I've tried searching the internet but I couldn't find a clear answer to my question. Is nucleosomal DNA ever degraded? Are there nucleases that degrade it, and if there are, are they common? If nucleosomal DNA is not degraded, what protects it? The histone proteins?

Nice question! Nucleosome is quite a stable structure because of electrostatic attractions between the phosphate group of DNA and Lysine or Arginine of histone octamer1. In fact, both histone and DNA are known to protect each other from degradation: histone protects DNA from some kinds of damage2 and DNA protects histone protein from proteolysis3. Also, the histone protein gets modified whenever there is damage to nucleosomal DNA, so that DNA repair machinery can be activated4. Thus, nucleosome is not only structurally stable, but also tries to maintain its stability by its constituents protecting each other.


This stability has also been taken to a whole new level. In a research paper, scientists predicted that the reason for discovery of thousands of years old DNA, in intact form, is due to the stability of nucleosome5. They have even predicted that nucleosomal DNA can remain readable for even millions of years, because the time when histones start degrading is also the time when proteases and nucleases degrade, leaving very few chances for DNA to degrade.

But, nucleosomal degradation is not impossible. In a research, scientists concluded that the enzymes Caspase Activated DNase (CAD) or DNA Fragmentation Factor 40 (DFF40), together called CAD/DFF40, are indirectly required for degradation of nucleosome6. See this paragraph (I only pasted the main points):

We report here that in response to apoptotic signals from a death receptor (CD95 and tumor necrosis factor-$alpha$) or mitochondrial (staurosporine) apoptotic stimulus, the core nucleosomal histones H2A, H2B, H3, and H4 become separated from DNA during apoptosis in Jurkat and HeLa cells and are consequently detectable in the cell lysate prepared using a non-ionic detergent. The timing of this histone release from DNA correlates well with the progression of apoptosis… Taken together, these data demonstrate that CAD/DFF40 functions indirectly in mediating nucleosomal destruction during apoptosis… In this study, we have found that core nucleosomal histones separate from chromatin in apoptotic cells, but that this is not simply a by-product of DNA fragmentation. This event is indirectly related to CAD/DFF40 in the sense that CAD/DFF40 is required but is insufficient for the apoptotic histone release.

Thus, though it was the only paper I could find, nucleosomal DNA can also degrade in the same way as linker DNA if larger amount of CAD/DFF40 is present to degrade it.


  1. Histone structure and nucleosome stability; Leonardo Mariño-Ramírez, Maricel G Kann, Benjamin A Shoemaker, and David Landsman
  2. Nucleosomal histone protein protects DNA from iron mediated damage; Helen U.Enright, Wesley J.Miller and Robert P.Hebbel
  3. Actin and DNA Protect Histones from Degradation by Bacterial Proteases but Inhibit Their Antimicrobial Activity; Asaf Sol, Yaniv Skvirsky, Edna Blotnick, Gilad Bachrach, and Andras Muhlrad
  4. Histone - Wikipedia
  5. Degradation of ancient DNA; Zvi Kelman, Lori Moran
  6. Apoptotic Release of Histones from Nucleosomes; Dongcheng Wu, Alistair Ingram, Jill H. Lahti, Brie Mazza, Jose Grenet, Anil Kapoor, Lieqi Liu, Vincent J. Kidd and Damu Tang

Centromeres are maintained by fastening CENP-A to DNA and directing an arginine anchor-dependent nucleosome transition

Maintaining centromere identity relies upon the persistence of the epigenetic mark provided by the histone H3 variant, centromere protein A (CENP-A), but the molecular mechanisms that underlie its remarkable stability remain unclear. Here, we define the contributions of each of the three candidate CENP-A nucleosome-binding domains (two on CENP-C and one on CENP-N) to CENP-A stability using gene replacement and rapid protein degradation. Surprisingly, the most conserved domain, the CENP-C motif, is dispensable. Instead, the stability is conferred by the unfolded central domain of CENP-C and the folded N-terminal domain of CENP-N that becomes rigidified 1,000-fold upon crossbridging CENP-A and its adjacent nucleosomal DNA. Disrupting the 'arginine anchor' on CENP-C for the nucleosomal acidic patch disrupts the CENP-A nucleosome structural transition and removes CENP-A nucleosomes from centromeres. CENP-A nucleosome retention at centromeres requires a core centromeric nucleosome complex where CENP-C clamps down a stable nucleosome conformation and CENP-N fastens CENP-A to the DNA.

Conflict of interest statement

The authors declare no competing financial interests.


Figure 1. CENP-C CD is the only…

Figure 1. CENP-C CD is the only nucleosome-binding domain of CENP-C required for retention of…

Figure 2. The arginine anchor of CENP-C…

Figure 2. The arginine anchor of CENP-C CD is critical for the CENP-A nucleosome structural…

Figure 3. The arginine anchor of CENP-C…

Figure 3. The arginine anchor of CENP-C CD is required for CENP-A nucleosome stability at…

Figure 4. CENP-N NT crossbridges CENP-A to…

Figure 4. CENP-N NT crossbridges CENP-A to DNA.

( a ) Coomassie Blue-stained SDS–PAGE of co-purification…

Figure 5. CENP-N NT undergoes global stabilization…

Figure 5. CENP-N NT undergoes global stabilization upon binding to the CENP-A nucleosome.

Figure 6. CENP-C CD and CENP-N NT…

Figure 6. CENP-C CD and CENP-N NT simultaneously bind to the same CENP-A NCP and…

Figure 7. CENP-C and CENP-N collaborate to…

Figure 7. CENP-C and CENP-N collaborate to maintain CENP-A nucleosomes at centromeres.

Figure 8. Model of the physical basis…

Figure 8. Model of the physical basis for the stability of CENP-A nucleosomes within the…

Author Summary

The octameric structure of eukaryotic nucleosomes is universally accepted as the basic unit of chromatin. This is certainly the case for the vast bulk of nucleosomes however, there have been no reports of the in vivo structure of nucleosomes associated with centromeres. Though centromeres make up only a minute fraction of the genomic landscape, their role in segregating chromosomes during mitosis is essential for maintaining genomic integrity. We report the characterization of centromeric chromatin from Drosophila cells, using detailed biochemical, electron microscopic, and atomic force microscopic analyses. Surprisingly, we found that, in striking contrast to bulk chromatin, centromeric nucleosomes are stable heterotypic tetramers in vivo, with one copy of CenH3 (the centromere-specific H3 variant), H2A, H2B, and H4 each, wrapping one full turn of DNA at interphase (the cell growth phase of the cell cycle). This results in nucleosome particles that are only half as high as bulk nucleosomes. These unexpected findings can help account for the dynamic behavior of CenH3-containing nucleosomes, whereby they are deposited promiscuously but are turned over in noncentromeric regions. Our demonstration of the existence of stable half-nucleosomes at centromeres suggests a novel mechanism for maintaining centromere identity.

Citation: Dalal Y, Wang H, Lindsay S, Henikoff S (2007) Tetrameric Structure of Centromeric Nucleosomes in Interphase Drosophila Cells. PLoS Biol 5(8): e218.

Academic Editor: Jim Kadonaga, University of California San Diego, United States of America

Received: April 24, 2007 Accepted: June 12, 2007 Published: July 31, 2007

Copyright: © 2007 Dalal et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Howard Hughes Medical Institute. YD was supported by an award (to SH) from the National Science Foundation (DBI 0234960).

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: AFM, atomic force microscopy BC, bulk chromatin CenH3, centromere-specific histone H3 DMS, dimethyl suberimidate IP, immunoprecipitated MNase, micrococcal nuclease

Phylogenomics of the nucleosome

Histones are best known as the architectural proteins that package the DNA of eukaryotic organisms, forming octameric nucleosome cores that the double helix wraps tightly around. Although histones have traditionally been viewed as slowly evolving scaffold proteins that lack diversification beyond their abundant tail modifications, recent studies have revealed that variant histones have evolved for diverse functions. H2A and H3 variants have diversified to assume roles in epigenetic silencing, gene expression and centromere function. Such diversification of histone variants and 'deviants' contradicts the perception of histones as monotonous members of multigene families that indiscriminately package and compact the genome. How these diverse functions have evolved from ancestral forms can be addressed by applying phylogenetic tools to increasingly abundant sequence data.

Denisovans: Another Human Relative

Scientists have also found DNA from another extinct hominin population: the Denisovans. The only remains of the species that have been found to date are a single fragment of a phalanx (finger bone) and two teeth, all of which date back to about 40,000 years ago (Reich 2010). This species is the first fossil hominin identified as a new species based on its DNA alone. Denisovans are relatives of both modern humans and Neanderthals, and likely diverged from these lineages around 300,000 to 400,000 years ago. You might be wondering: If we have the DNA of Denisovans, why can’t we compare them to modern humans like we do Neanderthals? Why isn’t this article about them too? The answer is simply that we don’t have enough DNA to make a comparison. The three specimen pool of Denisovans found to date is statistically far too small a data set to derive any meaningful comparisons. Until we find more Denisovan material, we cannot begin to understand their full genome in the way that we can study Neanderthals.

Neanderthals and modern humans shared habitats in Europe and Asia

We can study Neanderthal and modern human DNA to see if they interbred with modern humans

We can study the DNA of Neanderthals because we have a large enough Neanderthal sample size (number of individual Neanderthals) to compare to humans


Different antibodies against proteins of the nuclear envelope were used in a detailed characterization of nuclear apoptosis in staurosporine-treated BRL cells. Pair-wise comparison of double-labeled individual apoptotic cells by fluorescence microscopy suggested that proteins from the nuclear envelope disappeared in a sequential order. This was confirmed by western blot analysis of nuclear envelope proteins from apoptotic cell cultures. The elimination of proteins from the nuclear envelope in apoptotic BRL cells was compared with other hallmarks of apoptosis (degree of chromatin condensation, DNA degradation and NPC clustering) and compiled into a model (Fig. 9). Three different stages of apoptotic progression could be clearly distinguished: stage I, moderately condensed chromatin surrounded by a smooth nuclear periphery stage II, compact patches of condensed chromatin collapsing against a smooth nuclear periphery stage III, round compact chromatin bodies surrounded by grape shaped nuclear periphery. Condensation of chromatin was an early event in apoptotic BRL cell nuclei (stage I). Fragmentation of DNA as judged by TUNEL assay, started at a later time point (stage II). Interestingly, disappearance of POM121 and RanBP2 started in stage I and was more or less complete in stage II. Degradation of these two nucleoporins clearly preceded the degradation of NUP153 and lamin B, which started in stage II and proceeded gradually through stage III. By contrast, p62 still remained in stage III, which allowed detection of pore clustering, another hallmark of apoptosis (Buendia et al., 1999 Falcieri et al., 1994). Pore clustering developed gradually, starting clearly after POM121 and RanBP2 had been eliminated. Pore clustering apparently correlated with the disappearance of NUP153 and lamin B, and became more pronounced with alterations of nuclear peripheral shape.

Although one must keep in mind that the apoptotic program may differ depending on cell type and the method of initiation, the sequential order of elimination of proteins from the nuclear envelope reported here is consistent with previous publications. For example, it has been shown that degradation of lamin B, Lap2 and NUP153 preceded proteolysis of p62, LBR and gp210 (Buendia et al., 1999). Another study (Gotzmann et al., 2000) showed that degradation of lamin B preceded degradation of emerin. In both of the above reports, lamin B was degraded to a large extent in cells showing a high degree of chromatin condensation, corresponding to stages II-III in our investigation. Faleiro and Lazebnik reported decreased immunostaining in apoptotic cells due to inaccessibility or modification of the epitope recognized by mAb414 antibodies (Faleiro and Lazebnik, 2000). This was not the case in our study, where data from immunofluorescence and western blotting correlated well (Fig. 5, Fig. 7).

Since NPCs are believed to be connected to the nuclear lamina (Daigle, 2001 Dwyer and Blobel, 1976), it is tempting to speculate that clustering of nuclear pores and loss of membrane integrity is dependent of nuclear lamina disruption. In fact, NPC clustering was one of the most striking features in Drosophila embryos lacking B-type lamins (Lenz-Bohme et al., 1997) and mice lacking lamin A displayed nuclei with perturbed discontinuous nuclear envelopes (Sullivan et al., 1999). Apoptotic elimination of lamin B has been reported to be associated with the appearance of discontinuities in the nuclear envelope of chicken DU249 hepatoma cells (Duband-Goulet et al., 1998).

This study presents, for the first time, data that relate the apoptotic elimination of the integral pore membrane protein POM121 to that of other NE proteins and other hallmarks of apoptosis. Surprisingly, POM121 together with RanBP2 appear to be the earliest NPC proteins to be degraded. At present we have not identified the caspase responsible for POM121 degradation and we have been unable to detect any proteolytic fragment(s). The cytoplasmically exposed C-terminal portion of POM121 contains two tetrapeptides D 146 PRD and D 528 KTD, which could act as potential proteolytic sites for caspases (Cohen, 1997), although this remains to be analyzed experimentally.

The sequential order of elimination of the NE proteins observed can be interpreted in two different ways. If the NE proteins were simply degraded by a common set of downstream effector caspases one would expect the order of degradation to be limited by accessibility. It appears possible that apoptosis proceeds in a centripetal direction as a gradient of activated caspases starting in the cytoplasm and working its way into the nucleus. In such a scenario, proteins accessible from the cytoplasmic side of the NPC would be degraded first. This is true for both RanBP2, located on the cytoplasmic fibrils (Yokoyama et al., 1995) and POM121, whose C-terminal portion has been shown to be accessible for antibodies in digitonin-treated BRL cells (Soderqvist and Hallberg, 1994). In a centripetal mechanism, activated caspases would first have to enter the nucleus before getting access to NUP153 and lamin B on the nucleoplasmic side explaining their delayed degradation. However, this model does not explain the resistance against degradation shown by p62, gp210 and emerin even in late apoptosis (this study) (Buendia et al., 1999 Gotzmann et al., 2000). Furthermore, an ultra-structural study show that clustered pore complexes in late nuclear apoptosis essentially maintain their overall structure and density (Falcieri et al., 1994), indicating that most of the NPC proteins are still intact. A more likely interpretation of the sequential degradation observed in our study is that initially only a selective group of strategic targets are attacked by upstream proteases. This initial attack may pave the way for more substantial destruction in later stages, perhaps facilitated by loss of pore function and increased permeability. The C-terminal portion of POM121 is believed to be located in the central spoke region separating the smaller peripheral channels of the NPC believed to allow diffusion of smaller proteins between the cytoplasmic and nucleoplasmic compartments (Hinshaw et al., 1992). Elimination of POM121 would thus be expected to facilitate increased diffusion. Pore membrane proteins are believed to function in pore formation and anchoring periperal NUPs, and are thought to be important for NPC stability. Thus, an initial degradation of POM121 in apoptosis may serve to destabilize the NPC and perhaps allow nuclear entry of effector caspases and nucleases. The latter model would be consistent with a recent study, showing that disruption of the nucleocytoplasmic barrier is dependent on caspase-9 and precedes activation of caspase-3 (Faleiro and Lazebnik, 2000).

The sorting determinants of gp210, another vertebrate pore membrane protein, was located to its single transmembrane segment and to its 58 amino acid cytoplasmically exposed C-terminal tail (Wozniak and Blobel, 1992), suggesting that either or both of these domains are able to interact with other pore complex proteins (e.g. POM121), as was proposed (Hallberg et al., 1993). In POM121, a portion (amino acids 129-618) of its cytoplasmically exposed C-terminal domain has been shown to be responsible for targeting to the nuclear pores (Söderqvist et al., 1997). Here we show that gp210 remained associated with nuclear pores of apoptotic cells when POM121 had been eliminated. Our data is consistent with an earlier study (Buendia et al., 1999) showing that gp210 becomes degraded at a step succeeding degradation of NUP153 and lamin B in apoptosis. Our data also suggest that POM121 is not required for keeping gp210 in the pore membrane. Naturally, this does not exclude interaction(s) between these two proteins in other situations (e.g. during pore formation).

In this paper we have shown that overexpressed POM121-GFP and endogenous POM121 are degraded synchronously, that POM121 proteolysis is caspase dependent and appears to be a general phenomenon in apoptosis. Furthermore, we have shown that POM121 was one of the earliest proteins of the NE to be eliminated, occurring before many other signs of nuclear apoptosis (i.e. TUNEL assay detectable DNA fragmentation and NPC clustering). It will be interesting to compare elimination of POM121 with other hallmarks of apoptosis (e.g. release of cytochrome c from mitochondria, decreased mitochondrial inner membrane potential or appearance of phosphatidylserine on the cell surface). Non-invasive markers, such as GFP-tagged POM121 and cytochrome c (Goldstein et al., 2000), makes it possible to study propagation of apoptosis in individual living cells by time-lapse microscopy and are thus valuable tools for studies of mechanistic and temporal aspects of cell death progression. This is especially true in light of the recent discovery that reversible epigenetic factors may influence apoptotic development (Jones, 2001).

Characterization of new antibodies against POM121. (A) Immunostaining of monolayer BRL cells with anti-POM121 antibodies. A grazing section (a), an equatorial section (b) and the corresponding DNA staining (c) of nuclei of BRL cells are shown. Bar, 20 μm. (B) Western blotting of SDS-PAGE separated proteins of rat nuclear envelope membranes extracted with 7 M urea (7M up) or of whole cell lysates of monolayer cultures of BRL cells (BRL lyste), using anti-POM121 antibodies.

Characterization of new antibodies against POM121. (A) Immunostaining of monolayer BRL cells with anti-POM121 antibodies. A grazing section (a), an equatorial section (b) and the corresponding DNA staining (c) of nuclei of BRL cells are shown. Bar, 20 μm. (B) Western blotting of SDS-PAGE separated proteins of rat nuclear envelope membranes extracted with 7 M urea (7M up) or of whole cell lysates of monolayer cultures of BRL cells (BRL lyste), using anti-POM121 antibodies.

Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA

DNA methylation occurs on CpG sites and is important to form pericentric heterochromatin domains. The satellite 2 sequence, containing seven CpG sites, is located in the pericentric region of human chromosome 1 and is highly methylated in normal cells. In contrast, the satellite 2 region is reportedly hypomethylated in cancer cells, suggesting that the methylation status may affect the chromatin structure around the pericentric regions in tumours. In this study, we mapped the nucleosome positioning on the satellite 2 sequence in vitro and found that DNA methylation modestly affects the distribution of the nucleosome positioning. The micrococcal nuclease assay revealed that the DNA end flexibility of the nucleosomes changes, depending on the DNA methylation status. However, the structures and thermal stabilities of the nucleosomes are unaffected by DNA methylation. These findings provide new information to understand how DNA methylation functions in regulating pericentric heterochromatin formation and maintenance in normal and malignant cells.

1. Introduction

DNA methylation is an important epigenetic mark that regulates the formation of chromatin domains, such as heterochromatin [1–5]. In mammals, DNA methylation occurs in the CpG dinucleotide and is considered to affect the structure and stability of the nucleosome, which is the basic architecture in chromatin [6–10]. In the nucleosome, about 150 base pairs of DNA are left-handedly wrapped around the histone octamer, composed of two each of the core histones H2A, H2B, H3 and H4 [11–13].

DNA methylation is reportedly correlated with nucleosome positioning in plant and mammalian genomes [14,15]. The genomic DNA regions with high CpG content are known as CpG islands, and the CpG methylation apparently plays pivotal roles in gene regulation and genomic DNA maintenance [4,16,17]. Abnormal DNA methylation statuses have been detected in various cancer cells [18,19]. CpG islands are mostly hypomethylated in normal cells, but are hypermethylated in cancer cells, especially in the promoters of tumour suppressor genes [4,20,21]. In contrast, large-scale CpG island demethylation has been detected at the tissue-specific gene promoters in lung cancers [22]. These previous findings suggested that DNA methylation functions in proper gene expression and genomic DNA stability [23,24].

Heterochromatin instability in pericentromeric satellite regions has also been detected as an early and frequent event during human carcinogenesis [25]. Interestingly, this heterochromatin instability occurs concomitantly with the hypomethylation of the CpG sites on the satellite DNA [25–28]. However, the means by which this difference in the CpG methylation status affects the structural features of the nucleosome remain elusive.

In this study, we reconstituted nucleosomes with methylated and unmethylated human satellite 2 DNA fragments, in which the CpG sites are reportedly hypomethylated in cellular carcinomas [29]. Our biochemical and structural analyses revealed that the DNA methylation influenced the positioning and the DNA end flexibility of the nucleosomes assembled on the satellite 2 sequence, without affecting the nucleosome structures and stabilities.

2. Results

2.1. Nucleosome formation on the human satellite 2 sequence

We first prepared a 160 base-pair human satellite 2 DNA fragment. This satellite 2 fragment contained seven CpG sites, TTCGAT, TTCGAT, TTCGAT, TCCGAG, TTCGAT, TTCGAT and TTCGAG (from 5′ to 3′), which are potentially methylated in normal cells (figure 1a, upper panel). To ensure that these CpG sites are fully methylated, all of the CpG sites were replaced by TTCGAA, which can be cleaved by the restriction enzyme BstBI (figure 1a, lower panel). In this study, this satellite 2 derivative was named Sat2. As shown in figure 1b (lane 1), all of the CpG sites in the Sat2 160 base-pair fragment were digested by BstBI. As anticipated, the BstBI cleavage was completely inhibited when the Sat2 160 base-pair fragment was treated with the DNA methyltransferase M.SssI (figure 1b, lane 2), indicating that all seven CpG sites of Sat2 were fully methylated.

Figure 1. Translational positions of nucleosomes on methylated and unmethylated human satellite 2 DNAs. (a) Human pericentric satellite 2 DNA sequence (NCBI accession code: 603562, upper panel). The seven CpG sites are represented by capital red letters. The satellite 2 derivative (Sat2), in which the seven CpG sites of the satellite 2 DNA were substituted with BstBI recognition sites (highlighted by yellow rectangles), is represented in the lower panel. For DNA fragment preparation, Sat2 contains EcoRV sites at both ends of the DNA (highlighted by purple rectangles). (b) Non-denaturing PAGE analysis of the methylated and unmethylated Sat2 DNAs. The DNA fragment was methylated by the M.SssI DNA methyltransferase, and then treated with the BstBI restriction enzyme (8 units µg −1 DNA, lane 2). Lane 1 indicates a control experiment without M.SssI. The DNA (200 ng) was analysed by 10% PAGE with ethidium bromide staining. Lane 3 indicates the 10 base-pair DNA ladder markers. (c) The methylated and unmethyated DNA fragments (30 ng), with or without MNase treatment, were analysed by non-denaturing PAGE. Lane 1 indicates the 10 base-pair DNA ladder markers. Lanes 2 and 3 indicate the unmethylated and methylated Sat2 DNA fragments. Lanes 4 and 5 indicate the nucleosomal unmethylated and methylated Sat2 DNA fragments, protected from MNase. (d) Schematic of the translational nucleosome positions, determined by deep sequencing after MNase treatment. Yellow boxes indicate the CpG sites. The red (left), blue (centre) and green (right) ellipses represent the three translational nucleosome positions, with dyad axes located in the 75 (±3), 81 (±3) and 88 (±3) regions, respectively. (e) Graphic representation of the nucleosome ratios, located at the left, centre and right positions. White and grey bars represent the experiments with unmethylated and methylated Sat2 DNAs, respectively. Standard deviation values are shown (n = 3).

We then reconstituted the nucleosomes with methylated or unmethylated 160 base-pair Sat2 DNA fragments, by the salt dialysis method. The reconstituted nucleosomes were treated with micrococcal nuclease (MNase), which preferentially cleaves the linker DNA segments detached from the histone surface, and the resulting approximately 145 base-pair DNA fragments were purified (figure 1c, lanes 4 and 5). We then performed massively parallel sequencing (deep sequencing) with these MNase-treated DNA fragments and found one major (right, denoted as R) and two minor (centre and left, denoted as C and L, respectively) nucleosome positions on the Sat2 sequence (figure 1d,e). The major R position was mapped on the right edge of the Sat2 DNA fragment, and the minor C and L positions were shifted by about 7 and 13 base pairs from the right edge, respectively (figure 1d). In both the methylated and unmethylated Sat2 DNAs, about 70% of the nucleosomes were formed at the R position, although a slight decrease was observed with the methylated Sat2 (figure 1e). Similarly, upon the DNA methylation, the nucleosome population at the C position was decreased (figure 1e). In contrast, the population of the L position was increased 1.5-fold when the methylated Sat2 was used as the substrate (figure 1e).

2.2. Crystal structures of the nucleosomes containing the methylated Sat2R and Sat2L DNAs

We crystallized the nucleosomes containing the methylated Sat2L (145 base pairs) and Sat2R (146 base pairs) DNA fragments and determined their structures at 2.63 Å and 3.15 Å resolutions, respectively (table 1 and figure 2a,b). For a reference, we also determined the structure of the nucleosome containing the unmethylated Sat2R sequence at 2.90 Å resolution (table 1 and figure 2c). The histone octamer structures in the nucleosomes containing the methylated Sat2R and Sat2L DNAs were the same as that in the nucleosome containing the unmethylated Sat2R DNA (figure 2ac). In addition, the DNA binding path in the methylated Sat2R nucleosome was not different from that in the unmethylated R nucleosome (figure 2d). The DNA binding path in the methylated Sat2L nucleosome was also the same as that in the unmethylated Sat2R nucleosome (figure 2e). Therefore, these results indicate that the hypermethylation at the seven CpG positions of the Sat2 DNA does not affect the intrinsic DNA wrapping property of the histone octamer.

Figure 2. Crystal structures of nucleosomes containing methylated Sat2 DNAs. (a,b) The crystal structures of the nucleosomes containing the methylated satellite 2 left (Sat2L) DNA (a) and the methylated satellite 2 right (Sat2R) DNA (b). The 5-methyl-cytosines were not visible in these structures, because the nucleosomes were packed in a nested manner in the crystals. (c) The crystal structure of the nucleosome containing unmethylated Sat2R DNA. (d) The methylated Sat2R DNA structure is superimposed on the unmethylated Sat2R DNA structure in the nucleosomes. (e) The methylated Sat2L DNA structure is superimposed on the unmethylated Sat2R DNA structure in the nucleosomes.

Table 1. Data collection and refinement statistics (molecular replacement).

Possible Dinosaur DNA Has Been Found

The tiny fossil is unassuming, as dinosaur remains go. It is not as big as an Apatosaurus femur or as impressive as a Tyrannosaurus jaw. The object is a just a scant shard of cartilage from the skull of a baby hadrosaur called Hypacrosaurus that perished more than 70 million years ago. But it may contain something never before seen from the depths of the Mesozoic era: degraded remnants of dinosaur DNA.

Genetic material is not supposed to last over such time periods&mdashnot by a long shot. DNA begins to decay at death. Findings from a 2012 study on moa bones show an organism&rsquos genetic material deteriorates at such a rate that it halves itself every 521 years. This speed would mean paleontologists can only hope to recover recognizable DNA sequences from creatures that lived and died within the past 6.8 million years&mdashfar short of even the last nonavian dinosaurs.

But then there is the Hypacrosaurus cartilage. In a study published earlier this year, Chinese Academy of Sciences paleontologist Alida Bailleul and her colleagues proposed that in that fossil, they had found not only evidence of original proteins and cartilage-creating cells but a chemical signature consistent with DNA.

Recovering genetic material of such antiquity would be a major development. Working on more recently extinct creatures&mdashsuch as mammoths and giant ground sloths&mdashpaleontologists have been able to revise family trees, explore the interrelatedness of species and even gain some insights into biological features such as variations in coloration. DNA from nonavian dinosaurs would add a wealth of new information about the biology of the &ldquoterrible lizards.&rdquo Such a find would also establish the possibility that genetic material can remain detectable not just for one million years, but for tens of millions. The fossil record would not be bones and footprints alone: it would contain scraps of the genetic record that ties together all life on Earth.

Yet first, paleontologists need to confirm that these possible genetic traces are the real thing. Such potential tatters of ancient DNA are not exactly Jurassic Park&ndashquality. At best, their biological makers seem to be degraded remnants of genes that cannot be read&mdashbroken-down components rather than intact parts of a sequence. Still, these potential tatters of ancient DNA would be far older (by millions of years) than the next closest trace of degraded genetic material in the fossil record.

If upheld, Bailleul and her colleagues&rsquo findings would indicate that biochemical traces of organisms can persist for tens of millions of years longer than previously thought. And that would mean there may be an entire world of biological information experts are only just getting to know. &ldquoI think exceptional preservation is really more common than what we think, because, as researchers, we have not looked at enough fossils yet,&rdquo Bailleul says. &ldquoWe must keep looking.&rdquo

The question is whether these proteins and other traces are really what they seem. Hot on the heels of Bailleul&rsquos paper&mdashand inspired by the controversy over what the biomolecules inside dinosaur bones represent&mdasha separate team, led by Princeton University geoscientist Renxing Liang, recently reported on unexpected microbes found inside one from Centrosaurus, a horned dinosaur of similar age to Hypacrosaurus. The researchers said that they unearthed DNA inside the bone, but it was from lineages of bacteria and other microorganisms that had not been seen before. The bone had its own unique microbiome, which could cause confusion as to whether proteins and possible genetic material belonged to the dinosaur itself or to bacteria that had come to reside within it during the fossilization process.

The discovery that such fossils can harbor bacterial communities different from those in the surrounding stone complicates the search for dinosaur DNA, proteins and other biomolecules. The modern may be overlaid on the past, creating a false image. &ldquoEven if any trace organics could be preserved,&rdquo Liang says, &ldquothe identification processes would be as challenging as finding a needle in the haystack and thus will likely lead to potential false claims.&rdquo

&ldquoRight now, molecular paleontology is controversial,&rdquo Bailleul says. The first sticking point is that when researchers look for traces of ancient biological molecules, they use technologies invented to find intact traces that have been degraded or altered by vast amounts of time. On top of that issue, there remains much experts do not know about how a dinosaur bone changes from organic tissue in a recently alive animal to a fossil hardened by minerals. &ldquoWe have not figured out all of the complex mechanisms of molecular fossilization using chemistry. And we don&rsquot know enough about the roles that microbes play,&rdquo Bailleul says. For example, it is unclear how modern microbes outside of fossils might interact with those that have been living within the bones.

These unknowns, as well as protocols that are still in development, fuel the ongoing debate over what the biological tidbits inside dinosaur bones represent. The research on the Hypacrosaurus cartilage looked at its microscopic details and used chemical stains that bind to DNA. In contrast, the study on the Centrosaurus bone used DNA sequencing to understand the nature of the genetic traces inside it&mdashbut did not look at its microstructure.

Bailleul acknowledges that considering previously unknown forms of microorganisms when studying dinosaur bone microbiology is important. But she proposes that it is unlikely bacteria would find their way into a cartilage cell and mimic its nucleus in such a way that researchers would mistake the microorganisms for the genuine article. Yet &ldquoyou can never be too skeptical of your own results,&rdquo says paleogeneticist and author Ross Barnett, who was not involved in the two studies described above.

One of the largest difficulties in the ongoing debate, Barnett says, is a lack of replication. And paleogenetics has been through this problem before: Around the time the film Jurassic Park debuted in 1993, research papers heralded the discovery of Mesozoic DNA. Those claims were later overturned when other research teams could not replicate the same results. Even though the science of paleogenetics has changed since that time, the need for multiple labs to confirm the same result remains important. &ldquoIf a different lab could be independently sent fossils from the same site, work up their own antibodies, do their own staining and get the same results, it would make things more believable,&rdquo Barnett says. Such collaboration has yet to take place for some of the assertions of exceptional dinosaurian preservation.

Nevertheless, molecular paleobiology is developing standards of evidence and protocols as it continues to search for clues held inside ancient bones. &ldquoI hope that many paleontologists or biologists, or both, are also trying to do this,&rdquo Bailleul says. &ldquoWe can figure out the answers faster if we are all working on this together.&rdquo

Even if proposed dinosaur organics turn out to be false, the effort could still yield unexpected benefits. Bacterial communities are thought to be involved in the preservation of bones and in their replacement with minerals, thus helping dinosaur remains become fossils. &ldquoFuture studies about ancient DNA from past microbial communities that used to live inside the dinosaur bones could shed more light on the roles of microorganisms in the fossilization and preservation of bones through geological time,&rdquo Liang says.

&ldquoThese are very difficult questions,&rdquo Bailleul says. &ldquoBut if we keep trying, there is hope that we will figure out most answers.&rdquo As the situation stands now, nothing is written in stone.

Phenotypes from cell-free DNA

Cell-free DNA (cfDNA) has the potential to enable non-invasive detection of disease states and progression. Beyond its sequence, cfDNA also represents the nucleosomal landscape of cell(s)-of-origin and captures the dynamics of the epigenome. In this review, we highlight the emergence of cfDNA epigenomic methods that assess disease beyond the scope of mutant tumour genotyping. Detection of tumour mutations is the gold standard for sequencing methods in clinical oncology. However, limitations inherent to mutation targeting in cfDNA, and the possibilities of uncovering molecular mechanisms underlying disease, have made epigenomics of cfDNA an exciting alternative. We discuss the epigenomic information revealed by cfDNA, and how epigenomic methods exploit cfDNA to detect and characterize cancer. Future applications of cfDNA epigenomic methods to act complementarily and orthogonally to current clinical practices has the potential to transform cancer management and improve cancer patient outcomes.

1. Introduction

Blood is a minimally invasive source for tracking an individual's health status. Most known biomolecules found in blood, such as proteins, DNA, RNA, lipids, and metabolites inform us of some aspect of body function. However, using blood to diagnose and track cancer is still a major challenge. Minimally invasive diagnostics for cancer can greatly reduce pain and suffering of patients, and if cheaper than current methods, could be performed more often in the course of treatment to monitor disease state and inform clinical care. Genomic characterization of cancer either from biopsies or plasma cell-free DNA (cfDNA) has the added potential of providing personalized treatment options.

cfDNA is DNA bound to mononucleosomes and is found circulating extracellularly in the blood. cfDNA was first identified in 1948 [1] in the blood of patients however, the hypothesis that cfDNA acts as a reflection of disease state arose from observations in 1977 [2]. This relationship between cfDNA and disease has been a subject of study ever since. There are multiple possible pathways for the release of DNA fragments from cells, including apoptosis, necrosis, and exosome secretion [3–6]. Processes that increase the release of cfDNA include disease, inflammation, tissue injury, and exercise [7,8]. In healthy individuals, haematopoietic maturation is a major contributor to the normal cfDNA pool. Lui et al. elegantly demonstrated that lymphoid/myeloid tissues are the major contributors to cfDNA by identifying Y-chromosome sequences in plasma of female recipients of bone marrow transplantations from male donors [9]. Multiple studies since then further confirmed that the lymphoid/myeloid tissues mainly contribute to the normal cfDNA pool [10–13]. Tumours, when present, also contribute to cfDNA. Thus, cfDNA is a molecular barcode of cells undergoing turnover and is an attractive target for clinical diagnostics and real-time monitoring of many cancers.

Provided that there are ways to distinguish cfDNA originating at the disease site from cfDNA produced during normal turnover, cfDNA could be used for diagnosis. The major focus of using cfDNA for cancer diagnosis has been the identification of a limited disease-specific panel of mutations. However, mutation detection has a significant limitation: that clonal haematopoiesis contributes to a significant fraction of mutations in cfDNA, including mutations in prominent cancer-associated genes like TP53 and DNMT3A [14]. Moreover, these mutations in cfDNA from the early stages of disease can be indistinguishable from mutations that arise at appreciable rates in healthy tissues [15,16]. This major obstacle has led to the exploration of other orthogonal information in cfDNA that could identify its tissue-of-origin and, in turn, disease states. Epigenomes reflect cellular identity and phenotype, and our ability to connect epigenomes to cellular identity could be used to infer disease phenotypes from cfDNA. Significantly, epigenome-based cfDNA approaches would be orthogonal and complementary to mutation-based cfDNA assays. In this review, we discuss epigenome features that can be characterized in cfDNA, and which provide information on the cfDNA tissue-of-origin.

2. Cell-free DNA contains a map of the chromatin state in the tissue-of-origin

The periodicity of cytoplasmic DNA released during red blood cell maturation in mouse fetal liver led to the hypothesis that there was a regular arrangement of protein protections on the genome [17,18]. This work preceded both the field of apoptosis and the biochemical characterization of the nucleosome and provided the first hint that cfDNA could be a map of chromatin structure. Protein-bound DNA lasts longer than naked DNA in serum [19], where nucleases are abundant, which suggests that cfDNA is probably double-stranded and protein-bound to inhibit degradation by endogenous nucleases in plasma. This is supported by the ability to detect nucleosomes in plasma by sandwich enzyme-linked immunosorbent assays (ELISA) and by chromatin immunoprecipitation (ChIP) [20–24].

Apoptosis is one of the processes that could lead to genome fragmentation into small protein-DNA complexes that are found in circulation. Apoptosis involves the upregulation of nucleases that attack the cell's genome. Characteristic of this ‘attack' is DNA fragmentation that results in a laddering of repeat species less than 5 kb [5,25]. The repeating unit in laddering seen in apoptosis corresponds to approximately 150 bp, which parallels the length of DNA wrapped around a nucleosome. Therefore, plasma cfDNA, which is typically in the form of a mononucleosome, informs us about the chromatin structure of cells undergoing turnover. In other words, the ‘epigenome' of the tissue-of-origin can be measured from cfDNA (figure 1).

Figure 1. Cell-free DNA reflects the structural epigenomic information of the cell-of-origin. Schematic of the nucleosomal landscape differences of gene X when it is not expressed (left) and expressed (right) in different cell-types. (a) A non-expressed gene features promoter-proximal DNA methylation (red flags), methylation of nucleosomal histone tails (red flags, H3K27me3), nucleosome occlusion of the promoter (arrow), a lack of transcription factors (TFs) upstream of the promoter, and the absence of RNA polymerase II (RNAPII). (b) At expressed genes, nucleosomes are well-positioned, modifications like H3K4me3 (green flags) are present, RNAPII occupies the promoter, and TFs are bound upstream. At the +1 nucleosome during transcriptional elongation, RNAPII transiently breaks DNA-histone contacts allowing H2A-H2B dimers to exchange (light blue crescent). Nucleases, shown as scissors, are ubiquitously present and preferentially cleave accessible DNA. At gene X, when protein protections of DNA change during chromatin remodelling events, such as transcription, nuclease activity captures the different DNA length protections (see fragment lengths). (c) During cell-turnover, DNA-protein complexes are released into circulation. (d) Circulating DNA-protein complexes in plasma preserve the epigenomic features of the transcriptional status of the cell-of-origin. These features include fragment lengths (protein-DNA protections), DNA methylation, nucleosome positioning profiles, and nucleosome post-translational modifications. Note the preservation of these transcriptional and non-transcriptional DNA-protein species from in the body to plasma.

3. Structural epigenomics

It has long been known that chromatin structure in cells can be inferred from digestion patterns of nucleases that are sensitive to protein-DNA contacts [26]. Micrococcal nuclease (MNase) has been used to study chromatin structure for decades [27]. Understanding chromatin structure by sequencing fragments protected from MNase digestion has an unexpected parallel in cfDNA. The information derived from fragment length and genome location can inform us of biochemical activity at a genomic locus for MNase and cfDNA. For cfDNA, this can be used to infer transcriptional activity and tissue-of-origin. As we will demonstrate below, the length and genomic location of fragments protected from a nuclease can tell us the locus-specific distribution of nucleosomes, transcription factors (TF), and nucleosomal intermediates formed during transcription (figure 1). The ability to identify locus-specific structures of protein-DNA complexes from sequencing data, termed ‘structural epigenomics', is a general strategy applicable to a wide variety of datasets, including cfDNA sequencing datasets [11].

4. Nucleosome positions reflect genome function

Eukaryotic genomic DNA is packaged with nucleosomes like ‘beads on a string' and consecutive nucleosomes are separated by short linker DNA [28,29]. MNase preferentially degrades linker DNA and is inhibited when it encounters a protein-DNA contact [27,30]. A nucleosome protects 147 bp of DNA, a chromatosome (nucleosome with a linker histone bound) protects 167 bp of DNA, and TFs protect less than 50 bp of DNA. MNase has traditionally been used to map positions of whole nucleosomes in intact nuclei by purifying approximately 147 bp DNA after MNase digestion and subjecting this DNA to massively parallel short-read sequencing [31]. Nucleosome positioning impacts all biochemical processes that occur on the genome and consequently, knowing nucleosome positions enables us to predict biochemical activities that occurred in the cells that resulted in these protections [32]. A striking example of a distinct nucleosome organization is found in active genes. There is a depletion of nucleosomes at active promoters and nucleosomes are well-ordered upstream and downstream of active promoters. cfDNA contains information on nucleosome positioning and high-quality nucleosome maps were obtained from deep, whole-genome cfDNA sequencing from the plasma of healthy donors [12,33–35].

Nucleosome density inferred from cfDNA of healthy individuals had the same features as that of lymphoid cell lines: genes highly expressed in lymphoid cell lines featured nucleosome depletion at promoters and ordered nucleosome positions upstream and downstream of the promoter in cfDNA (figure 1). By contrast, genes that were not expressed in these cell lines showed significant nucleosome density over the promoters and lack of ordering over gene bodies in cfDNA [12,35]. These observations strongly validated the lymphoid/myeloid origin of cfDNA in healthy humans.

Quantitative scores were developed based on these striking observations to connect nucleosome profiles to gene activity in the cfDNA tissue-of-origin. Snyder et al. showed that stronger periodicity of nucleosomes in the gene body (the region between the transcription start site (TSS) and TSS+5000 bp) correlated with higher gene expression in lymphoid/myeloid tissues for healthy donors [12]. However, in cfDNA from donors with cancer, the correlation of nucleosome periodicity at gene bodies with gene expression of lymphoid/myeloid tissues was much weaker, suggesting other cell types contributing to the cfDNA pool. This was confirmed by the fact that the correlation of periodicity with gene expression of other cell types increased for donors with cancer [12]. Thus, nucleosome periodicity could inform the gene expression of the cfDNA tissue(s)-of-origin.

Ulz et al. noted that overall nucleosome density was lower in 2 kb upstream and downstream of the TSS (which they termed ‘2 K-TSS') and lowest at the promoter itself (usually termed the ‘nucleosome depleted region', NDR) in expressed genes in both MNase-seq of a lymphoblastoid cell line and the cfDNA sequencing data from healthy donors [35]. They developed a model that used the nucleosome occupancy at 2 K-TSS and NDR to predict gene expression. In a person with cancer, tumour-contributed cfDNA would have a depletion of nucleosomes at active genes, but this depletion may be masked by high nucleosome occupancy in cfDNA from lymphoid/myeloid tissues. Ulz et al. circumvented this problem by focusing on regions in the genome that featured copy-number gains in cfDNA of donors with cancer. These genomic regions would have a higher representation in tumour cfDNA if the tumour was the source of these copy number gains. Their simulations suggested that at least 75% of cfDNA at a given TSS must be released from tumour cells to infer expression status using their method. In the regions of copy number gain, they demonstrated significant changes in nucleosome occupancy at 2 K-TSS and NDR that correlated with increased expression of these genes in the tumour as assessed by RNA-seq of matched tumour biopsy samples. Thus, promoter and gene-body depletion of cfDNA fragments are indicative of gene activity in the tissue-of-origin of cfDNA.

5. Subnucleosomes in cell-free DNA inform expression state of genes

Transcription also results in the formation of subnucleosomes proximal to the promoter and identification of subnucleosomes could serve as a proxy for measuring gene activity. In eukaryotic cells, transcription occurs on a chromatin template. Because RNA polymerase II (RNAPII) needs to unwind the DNA strands to make RNA, every protein-DNA contact in its path needs to be disrupted. In vitro, nucleosomes present a substantial barrier to transcription elongation [36,37], resulting in specific stalls by RNAPII at sites of strong histone-DNA contacts. In cells, we can map the positions where RNAPII stalls with base pair-resolution maps of the 3′ ends of nascent transcripts [38–41]. These maps have shown that the first nucleosome downstream of TSS (+1 nucleosome) presents a strong barrier to RNAPII in cells.

The effect of RNAPII stalling on the +1 nucleosome could be understood by mapping intermediate nucleosome states during transcription elongation. With sequencing library protocols that capture all fragment lengths combined with paired-end sequencing, the full spectrum of fragments generated by MNase treatment of nuclei can be uncovered [11,42–44]. Fragments between 50 and 147 bp are too short to be protected by a whole nucleosome, but too long to be TF footprints. These intermediate-length protections represent a discrete loss of contacts asymmetrically from either side of the nucleosome as it unwraps during transcription and remodelling (figure 1). Correlating a base-pair resolution map of RNAPII with the high-resolution distribution of nucleosomal intermediates at the +1 nucleosome revealed that nucleosome unwrapping occurs in a stepwise manner—first, the contacts to the H2A-H2B dimer proximal to the promoter are lost. Then as RNAPII elongates through the nucleosome, contacts to the H2A-H2B dimer distal to the promoter are lost [11]. Cryo-electron microscopy of unwrapped nucleosomes and of RNAPII transcribing nucleosome templates yielded structures that matched the in vivo structures inferred from paired-end sequencing [45–47]. These observations highlight the ability of genomic sequencing of short DNA fragments to detect transcription-dependent substructures of nucleosomes in cells.

cfDNA is highly nicked and nicked DNA is lost in standard double-stranded library preparation protocols. Inspired by methods used in Neanderthal genome sequencing projects, Snyder et al. used a single-stranded library protocol (SSP) that captures both nicked and non-nicked fragments [12,48]. Surprisingly, they found an abundance of fragments less than nucleosomal size, ranging from approximately 40 bp onwards. These short fragments resemble subnucleosomes observed in MNase sequencing data from Drosophila cells [11]. When the subnucleosomal cfDNA fragments were mapped at +1 nucleosome positions of genes expressed in lymphoid/myeloid tissues, the same asymmetric unwrapping intermediates that were seen in Drosophila cells could also be observed in the cfDNA data [11]. Thus, it seems that the long residence time of RNAPII near the +1 nucleosome results in long-lived nucleosomal intermediate states that are mappable both by MNase, and by the endogenous nucleases that give rise to cfDNA.

In Drosophila cells, it was found that the amount of the subnucleosomal particles relative to nucleosomal particles at the +1 nucleosome of a gene correlated with the extent to which that gene was transcribed [11]. Therefore, it was hypothesized that the amount of these intermediates relative to whole nucleosomes (subnucleosome enrichment) at the +1 nucleosome of genes in cfDNA would correlate with the composite expression profile of cells giving rise to cfDNA. Indeed, in a healthy donor, cfDNA subnucleosome enrichment correlated much better with expression profiles of lymphoid/myeloid tissue types compared to other tissue types. However, in donors with cancer, the subnucleosome enrichment corresponding to lymphoid/myeloid tissues weakened substantially, enabling robust differentiation of healthy and cancer plasma cfDNA [11]. Thus, subnucleosomes in cfDNA represent relics of transcription in cfDNA tissues-of-origin that can inform us about the transcriptional programmes active at disease sites.

6. Chromatin structure correlations across megabases

The observation that nucleosome dynamics are reflected by shorter protections in cfDNA can be extended to much larger length scales than single nucleosomes, which allows maximal use of low depth cfDNA sequencing data. Using a machine learning model, Cristiano et al. were able to observe similar ‘fragmentation profiles' at megabase scales between cfDNA from healthy donors and the DNA released by MNase treatment of healthy lymphocytes [49]. Fragmentation profile is a measure similar to subnucleosome enrichment: a ratio of small fragments (100–150 bp) to large cfDNA fragments (151–220 bp). A significant difference in fragmentation profiles was observed between cfDNA from healthy donors and donors with cancer, enabling detection of cancer with low-depth whole-genome cfDNA sequencing. This method works presumably because fragmentation profiles at large length scales differentiate active domains of the chromosome from inactive domains. A tumour is expected to have significantly different active chromatin domains at the megabase scale compared to lymphoid/myeloid tissues.

A similar hypothesis is that the strength of correlation of cfDNA length profiles between genomic loci is proportional to the correlation of contacts made by the loci to the rest of the chromosome [50,51]. The correlation matrix of contact probability for a cell type can be obtained by Hi-C [52]. The Hi-C method involves crosslinking cells, fragmenting the genome into large pieces with the crosslinks intact, and then ligating the cross-linked fragments. Sequencing of the ligated contacts reveals chromatin contacts genome-wide. Regions of the genome with similar activity share similar contact profiles across the chromosome. Liu et al. showed that the strength of correlation of healthy cfDNA length profiles between different genomic regions is proportional to the strength of correlation of Hi-C contacts between different genomic regions as measured in lymphoblastoid cells. Furthermore, this correlation of cfDNA length profiles between different regions of a chromosome could be modelled as a combination of Hi-C maps of different tissue types to uncover tissue contributions to cfDNA in tumour samples with high (greater than 30%) tumour fractions [50]. Thus, cfDNA profiles could be used to infer chromosome contacts in the tissue(s)-of-origin.

7. Distinct cell-free DNA patterns at transcription factor binding sites

TF binding sites at promoters, enhancers, and insulators show enrichment of short protections (less than 50 bp) and depletion of nucleosomes in MNase-seq experiments [11,42,43,53–55]. Binding of many TFs also results in the ordering of nucleosomes around the TF binding sites [56]. Thus, protected TF binding sites represent a discrete class of genomic loci that show distinctive chromatin structural profiles in nuclease-protection assays. The enrichment of short fragments in cfDNA using SSP enabled Snyder et al. to observe similar enrichment of short protections and ordered nucleosome arrays at TF binding sites in cfDNA datasets from healthy plasma. They also observed an enrichment of short fragments in cfDNA at promoters that was proportional to expression levels of a lymphoid cell line, demonstrating the ability to possibly identify transcriptional complexes at a promoter from cfDNA [12].

Ulz et al. found that mononucleosome depletion observed in cfDNA at known TF-binding sites (TFBSs) is a good proxy for inferring TF binding [57]. This enables inference of TF binding from cfDNA sequenced using standard double stranded protocols that do not enrich for short fragments. They selected a set of TFs which are lineage-specific, for example, Androgen Receptor (prostate), Even-Skipped Homeobox 2 (colon), Forkhead box A1 (breast), and for each of them, profiled 1000 binding sites that were concordant across tissue samples [58]. Using plasma from patients with prostate, breast, and colon cancer, they showed that nucleosome depletion levels at binding sites of tumour-specific TFs are significantly higher compared to healthy cohorts and the reverse was true at binding sites for haematopoietic lineage-specific TFs such as LYL1, and EVI1. In addition, they also showed that nucleosome depletion levels at specific TFBSs were predictive of tumour subtypes. For instance, they found a significant reduction in nucleosome depletion at Androgen Receptor-binding sites when comparing samples from before and after a patient's prostate adenocarcinoma became androgen-independent. Thus, TF binding can be inferred from cfDNA either directly through short fragment protections when using SSP or indirectly via nucleosome depletion at TFBSs when using standard double-stranded library protocols and tissue-specific TFBSs enable identification of tissues that contribute to cfDNA.

8. Cell-free DNA fragment length

As we have discussed above, cfDNA length reflects the structure of chromatin in cells of origin and can be explicitly modelled as such. Beyond trying to understand cfDNA profiles in terms of chromatin structure, there have been several useful observations about the use of cfDNA length in biomarker development, for which we can only speculate the underlying molecular details. At its most general, the size of cfDNA fragments is used to delineate between baseline homeostasis and other biological phenomena that give rise to cfDNA. cfDNA has already been directly applied in the clinic for pre-natal testing. In plasma, fetal cfDNA is shorter than maternal cfDNA [13,59,60], which enables detection of aneuploidy or recessive disorders in the fetus non-invasively. Urine cfDNA features short periodic fragments that resemble subnucleosomes [51]. In individuals with cancer, circulating tumour DNA resides in the shorter cfDNA fraction [61–63]. The frequency of shorter, non-tumour cfDNAs is typically low and thus size selection for shorter fragments could dramatically increase the sensitivity of detection for mutant tumour cfDNA. Mutation targeting and identification of copy number variation and/or single copy number alterations is achieved with a higher sensitivity if the shorter cfDNA fraction is profiled [61,62]. This is a consequence of mutant tumour cfDNA being diluted less with non-tumour cfDNA in the shorter fraction. In the light of these observations, one prediction is that shorter cfDNA is a product of a specific mechanism that occurs at a higher frequency in fetal cells, cells that give rise to cfDNA in urine, and cancer cells relative to normal lymphoid/myeloid cell turnover.

9. Cell-free DNA chromatin immunoprecipitation uses cell type-specific genome-wide patterns of histone modifications to identify tissues of origin

Several studies in the early 2000s used ELISA to capture and calculate the concentration of nucleosomes in plasma [20,22–24,64,65]. Specifically, nucleosomes were captured in an antibody ‘sandwich', which used an antibody for a histone and an antibody for double-stranded DNA. This approach was an alternative means to assess cell death using plasma from cancer patients. In a notable study, cancer patients exhibited a statistically significant increase in nucleosome concentration compared to healthy individuals [20]. These findings suggest that nucleosome concentration could act as a biomarker for a disease state and/or discriminate between individuals with disease and those who are healthy. However, this application lacks specific genomic information to identify the tissue-of-origin and specify the disease state.

Beyond differences in overall nucleosome concentrations among physiological states, an intriguing question is if post-translational modifications (PTMs) of nucleosomal histones in plasma can be used to differentiate healthy individuals and those with disease (figure 1). PTMs of nucleosomal histones and DNA differ in euchromatic regions versus heterochromatic regions [66,67]. PTMs range from small chemical groups to larger proteins, such as ubiquitin, that are added to specific amino acids of proteins. For example, tri-methylation of histone 3 lysine 27 (H3K27me3) or H3K9me3 are found at heterochromatic regions where gene activity is repressed or ‘silent'. By contrast, H3K4me3 or H3K36me3 are associated with euchromatic regions where genes are active [68,69].

The link between cancer and chromatin regulation is well documented [70]. The majority of oncogenic mutations occur in genes of chromatin modifiers [71–73], which include a number of tumour-suppressor genes, such as SIRT1 [74]. Genomic instability is a hallmark of cancer and a consequence of alterations in heterochromatin that disrupt silencing [75,76]. Deligezer et al. analysed histone PTMs on circulating nucleosomes in plasma [22]. In light of the evidence for deregulation of repressive PTMs associated with gene silencing in cancer, histone methylation, specifically H3K9me1, was first assessed on circulating nucleosomes from myeloma patient plasma samples. Follow-up papers identified a reduction in repressive chromatin marks (H3K9me3, H4K20me3 and H3K27me3) in patients with colorectal cancer and metastatic prostate cancer [22,23]. However, these studies did not use high-throughput sequencing technology that would provide information on the genomic locations of histone PTMs an aspect that is important to consider when delineating between different physiological states (i.e. disease and non-disease states) that produce changes in PTM levels in circulating nucleosomes.

A recent study sought to assess circulating post-translationally modified nucleosomes in a variety of physiological states using a combination of ChIP and next-generation sequencing [21]. The authors created a workflow to isolate modified circulating nucleosomes from plasma and performed high-throughput sequencing of the associated DNA (cfChIP-seq). Modified nucleosomes containing PTMs associated with active genes or cis-regulatory elements (i.e. enhancers) were specifically targeted: H3K4me1/2/3 and H3K36me3. H3K4me3 cfChIP-seq signal was used as a proxy for active genes and recapitulated earlier findings that cell-type specific gene activation programmes belonged to mainly blood lineages in a healthy individual. Furthermore, the cfChIP-seq signals reflected previously identified ChIP-seq signals for PTMs in these lineages.

This analysis was extended to profile patients with a variety of physiological insults, including surgery and cancer. cfChIP-seq detected differences in gene expression in individuals with cancer compared to healthy individuals. In response to surgical interventions, changes in tissue-specific cfChIP-signatures were detectable in the blood of patients supporting tissue-of-origin specificity. Combinations of PTMs were also assessed for complementary information to further profile tissue-type specific gene activity from non-coding regions. The authors observed changes in H3K4me2 (found at putative enhancer elements) and H3K36me3 (found within the body of transcribed genes) that correlated with differential gene activity unique to cancer tumours. Overall, the ability to identify the tissue-of-origin based on enrichment of circulating nucleosome PTMs associated with transcription at specific regions of the genome is exciting. cfChIP-sequencing provides an additional layer of information about an individual's underlying physiological state derived from the blood.

10. Cell-free DNA methylation patterns identify tissues of origin

Methylation of cytosines in the CpG dinucleotide context is associated with cellular identity [77] and can be identified on cfDNA [78,79]. Deamination of cytosine to uracil using sodium bisulfite or using an antibody specific to CpG methylation to pulldown methylated DNA are the two main methods used to map CpG methylation genome wide. Often promoter hypermethylation of a gene is associated with silencing [79]. DNA methylation profiles are stable and cell-type specific, and differentially methylated regions (DMRs) have been extensively used in quantification of subpopulations in DNA extracted from heterogeneous cell populations [80,81]. Accomando et al. showed that cell-types in leucocytes can be quantified using methylation status at as few as 20 CpG loci [82]. Many genomic locations exhibit highly cell-type specific DNA methylation, which has been instrumental in developing computational methods to delineate the contribution of tissues to cfDNA. CpG islands, a region 300–3000 bp in length and rich in G + C content, are usually hypomethylated, but aberrant methylation of these regions is associated with diseases like cancer [83]. Some CpG islands share high methylation levels across tumour types, while others are tumour specific [84]. Promoter hypermethylation of tumour suppressor genes is especially a striking feature of tumours particularly during carcinogenesis [85,86]. Targeted amplification of candidate promoter regions has been used to find that cfDNA from cancer patients exhibit hypermethylation in contrast to healthy donors [87–89]. Using methylated DNA immunoprecipitation (MeDIP-seq) optimized for low DNA amounts, Shen et al. detected numerous DMRs in cfDNA from individuals with cancer that are specific to tumour tissue-of-origin [90]. In particular, they showed that plasma DMRs in individuals with colorectal cancer closely matched solid tumour-derived DMRs. Thus, tissue-specific methylation patterns and hypermethylation of tumour suppressor genes can be used as signatures to detect cancer as well as identify cfDNA tissue(s)-of-origin [91].

11. Conclusion/future perspectives

The remarkable observation in 1970 that hinted at a relationship between cytoplasmic DNA and the chromatin structure of the cell-of-origin now underlies the conceptual paradigm of cfDNA epigenomics [18]. As there is an increasing awareness of the limitations associated with detecting and identifying diseases, such as cancer, solely based on mutant genotypes, the emergence of cfDNA approaches based on epigenomics provides a valuable alternative (figure 2). Chromatin remodellers, chromatin-modifying complexes and DNA methyltransferases are frequently dysregulated in cancer highlighting the relationship between chromatin regulation and oncogenesis [92–97]. Furthermore, simulations show that using hundreds of features throughout the genome (which would be the case for epigenomic features) improves the limit of detection of disease states from cfDNA compared to a few features (which would be the case when looking for tumour mutations) [49,90]. Combining epigenomic assessments with mutation-based targeting may better guide personalized treatment of cancer patients rather than solely relying on one method.

Figure 2. Clinical applications for cancer patients using epigenomic features of cell-free DNA.

Early detection of cancer still remains a challenge for all cfDNA-based methods. An early cancer diagnosis increases a patient's chance for curative treatment. However, tumour shedding of DNA into the blood is at a lower concentration in early stages than what is observed in advanced cancer stages. Most methods described above have been used on samples of advanced cancer patients indicating a much-needed effort to examine their use and sensitivity for early detection. However, significant in-roads are being made. A recent multi-centre, randomized study used cfDNA methylation analysis to prospectively assess the largest cohort (n = 6689, 4207 healthy and 2482 cancer) to date for early detection and localization of more than 50 types of cancers [91]. With 99.3% specificity, the sensitivity increased as cancer stage increased (Stage I: 18%—Stage IV: 93%), with tissue-of-origin predicting cancer in 96% of samples with an accuracy of 93%. These are promising results for the future development of a cancer screening test for the general population. With improvements over time, cfDNA epigenomic methods described in this review can be combined with current clinical practices to improve detection of cancer in patients.

Obtaining molecular signatures of cancer based on epigenomic features highlights the variety of information contained in the blood of an individual. The studies we reviewed here represent an orthogonal network of disease-specific epigenomic information that is rich for exploitation. Combined with current clinical molecular diagnostics, the timely emergence of epigenomic cfDNA methods herald a chance to revolutionize disease management, specifically cancer. One can envision the successful implementation of these approaches to help guide personalized medicine at different stages of disease beginning with diagnosis and throughout the course of treatment. Beyond biomarkers, diverse epigenomic cfDNA methodologies also provide the opportunity to directly investigate the molecular mechanisms underlying pathology in humans.

For all of their complexity and rich diversity of constituent cellular phenotypes, multicellular organisms can be characterized by a common foundation—their genome. With all of our cells sharing the same genetic code, regulation of gene expression serves as the root of heterogeneity in cellular identity, response, and role. Given all of the information (form, function, development from a single cell, etc.) that must be encoded in the human genome, it is perhaps no surprise that the diploid human genome is very long, spanning 6 billion base pairs. Stretched end-to-end, the DNA in each diploid human somatic cell would measure roughly 2 m however, a need for space-efficient storage of DNA results in its compaction by orders-of-magnitude to fit inside small nuclei less than 10 μm in diameter (Greeley, Crapo, & Vollmer, 1978 Piovesan et al., 2019 ). Despite this compaction, DNA must also be dynamically accessible to allow gene activation, regulation, and replication as the cell grows, divides, and responds to stimuli. These considerations define the two seemingly contradictory challenges of chromatin organization: packaging DNA so that it fits within the cell while retaining sufficient accessibility for processes necessary for cell functionality.

Looking for the forces managing the balance of packaging versus functional accessibility, researchers dove into an exploration of the linear genome in the 1970s. The recombinant DNA revolution heralded the development of new experimental techniques for molecular genetics (e.g., isolating genes for study), and genes were sequenced for the first time (Lis, 2019 ). By the 1980s and 90s, scientists had uncovered a myriad of factors involved with transcriptional initiation and regulation at regulatory motifs proximal to the gene of interest (Lambert et al., 2018 Roeder, 2019 ). As our understanding of the complexity of transcriptional regulation deepened, however, it became apparent that proximal regulatory elements are just one part of a wider regulatory landscape. The subject of the 3D genome and distal regulation of transcription began capturing greater interest as researchers identified regulatory elements termed enhancers thousands to millions of base pairs distal to their target genes (Spitz, 2016 ). For instance, a nuclear ligation assay developed in 1993 probed the rat prolactin gene and reported that the distal enhancer and proximal promoter regions are spatially juxtaposed, an interaction stimulated by estrogen ligand acting upon an estrogen receptor bound to the distal enhancer (Cullen, Kladde, & Seyfred, 1993 Gothard, Hibbard, & Seyfred, 1996 ). Given the central role that gene expression plays in cell phenotype and the onset of disease, unpacking the functional ramifications of these distal genetic interactions holds great promise for advancing our understanding of the genome and has thus become the impetus for the development of chromosome conformation capture technologies. In this review, we chronicle major developments in chromosome conformation capture technology and the biological insights their application has given us, with particular attention given to the recently developed Micro-C method.

Million year old mammoth DNA breaks record

Scientists have just broken the record for oldest DNA ever sequenced - using some mammoth teeth that are over a million years old. That’s so long, in fact, that the DNA inside has mostly broken down into tiny scraps, many of which are indistinguishable now as being from a mammoth. But thanks to some amazing techniques, Love Dalénand his colleagues have not only pieced the scraps together, but also uncovered part of the mammoth family tree, as well as some hints about when they got their woolly fur. Love told Phil Sansom the story.

Love - We managed to recover DNA from three mammoth specimens that are more than 1 million years old. And this is the oldest DNA ever recovered.

Phil - That isn't just the oldest DNA, right? That's by far the oldest DNA!

Love - This is by far the oldest DNA recovered to date. The previous record, if you wish, was approximately 650,000 years old. So we are nearly doubling that record with these mammoths.

Phil - What were the samples in this case?

Love - The samples were teeth - molar teeth - from three different mammoths from three different localities in Siberia. These specimens were actually found in the 1970s by a Russian palaeontologist named Andrei Sher. It actually took until 2017 until we felt that we had the technology and the know-how to do this. We had to use very new methods to identify which of all these billions of short DNA sequences we generated actually came from mammoth.

Phil - And what did you find inside? Because you said that the DNA. you had it in very short pieces?

Love - What we see in the data, once we get it, is the DNA is degraded into extremely small fragments. Normally a chromosome is well above 100 million base pairs or letters long, but in this case, our DNA sequences are on average between 40 and 50 base pairs long. So basically the genome is broken down into many, many millions of small pieces. And we also see that the vast, vast majority of DNA in these samples are not from mammoth they are instead from bacteria there is also DNA from plants in the sediments and there's DNA from humans. Imagine that you have a puzzle with millions or even billions of small pieces that you have to piece together, and all we have to go by is the cover of the box. And in our case, that is the African elephant reference genome. But the problem here is that we don't only have one puzzle instead, we actually have maybe 10 different puzzles, where the pieces have been mixed together. And the challenge here is to find the pieces that only belong to one of those puzzles, and then put them together in the right order.

Phil - Did you manage it though? Did you get the full jigsaw puzzle made for each of these three teeth?

Love - Yes and no. We managed to put together the puzzles as well as could possibly be done, but only for one of these mammoths do we have all the pieces in the puzzle. For the other two, we have partial genomes, but from a sort of population genomic perspective, that doesn't really matter what matters is that we have enough data from each individual so that we can say something about their population history and relationship to other mammoths.

Phil - Right, so were they all very different kinds of mammoths?

Love - Well, one of them to our surprise came from a previously unknown type of mammoth that we didn't know existed. We refer to it as the Krestovka lineage.

Love - Yes. The second specimen was actually what we expected to find our expectation was that this one would be the ancestor of all wooly mammoths, and indeed that's what we find. And the third specimen is also interesting because it's a bit younger, it's about 700,000 years old, and this is actually one of the first known woolly mammoths.

Phil - Do you have any idea what this lost group of mammoths would've looked like?

Love - No, we didn't get enough genomic data to actually say anything about the sort of physical appearance about the Krestovka mammoth. We did get enough data from the other million year old mammoth, that was the ancestor of the woolly mammoth. This one is likely what we refer to as the steppe mammoth. Before this study it has been a bit unknown what they look like, because what we have are bones and teeth and so on from them, and tusks. So we knew they were big - they were much bigger than wooly mammoths - but what we now can see is that they actually appear to have had nearly all of these cold adaptations that we also see in woolly mammoths. They had woolly fur, adaptations for thermoregulation, and they probably also had this gene variant that made them slightly less sensitive to cold temperatures, which we also see in woolly mammoth.

Phil - I keep going back to the 1 million years old figure. Are we going to keep seeing these dates go further and further back? Or is that it, call it a day, 1 million - we're done!

Love - I don't think we're done at all. I mean, we can see in the data that the specimens are close to the limit, but they are not at the limit. So we know based on this data that we can go further back in time. The question is how far. These theoretical models basically say that there is no DNA at all left, the limit there is something like 7 million years. But at that limit the DNA is going to be broken down into very small fragments, like a few base pairs, and we can't work with that, because you can't prove that it comes from the animal in question. So the real question is: how far back in time can we go, and still be able to identify these fragments as being from mammoth or whatever animal we analyse? I think we can go beyond 2 million.

Watch the video: DNA-Verpackung Chromosom, Chromatin, Nukleosom, Histon, DNA, Basen Biologie, Genetik (May 2022).