Information

Why is DNA Shuffling more efficient than Point Mutation?

Why is DNA Shuffling more efficient than Point Mutation?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In a related post on Biology-SE the following insightful comment was made:

The advantage of DNA shuffling over introducing single mutations is that you have to screen fewer mutants and the activity/stability of the protein could be improved several hundred fold more.

Is there a mathematical argument as to why DNA Shuffling should be more efficient than introducing point mutations?

Even supposing each DNA Shuffling / Point Mutation step is equally "expensive" (and in reality DNA shuffling seems much more work) they are both generating one variant per-step, right? So why is the variant in one case grossly better than the other?

Related Post: Directed evolution: Point mutation vs Insertion-Deletion vs Shuffling


You should read this paper. Here is the gist of what you are interested in:

Because most point mutations are deleterious or neutral, the random point mutation rate must be low and the accumulation of beneficial mutations and the evolution of a desired function is relatively slow in such experiments. For example, the evolution of a fucosidase from a galactosidase required five rounds of shuffling and screening before a >10-fold improvement in activity was detected4. Naturally occurring homologous sequences are pre-enriched for 'functional diversity' because deleterious variants have been selected against over billions of years of evolution…

Although shuffling of a single gene creates a library of genes that differ by only a few point mutations1, 2, 3, 4, 5, 6, the block-exchange nature of family shuffling creates chimaeras that differ in many positions. For example, in previous work a single beta-lactamase gene was shuffled for three cycles, yielding only four amino-acid mutations3, whereas a single cycle of family shuffling of the four cephalosporinases resulted in a mutant enzyme which differs by 102 amino acids from the Citrobacter enzyme, by 142 amino acids from the Enterobacter enzyme, by 181 amino acid from the Klebsiella enzyme and by 196 amino acids from the Yersinia enzyme. The increased sequence diversity of the library members obtained by family shuffling results in a 'sparse sampling' of a much greater portion of sequence space15, the theoretical collection of all possible sequences of equal length, ordered by similarity (Fig. 4). Selection from 'sparse libraries' allows rapid identification of the most promising areas within an extended sequence landscape (a multidimensional graph of sequence space versus function)


DNA shuffling would be more efficient if you already have a big repertoire of variants (which have a lot of point differences among them; see Cohen, 2001). When you do not have that, then introducing point mutations using error prone PCR would be better.

A detailed mathematical framework to compare these two methods, is not available. However there are many mutation models and there is at least one model for DNA shuffling too (Sun F., 1999).

Basically, for a point mutation experiment the number of mutants will depend on the error rate of polymerase. In a DNA shuffling experiment you are just creating random combination from existing variants. So, which one is more efficient depends on these (and many more) parameters. Moreover, the "efficiency" also depends on what experiment you are trying.


Anticipatory evolution and DNA shuffling

DNA shuffling has proven to be a powerful technique for the directed evolution of proteins. A mix of theoretical and applied research has now provided insights into how recombination can be guided to more efficiently generate proteins and even organisms with altered functions.

Proteins are machines created by evolution, but it is unclear just how finely evolution has guided their sequence, structure, and function. It is undoubtedly true that individual mutations in a protein affect both its structure and its function and that such mutations can be fixed during evolutionary history, but it is also true that there are other elements of protein sequence that have been acted upon by evolution. For example, the genetic code appears to be laid out so that mutations and errors in translation are minimally damaging to protein structure and function [1]. Could the probability that a beneficial mutation is found and fixed in the population also have been manipulated during the course of evolution, so that the proteins we see today are more capable of change than the proteins that may have been cobbled together following the 'invention' of translation? Have proteins, in fact, evolved to evolve? There is already some evidence that bacteria are equipped to evolve phenotypes that are more capable of further adaptation (reviewed in [2,3,4]). For example, mutator [5] and hyper-recombinogenic [6] strains arise as a result of selection experiments. The development of DNA shuffling (reviewed in [7,8]) and the appearance of several recent papers using this technique [9,10,11] provide us with a surprising new opportunity to ask and answer these fundamental questions at the level of individual genes, and perhaps even genomes.

DNA shuffling, a method for in vitro recombination, was developed as a technique to generate mutant genes that would encode proteins with improved or unique functionality [12,13]. It consists of a three-step process that begins with the enzymatic digestion of genes, yielding smaller fragments of DNA. The small fragments are then allowed to randomly hybridize and are filled in to create longer fragments. Ultimately, any full-length, recombined genes that are recreated are amplified via the polymerase chain reaction. If a series of alleles or mutated genes is used as a starting point for DNA shuffling, the result is a library of recombined genes that can be translated into novel proteins, which can in turn be screened for novel functions. Genes with beneficial mutations can be shuffled further, both to bring together these independent, beneficial mutations in a single gene and to eliminate any deleterious mutations. Although multiple, beneficial mutations could potentially be generated just as well by serial mutagenesis and screening, DNA shuffling is much quicker: for example, the starting population of a library generated by mutagenic PCR typically contains 70-99% nonfunctional variants [14], whereas most variants formed by DNA shuffling are functional. Thus, DNA shuffling should allow a streamlined exploration of sequence space and acquisition of novel protein phenotypes easily, as has indeed proven to be the case for a number of protein targets [15,16].

Beyond biotechnology applications, DNA shuffling can potentially be used to recapitulate natural recombination and to ask whether recombination generally leads to better or novel proteins. In this regard, DNA shuffling can be carried out not only with genes that are closely related alleles, but also with a group of phylogenetically related genes that may differ by up to 40%, a process known as family shuffling [15]. As mentioned above, it was strongly suspected that by starting with a population of genes already known to be functional, family shuffling could move the most beneficial mutations into the same gene and thus quickly optimize or alter protein function. In fact, however, this intuition should hold true only if mutant alleles can generally either act in an additive or synergistic fashion. If mutant alleles are neutral or interfere with each other, then there will be no generic benefit to recombination.

In order to address this hypothesis, Joern et al. [9] have developed a novel technique for mapping recombination events by probe-hybridization analysis. Shuffled libraries were generated by crossing genes for several dioxygenase: toluene dioxygenase, todC1C2 tetrachlorobenzene dioxygenases, tecA1A2 and biphenyl dioxygenase, bphA1A2.Shuffled variants from the three-parent library were screened for toluene dioxygenase activity, and randomly selected variants were sequenced to determine the actual number of crossovers that had occurred to give rise to functional and nonfunctional variants. Unsurprisingly, it was found that crossovers commonly occurred in regions of high homology: although regions that contained ten or more common, identical residues made up less than 10% of the lengths of the genes, over 60% of the crossovers occurred in these regions. Interestingly, it was found that the number of crossover events did not correlate with protein function, suggesting that individual segments of a protein might act independently during evolution [9]. It is also possible that the proteins were so closely related to one another that multiple crossovers did not reduce or alter functionality.

Building on these results, Voigt et al. [10] hypothesized that functional genes derived by DNA shuffling (and perhaps by natural recombination) should preserve clustered sets of structural interactions (the so-called 'schemas') of the original protein (Figure 1a). In order to validate this hypothesis, the authors developed an algorithm that attempted to predict the effect of crossover events at specific sites in a gene. In particular, the algorithm assessed which amino acids were close to one another in both the primary and the tertiary protein structure and predicted which interaction subsets could be manipulated in a way that minimally disrupted protein structure and function. This analysis results in a 'schema profile' for the proteins, which indicates the amount of disruption to the schemas that recombination at each point along the sequence will cause (Figure 1b). Several proteins that had previously been evolved in vitro by family shuffling were evaluated, and the schema profiles of these proteins correlated well with the experimentally determined crossover points [14].

A graphical representation of the relationship between protein structure and schemas. (a) The β-lactamase protein is shown divided into different colored substructures (schemas), which are derived from the schema profile of the protein. (b) An example of a schema profile for a (simpler) hypothetical protein. Peaks correlate with positions in the protein where recombination will be maximally disruptive valleys correlate with positions that are predicted to minimally disrupt the structure and function of the protein. (c) Intron structure may correlate with schema structure. To the extent it is now possible to calculate schema profiles, it can be hypothesized that introns (white) may generally fall at minima while exons (black) may generally contain larger disruption values.

This algorithm was then used to generate schema profiles between two β-lactamases, TEM-1 and PSE-4, which confer ampicillin resistance and share only 40% amino-acid sequence identity. Hybrid enzymes that had varying degrees of recombination between schemas were then constructed, and the recombined variants were transformed into bacteria, which were assayed for ampicillin resistance. The most resistant hybrids contained recombined genes with crossovers that had been predicted in advance to occur between schemas [10].

What is particularly surprising is not that DNA shuffling occurs between domains even a brief observation of the three-dimensional structures of proteins immediately suggests that recombinational breakpoints will probably have the smallest effect on protein function if they occur outside of major structural units found by Voigt et al. [16] (although certain breakpoints between structural subunits, such as in the middle of α helices, would probably not have been predicted without schema profiling). Rather, the amazing thing is that proteins have evolved so that they are by and large composed of structural domains that can undergo recombination. As Voigt et al. [10] point out, Gô [17,18] found a correlation between intron locations and structural domains. This was expanded on by Gilbert and his co-workers [19], who advanced the notion that proteins could be modularly constructed from structural domains as an attempt to explain the origin of introns. Although the 'introns early' hypothesis has long since been shown to be implausible [20,21,22], the original notion that introns could act as buffers for recombination is still intellectually compelling, and it may be consistent with the results of Voigt et al. [10].

Interestingly, to the extent that proteins have evolved as modular machines that are capable of taking advantage of recombination during their evolutionary history, the very mathematical models propagated by Joern et al. [9] and Voigt et al. [10] may be unnecessary. 'Blind' DNA shuffling between closely related proteins may already be more than good enough to generate proteins with novel phenotypes. For example, we have evolved a β-glucuronidase in vitro to switch its substrate specificity from β-glucuronides to β-galactosides and have achieved an over 500-fold increase in activity towards the new substrate [23]. This catalytic conversion was achieved in three rounds of shuffling and screening, but further rounds of selection failed to achieve greater cleavage of β-galactosides. The initial library of this selection was constructed using mutagenic PCR, and a large fraction of the population was inactive, yet the catalytic specificity of the selection produced a switch of over 52-million-fold in substrate preference.

Similarly, new experiments from Zhang et al. [11] provide additional evidence that blind shuffling is fully capable of functional improvement, not just at the protein level, but even at the organismal level. These researchers coupled classical strain improvement (mutation and selection) with genetic recombination. Protoplast fusion results in very efficient recombination between the genomes of Streptomyces species, and iterative protoplast fusion results in the reassortment of multiple markers between species. To show the power of this new method, a Streptomyces strain producing the complex polyketide antibiotic tyiosin was selected for improved function and was then forced to undergo the equivalent of sexual reproduction. The genomes of several surviving mutants were shuffled after every round of selection to generate a combinatorial library of organisms that could again be screened for improved function. A strain generated by only two rounds of shuffling could produce tylosin at a rate comparable to strains that had undergone 20 rounds of classical selection. These results demonstrate that genome shuffling will probably lead to changes and improvements in organismal function as radical as those that have previously been observed for proteins.

Overall, these results further support the idea that evolution can act reflexively - that is, to enhance its own ability to act. From the results of Arnold and co-workers [9,10], it is possible that regions that fall between predicted schema might be conserved in sequence in order to facilitate recombination this hypothesis could be checked directly by database analysis. The application of the techniques described by Arnold and co-workers [9,10], del Cardayré and co-workers [11], and others may allow researchers to more effectively design libraries for screening. A large fraction of the products generated in traditional screening or even shuffling reactions are nonfunctional. Schema profiling and pathway shuffling may eventually make it possible to design directed evolution experiments in which structural and metabolic subunits are preserved, thereby limiting the exploration of sequence space largely to functional molecules. Ultimately, these advances should expand our understanding of natural genetic processes and thereby allow biologists to generate novel proteins and pathways in a fraction of the time that nature or conventional breeding would take.


Background

How do genes with new functions originate? This remains one of the most intriguing open questions in evolutionary genetics. Three principal mechanisms can create genes of novel function: point mutations and small insertions or deletions in existing genes duplication of entire genes or domains within genes, in combination with mutations that cause functional divergence of the duplicates [1–3] and recombination between dissimilar genes to create new recombinant genes (see, for example [4, 5]). We here choose to call only this kind of recombination gene shuffling, excluding, for example, duplication of domains within a gene. In such a gene shuffling event, the parental genes may be either destroyed or preserved [6]. Gene shuffling is clearly the most potent of the three causes of functional innovation because it can generate new genes with a structure drastically different from that of either parental gene. Laboratory evolution studies show that gene shuffling allows new gene functions to arise at rates of orders of magnitudes higher than point mutations [7, 8].

Much is known about rates of point mutations [9] and of gene duplications [10, 11]. In contrast, the rate at which gene shuffling occurs is relatively unexplored, despite the importance of shuffling for functional innovation. To be sure, anecdotal evidence suggests that successful gene shuffling occurs and that it creates genes with new functions [4]. In particular, proteins are often mosaics of domains that are characterized by sequence and structural similarity [12–19]. Many domains occur in multiple proteins of different functions, suggesting that new proteins can arise through the combination of domains of other proteins, a process requiring recombination. In addition, many studies have systematically identified one subclass of gene-recombination events - gene fusions [20–24]. These studies count gene fusion events in a genome of interest relative to multiple, often very distantly related, species. Because fused genes often have similar functions, identification of fusion events can aid in inferring gene functions. Here we address a question that goes beyond the above studies: how frequent is gene shuffling in comparison with other forces of genome change, such as gene duplication? This problem is difficult because of the many possible outcomes of recombination events. These outcomes fall into three principal categories, gene fusions, domain deletions, and domain insertions (Figure 1a). To identify these outcomes systematically on a genomic scale is computationally intensive, which has limited our analyses to a modest number of genomes (Table 1).

Identifying gene shuffling. (a) Gene shuffling and how it changes gene structure. The three scenarios of 'domain insertion' represent insertions of domains from gene 2 into gene 1. The reciprocal insertions (gene 1 into gene 2) are not shown. (b) Distinguishing true from spurious recombination events. In a spurious recombination event, reference genome R1 has two separate genes, where both T and R2 have a single, shuffled gene. The most parsimonious explanation for this observation is that the shuffled gene was present in R1 but was lost since R1's divergence from T.

One can identify gene-shuffling events either from protein sequence information or from information about protein structure. Structure-based approaches [12–15] have the advantage of being able to detect recombination events where sequence similarity between a recombination product and its parents has eroded beyond recognition. However, because two very distantly related structural domains can also have arisen through convergent evolution [25, 26], identifying common ancestry of two domains based on structure alone can be problematic. As a further limitation, structure-based approaches can only identify recombination events that respect the boundaries of protein domains, whereas some successful recombination events may occur within domains [27–29]. In addition, structural information is not available for all genes. For example, the Pfam database of protein domains [30] contains no structural information for more than 40% of proteins in budding yeast (Saccharomyces cerevisiae). Structure-based approaches may thus miss many shuffled genes. Because of these issues we chose a sequence-based approach which allows us to search for shuffling events without making restrictive assumptions regarding their nature. Essentially, our search imposes no restrictions on shuffling except that it must merge in a single gene two protein-coding sequences that were previously a part of two different genes. We thus avoid assuming that shuffling occurs only at domain boundaries or with certain recombination mechanisms without precluding either possibility. Our analysis can also account for gene-duplication events in either parental or recombined genes.

We here identify gene-shuffling events that have occurred in a 'test' species T since its divergence from a reference species R1. A gene in the test genome whose parts match more than one gene in the reference genome is a candidate for a gene-shuffling event that has occurred since the common ancestor of the two genomes. Our analysis also uses a third genome (reference genome R2) to prevent gene fission or gene loss in the reference genome R1 from resulting in spurious identification of gene shuffling events. Because R2 is an outgroup relative to T and R1, it allows us to detect such events in R1 (see Figure 1b). Like any comparative sequence-based approach, our analysis depends on detectable sequence similarity among genes. In other words, our analysis excludes rapidly evolving genes.


Methods

We take as input the amino acid sequences of the parent proteins to be shuffled, aligned to a length of n (amino acids and gaps) based on sequence and/or structure. For simplicity of exposition, we present our methods for the most common case of shuffling two parents, a1 and a2. Our methods readily extend to creating equivalent sites for recombination in multiple parents, and it remains interesting future work to allow for non-uniform shuffling (i.e., where different cross-overs are possible between different pairs of parents).

To optimize the shuffling experiment, we select a codon for each amino acid for each parent, yielding DNA sequences d1 and d2 of length 3n (maintaining gaps for those in the amino acid sequences). To expand the pool of codons being considered at a particular position, we may choose to make an amino acid substitution. Thus we take as additional input a specification of the allowed substitutions for each residue position for each parent, along with a number m of them to make. The allowed substitution specification may be derived from sequence and/or structural analysis of the parents, including general amino acid substitution matrices [18], position-specific amino acid statistics from related proteins [19], and Δ Δ G fold ∘ fold predictions for possible substitutions [20]. The results presented below determine allowed substitutions under the BLOSUM62 substitution matrix, considering only "conservative" substitutions which score no more than 4 worse than wild-type [15].

In describing the algorithms, we use possible codon sets representing the codons allowed at each position in the wild-type and under the allowed substitutions. For position i, set C1[i] contains the possible codons for a1[i], pairing each with an indication of whether or not it requires a substitution, e.g., <( TTT , 0), ( TTC , 0), ( TGG , 1)>for an F that could potentially be mutated to W . Set C2[i] is defined similarly for the second parent. We note that these may readily be used to restrict where to employ mutations (e.g., masking based on structural analysis, as discussed by Moore and Maranas [17]), by allowing only wild-type codons (or amino acids) in some positions.

We consider four types of objective function, targeting common nucleotides (at aligned positions), nearest-neighbor approximation to change in free energy of annealing (from dinucleotide pairs), common nucleotide runs (in contiguous strings), or library diversity (among resulting chimeras). We develop increasingly more complex dynamic programming algorithms to optimize these objectives.

Common nucleotide optimization

In this most basic optimization for DNA shuffling, the goal is to maximize the number of identical nucleotides at common positions:

where I is the indicator function (1 for true, 0 for false).

With no substitutions allowed, each residue position is independent of each other one. Thus we simply select for each position a pair of codons (one for each parent) with a maximal number of common nucleotides. When substitutions are allowed, we need to allocate them for optimal impact. While several approaches are possible, we develop here one based on dynamic programming, to serve as the basis for the more complex objective functions we pursue in subsequent subsections.


Anticipatory evolution and DNA shuffling

DNA shuffling has proven to be a powerful technique for the directed evolution of proteins. A mix of theoretical and applied research has now provided insights into how recombination can be guided to more efficiently generate proteins and even organisms with altered functions.

Proteins are machines created by evolution, but it is unclear just how finely evolution has guided their sequence, structure, and function. It is undoubtedly true that individual mutations in a protein affect both its structure and its function and that such mutations can be fixed during evolutionary history, but it is also true that there are other elements of protein sequence that have been acted upon by evolution. For example, the genetic code appears to be laid out so that mutations and errors in translation are minimally damaging to protein structure and function [1]. Could the probability that a beneficial mutation is found and fixed in the population also have been manipulated during the course of evolution, so that the proteins we see today are more capable of change than the proteins that may have been cobbled together following the 'invention' of translation? Have proteins, in fact, evolved to evolve? There is already some evidence that bacteria are equipped to evolve phenotypes that are more capable of further adaptation (reviewed in [2,3,4]). For example, mutator [5] and hyper-recombinogenic [6] strains arise as a result of selection experiments. The development of DNA shuffling (reviewed in [7,8]) and the appearance of several recent papers using this technique [9,10,11] provide us with a surprising new opportunity to ask and answer these fundamental questions at the level of individual genes, and perhaps even genomes.

DNA shuffling, a method for in vitro recombination, was developed as a technique to generate mutant genes that would encode proteins with improved or unique functionality [12,13]. It consists of a three-step process that begins with the enzymatic digestion of genes, yielding smaller fragments of DNA. The small fragments are then allowed to randomly hybridize and are filled in to create longer fragments. Ultimately, any full-length, recombined genes that are recreated are amplified via the polymerase chain reaction. If a series of alleles or mutated genes is used as a starting point for DNA shuffling, the result is a library of recombined genes that can be translated into novel proteins, which can in turn be screened for novel functions. Genes with beneficial mutations can be shuffled further, both to bring together these independent, beneficial mutations in a single gene and to eliminate any deleterious mutations. Although multiple, beneficial mutations could potentially be generated just as well by serial mutagenesis and screening, DNA shuffling is much quicker: for example, the starting population of a library generated by mutagenic PCR typically contains 70-99% nonfunctional variants [14], whereas most variants formed by DNA shuffling are functional. Thus, DNA shuffling should allow a streamlined exploration of sequence space and acquisition of novel protein phenotypes easily, as has indeed proven to be the case for a number of protein targets [15,16].

Beyond biotechnology applications, DNA shuffling can potentially be used to recapitulate natural recombination and to ask whether recombination generally leads to better or novel proteins. In this regard, DNA shuffling can be carried out not only with genes that are closely related alleles, but also with a group of phylogenetically related genes that may differ by up to 40%, a process known as family shuffling [15]. As mentioned above, it was strongly suspected that by starting with a population of genes already known to be functional, family shuffling could move the most beneficial mutations into the same gene and thus quickly optimize or alter protein function. In fact, however, this intuition should hold true only if mutant alleles can generally either act in an additive or synergistic fashion. If mutant alleles are neutral or interfere with each other, then there will be no generic benefit to recombination.

In order to address this hypothesis, Joern et al. [9] have developed a novel technique for mapping recombination events by probe-hybridization analysis. Shuffled libraries were generated by crossing genes for several dioxygenase: toluene dioxygenase, todC1C2 tetrachlorobenzene dioxygenases, tecA1A2 and biphenyl dioxygenase, bphA1A2.Shuffled variants from the three-parent library were screened for toluene dioxygenase activity, and randomly selected variants were sequenced to determine the actual number of crossovers that had occurred to give rise to functional and nonfunctional variants. Unsurprisingly, it was found that crossovers commonly occurred in regions of high homology: although regions that contained ten or more common, identical residues made up less than 10% of the lengths of the genes, over 60% of the crossovers occurred in these regions. Interestingly, it was found that the number of crossover events did not correlate with protein function, suggesting that individual segments of a protein might act independently during evolution [9]. It is also possible that the proteins were so closely related to one another that multiple crossovers did not reduce or alter functionality.

Building on these results, Voigt et al. [10] hypothesized that functional genes derived by DNA shuffling (and perhaps by natural recombination) should preserve clustered sets of structural interactions (the so-called 'schemas') of the original protein (Figure ​ (Figure1a). 1a ). In order to validate this hypothesis, the authors developed an algorithm that attempted to predict the effect of crossover events at specific sites in a gene. In particular, the algorithm assessed which amino acids were close to one another in both the primary and the tertiary protein structure and predicted which interaction subsets could be manipulated in a way that minimally disrupted protein structure and function. This analysis results in a 'schema profile' for the proteins, which indicates the amount of disruption to the schemas that recombination at each point along the sequence will cause (Figure ​ (Figure1b). 1b ). Several proteins that had previously been evolved in vitro by family shuffling were evaluated, and the schema profiles of these proteins correlated well with the experimentally determined crossover points [14].

A graphical representation of the relationship between protein structure and schemas. (a) The β-lactamase protein is shown divided into different colored substructures (schemas), which are derived from the schema profile of the protein. (b) An example of a schema profile for a (simpler) hypothetical protein. Peaks correlate with positions in the protein where recombination will be maximally disruptive valleys correlate with positions that are predicted to minimally disrupt the structure and function of the protein. (c) Intron structure may correlate with schema structure. To the extent it is now possible to calculate schema profiles, it can be hypothesized that introns (white) may generally fall at minima while exons (black) may generally contain larger disruption values.

This algorithm was then used to generate schema profiles between two β-lactamases, TEM-1 and PSE-4, which confer ampicillin resistance and share only 40% amino-acid sequence identity. Hybrid enzymes that had varying degrees of recombination between schemas were then constructed, and the recombined variants were transformed into bacteria, which were assayed for ampicillin resistance. The most resistant hybrids contained recombined genes with crossovers that had been predicted in advance to occur between schemas [10].

What is particularly surprising is not that DNA shuffling occurs between domains even a brief observation of the three-dimensional structures of proteins immediately suggests that recombinational breakpoints will probably have the smallest effect on protein function if they occur outside of major structural units found by Voigt et al. [16] (although certain breakpoints between structural subunits, such as in the middle of α helices, would probably not have been predicted without schema profiling). Rather, the amazing thing is that proteins have evolved so that they are by and large composed of structural domains that can undergo recombination. As Voigt et al. [10] point out, Gô [17,18] found a correlation between intron locations and structural domains. This was expanded on by Gilbert and his co-workers [19], who advanced the notion that proteins could be modularly constructed from structural domains as an attempt to explain the origin of introns. Although the 'introns early' hypothesis has long since been shown to be implausible [20,21,22], the original notion that introns could act as buffers for recombination is still intellectually compelling, and it may be consistent with the results of Voigt et al. [10].

Interestingly, to the extent that proteins have evolved as modular machines that are capable of taking advantage of recombination during their evolutionary history, the very mathematical models propagated by Joern et al. [9] and Voigt et al. [10] may be unnecessary. 'Blind' DNA shuffling between closely related proteins may already be more than good enough to generate proteins with novel phenotypes. For example, we have evolved a β-glucuronidase in vitro to switch its substrate specificity from β-glucuronides to β-galactosides and have achieved an over 500-fold increase in activity towards the new substrate [23]. This catalytic conversion was achieved in three rounds of shuffling and screening, but further rounds of selection failed to achieve greater cleavage of β-galactosides. The initial library of this selection was constructed using mutagenic PCR, and a large fraction of the population was inactive, yet the catalytic specificity of the selection produced a switch of over 52-million-fold in substrate preference.

Similarly, new experiments from Zhang et al. [11] provide additional evidence that blind shuffling is fully capable of functional improvement, not just at the protein level, but even at the organismal level. These researchers coupled classical strain improvement (mutation and selection) with genetic recombination. Protoplast fusion results in very efficient recombination between the genomes of Streptomyces species, and iterative protoplast fusion results in the reassortment of multiple markers between species. To show the power of this new method, a Streptomyces strain producing the complex polyketide antibiotic tyiosin was selected for improved function and was then forced to undergo the equivalent of sexual reproduction. The genomes of several surviving mutants were shuffled after every round of selection to generate a combinatorial library of organisms that could again be screened for improved function. A strain generated by only two rounds of shuffling could produce tylosin at a rate comparable to strains that had undergone 20 rounds of classical selection. These results demonstrate that genome shuffling will probably lead to changes and improvements in organismal function as radical as those that have previously been observed for proteins.

Overall, these results further support the idea that evolution can act reflexively - that is, to enhance its own ability to act. From the results of Arnold and co-workers [9,10], it is possible that regions that fall between predicted schema might be conserved in sequence in order to facilitate recombination this hypothesis could be checked directly by database analysis. The application of the techniques described by Arnold and co-workers [9,10], del Cardayré and co-workers [11], and others may allow researchers to more effectively design libraries for screening. A large fraction of the products generated in traditional screening or even shuffling reactions are nonfunctional. Schema profiling and pathway shuffling may eventually make it possible to design directed evolution experiments in which structural and metabolic subunits are preserved, thereby limiting the exploration of sequence space largely to functional molecules. Ultimately, these advances should expand our understanding of natural genetic processes and thereby allow biologists to generate novel proteins and pathways in a fraction of the time that nature or conventional breeding would take.


DNA shuffling of a family of genes from diverse species accelerates directed evolution

DNA shuffling is a powerful process for directed evolution, which generates diversity by recombination 1 , 2 , combining useful mutations from individual genes. Libraries of chimaeric genes can be generated by random fragmentation of a pool of related genes, followed by reassembly of the fragments in a self-priming polymerase reaction. Template switching causes crossovers in areas of sequence homology. Our previous studies used single genes and random point mutations as the source of diversity 3,4,5,6 . An alternative source of diversity is naturally occurring homologous genes, which provide ‘functional diversity’. To evaluate whether natural diversity could accelerate the evolution process, we compared the efficiency of obtaining moxalactamase activity from four cephalosporinase genes evolved separately with that from a mixed pool of the four genes. A single cycle of shuffling yielded eightfold improvements from the four separately evolved genes, versus a 270- to 540-fold improvement from the four genes shuffled together, a 50-fold increase per cycle of shuffling. The best clone contained eight segments from three of the four genes as well as 33 amino-acid point mutations. Molecular breeding by shuffling can efficiently mix sequences from different species, unlike traditional breeding techniques. The power of family shuffling may arise from sparse sampling of a larger portion of sequence space.


8 Approaches to Random Mutagenesis

Random mutagenesis is an incredibly powerful tool for altering the properties of enzymes. Imagine, for example, you were studying a G-protein coupled receptor (GPCR) and wanted to create a temperature-sensitive version of the receptor or one that was activated by a different ligand than the wild-type. How could you do this?

Firstly, you would clone the gene encoding the receptor, then randomly introduce mutations into the gene sequence to create a “library” containing thousands of versions of the gene. Each version (or “variant”) of the gene in the library would contain different mutations and so encode receptors with slightly altered amino acid sequences giving them slightly different enzymatic properties than the wild-type.

Next, you could transform the library into a strain where the receptor would be expressed and apply a high throughput screen to pick out variants in the library that have the properties you are looking for. Using a high throughput screen for GPCR activity (see here for examples) you could pick out the variants from the library that were temperature-sensitive or were activated by different ligands.

Sound easy? Well, of course it’s not that easy. Creating a random mutant library that contains enough variants to give you a good chance of obtaining the altered enzyme you desire is a challenge in itself. There are many ways to create random mutant libraries, each with it’s own pros and cons. Here are some of them:

1. Error-prone PCR. This approach uses a “sloppy” version of PCR, in which the polymerase has a fairly high error rate (up to 2%), to amplify the wild-type sequence. The PCR can be made error-prone in various ways including increasing the MgCl2 in the reaction, adding MnCl2 or using unequal concentrations of each nucleotide. Here is a good review of error prone PCR techiques and theory. After amplification, the library of mutant coding sequences must be cloned into a suitable plasmid. The drawback of this approach is that size of the library is limited by the efficiency of the cloning step. Although point mutations are the most common types of mutation in error prone PCR, deletions and frameshift mutations are also possible. There are a number of commercial error-prone PCR kits available, including those from Stratagene and Clontech.

2. Rolling circle error-prone PCR is a variant of error-prone PCR in which wild-type sequence is first cloned into a plasmid, then the whole plasmid is amplified under error-prone conditions. This eliminates the ligation step that limits library size in conventional error-prone PCR but of course the amplification of the whole plasmid is less efficient than amplifying the coding sequence alone. More details can be found here.

3. Mutator strains. In this approach the wild-type sequence is cloned into a plasmid and transformed into a mutator strain, such as Stratagene’s XL1-Red. XL1-red is an E.coli strain whose deficiency in three of the primary DNA repair pathways (mutS, mutD and mutT) causes it to make errors during replicate of it’s DNA, including the cloned plasmid. As a result each copy of the plasmid replicated in this strain has the potential to be different from the wild-type. One advantage of mutator strains is that a wide variety of mutations can be incorporated including substitutions, deletions and frame-shifts. The drawback with this method is that the strain becomes progressively sick as it accumulates more and more mutations in it’s own genome so several steps of growth, plasmid isolation, transformation and re-growth are normally required to obtain a meaningful library.

4. Temporary mutator strains. Temporary mutator strains can be built by over-expressing a mutator allele such as mutD5 (a dominant negative version of mutD) which limits the cell’s ability to repair DNA lesions. By expressing mutD5 from an inducible promoter it is possible to allow the cells to cycle between mutagenic (mutD5 expression on) and normal (mutD5 expression off) periods of growth. The periods of normal growth allow the cells to recover from the mutagenesis, which allows these strains to grow for longer than conventional mutator strains.

If a plasmid with a temperature-sensitive origin of replication is used, the mutagenic plasmid can easily be removed restore normal DNA repair, allowing the mutants to be grown up for analysis/screening. An example of the construction and use of such a strain can be found here. As far as I am aware there are no commercially available temporary mutator strains.

5. Insertion mutagenesis. Finnzymes have a kit that uses a transposon-based system to randomly insert a 15-base pair sequence throughout a sequence of interest, be it an isolated insert or plasmid. This inserts 5 codons into the sequence, allowing any gene with an insertion to be expressed (i.e. no frame-shifts or stop codons are cause). Since the insertion is random, each copy of the sequence will have different insertions, thus creating a library.

6. Ethyl methanesulfonate (EMS) is a chemical mutagen. EMS aklylates guanidine residues, causing them to be incorrectly copied during DNA replication. Since EMS directly chemically modifies DNA, EMS mutagenesis can be carried out either in vivo (i.e. whole-cell mutagenesis) or in vitro. An example of in vitro mutagenesis with EMS in which a PCR-amplified gene was subjected to reaction with EMS before being ligated into a plasmid and transformed can be found here.

7. Nitrous acid is another chemical mutagen. It acts by de-aminating adenine and cytosine residues (although other mechanisms are discussed here) causing transversion point mutations (A/T to G/C and vice versa). An example of a study using nitrosoguanidine mutagenesis can be found here.

Note: I have only mentioned two chemical mutagens but there are many others. Hirokazu Inoue has written an excellent article describing some of them and their use in mutagenesis, see here (pdf).

Another note: Chemical mutagens are, of course… mutagens and therefore should be handled with great care. Be especially careful with EMS as it is volatile at room temperature. Read the MSDS and do a proper risk assessment before carrying out these experiments.


DNA shuffling: Modifying the hand that nature dealt

DNA shuffling is a technique being utilized for in vitro recombination of a single gene or pools of homologous genes. The genes are fragmented into randomly sized pieces, and polymerase chain reaction (PCR) reassembly of full-length genes from the fragments, via self-priming, yields recombination due to PCR template switching. After these PCR products are screened and the interesting products sequenced, improved clones are reshuffled to recombine useful mutations in additive or synergistic ways, in effect mimicking the process of natural sexual recombination. Proteins can be &lsquobred&rsquo with the appropriate individual properties and then their &lsquoprogeny&rsquo screened for the desired combination of traits. DNA shuffling is a powerful tool enabling rapid and directed evolution of new genes, operons and whole viral genomes.

Journal

In Vitro Cellular & Developmental Biology - Plant &ndash Springer Journals


Future perspectives

As structural genomics matures, these versatile screening strategies are starting to bridge the gap between structural biology and cellular biology. For example, split GFP can be used for tracking pathogen effector proteins in host cells [58], mapping cell-cell contacts [59], and viral/cell membrane fusion [60]. Future applications may include tagging membrane proteins on either side of the cellular lumen. Domain screening technologies coupled with deep sequencing will likely play increasingly important roles in antigen generation for phage based antibody development and vaccinology [61]. One can expect hybrid approaches to increasing protein stability of individual proteins and protein complexes combining computational design to create ‘smart’ libraries towards stability or activity (Rosetta3, [62]), in-house microfluidic gene synthesis [63,64] of corresponding constrained diversity DNA libraries, and microfluidic screens or selections for protein stability and activity [65,66].


References

Freeland SJ, Hurst LD: The genetic code is one in a million. J Mol Evol. 1998, 47: 238-248.

Radman M, Matic I, Taddei F: Evolution of evolvability. Ann NY Acad Sci. 1999, 870: 146-155.

Radman M, Taddei F, Matic I: Evolution-driving genes. Res Microbiol. 2000, 151: 91-95. 10.1016/S0923-2508(00)00122-4.

Tenaillon O, Taddei F, Radmian M, Matic I: Second-order selection in bacterial evolution: selection acting on mutation and recombination rates in the course of adaptation. Res Microbiol. 2001, 152: 11-16. 10.1016/S0923-2508(00)01163-3.

Sniegowski PD, Gerrish PJ, Lenski RE: Evolution of high mutation rates in experimental populations of E. coli. Nature. 1997, 387: 703-705. 10.1038/42701.

Guttman DS, Dykhuizen DE: Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science. 1994, 266: 1380-1383.

Farinas ET, Bulter T, Arnold FH: Directed enzyme evolution. Curr Opin Biotechnol. 2001, 12: 545-551. 10.1016/S0958-1669(01)00261-0.

Kolkman JA, Stemmer WP: Directed evolution of proteins by exon shuffling. Nat Biotechnol. 2001, 19: 423-428. 10.1038/88084.

Joern JM, Meinhold P, Arnold FH: Analysis of shuffled gene libraries. J Mol Biol. 2002, 316: 643-656. 10.1006/jmbi.2001.5349.

Voigt CA, Martinez C, Wang ZG, Mayo SL, Arnold FH: Protein building blocks preserved by recombination. Nat Struct Biol. 2002, 9: 553-558.

Zhang YX, Perry K, Vinci VA, Powell K, Stemmer WP, del Cardayré SB: Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature. 2002, 415: 644-646. 10.1038/415644a.

Stemmer WP: DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci USA. 1994, 91: 10747-10751.

Stemmer WP: Rapid evolution of a protein in vitro by DNA shuffling. Nature. 1994, 370: 389-391. 10.1038/370389a0.

Matsumura I, Ellington AD: Mutagenic PCR of protein-coding genes for in vitro evolution. In In Vitro Mutagenesis Protocols. Edited by: Braman J. 2001, Totowa NJ: Humana

Crameri A, Raillard SA, Bermudez E, Stemmer WP: DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature. 1998, 391: 288-291. 10.1038/34663.

Ness JE, Welch M, Giver L, Bueno M, Cherry JR, Borchert TV, Stemmer WP, Minshull J: DNA shuffling of subgenomic sequences of subtilisin. Nat Biotechnol. 1999, 17: 893-896. 10.1038/12884.

Gô M: Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature. 1981, 291: 90-92.

Gô M: Modular structural units, exons, and function in chicken lysozyme. Proc Natl Acad Sci USA. 1983, 80: 1964-1968.

Gilbert W, Glynias M: On the ancient nature of introns. Gene. 1993, 135: 137-144. 10.1016/0378-1119(93)90058-B.

Palmer JD, Logsdon JM: The recent origins of introns. Curr Opin Genet Dev. 1991, 1: 470-477.

Cavalier-Smith T: Intron phylogeny: a new hypothesis. Trends Genet. 1991, 7: 145-148.

Rogers JH: The role of introns in evolution. FEBS Lett. 1990, 268: 339-343. 10.1016/0014-5793(90)81282-S.

Matsumura I, Ellington AD: In vitro evolution of beta-glucuronidase into a beta-galactosidase proceeds through non-specific intermediates. J Mol Biol. 2001, 305: 331-339. 10.1006/jmbi.2000.4259.


Evolution of a cytokine using DNA family shuffling

DNA shuffling of a family of over 20 human interferon-α (Hu-IFN-α) genes was used to derive variants with increased antiviral and antiproliferation activities in murine cells. A clone with 135,000-fold improved specific activity over Hu-IFN-α2a was obtained in the first cycle of shuffling. After a second cycle of selective shuffling, the most active clone was improved 285,000-fold relative to Hu-IFN-α2a and 185-fold relative to Hu-IFN-α1. Remarkably, the three most active clones were more active than the native murine IFN-αs. These chimeras are derived from up to five parental genes but contained no random point mutations. These results demonstrate that diverse cytokine gene families can be used as starting material to rapidly evolve cytokines that are more active, or have superior selectivity profiles, than native cytokine genes.


Watch the video: DNA shuffling (August 2022).