# How does Genetic drift and selection affect fixation of an allele?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I want to know how does genetic drift, and selection coefficient (s) together affect the fixation of an allele? For example, if there is a neutral allele (s=0), will only genetic drift cause the fixation or loss of an allele? Similarly, if an allele is beneficial which means that s > 0. In that case, how do genetic drift and selection coefficient together determine the fixation and loss of an allele? And what happens when s < 0 that is an allele is deleterious. Lastly, how do genetic drift and selection coefficient gets affected by the population size? I am reading about population genetics, and I am totally confused by this point. I would appreciate if someone can clear this to me.

It would take a lot of writing to fully answer your question. Below I give an answer without spending too much time on the underlying math assuming you are confortable with probability theory and Taylor series. If you really want to understand more of all of that you would just need to consider reading a book in population genetics (see end of the post for recommendation).

# Probability of fixation under drift only

In absence of selection, mutation and migration, drift only may yield an allele to eventually reach fixation (or loss). The probability of an allele to get fixed is then just equal to its frequency $$P_{fix}=p$$. When the neutral mutation has just arisen, this probability is therefore $$P_{fix}=p=frac{1}{2N}$$ for a diploid population of size $$N$$.

Drift and population size - Intuition

For some intuition about the effect of population size on drift you might want to have a look at Why is the strength of genetic drift inversely proportional to the population size?.

Drift and population size - Effective population size

To go a little further than the linked post, note that the the strength of genetic drift is defined by what we call the effective population size $$Ne$$. But before talking about that let's talk about the variance in possible allele frequencies in the next generation (which we'll call $$var(p')$$).

The number of alleles $$A$$ in the next generation follows a binomial distribution (if we follow Wright-Fisher model). The variance of this binomial distribution is $$2 p N (1-p)$$, where $$p$$ is the frequency of the allele $$A$$ in the previous generation. Dividing this variance by $$(2N)^2$$ (the square is caused by the fact that $$var(k X) = k^2 var(X)$$, where $$X$$ is a random variable), to get the variance in allele frequency in the next generation

$$var(p') = frac{p(1-p)}{2N}$$

We define the strength of genetic drift as $$var(p')$$ and for intuition, we usually refers to it as the value of $$N$$ that corresponds and to avoid confusion with the actual population size we call this new measure $$Ne$$. Therefore,

$$var(p') = frac{p(1-p)}{2Ne}$$

Solving for $$Ne$$ yields to

$$Ne = frac{p(1-p)}{2var(p')}$$

and this effective population size $$Ne$$ is what we use to talk about strength of genetic drift.

Probability of not reproducing

As I will refer to this statistic later, I will introduce it here. What is the probability of a given individual to not reproduce? Let's call this quantity $$P(k=0)$$ as the probability that the number of offspring $$k$$ is $$0$$.

Imagine you sample $$2N$$ haplotypes (with replacement) to form the next generation and each of them has a probability $$1-frac{1}{2N}$$ to be chosen, the the probability of not reproducing is $$p(k=0) = left(1-frac{1}{2N} ight)^{2N}$$. Taking the Taylor series of this expression for large values of $$N$$ yield to the approximation $$p(k=0) = left(1-frac{1}{2N} ight)^{2N} ≈ e^{-1} ≈ 0.37$$. So, any individual has a probability greater than a third to leave any offspring in the following generation.

# Probability of fixation under both drift and selection

If an allele if beneficial, then its probability of fixation is greater than its frequency $$p$$ (and vice-versa if deleterious).

Approximation

There are a number of ways to derive such probability and they all end up with some complicated form. An easy approximation (which I am not going to make any demonstration) for this probability is

$$P_{fix}≈frac{1-e^{-4Ns} }{1-e^{-4Nsp}}$$

Just to give you some intuition about the form of the above approximation (but without doing a full demonstration that would be quite long), consider that the number of offspring an individual have follows a Poisson distribution with fitness as the rate. The fitness of the mutant individual is $$1+s$$, and therefore the probability of this individual of having $$k$$ offspring is

$$P(k) = frac{(1+s)^k e^{-1-s}}{k!}$$

Think of the generation where the mutation just arised and only one individual has this mutation. The probability of this mutation to disappear immediately is therefore

$$P(0) = e^{-1-s}$$

, which for reasonable values of $$s$$ is not very different from the probability of not reproducing when $$s=0$$ and the probability become $$e^{-1}$$ (as already shown above via an other mean).

A less accurate approximation (but with a very easy form) for the probability of fixation is

$$P_{fix} = 2Ns$$

Some more calculations about the interplay of drift and selection can be found in this post.

# Source of information

The post Book recommendation on population/evolutionary genetics? will give you great sources of information to learn more on the subject. It will also re-explain what I just said with a lower pace. I particularly recommend Population Genetics: A Concise Guide

## Genetic Drift vs. Gene Flow vs. Natural Selection

Genetic drift, gene flow, and natural selection may sound similar or even confusing to some. All three are mechanisms in the evolutionary process that have to do with alleles and/or gametes, but there are several significant differences.

Discussions about genes and natural selection usually include the term allele. An allele is just one version of a gene found at the same place (locus) on a chromosome. An example of an allele is the color of a bird’s feathers. In sexually reproducing organisms, alleles occur in pairs because the offspring receive one from each parent.

## Drift and Selection

The Hardy-Weinberg equation describes allele frequencies in populations. It predicts the future genetic structure of a population the way that Punnett Squares predict the results of an individual cross. The equation calculates allele frequencies in non-evolving populations. It is based on the observation that in the absence of evolution, allele frequencies in large randomly breeding populations remain stable from generation to generation.

In real populations, evolution does occur and allele frequencies vary over time. This divergence between real, evolving populations and theoretical, non-evolving populations allows the Hardy-Weinberg equation to be used to explore the effect of evolution on populations. Two major factors that cause real populations to diverge from the equilibrium predicted by the Hardy-Weinberg equilibrium are genetic drift and natural selection.

The following illustration shows changes in actual allele frequencies over time compared to the stable structure predicted by the Hardy-Weinberg equation.

Genetic drift is the random variation that results in specific individuals producing more or less offspring than predicted by chance alone. This is most pronounced in small populations and is a major reason real allele frequencies do not remain at Hardy-Weinberg equilibrium values. Genetic drift is random and as such does not result in populations becoming more adapted to their environment.:

Natural selection increases the frequency of a favored allele over another and can cause significant departures from Hardy-Weinberg equilibrium.

Assuming a trait controlled by two alleles where p is the frequency of one allele and q is the frequency of the other allele, the sum of the frequencies must equal 1:

Given p and q, the Hardy-Weinberg equation is:

• p 2 equals the proportion of the population that is homozygous for allele 1
• q 2 equals the proportion of the population that is homozygous for allele 2
• 2pq is the proportion heterozygotes in the population.
1. No mutations: changes in allele frequencies are not changing due to mutations.
2. No natural selection - All genotypes have the same reproductive success.
3. The population is infinitely large
4. Mating is completely random
5. No migration - There is no flow of genes in or out of the population due to migration.
6. All individuals produce the same number of offspring.
7. Generations are non-overlapping

While real populations don’t maintain the stable allele frequencies predicted by the Hardy-Weinberg equilibrium, the equation can be used to determine the rates and types of evolutionary change and the types of changes occurring in a population.

Exploration of population dynamics using Hardy-Weinberg frequencies revels many patterns. For example, the Hardy-Weinberg equation shows how poorly represented alleles persist in populations and the role heterozygotes play in producing individuals with deleterious, homozygous recessive traits.

## Are genetic drift and inbreeding the same thing?

Does it ever happen to you that the more you try to understand something, the more difficult to understand it turns out to be? Recently, I’ve had such a problem with two of the very basic microevolutionary phenomena – genetic drift and inbreeding.
Genetic drift and inbreeding are associated with changes in allele frequencies and heterozygosity, and are particularly important in small populations. Their causes and effects are so intertwined that I ended up asking “Are genetic drift and inbreeding the same thing?”

The problem is that at small population sizes, the combined effect of genetic drift and inbreeding leads to increased homozygosity and fixation of alleles, including deleterious alleles. Sometimes these processes are described as independent forces operating at the same level, while elsewhere inbreeding tends to be addressed as a result of genetic drift.
I realised that despite using these terms in everyday scientific discussions, writing, and when presenting my research, I wasn’t able to properly address them. Feeling like a very poor PhD student, I turned to the best friend of every desperate [fill in your occupation], i.e. I asked Google.
I’m not sure if it made me feel relieved or even more concerned when I found out that the relationship of genetic drift and inbreeding was something that already (two of) the fathers of the modern evolutionary synthesis couldn’t agree upon.

R. A. Fisher. Wikimedia Commons/Flickr Commons

Sewall Wright (Barton 2016, Genetics)

The turbulent relationship of genetic drift and inbreeding (as well as the relationship of R. A. Fisher and Sewall Wright) has been evolving through decades. It has started in the 1920s and 1930s with Wright’s definition of the inbreeding coefficient (Wright 1922), Wright-Fisher’s model of binomial sampling in a finite population (Fisher 1930, Wright 1931) and introduction of the concept of effective population size (Ne Wright 1931, 1939).

“[Effective population size] is the size of an idealized population with the same gene frequency drift or inbreeding as the observed population. An idealized population is panmictic with each parent having an equal expectation of progeny.” (Crow 2010).

Aged 94, a year before his death, James F. Crow wrote a perspective article titled “Wright and Fisher on inbreeding and random drift” (Crow 2010). On three pages, Crow managed to explain both of the giants of theoretical population genetics, and provided a simple explanation to the complex problem.

“Fisher did not consider the irregular consanguineous matings that occur, especially in animal pedigrees, and for which Wright’s inbreeding algorithm is especially useful. I doubt, however, that this would have changed Fisher’s opinion. He clearly thought that consanguineous mating within a large population, whether systematic or not, was quite different from increased fixation due to small population size.” (Crow 2010)

“Wright was particularly pleased that his F statistics could be used to measure random drift as well as consanguineous mating…. Since he could use the same formulas for both inbreeding and random drift, Wright naturally thought of these as two sides of the same coin.” (Crow 2010)

And not surprisingly, Crow found the answer to the problem in the effective population size. He realised that there is not just one way of defining Ne and in association with genetic drift and inbreeding described variance (NeV) and inbreeding (NeI) effective sizes, respectively (Crow 1954). These were later even further developed to consider separate sexes and selection (Crow & Denniston 1988, Caballero 1994).

I won’t go into details, but what these formulas clearly show is that NeI and NeV are not the same thing, in which case R. A. Fisher is the winner, voilà!
…unless the population size remains constant, and then they can be both reduced to:

and then Wright is the winner. Touché!
In reality, the conditions of constant population size are much less common, so if I were to choose the winner, it would be Fisher. Or maybe Crow.
References
Caballero, A., 1994. Developments in the prediction of effective population size. Heredity 73, 657–679.
Crow JF (2010) Wright and Fisher on Inbreeding and Random Drift. Genetics, 184(3), 609-611. doi: 10.1534/genetics.109.110023
Crow JF & Denniston C (1988) Inbreeding and variance population numbers. Evolution 42, 482–495.
Fisher RA (1930) The Genetical Theory of Natural Selection. Clarendon Press, Oxford.
Wright S (1922) Coefficients of inbreeding and relationship. Am. Nat. 56, 330–338.
Wright S (1931) Evolution in Mendelian Populations. Genetics 16, 97–159.
Wright S (1939) Statistical Genetics in Relation to Evolution (Exposes de Biometrie et de Statistique Biologique, Vol. 13) Hermann & Cie, Paris.

## Adaptive fixation in two-locus models of stabilizing selection and genetic drift

The relationship between quantitative genetics and population genetics has been studied for nearly a century, almost since the existence of these two disciplines. Here we ask to what extent quantitative genetic models in which selection is assumed to operate on a polygenic trait predict adaptive fixations that may lead to footprints in the genome (selective sweeps). We study two-locus models of stabilizing selection (with and without genetic drift) by simulations and analytically. For symmetric viability selection we find that ∼16% of the trajectories may lead to fixation if the initial allele frequencies are sampled from the neutral site-frequency spectrum and the effect sizes are uniformly distributed. However, if the population is preadapted when it undergoes an environmental change (i.e., sits in one of the equilibria of the model), the fixation probability decreases dramatically. In other two-locus models with general viabilities or an optimum shift, the proportion of adaptive fixations may increase to >24%. Similarly, genetic drift leads to a higher probability of fixation. The predictions of alternative quantitative genetics models, initial conditions, and effect-size distributions are also discussed.

Keywords: pleiotropy polygenic adaptation stabilizing selection two-locus model.

### Figures

Quality of the QLE approximation.…

Quality of the QLE approximation. (A) Logarithm of the mean relative error for…

Distribution of the maximum eigenvalues…

Distribution of the maximum eigenvalues in the equilibrium states (see Table 2) obtained…

Mean speed of adaptation for the constant-environment model. Mean speed is measured by…

## Results

### Experimental results

We begin by reporting our measurements of the average fraction of each strain, the two-point correlation functions between strains, and the relative rates of annihilations and coalescences as a function of length expanded for our four competing strains of E. coli. As discussed in the Materials and Methods, we found that our eCFP and eYFP strains had the fastest expansion velocities followed by the black strain and finally the mCherry strain (see Table 1). We expected that our experimental measurements would reflect this hierarchy of speeds faster expanding strains should have a larger fitness than slower expanding ones. To illustrate the presence of selection, we used neutral theory (discussed in detail in S1 Appendix) as a null expectation selection caused deviations from the neutral predictions. To calibrate neutral theory to our experiments we fit R0 and Dw, two model parameters illustrated in Fig 1, following the procedures discussed in the Materials and Methods. The fit values of R0 and Dw can be seen in Table 2. In later sections, we show how to predict the average fraction, two-point correlation functions, and relative rates of annihilation and coalescences using our random-walk model and simulation.

was the fitness of each strain relative to mCherry in liquid culture with respect to their basal growth rates gi and gR. The radial expansion velocity fitness siR did not match the well-mixed liquid-culture fitness . However, every strain in liquid culture still grew faster than mCherry. Interestingly, the black strain grew faster than the eCFP and eYFP strains in liquid culture while on agar, the eCFP and eYFP strains expanded faster than the black strain. See the Materials and methods for additional information.

We experimentally measured R0, Dw, and using the procedures outlined in the Materials and methods so that we could compare experimental results with our model’s predictions.

#### Average fractions.

The average fraction of strain i at a length expanded of L = RR0 is defined as (1) where fi(ϕ, L) is the local fraction of strain i at angle ϕ and length L (i.e. at a pixel specified in polar coordinates by ϕ and L). The angular brackets represent an average over many range expansions and fi is normalized such that ∑i fi(ϕ, L) = 1 for each location in the colony as discussed in the Image Analysis section. In the neutral case, the average fraction of each strain should equal their inoculated fractions and should be independent of length expanded. Selection forces the average fractions of less fit strains to decrease.

We measured the average fraction versus radial length expanded in two separate sets of experiments where we inoculated different fractions of our eYFP, eCFP, and mCherry strains. In one experiment, we inoculated the eYFP, eCFP, and mCherry strains with equal initial fractions of 33% while in the other we inoculated 80% of the mCherry strain and 10% each of the eCFP and eYFP strains. We conducted 20 replicates in each case and calculated the average fraction of each strain using our image analysis package. Fig 2 displays the trajectories of the 20 expansions and the mean trajectory (the average fraction) as ternary composition diagrams for both sets of initial conditions [37].

The red dot indicates the composition at the radius R0 = 3.50 mm where distinct domain walls form and the blue dot indicates the composition at the end of the experiment. The red dots are dispersed about the initial inoculated fractions due to the stochastic dynamics at the early stages of the range expansions when R < R0. The highly stochastic trajectories illustrate the importance of genetic drift at the frontier in the E. coli range expansions. The smaller ternary diagrams display the average fraction over all expansions vs. length expanded for each set of experiments. For both initial conditions, we see a small systematic drift away from the mCherry vertex indicating that the mCherry strain has a lower fitness, in agreement with the independent radial expansion velocities of each strain (see Table 1). Note that two replicates on the right resulted in the complete extinction of eCFP due to strong spatial diffusion, indicated by the trajectories pinned on the absorbing line connecting the eYFP and mCherry vertices.

In both sets of experiments, we observed a systematic drift away from the mCherry vertex as a function of radius as illustrated by the mean trajectories shown as insets. We witnessed two cases where the 10% initial inoculant of the eCFP strain became extinct, represented by the pinning of trajectories to the absorbing boundary connecting the eYFP and mCherry vertex, a consequence of the strong genetic drift at the frontiers of our E. coli range expansions. These measurements indicate that the mCherry strain was less fit than the eCFP and eYFP strains, consistent with the order of the radial expansion velocities.

#### Two-point correlation functions.

Next, we measured the two-point correlation functions given by (2) where fi(ϕ, L) is again the local fraction of strain i at angle ϕ and expansion length L. Fij gives us the probability that strain i is located at an angular distance of ϕ away from strain j at a length expanded L. Note that Fij = Fji and Fij(ϕ) = Fij(−ϕ). Although the average fraction is constant in the neutral case, the two-point correlation functions broaden due to the coarsening of genetic domains [2]. Neutral q-color Voter models analytically predict the form of the two-point correlation functions [2] as seen in equation (S1.3) in S1 Appendix.

Deviations from neutral predictions are caused by selection. Analytical results describing these deviations are not available for reasons discussed in the S1 Appendix (the hierarchy of moments does not close) numerical simulations must be used to calculate the precise shape of the correlation functions as seen in second half of our Results section. Regardless, selection-induced deviations can be understood in the limit of both large and small angular separations. For large angular separations, spatial correlations will be negligible the two-point correlation functions will consequently factorize and plateau at the value Fij = FiFj where Fi is the average fraction at length L from above. Therefore, in neutrality, the two-point correlation functions Fij should plateau at , the product of the initial fractions inoculated of strains i and j (in neutrality, Fi does not change). Selection can thus be identified by comparing the experimentally measured plateau value to the neutral prediction value. Furthermore, in the limit of zero angular separation, it is known that ∂ϕFij measures the density of ij domain walls [2] (where ij). In general, if strain i is less fit than the other strains, it will have fewer domain walls, decreasing the domain-wall density and thus the slope near ϕ = 0.

We measured the correlation functions between each pair of strains in three sets of experiments where we inoculated equal well-mixed fractions of the eCFP, eYFP, and black strains, then eCFP, eYFP, and mCherry, and then finally all four strains. We conducted 20 replicates of each experiment, measured all two-point correlation functions at the final radius of R = 10 mm corresponding to a length expanded of L = RR0 = 6.5 mm, and averaged the results. In Fig 3, we plotted the neutral correlation function prediction and compared it to the experimentally measured correlation functions.

The shaded regions in these plots indicate standard errors of the mean. Using the measured diffusion coefficient Dw and initial radius where domain walls form R0 (see Table 2), we also plot the theoretical neutral two-point correlation functions (black dashed line see eq. (S1.3)). The colors of each plotted correlation function were chosen to correspond to their composite strain colors for example, two-point correlation correlation functions associated with mCherry were red or were blended with red. The subscripts correspond to the color of each strain: C = eCFP, Y = eYFP, R = mCherry, and B = Black. As judged by the magnitude of the deviation from neutral predictions, the black strain has a small selective disadvantage relative to eCFP and eYFP and the mCherry strain has an even greater disadvantage, in agreement with the independent radial expansion velocities of each strain (see Table 1).

The two-point correlation functions in the experiment between eCFP, eYFP, and the black strains (first column of Fig 3) are consistent with the order of radial expansion velocities (see Table 1). The correlation between the eCFP and eYFP strains plateaued at a higher value than the neutral prediction while the correlation between eCFP and black plateaued at a lower value, indicating that the eCFP and eYFP strains were more fit. The self-correlation for the black strain, FBB, also plateaued at a value below eCFP, eYFP, and the neutral prediction, further indicating that it had a smaller fitness. The self-correlation data was more noisy than the correlation between strains, however we consistently found that correlations between strains were better at detecting fitness differences than self-correlations.

In contrast, combining eCFP, eYFP, and mCherry in one set of experiments and all four strains in another revealed that mCherry had a larger fitness defect. Correlation functions including mCherry always plateaued at a significantly smaller value than correlation functions excluding it. Furthermore, off-diagonal (bottom-row of Fig 3) correlation functions involving the mCherry strain had a smaller slope at zero angular separation, indicating that less mCherry domain walls were present and that the mCherry strain was less fit than the others. The two-point correlation functions were thus consistent with the black strain having a small selective disadvantage relative to eCFP and eYFP and the mCherry strain having a larger disadvantage relative to all others.

#### Annihilation asymmetry.

The last quantity we measured was the relative rate of annihilations and coalescences per domain wall collision examples of annihilations and coalescences can be seen on the left side of Fig 1. Many theoretical results exist describing the neutral dynamics of annihilations and coalescences and they are summarized in S1 Appendix. To succinctly quantify the difference between the annihilation and coalescence probabilities per wall collision, we define the “annihilation asymmetry” ΔP(L) = PA(L) − PC(L) as the difference in probability of obtaining an annihilation versus a coalescence per collision at a distance expanded of L. If q neutral colors are inoculated in equal fractions, it can be shown that (3) Note that in neutrality, the annihilation asymmetry ΔP is independent of the length expanded L it depends only on the number of strains q inoculated in equal fractions. In the presence of selection, however, less fit strains should be squeezed out as the length expanded L increases, forcing q and thus ΔP to change.

To gain insight into the behavior of ΔP, for the case of q neutral colors in equal proportions, we have limq→∞ ΔP(q) = −1 (only coalescences), ΔP(q = 3) = 0 (equal numbers of annihilations and coalescences), and ΔP(q = 2) = 1 (only annihilations). The quantity ΔP thus provides a simple way to characterize the annihilation/coalescence difference in a single curve that varies smoothly between −1 and 1 as 2 ≤ q < ∞. In S1 Appendix we develop and discuss the case when strains are inoculated in non-equal proportions (see supplementary equations (S1.8)–(S1.10)) in that scenario, it is useful to define a “fractional q” by inverting eq (3) to read q = (3 + ΔP)/(1 + ΔP) (i.e. a fractional q can be evaluated for a given ΔP).

To experimentally quantify the annihilation asymmetry, we examined the average cumulative difference in annihilations and coalescences vs. the average cumulative number of domain wall collisions as colonies expanded ΔP is given by the slope of this quantity and can be seen in Fig 4 (see Supplementary S1 Fig for a display of cumulative count vs. length expanded). Regardless of which strains were inoculated and their selective differences, our results were consistent with the neutral theory prediction in eq (3) for q = 2, q = 3, and q = 4 as judged by the overlap of the black dashed line with the shaded standard error of the mean in each case. ΔP appeared to be constant as a function of length. We also tested an initial condition where we inoculated strains in unequal proportions: we inoculated 10% of eCFP and eYFP and 80% of mCherry. This experiment again matched the neutral prediction of ΔP ≈ 0.51 (and correspondingly q ≈ 2.33) within error. Evidentally, as discussed in more detail below, certain observables like the average fraction and two-point correlation functions show stronger signatures of selection than others like the annihilation asymmetry.

The slope of this plot gives the annihilation asymmetry ΔP. The shaded regions represent the standard error of the mean between many experiments. We use the notation C = eCFP, Y = eYFP, B = black, and R = mCherry. Despite the presence of selection, ΔP was consistent with the standard neutral theory prediction of eq (3) for q = 2, q = 3, and q = 4 (equal initial fractions of q strains), as judged by the overlap of the black dashed lines with the shaded areas in every case. We also explored an initial condition where we inoculated unequal fractions of three strains we inoculated 10% of both eCFP and eYFP and 80% of mCherry. Our experiments agreed with the prediction of ΔP ≈ 0.51, or an effective q ≈ 2.33, from the neutral theory developed in supplementary equations (S1.8)–(S1.10).

### Simulation results

In this section, we introduce three key combinations of our random walk model’s input parameters R0, Dw, and (see Fig 1) that control the evolutionary dynamics of our four competing E. coli strains. Using simulation, we show that we can utilize these key combinations to collapse the simulated evolutionary dynamics (focusing on the experimental quantities we measured above: the average fraction, two-point correlation function, and annihilation asymmetry) of an arbitrary number of competing strains in a range expansion.

#### Key parameters.

What key combinations of the variables seen on the right side of Fig 1 govern the evolutionary dynamics of our competing strains? Our goal is to describe the dynamics as a function of length expanded by our colonies L = RR0 with R0 the initial radius where domain walls form, the domain wall diffusion coefficient per length expanded Dw (units of length), and all wall velocities per length expanded (dimensionless). The two-point correlation functions must include an additional independent variable: the angular distance ϕ between strains.

Investigating the width of a single sector of a more fit allele sweeping through a less fit allele, as illustrated on the right of Fig 1, reveals important parameter combinations (see S1 Appendix for additional details). In a linear expansion, the deterministic, selection-induced growth of a sector of genotype i sweeping through a less fit genotype j will scale as while its diffusive growth will scale as . At short lengths expanded, diffusion will thus dominate deterministic growth, and at larger lengths selection will dominate diffusion. A crossover expansion length [2, 9, 32] beyond which selection dominates follows by equating the deterministic and diffusive growth, (4) The factor of 2 in front of and 4 in front of Dw arises because we are monitoring the distance between two domain walls (i.e. a sector) similar arguments can be applied to describe the motion of individual walls. It is worth noting that the chirality of sector boundaries reported in E. coli range expansions [18, 36] would result in a wall velocity pointing in the same direction (left or right) for every domain wall. We can ignore this constant bias in our models because sectors will still expand at the same rate despite an additional superposition of all domain walls moving in a specific direction. To avoid complications arising from chirality in this paper, we focus on quantifying the growth of sectors, i.e. the distance between two domain walls, as opposed to tracking the motion of an individual domain wall whenever possible. is the characteristic length that the colony must expand in order for selection to dominate over diffusion for strain i sweeping through strain j and acts as the first key parameter.

Upon repeating this argument for domains on a radially inflating ring (see S1 Appendix), we identify [32, 38] as the inflationary analog of : the expansion length beyond which selection dominates over diffusion, and find (5) κij is a dimensionless prefactor that can be thought of as an “inflationary selective advantage” controlling the expansion length at which selection dominates over diffusion and is given by (6) Fig 5 illustrates the importance of κij it displays the ratio of the inflationary to the linear selection length scale as a function of κij from the numerical solution of eq (5). We find that the ratio of the length scales has the asymptotic behavior (7) Thus, if κij ≫ 1, inflation can be ignored (relative to selection and genetic drift), and the inflating selection length scale approaches the linear selection length scale. In contrast, if κij ≪ 1, the inflationary selection length will be many times larger than the linear selection length scale [32]. As κij becomes smaller, inflation and genetic drift dominate over selection for a larger length expanded. κij is the second key parameter describing the dynamics of our system. Note that in contrast to a linear expansion which just features competition between genetic drift and selection (captured by the quantity ), a radial inflation has three, separate, competing effects: genetic drift, selection, and inflation. κij quantifies the strength of selection relative to inflation and diffusion.

If κ ≳ 1, inflation does not appreciably slow selective sweeps as LI approaches the linear selection length scale Ls. In contrast, if κ ≪ 1, the inflationary selection length scale LI will be many times larger than the linear selection length scale Ls, indicating that selection will be weak compared to inflation and diffusion (but will ultimately dominate at very large lengths expanded). The three black points correspond to measurements of the κij that govern the dynamics of our competing strains N stands for the two selectively neutral strains (eCFP and eYFP), B for black, and R for mCherry (red). See the Predicting experimental results with simulation section for more details.

The third and final key parameter is the characteristic angular correlation length between selectively neutral genotypes. This parameter arises naturally when analytically calculating the neutral two-point correlation functions from the Voter model (see eq. (S1.3)). The parameter also has an intuitive description. When moving into polar coordinates, the angular diffusion coefficient Dϕ is related to the standard linear domain wall diffusion coefficient by Dϕ = Dw/R 2 . The characteristic scale for the radius is R0 the angular diffusive growth of domains should consequently scale as . Note that this characteristic angular length does not depend on the total number of strains it describes the diffusive coarsening of a single strain sector propagating through one or more other strains.

We have now identified the three key parameters that govern the evolutionary dynamics of our competing strains. is the length that a linear expansion must grow in order for selection to dominate over diffusion for strain i sweeping through strain j, controls whether selection (κij ≪ 1) or inflation (κij ≫ 1) may be neglected relative to other effects in radially inflating expansions, and sets the characteristic angular correlation length between selectively neutral genotypes. These key parameters are listed in Table 2.

#### Collapsing the evolutionary dynamics with the key parameters.

We used simulations of annihilating and coalescing random walkers constrained to lie on the edge of an inflating ring with deterministic biases due to selection (see the Simulation methods section for additional details) to investigate the effect of the parameters R0, Dw, and the set of all on the evolutionary dynamics of our competing strains. As we varied R0, Dw, and , we calculated the average fraction of each strain, the two-point correlation functions between strains, and the relative rate of annihilations and coalescences per domain wall collision (the quantities we measured experimentally). We also investigated the role of the three key combinations of parameters , κij, and ϕc for both linear and radial expansions.

We first simulated q = 3 competing strains where two neutral strains swept through a third less fit strain with wall velocity vw, similar to our experiments with two neutral strains (eCFP and eYFP) and the less fit mCherry strain. The three strains were numerically inoculated in equal proportions. Note that in this simulation, there was only one non-zero vw and consequently one and one . We varied vw from 10 −3 ≤ vw ≤ 10 −1 and N0 from 10 2 ≤ N0 ≤ 10 5 (altering R0 = N0a/(2π)) and computed the average fraction F of the less fit strain and the annihilation asymmetry ΔP. We found that both F and ΔP from simulations with identical κ, despite different values of R0 and vw, collapsed if L, the length traveled was rescaled by Ls as seen in Fig 6. Each curve in Fig 6 consists of six collapsed simulations with unique values of R0 and vw but with the same value of κ. Further simulations revealed that the two-point correlation functions Fij could be collapsed from simulations with identical κ if L was rescaled by Ls and ϕ was rescaled by provided ϕc ≪ 2π (see Supplementary S2 Fig).

Two neutral strains swept through a less fit strain with a wall velocity vw each strain was numerically inoculated in equal proportions and the colony’s initial radius was R0. For identical κ, despite different values of R0 and vw, both F and ΔP can be collapsed if the length traveled L is rescaled by , the linear selection length scale. Each universal curve at a fixed κ consists of six simulations with different values of vw and R0 each set of parameters has a different marker. As κ decreases, inflation slows the selective sweep of the more fit strains through the less fit strain as illustrated by the slower decrease of F. ΔP transitioned from 0 to 1 as the number of strains present in the expansion decreased from q = 3 to q = 2 (the less fit strain was squeezed out) this is expected from eq (3), ΔP = (3 − q)/(q − 1). Supplementary S3 Fig is identical to this figure except the y-axis of F(L/Ls, κ) is placed on a linear scale this may be useful for comparison with experiments.

We now consider the collapsed curves F(L/Ls, κ) and ΔP(L/Ls, κ) as a function of the parameter κ as seen in Fig 6. κ had a pronounced effect on both quantities. For κ ≳ 5 the dynamics of F and ΔP approached the dynamics of a linear expansion at all L/Ls, illustrated by the bright pink line on the left and the bright pink dots on the right of Fig 6 the more fit strain swept so quickly through the less fit strain that the colony’s radial expansion could be ignored. As κ decreased, the less fit strain was squeezed out more slowly due to the inflation of the frontier, resulting in slower transitions from q = 3 to q = 2 colors and consequently slower transitions from ΔP = 0 to ΔP = 1. For κ ≪ 1, ΔP barely shifted from 0 over the course of the simulation. Interestingly, ΔP peaked at a finite L/Ls for small κ it is not clear what causes this effect, but it may be related to the transition from linear to inflation-dominated dynamics as L increases.

Additional simulations revealed that for expansions composed of many strains with different fitnesses (multiple ) and consequently various κij, all of our observables (F, ΔP, and Fij) could again be collapsed onto a master curve by rescaling L by any one of the selection length scales (i.e. ) and by rescaling ϕ by ϕc the set of κij specified the master curve. An example of a simulation with that exhibits collapsed dynamics for three κij can be seen in Supplementary S4 Fig.

To summarize the results of this section, we found that we could collapse the average fraction F, annihilation asymmetry ΔP, and the two-point correlation functions Fij by (8) (9) (10) where the brackets indicate a set of variables parameterized by i and j (i.e. represents the set of all ij wall velocities). As long as L was rescaled by any selection length scale and ϕ was rescaled by the characteristic angular correlation length ϕc, the set of <κij> completely dictated the evolutionary dynamics.

### Predicting experimental results with simulations

A major goal of this paper is to test if the annihilating and coalescing random-walk model can predict the experimental evolutionary dynamics of our four competing strains (alleles) with different fitnesses (radial expansion velocities). To the best of our knowledge, analytical results for the random-walk model are unavailable (as discussed in S1 Appendix) we consequently used our simulations to predict the dynamics. In this section we quantify the three key parameter combinations for our experimental expansions and then use them to predict the evolutionary dynamics of all four of our competing E. coli strains in an independent experiment.

In the last section, we found that our simulation dynamics could be collapsed onto master curves for a fixed set of by rescaling the length expanded L by any single and by rescaling ϕ by . These simulated master curves were invariant to the alteration of simulation parameters provided that the set of κij remained the same. This insight allowed us to develop a novel method of characterizing the experimental dynamics. Namely, we could experimentally determine , κij, and ϕc, collapse the experimental data the same way as the simulations (i.e. , ϕ/ϕc), and compare the two to predict the dynamics of many competing alleles in a range expansion. As discussed below, this technique ultimately allowed for accurate predictions of the evolutionary dynamics of the four competing strains and, surprisingly, allowed us to make much more precise measurements of selective differences between strains.

As mentioned above in the Experimental Results section, using the procedures outlined in the Materials and Methods, we had previously determined R0 = 3.50 ± 0.05 mm and Dw = 0.100 ± 0.005 mm (Table 2). In order to fit and , however, we needed to measure . By tracking the growth of a more fit sector sweeping through a less fit strain (see the Materials and methods), we found that each strain swept through mCherry with a wall velocity of (as seen in Table 2) we could not detect the wall velocity of the eYFP and eCFP sweeping through the black strain.

In principle, the measured values of R0, Dw, and should have allowed us to totally calibrate the three key parameter combinations. For example, . The value of followed from the measurement of using the known value of R0. Unfortunately, the final parameter was more difficult to calibrate. Using , we found that the error on this value was too large for it to be predictive in our simulations. Furthermore, as we were unable to accurately measure the wall velocity of the eCFP/eYFP strains sweeping through the black strain, we could not calculate the corresponding selection length scale. We therefore needed a new technique to determine . As our eCFP and eYFP strains were neutral within error, we treated our system as composed of one neutral (N) eCFP/eYFP strain, a red (R) mCherry strain, and a black (B) strain (q = 3 colors). As the eCFP/eYFP expanded faster than the black followed by the mCherry strain, we needed to determine the values of , , and .

## Drift versus selection

Genetic drift and natural selection rarely occur in isolation of each other both forces are always at play in a population. However, the degree to which alleles are affected by drift and selection varies according to circumstance.

In a large population, where genetic drift occurs very slowly, even weak selection on an allele will push its frequency upwards or downwards (depending on whether the allele is beneficial or harmful). However, if the population is very small, drift will predominate. In this case, weak selective effects may not be seen at all as the small changes in frequency they would produce are overshadowed by drift.

## Abstract

Species conservation can be improved by knowledge of evolutionary and genetic history. Tigers are among the most charismatic of endangered species and garner significant conservation attention. However, their evolutionary history and genomic variation remain poorly known, especially for Indian tigers. With 70% of the world’s wild tigers living in India, such knowledge is critical. We re-sequenced 65 individual tiger genomes representing most extant subspecies with a specific focus on tigers from India. As suggested by earlier studies, we found strong genetic differentiation between the putative tiger subspecies. Despite high total genomic diversity in India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding or founding events, possibly due to small and fragmented protected areas. We suggest the impacts of ongoing connectivity loss on inbreeding and persistence of Indian tigers be closely monitored. Surprisingly, demographic models suggest recent divergence (within the last 20,000 years) between subspecies and strong population bottlenecks. Amur tiger genomes revealed the strongest signals of selection related to metabolic adaptation to cold, whereas Sumatran tigers show evidence of weak selection for genes involved in body size regulation. We recommend detailed investigation of local adaptation in Amur and Sumatran tigers prior to initiating genetic rescue.

## 8.1 Consequences of Genetic Drift

The more skewed the allele frequency is away from (0.5) , the more likely the population will be to become fixed for one allele or the other.

### 8.1.1 Time to Fixation

In fact, there is a well known relationship between the probability of fixation and the combination of both allele frequency and population size. Namely, the expected time to fixation, tfix (in generations), for a population of size Ne with two alleles (occurring p and q) is:

which for values of (N) for (p=c(0.01, 0.1, 0.25, 0.5)) are

This parameter Ne can be quite deviant from the census size (N) depending upon several features of the organisms life history. We return to that later and discuss it in depth, for the time being, lets just assume it is a measure of the size of a population. That said, the stochastic selection of alleles due only to population size, can have significant effects on allele frequencies, available genetic diversity, and genotypic composition. In this simple two-allele system (often referred to as the Wright-Fisher model), if drift is the only feature that is influencing allele and genotype frequencies, the variance in allele frequencies through time has an expectation of:

[ sigma^2_p = pqleft[ 1-expleft(-frac<2N_e> ight) ight] ]

which if examined for changes in Ne for fixed p=0.5

Figure 8.1: Expected variance in allele frequencies through time for a Wright-Fisher model of genetic drift for three different effective population sizes.

or changes in p for fixed Ne = 100 are

Figure 8.2: Expected variance in allele frequencies through time for a 2-allele Wright-Fisher model of genetic drift across different starting allele frequencies.

The important distinction here between these two graphs are that:

1. For different effective population sizes, the larger the population, the more stable the allele frequencies through time. For (N_e = 1000) , the variance in allele frequencies is relatively small, compared to the other population sizes, even after 100 generations.
2. Allele frequencies show the opposite effect, with low allele frequencies, the variance is less than that in populations with larger allele frequencies (maximizing when (p = frac<1>) , where (ell) is the number of alleles). There is more genetic variance with a more even distribution of allele frequencies (something we will return to when we talk about Fishers Fundamental Theorem).

### 8.1.2 Time to Allele Loss

Time to fixation relates to the loss of all alleles, though in the data we often deal with, there is not a simple (p=q) , two allele system. However, the models developed thus far, can give us an idea of the expected time to allele loss by rearranging the expectations a bit.

Genetic drift is a change in the frequency of an allele within a population over time. A population of rabbits can have brown fur and white fur with brown fur being the dominant allele. By random chance, the offspring may all be brown and this could reduce or eliminate the allele for white fur.

Genetic drift is more important in small populations because the chances of an allele being lost or fixed in the population are much higher, this is because each individual in a small population represents a larger proportion of the entire population (than in a large population).