Information

What fraction of sites are expected to be polymorphic?


Question

Consider a very long (eventually infinite) DNA sequence of neutral sites. Consider a panmictic population of constant size $N$ with a per site mutation rate of $mu$ where all individuals have the exact same fitness.

What is the fraction of sites that we'd expect to be polymorphic in the population (SNPs)?

Motivation behind this question

I am asking this question to verify the results of simulations I run. For example, I run a simulation with $x$ ($x$ will be varying below) neutral sites, with a per-site mutation rate $mu = 10^{-9}$ and a population size of $N=100$. I run the simulations for 10,000 generations. There is no recombination. When the number of sites:

  • $x=10^3$ I get 0 SNP
  • $x=10^4$ I get 1 SNP
  • $x=10^5$ I get 3 SNPs
  • $x=10^6$ I get 25 SNPs
  • $x=10^7$ I get 238 SNPs

Is there a bug in my model or is it what we'd expect given the parameters?

In the human genome, 1 out of 300 sites are polymorphic (SNPs) (ref.). This is a frequency of SNPs that is 100 times greater than what I observe in my simulations. Note however, that the assumption of neutrality and out demographic assumptions would not perfectly hold and this result could pretty far off neutral expectation. My goal is not to reproduce something that look like the human genome but only to reproduce the neutral expectations for the moment.


Reiterating the above comments. Have a look at Tajima's D. It provides an estimate for the number of segregation sites for a population under a neutral mutation model.

The general form of the estimation for a diploid population is $E[S]=4Nmusum_{i=0}^{n-1} frac{1}{i}$. Here the mutation rate of is per-genome not per-site, so $mu=L * 10^{-9}$ where $L$ is the genome size. Estimating the segregation sites of an entire population of $n=N=100$ with genome size of $L=10^{7}$ where each site has a per genome mutation rate of $mu=10^{-2}$ one would expect that $E[S] approx 20.75$. So, your numbers seem higher than expected.

I've written some example simulation software that is capable of performing such evolutionary scenarios (Clotho manuscript). Similarly, you can check your numbers against a population generated using MS.


The fraction of polymorphic sites that exist in a population is dependent on the biology of the organism. For instance, you would expect to find different rates of polymorphism in related plants that have different breeding systems, e.g. in Silene [1]. Past bottlenecks are also expected to decrease polymorphisms [2]. So, the answer to your question would depend on the exact species and population that you are looking at.


we included a script to calculate this in supplemental material

http://onlinelibrary.wiley.com/doi/10.1111/mec.13034/full

… single segregating site per locus or up to a maximum of four SNPs, as is expected for short-read genomic data (see attached R script for estimation).


Restriction fragment length polymorphism

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, in order to distinguish individuals, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

Although now largely obsolete due to the emergence of inexpensive DNA sequencing technologies, RFLP analysis was the first DNA profiling technique inexpensive enough to see widespread application. RFLP analysis was an important early tool in genome mapping, localization of genes for genetic disorders, determination of risk for disease, and paternity testing.


Introduction

The discovery of clustered regularly interspaced short palindromic repeats (CRISPR) in bacteria 1,2 and the Cas9 enzyme (CRISPR associated protein 9) 3,4,5 , has revolutionized our capacity to genetically engineer a wide range of organisms. The subsequent development of CRISPR-Cas9-based gene drives 6 has further increased the potential application of this technology. Gene drives promote the spread of introduced genetic elements (e.g., alternative alleles, exogenous genes) through populations by altering the way in which they are inherited, such that the desired genetic element is over-represented among progeny (“Super-Mendelian inheritance”) 7 . This leads to an increase in frequency of the introduced genetic element, potentially until fixation in the targeted population.

One application of CRISPR-Cas9 gene drive that has gained a great deal of attention is the possibility of controlling populations of disease vectors like mosquitoes. The focus of current efforts is Anopheles gambiae and An. coluzzii which transmit malaria, and Aedes aegypti which transmits dengue, chikungunya, yellow fever, and Zika. Collectively, these diseases cause hundreds of thousands of human deaths per year 8 . New strategies for controlling these vectors are sorely needed because currently available control methods are costly, increasingly ineffective due to insecticide resistance 9 and are generally difficult to deploy in rural endemic areas. Alternative genetic-based strategies for vector control are not new, however, the recent advances in genetic engineering and gene drive have sparked increased interest in this approach. There are two broad categories of strategies involving genetically engineered mosquitoes (GEM) with gene drive currently under development: population suppression aimed at greatly reducing or eliminating the mosquito population 10 and population modification, which renders mosquitoes incapable of transmitting a pathogen but otherwise leaves it unaltered 11 . Recently, CRISPR-Cas9-based gene drive systems have been designed for population modification in Anopheles 12 and Aedes 13 and for population suppression in Anopheles 10,14 mosquitoes.

Experiments demonstrating the capacity of gene-drive constructs to spread through wild-type populations in laboratory cages have yielded promising results 10 . A major limitation of these experiments is that they use populations of mosquitoes derived from long-standing laboratory colonies that do not replicate populations as they occur in nature 15,16 . Specifically, founder effects during establishment, repeated bottlenecks experienced during maintenance, and selection for adaptation to the laboratory environment in these colonies all result in the loss of genetic variability relative to their counterparts in nature 17,18,19 .

Recently, several population genomic studies have amassed a large volume of genomic data from natural populations of An. gambiae 20,21 , An. coluzzii 21 , and Ae. aegypti 22 . These surveys revealed exceptionally high levels of genetic variability leading some authors to warn that CRISPR-Cas9-based gene-drive systems (CGD) may be prone to failure due to drive resistance resulting from standing genetic variation. This includes uncleavable alleles within the target sequence that are not recognized by the guide RNA 21,23 . A study of the impact of drive resistance alleles (DRAs) on the performance of CGD in natural populations of the flour beetle, Tribolium castaneum, concluded that population-specific rare alleles will probably reduce or eliminate drive efficacy 24 . General modeling approaches revealed that standing genetic variation could even exceed de novo mutations in contributing to CGD resistance 25 . Given the interest in the development of CGD, a systematic evaluation of the distribution of polymorphisms within the genomes of these critical mosquito species and its impact on potential target sites for CRISPR-Cas9 editing is warranted.

Here we present genome-wide screens of the three principal human disease vector species An. gambiae, An. coluzzii, and Ae. aegypti for the presence of CRISPR-Cas9 target sites and an analysis of the degree of polymorphism therein. In detail, we search all transcribed regions of protein-coding genes in the species’ reference genomes for potential CRISPR-Cas9 target sites. We then subject each target site to a screen for nucleotide polymorphisms (single nucleotide polymorphisms, insertions, deletions) in the genomes of mosquitoes sampled directly from natural populations. Our analyses include 111 An. gambiae, 100 An. coluzzii, and 132 Ae. aegypti genomes from our lab plus publicly available polymorphism data from 937 additional An. gambiae s.l. samples. The special interest in An. gambiae as the principal vector of malaria in Africa results in a larger number of individual mosquito sequence data compared with any other mosquito species. Additional insights gained from including the larger number of sequences compared with An. coluzzii and Ae. aegypti outweigh the benefits of having equal numbers per species. We find that >30% of protein-coding genes have potential CRISPR-Cas9 targets with GC content between 30 and 70% and no off-target sequence. This drops to 8.4% if sites with DRAs at frequencies >1% in natural populations are excluded. Nonetheless

90% of all protein-coding genes contain at least one target site that remain after this filtering. Based on these observations we conclude that DRAs within the standing variation that exists in natural populations of the mosquito species studied will not pose a problem to the successful deployment of CRISPR-Cas9-based gene drive for population modification strategies. Gene drive used as part of population suppression strategies are more likely to be unsustainable because of the presence of low-frequency DRAs and the fact that they impose much stronger selection favoring them.


Author Summary

Population genomics, the study of genome-wide patterns of sequence variation within and between closely related species, can provide a comprehensive view of the relative importance of mutation, recombination, natural selection, and genetic drift in evolution. It can also provide fundamental insights into the biological attributes of organisms that are specifically shaped by adaptive evolution. One approach for generating population genomic datasets is to align DNA sequences from whole-genome shotgun projects to a standard reference sequence. We used this approach to carry out whole-genome analysis of polymorphism and divergence in Drosophila simulans , a close relative of the model system, D. melanogaster . We find that polymorphism and divergence fluctuate on a large scale across the genome and that these fluctuations are probably explained by natural selection rather than by variation in mutation rates. Our analysis suggests that adaptive protein evolution is common and is often related to biological processes that may be associated with gene expression, chromosome biology, and reproduction. The approaches presented here will have broad applicability to future analysis of population genomic variation in other systems, including humans.

Citation: Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, Hahn MW, et al. (2007) Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans . PLoS Biol 5(11): e310. https://doi.org/10.1371/journal.pbio.0050310

Academic Editor: Mohamed A. F. Noor, Duke University, United States of America

Received: March 19, 2007 Accepted: September 26, 2007 Published: November 6, 2007

Copyright: © 2007 Begun et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: DJB was supported by National Institutes of Health (NIH) Grant R01 GM071926. CHL was supported by NIH R01HG2107–3 and NIH HG02942-01A1. AKH was supported by a National Science Foundation (NSF) Postdoctoral Fellowship in Biological Informatics (Grant No. 0434670). MWH and PMN were supported by an Indiana University-Purdue University Collaborations in Life Sciences and Informatics Research grant to MWH. MWH was also supported by an NSF Postdoctoral Fellowship in Biological Informatics while he was at UC-Davis. CDJ was supported by NSF grant DEB 0512106. ADK was supported by a Howard Hughes Predoctoral Fellowship. Y-P. Poh was supported by a graduate student fellowship (Academica Sinica) and Grant NSC 9402917-1-007–011. LP and CD were supported by NIH R01-HG02362–03 and NSF grant CCF 03–47992. Generation of the D. simulans and D. yakuba sequences was supported by grants from the National Human Genome Research Institute.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: CDS, coding sequence GO, gene ontology indel, insertion/deletion MK test, McDonald and Kreitman test UTR, untranslated region


Selection, whether natural or artificial, changes the frequency of morphs within a population this occurs when morphs reproduce with different degrees of success. A genetic (or balanced) polymorphism usually persists over many generations, maintained by two or more opposed and powerful selection pressures. [6] Diver (1929) found banding morphs in Cepaea nemoralis could be seen in prefossil shells going back to the Mesolithic Holocene. [10] [11] Non-human apes have similar blood groups to humans this strongly suggests that this kind of polymorphism is ancient, at least as far back as the last common ancestor of the apes and man, and possibly even further.


The relative proportions of the morphs may vary the actual values are determined by the effective fitness of the morphs at a particular time and place. The mechanism of heterozygote advantage assures the population of some alternative alleles at the locus or loci involved. Only if competing selection disappears will an allele disappear. However, heterozygote advantage is not the only way a polymorphism can be maintained. Apostatic selection, whereby a predator consumes a common morph whilst overlooking rarer morphs is possible and does occur. This would tend to preserve rarer morphs from extinction.

Polymorphism is strongly tied to the adaptation of a species to its environment, which may vary in colour, food supply, and predation and in many other ways. Polymorphism is one good way the opportunities [ vague ] get to be used it has survival value, and the selection of modifier genes may reinforce the polymorphism. In addition, polymorphism seems to be associated with a higher rate of speciation.

Polymorphism and niche diversity

G. Evelyn Hutchinson, a founder of niche research, commented "It is very likely from an ecological point of view that all species, or at least all common species, consist of populations adapted to more than one niche". [13] He gave as examples sexual size dimorphism and mimicry. In many cases where the male is short-lived and smaller than the female, he does not compete with her during her late pre-adult and adult life. Size difference may permit both sexes to exploit different niches. In elaborate cases of mimicry, such as the African butterfly Papilio dardanus, [4] : ch. 13 female morphs mimic a range of distasteful models, often in the same region. The fitness of each type of mimic decreases as it becomes more common, so the polymorphism is maintained by frequency-dependent selection. Thus the efficiency of the mimicry is maintained in a much increased total population.

The switch

The mechanism which decides which of several morphs an individual displays is called the switch. This switch may be genetic, or it may be environmental. Taking sex determination as the example, in humans the determination is genetic, by the XY sex-determination system. In Hymenoptera (ants, bees and wasps), sex determination is by haplo-diploidy: the females are all diploid, the males are haploid. However, in some animals an environmental trigger determines the sex: alligators are a famous case in point. In ants the distinction between workers and guards is environmental, by the feeding of the grubs. Polymorphism with an environmental trigger is called polyphenism.

The polyphenic system does have a degree of environmental flexibility not present in the genetic polymorphism. However, such environmental triggers are the less common of the two methods.

Investigative methods

Investigation of polymorphism requires use of both field and laboratory techniques. In the field:

  • detailed survey of occurrence, habits and predation
  • selection of an ecological area or areas, with well-defined boundaries
  • capture, mark, release, recapture data (see Mark and recapture)
  • relative numbers and distribution of morphs
  • estimation of population sizes
  • genetic data from crosses
  • population cages

  • chromosome cytology if possible
  • use of chromatography or similar techniques if morphs are cryptic (for example, biochemical)

Without proper field-work, the significance of the polymorphism to the species is uncertain and without laboratory breeding the genetic basis is obscure. Even with insects, the work may take many years examples of Batesian mimicry noted in the nineteenth century are still being researched.


Discussion

Statistics, coverage, and sequencing errors

It is striking that p-values for the non-divergent sites increase with coverage. For instance, out of the 36000 non-divergent sites, we expect approximately 36 sites by chance to have a p-value less than 10 −3 . For 10x coverage, we find 9, for 20x, we find 35, and for 40x we find 70. This indicates that p-values are biased upwards with increasing coverage, and must be consequently be interpreted with care [13]. The expectation and variance of F ST similarly depends on coverage. In contrast, low coverage in combination with sequencing errors and incorrectly mapped reads here result in a large number of high-scoring non-divergent sites. Using a combination of these measures may be effective, but also effectively narrows the data set, much like a stringent filtering for coverage.

Simulated data is by definition a simplification of reality. For instance, here the data assumes uniform probability of reads across the genome, and unbiased and context independent sequencing errors. Also, divergent and non-divergent positions occur in similar numbers in the simulated data, in reality, there will be a continuous spectrum of allele frequencies, and it will depend globally on the degree of divergence between populations, and locally on selection and other non-random evolution pressures. Results from simulated data must, as always, be interpreted as optimistic. In practice, coverage will vary substantially across a sequenced genome. In general, high variance regions tend to have lower mapping [22], but other factors are bias caused by GC-content, misassembly and collapsed repeats, copy-number and other structural variations, incorrect mapping, sampling bias (including from variation of molarity in DNA samples). Real data sets must therefore be expected to contain a wide range of coverages, mapping reliability, and sequencing error rates.

Other information theoretic measures

Although not commonly applied, information theoretic measures have been used previously in analyzing genetic variation. Expected site information is related to Kullback-Leibler divergence [23], but differs in that it is symmetric and extended to multiple alleles. Rosenberg [6] gives a summary of several alternative statistics, and also develops an information theoretic measure that contrasts individual populations with an average of all population. This measure is then used to infer ancestry, and applied to microsatellite data. Here, we develop an information theoretic measure in a Bayesian context, and apply it to high-throughput sequencing data.

Dealing with sequencing errors and artifacts

Based on the assumption that most sequencing errors will be singletons, Achaz [24] developed variants of several estimators for Θ which avoids taking singletons into account. Achaz' formulas were later adapted to high-throughput sequencing experiments, and given a more generalized (but approximate) form that allowed an arbitrary lower bound on number of observed alleles [4]. However, much of the genetic diversity is in the form of low frequency alleles, and as singletons also have a high impact on many statistics [24], these estimators have lower power [24, 2]. It is also possible to attempt to quantify the errors more precisely by leveraging characteristics of the data [5].

Future work

Here, we have focused on the expected information content. As this is an additive measure, it is straightforward to sum over multiple sites to get the expected information for a set of SNPs. Since rare alleles yield more information than common ones, a natural extension might be to consider instead the minimum information content from a set of loci, ensuring that we can reach a conclusion even if we are unlucky with the actual alleles observed. Yet another option is to calculate a confidence interval for the information.


Discussion

I. elegans is a geographically widely distributed damselfly species that occurs across the Palearctic (Boudot & Salamun, 2015 ) and has become a well-known study system in evolutionary ecology. Population studies on I. elegans have addressed a variety of topics, including the age-, frequency- and density-dependent dynamics of male-mating harassment (Svensson et al., 2005 Gosden & Svensson, 2009 Van Gossum et al., 2011 Willink et al., 2019 ) and the maintenance of heritable morphs by balancing selection (Takahashi et al., 2014 Le Rouzic et al., 2015 ). Population studies with I. elegans have also been used to investigate phenotypic correlations between alternative mechanisms of defence against parasites (Willink & Svensson, 2017 ), and the role of environmental and social effects on temperature sensitivity and range expansion (Lancaster et al., 2017 ). These studies rely on female colour morphs as visual genetic markers of suites of correlated traits on which natural and sexual selection operate. However, the majority of such field studies have taken place near the northern limit of the distribution range of I. elegans. How these phenomena are influenced by large-scale environmental variation is therefore an important question, which could be addressed with comparative studies across the geographic range of I. elegans. As a first step on the road towards such a broader geographic scope of studies of I. elegans, here we present data on the basic population biology and phenology of this trimorphic species in Cyprus. This is the southernmost region where populations of this widespread species have been systematically studied to date.

Three main features distinguish breeding populations in Cyprus from those in Northern Europe. Firstly, male-mimicking A-females are the minority morph in Cyprus, occurring only at ∼5% frequency, while I- and O-females occur at similarly high frequencies (Fig. 2a,b). Such a low frequency of A-females is striking, given that this may be the most visually conspicuous morph to human observers. A-females display a bright blue colouration throughout their adult life, which may increase their detectability compared to the other morphs, which develop a darker and duller colour pattern during sexual maturation (Henze et al., 2019 Willink et al., 2020 ). In contrast, the frequency of A-females increases with latitude, with females in Southern Sweden typically being composed of 60–80% male mimics (Gosden et al., 2011 Le Rouzic et al., 2015 ). The increasing frequency of A-females in northern Europe is likely due to their developmental advantages at cooler temperatures, whereby A-females enjoy higher pre-reproductive survival and faster sexual maturation and colour development (Svensson et al., 2020 ). The developmental success of I- and O-females after emergence from the last nymphal stage is in contrast more sensitive to temperature, and these two morphs are therefore expected to benefit more from the warmer Mediterranean climate of Cyprus (Svensson et al., 2020 ).

The low island-wide frequency of A-females might imply that some populations in Cyprus are effectively dimorphic, with only I- and O-females. In fact, a previous survey across continental Europe suggested that about 10% of populations were dimorphic, although they all included A-females (Gosden et al., 2011 ). The loss of a female morph may cause increased pre-mating and mating harassment in the other two morphs, as males would have fewer competing targets while forming a search image of potential mates (Dukas & Kamil, 2001 ). However, if male-mimicry is effective, the local absence of otherwise rare A-females should not dramatically alter male-mating harassment in the other two morphs, which would already account for most male-mating attempts under negative frequency-dependent selection. The local absence of A-females might also be temporary. In I. elegans, the A-allele is dominant over I- and O-alleles. While dominant alleles under negative frequency-dependent selection are more likely to be lost by genetic drift, they also re-invade populations more easily due to Haldane's Sieve, the expectation that a weakly advantageous mutation will increase more rapidly in frequency if dominant (Haldane, 1924 Pannell et al., 2005 ). Haldane's Sieve acting on migrants might thus contribute to the maintenance of the dominant A-allele at a regional scale (Roux & Pannell, 2019 ).

As in Cyprus, A-females are usually not the majority morph in the south of continental Europe, but typically I-females are more common than O-females (Gosden et al., 2011 Svensson et al., 2020 ). In the closely related species Ischnura genei, populations in the Mediterranean island of Sardinia also have relatively low frequencies of A-females, but O-females are generally more common than I-females (Sanmartín-Villar & Cordero-Rivera, 2016 ). Although several ecological differences between A- and both I- and O-females have previously been reported for I. elegans (Lancaster et al., 2017 Willink & Svensson, 2017 Svensson et al., 2020 ), ecological differences between I- and O-morphs have not been investigated to a similar extent, probably because O-females are so rare in northern Europe. The role of ecological mechanisms versus historical contingencies and ‘island effects’ that might shape morph-frequency variation within the Mediterranean region therefore remains an interesting question that should be addressed in the future, to get a better understanding of the ecological factors and evolutionary processes operating in this trimorphic system.

Secondly, adult individuals of I. elegans are relatively small in Cyprus (Fig. 3). This is particularly the case for males, which are also generally smaller than females (Abbott & Gosden, 2009 ), and when compared to the reported European range of total body length and hind wing length (Dijkstra & Lewington, 2006 Fig. 3b,c). This qualitative result is consistent with Bergman's rule, a pattern of increasing body size with decreasing temperature. Bergman's rule is supported in vertebrate endotherms (Ashton et al., 2000 Meiri & Dayan, 2003 Salewski & Watt, 2017 but see Riemer et al., 2018 ), but is not generally supported across diverse clades of insects and other ectotherms (Mousseau, 1997 Blanckenhorn & Demont, 2004 Adams & Church, 2008 Shelomi, 2012 Wonglersak et al., 2020 ). One mechanistic explanation for Bergman's rule that is applicable for I. elegans, and odonates in general, is that developmental rate (i.e. cell division and differentiation) increases more rapidly with temperature than does metabolism (Blanckenhorn & Demont, 2004 ). Therefore, higher developmental temperature at more southern latitudes should result in faster maturation at a smaller body size. Because growth in odonates occurs only during the aquatic larval stage, differences in developmental temperature should affect both the duration of the growing period (i.e. voltinism) and the adult body size (Johansson, 2003 De Block et al., 2008 Hassall et al., 2014 ). Our study did not address whether Bergman's rule is met throughout the distribution range of I. elegans. However, a decreasing number of generations per year at higher latitudes has been previously reported (Corbet et al., 2006 ), and the extended flight season of I. elegans in Cyprus (see below) also suggests that populations on this island are multivoltine.

A-females in Cyprus were more male-like in size than either I- or O-females. This is consistent with previous analyses in Swedish populations, showing that A-females are more male-like in shape than the other morphs (Abbott & Gosden, 2009 ). Recent studies in female-polymorphic insects, including the widespread tropical and subtropical I. senegalensis (a congener of I. elegans that has been intensively studied in Japan), suggest that the development of male-coloured females is more masculinised, compared to the development of alternative female morphs (Takahashi et al., 2019 ). These developmental differences between morphs may be caused by alternative splicing and differential expression patterns of the regulatory gene doublesex (Takahashi et al., 2019 , 2020 ), which also underlies the development of somatic sex differences across many insect taxa (Kopp, 2012 ). To date, there is no direct evidence of a male-like expression pattern of dsx in A-females of I. elegans. However, the locus or set of tightly linked loci that govern colour morph development in I. elegans seems to have pleiotropic effects during colour development and differentiation of the female morphs (Willink et al., 2020 ). Such widespread pleiotropy may also impact the rate and duration of larval development, in turn generating size differences between morphs. In southern Sweden, the larval developmental period is shorter in the offspring of O-females, suggesting this allele is associated with a faster developmental rate (Abbott & Svensson, 2005 ). In contrast, the A-allele may be associated with a slower growth rate, as A-females have a similar developmental period as I-females but become mature at a smaller size (Abbott & Svensson, 2005 ). Whether these developmental differences occur at warmer temperatures has not been investigated.

Finally, the flight season, during which adult damselflies emerge and can potentially mate, is considerably longer in Cyprus than in any other location with comparable data (Fig. 4a Boudot & Salamun, 2015 ). This is consistent with overall fast development and multivoltinism caused by warm to mild temperatures throughout the year. Flight activity in Cyprus spanned more than 9 months (Fig. 4a), resulting in partial evidence for a seasonal pattern in the island-wide dataset (Table 2 Fig. 4b). Some core localities with long-term data, had even less support for seasonality in flight activity (Table 2). Variation in seasonality among localities could be explained by regional differences in the length of growth periods and breeding seasons, for instance due to an altitudinal temperature gradient. Although a test of this hypothesis would entail estimating seasonal strength from a larger number of localities with long-term data, such a parallel between latitudinal and altitudinal gradients in the seasonality of (potential) mating activity is known for other ectotherms (Morrison & Hero, 2003 ). Nevertheless, the marked contrast between the flight season of I. elegans in Cyprus and more Northern European sites (Fig. 4a) suggests a pervasive role of temperature influencing life-history evolution in this widely distributed damselfly.

In conclusion, species with broad distribution ranges, such as I. elegans, provide excellent opportunities to investigate how large-scale climatic variation shapes the phenotypic outcomes of selection driven by local interspecific and intraspecific interactions. However, for many widespread species, local field studies tend to closely match the geographic distribution of the scientists who study them, which in turn reflects economic factors and research traditions, rather than strict biological considerations, and can lead to biases in the perception of which ecological factors are most important (Zuk, 2016 ). Here, we have studied the population biology of the widely distributed damselfly I. elegans in Cyprus, which is the southern range limit of this species in Europe. Populations of I. elegans in Cyprus are distinguished by the rarity of male-mimicking females, reduced body size, and a long flight season. These ecological differences from northern Europe, where the majority of field studies have been conducted, underscore the importance of broadening the geographic scope of field studies in I. elegans and many other widespread organisms.


Methods

We searched for all bird species in which true colour polymorphism is known to occur, by consulting the existing literature on the topic and a number of books devoted to birds of the world or specific geographical areas ( King & Dickinson, 1975 Howard & Moore, 1980 Brown et al., 1992 Fry et al., 1992 Meyer De Schauensee, 1992 Pizzey & Knight, 1997 Cleere & Nurney, 1998 Cramp, 1998 Grimmet et al., 1998 Westoll, 1998 del Hoyo et al., 1999 Scott, 1999 ). We considered polymorphic species in which colour polymorphism occurred in one or in both sexes, regardless of age-specific plumage variation. We recorded, when available, the occurrence of clinal variation in relative frequency of morphs (morph-ratio) in order to identify potential environmental factors changing in relation to the change in morph frequency. For descriptive purposes, morph-ratio clines were classified according to type (i.e. geographic, climatic and habitat clines). Then, clines were assigned to one of the following categories: (a) clines with a clear morph-background matching, e.g. pale morph in open habitat, dark morph in closed habitat (b) clines with the opposite trend, e.g. pale morph in closed habitat and dark morph in open habitat and (c) clines that could not be interpreted in any direction. Finally, we compared number of clines in category (a) with number of clines in category (b) by using a binomial test.


Ecology

Selection, whether natural or artificial, changes the frequency of morphs within a population this occurs when morphs reproduce with different degrees of success. A genetic (or balanced) polymorphism usually persists over many generations, maintained by two or more opposed and powerful selection pressures. [9] Diver (1929) found banding morphs in Cepaea nemoralis could be seen in pre-fossil shells going back to the Mesolithic Holocene. [12] [13] Apes have similar blood groups to humans this suggests rather strongly that this kind of polymorphism is quite ancient, at least as far back as the last common ancestor of the apes and man, and possibly even further.

The relative proportions of the morphs may vary the actual values are determined by the effective fitness of the morphs at a particular time and place. The mechanism of heterozygote advantage assures the population of some alternative alleles at the locus or loci involved. Only if competing selection disappears will an allele disappear. However, heterozygote advantage is not the only way a polymorphism can be maintained. Apostatic selection, whereby a predator consumes a common morph whilst overlooking rarer morphs is possible and does occur. This would tend to preserve rarer morphs from extinction.

Polymorphism has a lot to do with the adaptation of a species to its environment, which may vary in colour, food supply, predation and in many other ways. Polymorphism is one good way the opportunities get to be used it has survival value, and the selection of modifier genes may reinforce the polymorphism. In addition, polymorphism seems to be associated with a higher rate of speciation (Hugall & Stuart-Fox 2012).

Polymorphism and niche diversity

G. Evelyn Hutchinson, a founder of niche research, commented "It is very likely from an ecological point of view that all species, or at least all common species, consist of populations adapted to more than one niche". [15] He gave as examples sexual size dimorphism and mimicry. In many cases where the male is short-lived and smaller than the female, he does not compete with her during her late pre-adult and adult life. Size difference may permit both sexes to exploit different niches. In elaborate cases of mimicry, such as the African butterfly Papilio dardanus, [6] : ch. 13 female morphs mimic a range of distasteful models, often in the same region. The fitness of each type of mimic decreases as it becomes more common, so the polymorphism is maintained by frequency-dependent selection. Thus the efficiency of the mimicry is maintained in a much increased total population.

The switch

The mechanism which decides which of several morphs an individual displays is called the switch. This switch may be genetic, or it may be environmental. Taking sex determination as the example, in humans the determination is genetic, by the XY sex-determination system. In Hymenoptera (ants, bees and wasps), sex determination is by haplo-diploidy: the females are all diploid, the males are haploid. However, in some animals an environmental trigger determines the sex: alligators are a famous case in point. In ants the distinction between workers and guards is environmental, by the feeding of the grubs. Polymorphism with an environmental trigger is called polyphenism.

The polyphenic system does have a degree of environmental flexibility not present in the genetic polymorphism. However, such environmental triggers are the less common of the two methods.

Investigative methods

Investigation of polymorphism requires use of both field and laboratory techniques. In the field:

  • detailed survey of occurrence, habits and predation
  • selection of an ecological area or areas, with well-defined boundaries
  • capture, mark, release, recapture data (see Mark and recapture)
  • relative numbers and distribution of morphs
  • estimation of population sizes
  • genetic data from crosses
  • population cages cytology if possible
  • use of chromatography or similar techniques if morphs are cryptic (for example, biochemical)

Both types of work are equally important. Without proper field-work, the significance of the polymorphism to the species is uncertain without laboratory breeding, the genetic basis is obscure. Even with insects, the work may take many years examples of Batesian mimicry noted in the nineteenth century are still being researched.


Reindeer and caribou

Genetic polymorphism of serum transferrins in reindeer is used in population and genetic studies. [92] [93] Gene concentrations of alleles in populations of reindeer of the North-East of Siberia were compared with those in reindeer inhabiting Norway, the northern regions of the European part of the USSR and from North American caribou. Researchers found that frequencies of Tf alleles of the Siberian reindeer differed from all the others. It is possible that resistance to necrobacteriosis is related to concentrations of alleles in certain reindeer populations. [93]