We present here a study of the genetic diversity and
population structure in a collection of Trifolium
pratense using GBS to produce the molecular marker data. GBS is a high
throughput and cost effective procedure that can be used to identify and select
target plants for breeding programs (Kim, et al., 2016).
GBS was performed on a collection of 640 individuals from 75 collections. However,
10 genotypes did not pass the sequencing quality scores and were therefore
omitted from the analysis. ApeKI was used to reduce the complexity of the
genome and a high number of tags were identified in conjunction with the red
clover genome assembly (De Vega, et al., 2015).
ApkeI was selected as the restriction enzyme as it is partially methylation
sensitive and rarely cuts retrotransposons. Therefore, ApeKI digestion products
are fragments preferentially from low-copy genomic regions (Elshire, et al., 2011, Sonah, et al., 2013),
and are more likely to be genic in origin.
1.8 million tags were produced in TASSEL5v2. Within these
tags 8,118 high quality SNP were identified across the 640 genotypes. For SNP
to pass into the final cohort there had to be a minimum of 10 reads per variant
tag. This ensured that heterozygous calling was stable and reliable across the
genome. Red clover is a heterozygous plant and this was reflected in the number
of heterozygous biallelic SNP identified. The 8,118 SNP were predominantly
transition SNP as opposed to transversion SNP (4999 v 3119, Table 3), this
phenomenon has also been reported amongst others in chickpea (Kujur, et al., 2015).
This bias towards transitional SNP is advantageous during natural selection as
these SNP are more likely to conserve protein structure than are transvertion
SNP (Wakeley, 1996).
Phylogenetic relationships and genetic diversity
This paper describes the genetic diversity and population
structure of red clover; and indicates a strong relationship between geography
and accession. The germplasm was a collection of material from ecotype
populations from Europe and Asia and five breeding lines originating from
Europe. There were some anomalies in the genetic structure analysis, and there
is some indecision as to how many groupings best explains this collection of
red clover from the IBERS gene bank. According to k-means clustering, the
change in slope identified either a two or a four groupings structure, this also
was reflected in the PCA. However, STRUCTURE analysis identified the two groupings
structure, but a more likely population of nine groupings. In all cases, the
two grouping structure separated Asia from Europe; the four group structure
differentiated Asia, UK, Iberia and the rest of Europe, with no discernible
genetic variation between the cultivars and their region of origin. The nine
groups structure again separated Asia from Europe, however the European
accessions were less well defined with the likely scenario being a division of
Iberia and UK from the rest of Europe.
Landraces and natural ecotype populations, and to some
extent cultivars, are heterogeneous populations derived from cross-pollination
of many individuals (Bowley, et al., 1984, Taylor and Quesenberry, 1996).
Therefore high levels of within population variability are expected (Kolliker, et al., 2003). This has been shown in
studies of alfalfa (Mengoni, et al., 2000), and many other legume
species (Smykal, et al., 2015). AMOVA analysis revealed high
levels of within population variation with a range of 8-15% of the total
variation, and low levels between accessions (<1% - 6%). This was reflected in the population wide analysis in Pegas, where HO (0.239) was significantly higher than DST (0.019). The high heterogeneity and heterozygosity of red clover is expected as a result of cross pollination and its gametic self incompatibility (Riday and Krohn, 2010). Previous studies into the genetic diversity of red clover have shown that the majority of the diversity is at the population level. Studies using SSR markers in a Ukrainian red clover cultivar collection (Dugar and Popov, 2013) found that the among group genetic variability was low and accounted for only around 7% of the total variability. The genetic diversity of a core collection of red clover also previously analysed with SSR using AMOVA (Gupta, et al., 2016) revealed that most of the genetic diversity was evident within the population and a much smaller proportion was accounted for by the among group variability. However, studies based on only 6 SSR markers on seven cultivars of red clover (Berzina, et al., 2008) revealed low levels of genetic polymorphic variation (2%) within groups, as well as very low ? scores (0.006 to 0.043). Therefore, the results of this present study into the genetic variability via AMOVA of the ecotypes and varieties is in line with previous analyses. There was no genetic bottleneck identified in the population as a result of domestication or as a result of small isolated ecotype populations (HO was not lower than the HE in any case). However, in general domestication tends to reduce allelic variation and genetic diversity (Doebley, et al., 2006). There was evidence for this from the lower DST scores and lower overall heterozygosity (FIS) calculated for the varieties to that of the ecotypes (Table 6). This low level of diversity may be due to continuous selection reducing the effective population size increasing genetic drift and hitchhiking during the domestication process (Doebley, et al., 2006, Gross and Olsen, 2010, Tang, et al., 2010). The low levels of diversity were especially evident when the varieties were analysed separately. This may indicate a common unknown ancestry to the varieties of Milvus, Britta and AberRuby shown by the low-recorded DST and FST values. However, the analysis for Crossway and Broadway were somewhat predictable as these are related varieties (Rumball, et al., 2003). The ecotype analysis of the Asian accessions indicated a diverse and moderately unrelated population. These genotypes cover a very wide geographic region and it is not surprising that they show the highest levels of genetic differentiation both between and within the samples. The samples from the Iberian Peninsula also consisted of a moderately unrelated set of genotypes. At the local level, these ecotypes are geographically separated, which may have influenced potential for gene flow between the populations. The increase in genetic diversity may also be a result of some selective grazing forces; 1/6 of the Iberian population were collected from fields where horses were freely grazing. This diversity may enable these accessions to survive the close grazing nature of horses. The UK population showed the lowest FST value for the ecotypes. Island populations have lower genetic diversity when compared to continental populations (Crawford, et al., 1992), this is most likely due to reduced genetic variation in the initial population. In all circumstances, there was a departure from HW, as indicated by the significant negative FIS scores. These negative scores showing an excess in heterozygosity are predictable for red clover as it is an obligate out breeder and is self-incompatible (Riday and Krohn, 2010). However, departures from HW are possible due to non-random mating during the domestication process. A small reproductive population during breeding will inevitably cause departures from HW, where only a few individuals contribute to the next generation leading to significant deviation from random mating (Balloux, 2004, Pudovkin, et al., 1996). Outlier detection Red clover is a Mediterranean species adapted to many edaphic conditions (Taylor and Quesenberry, 1996). Like many other species, this broad range of adaptation is due, on the whole, to the existence of high numbers of local adapted genotypes rather than to a single ubiquitous genotype (Schmid, et al., 2001), and most cultivars are not adapted to areas far from where they were developed. Outlier loci differ significantly from the background genomic levels (Storz, 2005, Storz, 2010), and loci that are under divergent selection are likely to be responsible for the genetic variation that affects fitness in different environments. Environmental conditions, including longitude, latitude and altitude, may result in physiological challenges that in turn may lead to morphological and molecular adaptations (Storz, 2010). Sam?ada indicated a strong correlation between longitude and adaptive SNP, and indeed a regression of the first principal coordinate identified the same correlation (Figure 5). These correlations also reflected the population structure as defined by cluster analysis in UPGMA and PCA. In this study, we used three methods to detect outlier markers. The proportion of outliers detected varied between the methods; R detected 10.7%, BayeScan 7.2% and Sam?ada 12.6%. The three-way analysis into SNP outliers identified 56 SNP, providing strong evidence that these loci are recent targets of geographically variable natural selection. The 56 variants detected in the genome scans were identified as both transversion and translation SNP and were found in a diverse array of gene models. There are several genes of interest to the present study, one of which is the SNP variant present in the intron of gene 11182. This is predicted to be an SRS family protein, which are genes of putative transcription factors in which the SHORT INTERNODES (SHI) gene is a member. In Poplus, this gene has been found to regulate shoot growth and xylem proliferation (Zawaski, et al., 2011). The authors found that the suppression of Poplus SHI-like genes increased vegetative stem and leaf growth. Coupled with this gene involved in stem growth are two SNP in gene 12650, which encodes a pentatricopeptide repeat protein (PRP). These PRPs may be involved in plant growth and development and response to environmental conditions (Han, et al., 2016, Lee, et al., 2017, Wu, et al., 2016). Other genes of interest are involved in plant growth and response to environmental stress are gene 1512 a WD40 protein, and a DREB family transcription factor member gene DDF2, gene 1450 (Lehti-Shiu, et al., 2015). A WD protein was also found to be an outlier in a study of Medicago symbiont genes (Grillo, et al., 2016). These SNP variants may provide evidence for the morphological changes seen across the red clover populations. This is the first detailed account of outlier variants found in red clover, and what is clear from the evidence is that there are few outlier loci, which are under selection for growth parameters. There were no loci reported for flowering time, this was surprising as there was considerable variation in the time to flowering in the accessions. It remains to be seen if the differences in growth seen amongst red clover ecotypes has been due to environmental selection pressure or due to enhanced breeding and local use. Haplotype reconstruction of outlier region The haplotype reconstruction PCA (Figure 6) clearly depicted the variation in the SNP in this region for the Iranian accessions. This was especially true for Aa3507, which had the most prominent FST values for a mildew resistance gene. This gives further evidence of this region of chromosome 7 to have undergone some selective divergence in the Iranian accessions. It remains to be investigated if this is a true region of adaptive variation. Divergent evolution generally occurs in the presence of gene flow (Wu, 2001), and population differentiation may occur in the face of gene flow if adaptively driven, which results in local adaptation and reduced gene flow between populations (Schluter, 2009). Populations in different environments may initially differ by only a few genomic sites, and the surrounding DNA may differ due to linkage disequilibrium. Linkage disequilibrium Linkage disequilibrium (LD) may be used to discover past evolutionary and population structure changes (Flint-Garcia, et al., 2003). It is affected by population size, genetic drift, population admixture, and mating system (Flint-Garcia, et al., 2003), and marker system (Stich, et al., 2006). Obligate outcrossing produces many recombination events which causes LD to decay rapidly (Flint-Garcia, et al., 2003). The analysis presented here into LD in red clover indicated a moderate LD shown by r2=0.03 in the varieties and a low LD in the ecotypes r2 = 0.007 average. Other investigations into LD in other outbreeding crop species have revealed moderate LD in cauliflower (Matschegewski, et al., 2015) (r2 = 0.06) and maize (r2=0.07) (Hoyle, et al., 2007). However, the LD measures in this study are not consistent with the red clover population studied for the genome assembly (De Vega, et al., 2015). Here LD varied between 0.15 – 0.25 in the seven chromosomes of a synthetic population with multiple parents. This high level of LD may have occurred as a result of reduced genetic diversity during the breeding process. Breeding populations are typically small, and by their nature selective for such things as flowering time. The moderate r2 value for the varieties studied here, suggest that they are not entirely related, this is reflected in the AMOVA and PCA. So it is not surprising that the recorded LD is much lower than the analysis from the genome assembly. However, the higher estimation of LD in the varieties over that of the ecotypes may also be as a result of population size and genetic drift during domestication and breeding, smaller populations will tend to increase the LD (Bouchet, et al., 2012).