Plant material, field experiment and phenotypic data evaluationThe association mapping panel comprised of 96 diverse genotypes of lentil (Table 1) include advanced breeding lines, released varieties, and exotic germplasm lines of Mediterranean region.
The experiment was conducted at three diverse geographical locations Delhi (28°63??24? N, 77°15??14? E, 218 m AMSL), Indore (22°42??30? N 75°53??32? E, 560 m AMSL) and Dharwad (15°29’3” N, 74°58’30” E; 681 m AMSL) during the winter season of 2014–2015. The experimental material was planted in a randomized complete block design in two replications. Each genotype was planted in 3 rows with 5 × 30 cm spacing having 4.0 m row length. All the experiments were managed using recommended agronomic practices across the locations and normal healthy crop was raised.
The phenotypic data of each genotype for grain size and weight traits for AM panel was collected for 3 locations. Similarly protein content was also estimated in these samples. Data for all these traits was measured in three replicates. Seed size (mm) was determined by measuring the size of 10 seeds using Vernier Caliper and the average value was used for the analysis. For average seed weight (g), a random sample of 100 seeds was taken for each genotype. Further, protein content was estimated from 100 g sample of each genotype.
XLSTAT tool (http://www.xlstat.com/en/) was used for the statistical analysis. Genomic DNA extraction and SSR amplificationPooled leaf samples (~5 g) were collected at three weeks after sowing from five plants of each genotype and genomic DNA was isolated using modified CTAB method (Murray and Thompson 1980).
The quality and quantity of DNA estimated using a spectrophotometer, and the samples were diluted to 10 ng per ?L. The PCR was performed in a 20 ?L volume which consisted of 10× buffer (100 mM Tris-HCl, 15 mM MgCl2, and 500 mM KCl); 0.5 ?M each of forward and reverse primers from Sigma-Aldrich (Spruce Street, St. Louis, USA); 200 ?M of each dNTP; 1 U of Taq DNA polymerase and 40 ng genomic DNA as template. Thermal-cycling was carried out in 96 blocked VeritiTM thermal cycler (Applied Biosystems, Life Technologies, Singapore) having one denaturation cycle at 94°C for 4 min, followed by 30 cycles of 94°C for 1 min, annealing at 59°C–62°C (primer specific) for 30 sec, extension at 72°C for 1 min, and a final extension at 72°C for 10 min. The amplified products were electrophoresed at 100 V in 1×TBE buffer for 3 h in 3% MetaphorTM agarose gels (Lonza, Rockland, ME USA). The gel was stained using Ethidium bromide and visualized using CCD camera attached to a gel documentation system (Alpha Imager) at 260 nm.
Sixty genomic SSRs (Saha et al., 2010; Hamwieh et al. 2005; 2009) and 260 EST-SSRs (Jain et al. 2013; Kaur et al. 2011; 2014) were used for polymorphism survey. A total of 73 SSRs including 20 genomic and 53 EST-SSRs exhibiting polymorphism were utilized for the study. Diversity Analysis and Population StructureThe PCR amplified products were scored manually based on 50 bp DNA ladder.
The binary matrix was transformed to generate a genetic similarity matrix using Jaccard’s coefficient. The dendrogram based on the unbiased genetic distances among genotypes was constructed using un-weighted neighbour joining (UNJ) method using DARwin 5.0.145 software (http://darwin.cirad.fr/). Polymorphism information content (PIC) was computed using the formula PIC = 1-?Pi ???Pi Pj, where “i” is the total number of alleles detected for the SSR marker, “Pi” is the frequency of the i allele in the set of 96 genotypes investigated, and j = I + 1 (Botstein et al.
, 1980). For determining the genetic structure and numbers of subgroups in the population, model-based approaches were used. The STRUCTURE 2.3.4 program (Pritchard et al., 2000; Thronsberry et al., 2001) which uses the Bayesian clustering approach was used to identify the number of subpopulations by assuming prior values k=1 to 10. The data was analyzed at 250,000 run length, as the burning period length followed by 250,000 Markov Chain Monte Carlo iterations by keeping ? constant.
Each k value was run and repeated 10 times with values ranging from 1 to 10 using an admixture model and allele frequency was correlated for estimating the genomic proportions of diverse individuals. Structure harvester v 6.92 (Earl and Vonholdt, 2012) was used for obtaining the optimum k value determined by plotting the LnP(D) value against the given k value. Association Mapping and Favourable Allele MiningTo identify the associated markers for grain size, weight and protein contents, association analysis were performed using 73 polymorphic SSR markers. Using TASSEL 3.01 software, the LD values were estimated between each pairs of polymorphic loci by calculating the squared values of correlation coefficient (r2). The LD block is referred to as all pairs of adjacent loci that are present in LD within a chromosome or linkage group (Stich et al., 2005).
The LD between each pair of polymorphic loci and significance of LD coefficient was estimated using 10,000 permutations using TASSEL 3.01 software (Bradbury et al., 2007).
The General Linear Model (GLM with Q matrix) was used to study the association between various grain parameters and the associated SSR markers. A LD plot with p and r2 values was generated to represent the overall LD among all the polymorphic SSR markers. Further, average r2 and percent of observations were estimated over all the pairwise comparisons and significant association between the marker locus and grain parameters were determined based on p (probability) and r2 values (Pritchard and Przeworski 2001).
The markers which are found significantly associated with the grain traits are then represented in the Manhattan plots, implemented in TASSEL. Also, the distribution of p values of all the polymorphic SSRs was generated using Manhattan plots (Bradbury et al., 2007). Quantile-quantile (QQ) plots of the expected and observed p values were generated to evaluate the adequacy of controlling Type I error.