Establishment of a core collection of Cynodon based on morphological data

In a field plot study conducted in Danzhou, Hainan province, China, a total of 537 wild Cynodon accessions from 22 countries and classified into 11 groups according to taxonomy and origin, were characterized in terms of 11 phenotypic traits in order to construct a core collection. For this, the optimal strategy was developed by screening within the following method levels: (i) 7 sampling proportions (5, 10, 15, 20, 25, 30 and 35%); (ii) 3 sampling methods (preferential sampling, deviation sampling and random sampling); (iii) 5 clustering methods [single linkage, completed linkage, median linkage, unweighted pair-group average (UPGMA) and Ward’s method]; (iv) 3 genetic distances (Euclidean distance, Mahalanobis distance and principal component distance); and (v) 3 sampling proportions within groups (simple, logarithmic and square root proportions). Mean difference percentage, variance difference percentage, coincidence rate of range and variation coefficient changing rate were the criteria adopted for evaluating how well the core collection represented the original collection. The correlation between the original and core collections was determined for comparison. The core collections were validated with the sample distribution diagram of the main components. Results showed that the optimal sampling method for constructing a Cynodon core collection was preferential sampling, the optimal sampling proportion being 20%. The optimal sampling proportion within groups was the square root proportion, the optimal genetic distance was Mahalanobis distance and the optimal clustering method was UPGMA. The proposed core collection of Cynodon is composed of 108 accessions; it was constructed following the optimal sampling strategy identified and retained the original collection ́s phenotypic diversity, phenotypic trait correlations and phenotypic group structure. Thus, this collection could be considered a representative sample of the entire resource.


Introduction
Cynodon Rich., a C4-type herbaceous genus of the family Poaceae, subfamily Chlorioideae, tribe Cynodonteae, is one of the main 3 warm-season turf grasses as well as a fine pasture (Harlan and de Wet 1969) that can be divided into 9 species and 10 varieties (Taliaferro 1995). Most Cynodon species originated from the African region; they are mainly distributed in warm and humid tropical or subtropical regions, with some in temperate regions (Rochecouste 1962;Harlan et al. 1970;Wofford and Baltensperger 1985). Two Cynodon taxa, Cynodon dactylon (L.) Pers. var. dactylon and C. transvaalensis Burtt Davy, are important to the turf grass industry (Kenworthy et al. 2007). C. dactylon var. dactylon is a cosmopolitan species with a latitudinal distribution range of 53° N-45° S. In terms of elevation, this species can grow from sea level to the Himalayas at an elevation of 4,000 m. As a result of its widespread distribution and prevalence, it is frequently referred to as 'common' Bermuda grass. C. transvaalensis, often referred to as African Bermuda grass, is indigenous to only the Transvaal region of South Africa (Harlan et al. 1970). Two species, namely C. dactylon and C. radiatus Roth, plus an as yet undetermined Cynodon taxon, exist in China. Domestic and foreign scholars have made progress in the collection and genetic diversity evaluation of Cynodon germplasm resources (Wu et al. 2004(Wu et al. , 2006Wang et al. 2009;Tiwari et al. 2016;Zheng et al. 2017). These studies provided a good basis for the construction of a core collection of Cynodon.
Currently, core collections are mainly concentrated on crops or horticultural plants (Ortiz et al. 1998;Li et al. 2002;Martínez et al. 2017). Few studies have focused on pasture species with constructed core collections being mainly Lolium perenne, perennial Medicago, annual Medicago, Cajanus cajan and Pennisetum glaucum (Basigalup et al. 1995;Charmet and Balfourier 1995;Diwan et al. 1995;Reddy et al. 2005;Bhattacharjee et al. 2007). However, few investigations have involved a Cynodon core collection. Anderson (2005) constructed a C. dactylon core collection of 169 genotypes with the phenotypic data for 598 C. dactylon samples. Jewell et al. (2011) performed EST-SSR amplification on the DNA of 690 Australian samples of C. dactylon. The SSR data obtained were combined with information on the geographical origin of the samples to construct the Australian C. dactylon core collection with the core material representing 13% of the original collection selected by the hierarchical cluster sampling method. Zheng et al. (2014) used phenotypic data for 831 C. dactylon samples to construct a preliminary core collection, including 208 genotypes.
In recent years, the Tropical Crops Genetic Resources Institute of the Chinese Academy of Tropical Agricultural Sciences has assembled a large array of Cynodon germplasm. The collected accessions were studied in terms of morphology, molecular biology, stress resistance, turf value, feed value and so on (Huang et al. 2013; and provided a broad range of material for systematic research and genetic breeding of Cynodon. Nevertheless, many investigations experience difficulties with preservation, evaluation and identification. There is an urgent need to examine existing germplasm resources to hasten the identification of particular germplasm for breeding studies. Construction of a Cynodon core collection is crucial for germplasm resource innovation, effective protection and utilization, species improvement and breeding of new varieties.

Plant material
The plant material used was 537 wild Cynodon accessions collected from 22 different countries and regions from 2006 to 2014 by the Prataculture Research Office in the Tropical Crops Genetic Resources Institute of the Chinese Academy of Tropical Agricultural Sciences (CATAS). These accessions included 476 C. dactylon, 59 C. radiatus, 1 C. transvaalensis and 1 as yet unidentified taxon (Cynodon sp., with 2-flowered spikelets). The sources are shown in Table 1.

Experiment setup
Plants were established in December 2013 in 1-m 2 plots in the experimental fields of the Tropical Crops Genetic Resources Institute of the Chinese Academy of Tropical Agricultural Sciences (CATAS), in Danzhou, Hainan, China. A randomized block design was used; each material was replicated 3 times.

Morphological traits
The morphological traits were measured, after standardization cuts in March 2014 and March 2015, on 2 occasions, i.e. June 2014 and June 2015. The following 11 traits were assessed: 7 quantitative traits (erect branch leaf length and width; stolon leaf length and width; stolon stem diameter and length; and turf height), which were measured 15 times in each plot to obtain mean values; and 4 qualitative traits (leaf hair; leaf posture; leaf color; and stolon stem color), which were measured 3 times (Huang et al. 2012). The measuring method was as follows: erect branch leaf length and width: the fourth mature leaf from the top to the base of the erect branch was randomly selected to measure the length and width of the leaf at the widest point with a Vernier caliper. Stolon leaf length and width: the fourth mature leaf from the top to the base of the stolon was randomly selected to measure the length and width of the leaf at the widest point with a Vernier caliper. Stolon stem diameter and length: the fourth stem of the stolon from the top to the base was selected to measure the diameter and length with a Vernier caliper. Turf height: the natural height of the accession was measured using a rule. Leaf hair, leaf posture, leaf color and stem color were measured by ocular estimate.

Screening of sampling method, sampling proportion, clustering method and genetic distance
The sampling methods compared were: preferential sampling; deviation sampling; and random sampling (Hu et al. 2000;Li et al. 2004). Sampling proportions from the entire collection were at 7 levels (5, 10, 15, 20, 25, 30 and 35%). Five clustering methods (single linkage method, completed linkage method, median linkage method, unweighted pair-group average method and Ward's method) (Ward 1963;Sibson 1973) were adopted. The genetic distances were: Euclidean distance; Mahalanobis distance; and principal component distance. Consequently, 315 Cynodon core subsets were constructed by combining 2 of 4 factors, i.e. sampling method, entire sampling proportion, clustering method and genetic distance. By evaluating the core subsets, the relative effectiveness of optimal sampling method, entire sampling proportion, clustering method and genetic distance for selecting a core collection from the entire collection was assessed.

Screening of sampling proportion within groups
The original germplasm samples, 537 Cynodon accessions, were first divided into 4 groups with the hierarchical grouping method, initially on the basis of botanical names: C. dactylon, C. radiatus, Cynodon sp. (unid.) and C. transvaalensis. These 4 major groups were further divided into 11 subgroups depending on geographical origin. The above screened sampling method, entire sampling proportion, clustering method and genetic distance with the original germplasm grouping were used to screen the 3 sampling proportions within groups (simple proportion, logarithmic proportion and square root proportion) for the construction of the core collection. The number of core materials of each subgroup was calculated. The specified quantity of core materials of the 11 groups was sampled with the screened sampling method, entire sampling proportion, clustering method and genetic distance to be evaluated. Variance analysis of the evaluation parameters was conducted to determine the optimal sampling proportion within groups.

Construction of Cynodon core collection
Phenotypic data for Cynodon were processed by the screened sampling method, entire sampling proportion, sampling proportion within group, clustering method and genetic distance and core materials were then extracted to construct a phenotypic core collection of Cynodon.

Evaluation of Cynodon core collection
To determine whether the constructed core collection retained the genetic diversity of the original collection and thus was representative of the original collection, 4 evaluation parameters (mean difference percentage, variance difference percentage, coincidence rate of range and variation coefficient changing rate; Hu et al. 2000) were selected. The core collection was considered representative/ effective only when the mean difference percentage was <20%, and the coincidence rate of range was ≥80%. Moreover, a small mean difference percentage resulted in a large variance difference percentage. The coincidence rate of range and the variation coefficient changing rate showed strong genetic diversity of the original collection. The 4 evaluation parameters were calculated using the following formulae (Hu et al. 2000): where: MD = the mean difference percentage, VD = variance difference percentage, CR = coincidence rate of range, VR = variation coefficient changing rate; i represents the ith trait; n is the total number of traits; St is the number of traits with significant difference when the t-test between core collection and original collection is performed; SF is the number of traits with significant differences when the F-test between core collection and original collection is performed; R C( ) is the range of the ith trait of the core collection group; R I( ) is the range of the ith trait of the original population; CV C( ) is the coefficient of variation of the ith trait of the core collection; and CV I( ) is the coefficient of variation of the ith trait of the original population.

Confirmation of Cynodon core collection
To evaluate the representativeness of the constructed core collection, principal component analysis of the original and core collections was conducted. Taking the first and second principal components as the horizontal and vertical axes, respectively, a 2D distribution diagram of their principal components was drawn to compare their population structures. The correlations of the 11 phenotypic traits of both the original and core collections were analyzed to decide whether the correlations of the traits in the original collection were retained in the core collection (Ortiz et al. 1998).

Data processing and analysis
Microsoft Office Excel 2007 was used to process the data. SPASS 19.0 was used for correlation analysis and principal component analysis.

Screening of sampling method, entire sampling proportion, clustering method and genetic distance
Without the grouping of the original collection, 3 sampling methods (preferential sampling, deviation sampling and random sampling), 7 entire sampling proportions (5, 10, 15, 20, 25, 30 and 35%), 5 clustering methods (single linkage, completed linkage, median linkage, UPGMA and Ward's method) and 3 genetic distances (Euclidean distance, Mahalanobis distance and principal component distance) were combined to construct 315 core subsets. These 315 core subsets were evaluated using mean difference percentage, variance difference percentage, coincidence rate of range and variation coefficient changing rate. The 4 evaluation parameters were subjected to analysis of variance, and the results are shown in Table 2.

Screening of sampling method
The constructed core subsets by the 3 sampling methods were comprehensively evaluated using the mean difference percentage, variance difference percentage, coincidence rate of range and variation coefficient changing rate ( Table 2). The mean coincidence rates of range of the constructed core subsets by preferential sampling and deviation sampling were 100%. The 3 other parameters were significantly better than that of deviation sampling and random sampling. Thus, we concluded that the optimal sampling method was preferential sampling. Table 2 shows that the average mean difference percentage range of the core subsets constructed by Euclidean distance, Mahalanobis distance and principal component distance was in the range of 2.86-4.16%. The average variance percentage range was 65.3-65.6%, the average coincidence rates of range values were all 100% and the average variation coefficient changing rate range was 136-158%. These 3 factors were not significantly different in the 4 evaluation parameters, so any of the 3 genetic distances could be used. Mahalanobis genetic distance was considered the optimal genetic distance.

Screening of clustering method
Table 2 also demonstrates that the minimum average mean difference percentage of the core subsets constructed with the 5 clustering methods was 0.87%, and the maximum was 4.33%. The value range of the average difference percentage was 62.3-68.8%, the average coincidence rates of range were all 100%, and the average range of variation coefficient changing rate was 135-160%. The 4 evaluation parameters of the 5 clustering methods showed no significant differences, so any of the clustering methods could be selected to construct the core collection. In this research we used the commonly used unweighted pair-group average (UPGMA) method as the optimal clustering method.

Preliminary screening of sampling proportion
The evaluation parameters of each of the core subsets exhibited large differences when different entire sampling proportions, namely, 5, 10, 15, 20, 25, 30 and 35%, were used to construct the C. dactylon core subsets (Table 2). Comprehensive analysis of the 4 evaluation parameters revealed that 5 and 10% were optimal for all 4 evaluation parameters, with no significant differences between them. Additionally, 15 and 20% showed significant differences from 5% only for the average variation coefficient changing rate. The entire parameter values of 25, 30 and 35% were inferior to the other 4 proportions. Theoretically, 5 and 10% of the original population should be selected as the entire sampling proportions but are rarely used in similar research.
Since 15 and 20% showed little significant difference from 5 and 10% we selected them as alternatives for further screening in subsequent tests.

Screening of sampling proportion within groups and entire sampling proportions
Sampling proportions within groups were set as simple proportion, square root proportion and logarithmic proportion, while the entire sampling proportions were 10 and 20%. We combined entire sampling proportions and sampling proportions within groups in 6 ways to screen and calculate the number of core collections to be extracted from each group. Table 3 shows that the numbers of Cynodon accessions contained in the 9th, 10th and 11th groups were 4, 1 and 1. To ensure that the special material from the original collection was maintained by the core collection, only square root proportion in the 3 kinds of sampling proportions within groups could extract core materials from all 3 groups. Thus, square root proportion was selected as the sampling proportion within groups. Moreover, 10 and 20% constructions and the evaluation parameters of the core subsets were compared. Table 4 demonstrates that the 2 constructions showed only small differences and differences were inconsistent, so 20% was used as the optimal sampling proportion.

Cynodon core collection
The phenotypic data for 537 Cynodon samples in the collection were analyzed using the above optimal core collection construction strategy for screening. The core materials were extracted to construct a core collection of 108 samples, which consisted of 89 C. dactylon, 17 C. radiatus, 1 Cynodon sp. (unid.) and 1 C. transvaalensis accession ( Table 5).

Evaluation of core collection
Correlation analysis of phenotypic traits. Correlation analysis on the 11 phenotypic traits of the original and core collections was conducted ( Table 6) and showed that 28 pairs of phenotypic traits in the original Cynodon collection were highly significantly (P<0.01) correlated and 4 pairs of phenotypic traits were significantly (P<0.05) correlated. In the core collection, 24 pairs of phenotypic traits were highly significantly (P<0.01) correlated and 4 pairs of traits were significantly (P<0.05)  correlated. Approximately 85% of the phenotypic trait correlations in the original collection was maintained in the core collection.

Principal component analysis
To further determine how representative the core collection was of the original collection, principal component analysis of the phenotypic data of the 2 groups was conducted (Table 7). Four principal components were selected with a standard eigenvalue >1. The eigenvalue, proportion and accumulating contribution rate of the principal components in both core and original collections were similar. The eigenvalues of principal components 1-3 were slightly higher than those of the original collection. The 2D scatter diagram (Figure 1), based on the core collection for the first and second principal components, showed that the geometrical shapes and features for the core collection's principal components were similar to those of the original collection. The collection materials of both groups were concentrated mainly in the left part of the scatter diagram. They were distributed sporadically in the lower part of the diagram, thereby indicating that the genetic structures of the original collections were retained well by the core collection.

Construction strategy of core collection
Core collection construction is a continuous sampling process employing a certain sampling strategy. A good sampling method can remove the genetic redundancy of the furthest group and maximize the variation in the core collection, while conserving the genetic diversity of the original group (Brown 1989). The main methods for constructing a core collection are: completely random sampling; grouping first and then completely random sampling; and clustering first and then sampling based on genetic structure (Spagnoletti Zeuli and Qualset 1993). Given the uneven genetic diversity of resource distribution, core collections with different diversity structures can result with random and cluster samplings (Spagnoletti Zeuli and Qualset 1993). Most studies show that the sampling effect of cluster sampling is better than that of random sampling (Spagnoletti Zeuli and Qualset 1987;Diwan et al. 1995). Cluster sampling can remove a certain proportion of samples from genetic material close to the genetic distance to reduce genetic redundancy and maintain the genetic structure of the original collection; thus, the sampling results of cluster sampling are better than those of random sampling. In the present research, 3 sampling methods (multiple clustering preferential sampling method, multiple clustering variability sampling method and non-clustered completely random sampling method) were compared. The multiple clustering preferential sampling method proved most suitable for Cynodon core collection construction. Sampling proportion is also a key step in constructing a core collection and is divided into the entire sampling proportion and sampling proportion within groups. High sampling proportion may result in samples with high redundancy, or important core collection materials may be lost (Brown 1989). In the construction of different plant core collections the selected proportion can range from 5 to 30% of the total collection of the species, but commonly about 10% is used (Lindroth et al. 2002). In this study, we used 20% as the sampling proportion for construction of the Cynodon core collection and compared 3 sampling proportions within groups (simple proportion, logarithmic proportion and square root proportion). Square root proportion proved to be the optimal proportion, which was in accordance with the screening results of Zheng et al. (2014), who used phenotypic traits to construct a C. dactylon preliminary core collection.

Evaluation of core collection
Correlation among phenotypic traits is an intrinsic characteristic of a species, which is the external manifestation of the correlation among genetic materials within the species. Sampling should not change the correlation among the intrinsic traits of the species; therefore, a well-selected core collection should maintain the correlations among the phenotypic traits of the original collection (Ortiz et al. 1998;Xu et al. 2006). Given that not all germplasm of the original population is held in the phenotypic core collection, and the degrees of freedom of the 2 differ, the correlation coefficients cannot be compared. To determine whether the significantly correlated traits in the original population were also significantly correlated in the phenotypic core collection we performed correlation analysis based on the 11 phenotypic traits of the original and core collections. The findings revealed that the core collection maintained more than 85% of the trait correlations of the original collection.
Principal component analysis is a data analysis technique that transforms multivariates and multiindexes into few variables and indexes; this technique is widely used in biodiversity research, including the construction of core collections (Zheng et al. 2014). The 2D scatter diagram based on the scores of the germplasm material in the first and second principal components closely reflected the distribution of the germplasm material in the original population, thereby intuitively reflecting the genetic structure of the original population. The distances among the germplasm materials in the scatter diagram reflect their genetic similarity, with close distance indicating high similarity, and long distance reflecting a high degree of dissimilarity. The high contribution rates of the first and second principal components resulted in acceptably accurate reflection of the genetic distribution in the original population. On the basis of a comparison of the 2D distribution diagrams of the principal components of both original and core collections, the core collection was shown to closely reflect the genetic make-up of the original collection.

Conclusions
Developing the optimal strategy regarding sampling method, entire sampling proportion and sampling proportion within groups, genetic distance and clustering method, a core collection of Cynodon, including 108 accessions, based on 11 phenotypic traits from 537 wild accessions, was constructed. This core collection is important for germplasm resources innovation, efficient conservation and utilization and should be useful for species improvement and breeding of varieties.