Principal component analysis applied to the study of yield and nutritional characteristics of forage cultivars

The objective of this study was to evaluate the importance of various yield and nutritional characteristics for the differentiation of forage cultivars using principal component analysis (PCA). Data were obtained from an experiment conducted with a complete randomized block design (RCBD) with 6 replications. Eleven cultivars of forage grasses of the species Urochloa brizantha , U. ruziziensis , Megathyrsus maximus , Cenchrus ciliaris , Andropogon gayanus and Setaria sphacelata were evaluated. For yield characteristics, PCA revealed that the first 3 components explained 82.0% of total variation between forage cultivars. Similar results were observed for nutritional characteristics with the first 3 components explaining 91.4% of total variation in leaf chemical composition and 83.8% of variation in stem chemical composition. Variables that contributed most to discrimination between forage cultivars were: number of tillers per plant; number of leaves per plant; median leaf width; stem dry matter yield; leaf:stem ratio; % dry matter, % crude protein (CP) and % neutral detergent fiber of leaves; and % CP, % ether extract and % acid detergent fiber of stems. PCA was effective in identifying the key parameters that need to be measured in evaluating grass species and allowed a reduction in the number of yield and nutritional characteristics to be assessed in experiments designed to evaluate forage cultivars. This reduced both the workload and the costs involved while still allowing valid conclusions.


Introduction
For animal production to be economically viable, pasture management must be a tool for increasing profits. Therefore, the analysis of structural and yield characteristics and/or chemical composition aiming to compare the performance of forages between and within different genera is important for the selection of cultivars (Luna et al. 2016;Silva et al. 2016). It is expected that the superiority or inferiority of a cultivar over others is maintained over time (Martuscello et al. 2015). However, in analyses involving a great number of variables, many do not contribute to the discrimination between individuals, either because they are invariable or because they are redundant due to the correlation with other variables in the analysis. Thus, it is necessary to identify characters that contribute little to the discrimination between individuals in order to discard redundant and difficult-to-measure characters and, consequently, reduce the time, labor and cost of measurements during experiments (Cruz et al. 2011;Jolliffe 1972).
One method for reducing the dimensionality of variables in agricultural experiments is principal component analysis (PCA) (Jolliffe 1973). This multivariate analysis methodology consists of obtaining a new set of variables, i.e. the principal components, resulting from a linear combination of the original variables measured in the experiment. The components obtained are independent of each other and are considered to provide an acceptable estimate of the total variation contained in the complete data set. By using the relative importance of each principal component and weighting coefficients of the original variables in these components, it is possible to assess the contribution of each variable towards the total variation among individuals (Cruz et al. 2012;Hongyu et al. 2016).
The efficiency of using PCA in reducing the dimensionality of variables and discrimination between genotypes was evidenced, among others, by Rêgo et al. (2003) and Matias et al. (2020). This method of analysis has been used in several areas of knowledge, such as fruit production (Souza et al. 2017), soil (Gama-Rodrigues et al. 2018, poultry farming (Traldi et al. 2018), sheep farming (Silva Filho et al. 2019) and forages (Castañeda-Pimienta et al. 2017;Moreira et al. 2018). Da Silva and Sbrissia (2010) emphasized the potential of PCA for the interpretation of experimental data for forage species. It makes it possible to obtain conclusions similar to those obtained by conventional univariate techniques. The advantage is to reduce the number of variables measured to a few principal components, thereby reducing the workload. Gallo et al. (2013) reported that multivariate analysis, such as PCA, helps in studies evaluating forage cultivars, as it can discriminate between qualitative and yield characteristics, since there is a complex correlation between nutritional value and yield.
Given the above, the objective of this study was to evaluate the importance of yield and nutritional characteristics for the differentiation between forage cultivars using principal component analysis in order to determine the appropriate parameters to measure to obtain a valid comparison.

Materials and Methods
The study was carried out in Barra, BA, Brazil (11º05'20" S, 43º08'31" E; 406 masl). The climatic type is BSh (semi-arid region), with an average annual temperature of 25.7 ºC and an average annual rainfall of 649 mm (INMET 2018).
The data were obtained from an experiment in a greenhouse with a complete randomized block design and 6 replications (pots). Eleven cultivars of forage grasses were evaluated: Urochloa brizantha (syn. classified as a Quartzarenic Neosol of medium texture, with the following chemical characteristics in the 0-20 cm layer: pH in CaCl 2 = 6.1; P = 44.2 mg/dm 3 ; Ca+Mg = 4.10 cmolc/dm 3 ; K = 0.37 cmolc/dm 3 ; Al = 0.05 cmolc/ dm 3 ; H+Al = 1.10 cmolc/dm 3 ; clay = 105 g/dm 3 ; silt = 25 g/dm 3 ; and sand = 870 g/dm 3 . To obtain experimental units, 10 seeds were sown in each pot. After 21 days, seedlings were thinned to leave 5 plants per pot. Thirty-three days later, a second thinning was performed to leave 3 plants in each pot. Following thinning, a uniform cutting of all plants was performed at 5 cm from ground level. After a further 41 days of regrowth, measurements were made on all plants before harvesting, separating into leaf and stem, determining dry matter (DM) yield and assessing chemical composition prior to data analysis.
The following cultural treatments were performed based on the results of soil chemical analysis. At sowing, 30.4 g phosphorus (P), 368 mg nitrogen (N) and 448 mg potassium (K) were applied to each pot. After the first and second thinnings, 768 mg N and 832 mg K were applied on each occasion to each pot. Sources of P, N and K were simple superphosphate, urea and potassium chloride, respectively. Soil was kept at field capacity through manual irrigation.
For comparison between cultivars, yield and nutritional characteristics of each cultivar were determined. Yield characteristics were: plant height (PH), corresponding to height of curvature of leaves (average height of the canopy) around a rule graduated in cm; number of tillers per plant (NTPP), i.e. average number of tillers for the 3 plants in each pot; length of expanded leaves (LEL), distance between the apex and the leaf ligament; number of leaves per plant (NLPP), sum of emerging, completely expanded, senescent and dead leaves per plant; median leaf width (MLW), width (cm) of the median leaf area measured using a graduated rule; diameter of the median internode (ID), diameter (mm) of the internode of the median stem region determined using a pachymeter; leaf dry matter yield (LDMY), sum of the leaf mass of the 3 plants in each pot; stem dry matter yield (SDMY), stem mass of the 3 plants in each pot; and leaf:stem ratio (L:S), the ratio between dry matter mass of leaves and mass of stems.
For nutritional characteristics, chemical compositions of leaf and stem (stem and leaf sheath) were analyzed separately. Samples of leaves and stems were packed in paper bags, weighed and dried in an oven with forced-air ventilation at 55 ºC for 72 hours. Then, they were ground in Willey knife mills, sieved using a 1-mm sieve and stored in closed containers for further chemical analysis. DM concentration was determined by the INCT-CA method G-003/1; mineral matter (MM) by the INCT-CA M-001/1 method; total nitrogen and crude protein (CP) by the INCT-CA N-001/1 method; ether extract (EE) by the INCT-CA method G-004/1; neutral detergent fiber (NDF) by the INCT-CA method F-002/1; and acid detergent fiber (ADF) by the INCT-CA method F-004/1. These methodologies have been described by Detmann et al. (2012).
Principal component analysis was performed according to the procedures presented by Cruz et al. (2012) based on the standardization of original data. The correlation matrix was used as the basis for obtaining the components. For the identification of variables that could be discarded, the criteria proposed by Jolliffe (1972) and corroborated by Jolliffe (1973) were adopted. According to these criteria, the number of variables to be discarded corresponded to the number of principal components with an eigenvalue below 0.7. For these components, the variable with the highest weighting coefficient, in absolute value, was discarded because variables with the highest coefficient in components with eigenvalues below 0.7 contribute little to discrimination between individuals. For cases in which the variable with the highest coefficient had already been discarded with another component, it was decided not to discard it again. Analyses were performed in the software GENES -Computational for genetics and statistical analyses (Cruz 2013).

Results
Tables 1 and 2 show the principal components and their respective eigenvalues and percentages of explained variance, as well as the accumulated variance for yield and nutritional characteristics, respectively.
To determine variables to be discarded, the character with the highest coefficient, in absolute terms, in the last principal component was identified and then in components of immediately higher variance up to the one whose eigenvalue did not exceed 0.7. Table 1 shows that of the 9 principal components obtained for yield characteristics, 5 had an eigenvalue lower than 0.7. For nutritional characteristics, 3 of the 6 principal components for chemical composition of both leaf and stem presented an eigenvalue below 0.7 (Table 2).
For yield components, variables identified for discarding, in order of lesser importance for differentiation between cultivars, were: leaf dry matter yield (LDMY); plant height (PH); length of expanded leaves (LEL); and diameter of median internode (ID) ( Table 3). In the sixth principal component, no characteristic was eliminated, since LDMY had already been eliminated in the ninth component. According to weighting coefficients shown in Table 4, recommended nutritional variables for discard were: ADF and EE for leaves; NDF and DM for stems; and MM for both. Thus, among the 21 variables analyzed, 11 were considered relevant to use to distinguish among the evaluated genotypes. The other variables were discarded due to their low contribution to the total variation between individuals. One of the reasons for this low contribution is the high association with other variables (Table 5 and Table 6).  Table 5 indicates that yield characteristics recommended for discard showed significant and expressive correlations with the remaining variable MLW. For nutritional characteristics, among the variables recommended for discard, all variables except leaf MM and stem DM showed significant correlations with the remaining variables (Table 6).

Discussion
The relative importance of a principal component is assessed by the percentage of total variance it explains, which decreases from the first to the last component, i.e. the last component is responsible for explaining a minimum fraction of the total variance available (Cruz et al. 2011). For yield characteristics, PCA revealed that the first 3 components explained 82.0% of the total variation between forage cultivars. Similar results were observed for nutritional characteristics, with the first 3 components explaining 91.4% of total variation for leaf chemical composition and 83.8% for stem chemical composition.
In evaluation of nutritional divergence among Brachiaria ruziziensis (now Urochloa ruziziensis) clones carried out by Moreira et al. (2018), only the first 2 principal components were needed to explain 96.2% of variation between genotypes. Castañeda-Pimienta et al. (2017) evaluated agronomic characteristics of 6 accessions of 4 Brachiaria species (which now belong to the genus Urochloa) and observed that 90.5% of variation in the data set was explained by the first 3 components. By contrast, Daher et al. (1997) analyzed accessions of elephant grass (Pennisetum purpureum, now Cenchrus purpureus) and found that at least 7 principal components were required for the percentage of total variance explained to exceed 80.0%.
A significant benefit of the principal component technique is reduction of the dimensions of the data set, while retaining maximum variability and using a low number of principal components. This number of components varies according to the researcher's interest (Da Silva and Sbrissia 2010). When aiming to determine genetic divergence by graphically dispersing accessions in a two-dimensional space using scores, Daher et al. (1997) found that the first 2 components explained at least 80.0% of total data variation. However, when this level was not reached by incorporating data from the first 2 components, Cruz et al. (2012) proposed complementing the analysis with the graphic dispersion of the third and fourth components. Thus, it appears that the analysis of principal components can be efficient in summarizing the total variance of a data set, allowing, if needed, the analysis of diversity between cultivars using graphic dispersion.
By analyzing the importance of 22 variables for the study of genetic diversity between accessions of elephant grass, Daher et al. (1997) identified that 8 variables were sufficient to discriminate between accessions. Strapasson et al. (2000) analyzed 58 botanical-agronomic descriptors for the characterization of accessions of Paspalum guenoarum and Paspalum plicatulum and concluded that there is no need to work with an excessive number of descriptors, since 86.0% of them were non-discriminant. Cruz et al. (2012) also reported that some characteristics can have minor importance because they are correlated with others considered in the study or because they do not vary among the evaluated genotypes.
According to Daher et al. (1997), the efficiency of PCA in comparing accessions and the criteria adopted for discarding variables are debatable because of the possibility of eliminating variables that have considerable weights in the first components. In our study, LDMY was strongly associated with the first principal component and was recommended for discard (Table 3). However, it is noteworthy that this variable presented a correlation coefficient of 81.0% with MLW, one of the remaining variables. An analogous case is observed for stem NDF, which was recommended for discard and had a significant weight in the first principal component but had a high correlation with stem ADF (Table 6).

Conclusions
PCA proved effective and allowed a reduction in the number of yield and nutritional characteristics which need to be measured in experiments designed to evaluate forage cultivars. Based on our findings, collection of data for PH, LEL, ID and LDMY; MM, EE and ADF% of leaves; and DM, MM and NDF% of stems is unnecessary. This can result in considerable savings in time and resources in forage evaluation without a significant loss of information. The variables that contributed most to discrimination between forage cultivars were: NTPP, NLPP, MLW, SDMY and L:S; plus DM, CP and NDF% of leaves; and CP, EE and ADF% of stems.