^{*}

Edited by: Han Mulder, Wageningen University & Research, Netherlands

Reviewed by: Zhe Zhang, South China Agricultural University, China; Xiangdong Ding, China Agricultural University (CAU), China; Mario Calus, Wageningen University & Research, Netherlands

*Correspondence: Patrik Waldmann,

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (^{2}^{2}^{2}^{2}^{2}

At the heart of classical quantitative genetics is linear model theory (

One of the most important factors is the number of markers _{OLS}^{T}^{-1}^{T}

We can also obtain the variances (diagonal terms) and covariances (off-diagonal terms) of the regression coefficients as ^{T}^{T}

Although the number of genotyped individuals is generally increasing, the experimental setting in genomic prediction is often that ^{T}_{p}^{T}

Another interesting feature of RR appears when considering the MSE. In general, for any estimator of a parameter ^{T}_{p}^{T}^{-1}

RR can be written as an optimization problem _{2}-norm. The first term is the loss function and the second term the penalty. By changing the penalty into an _{1}-norm, we end up with

In order to determine the best model, it is important to find a good measure of the lowest test error, because the training error will decrease when more variables or parameters are added to the model. There are a number of approaches (e.g., Mallows’ _{P}_{P}

An alternative approach is to use cross-validation (CV). There are several variants of CV, but the general idea is to average MSE over some sets of hold-out test data (

for model evaluation in genome-enabled prediction. The use of ^{2} for model comparison has been questioned, see for example ^{2}

It is also possible to assess the goodness of fit of the models using the coefficient of determination ^{2}^{2}^{2}

In a recent publication (^{2}^{2}

In our previous paper (^{2}^{2}^{2}^{2}^{2}^{2}

Mean squared error (MSE), predictive correlation accuracy (^{2}^{2}

Method | MSE | ^{2} |
^{2} |
COV[ |
VAR[ŷ] |
---|---|---|---|---|---|

RR | 83.07 | 0.300 | 0.291 | 32.22 | 29.54 |

LASSO | 65.73 | 0.460 | 0.439 | 44.30 | 36.41 |

ALASSO | 64.52 | 0.455 | 0.449 | 50.68 | 48.17 |

Ranking of individuals in terms of breeding values and predicted phenotypes is important in breeding. The order of the 10 best individuals differs not only between the RR, LASSO and ALASSO, but also within each model when min MSE and max ^{2}

Ranking of the 10 best individuals from the simulated QTLMAS2010 data based on ^{2}

Rank | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Metdod/selection statistic | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

RR/min[MSE] | 2,586 | 2,772 | 2,977 | 3,050 | 3,195 | 3,056 | 2,756 | 2,738 | 2,821 | 3,184 |

RR/max[^{2} |
2,586 | 2,772 | 3,195 | 2,977 | 3,050 | 3,184 | 2,589 | 2,821 | 2,756 | 2,738 |

LASSO/min[MSE] | 2,967 | 2,820 | 2,586 | 2,809 | 3,050 | 2,977 | 3,195 | 2,582 | 2,688 | 2,765 |

LASSO/max[^{2} |
2,967 | 2,820 | 2,809 | 2,688 | 2,582 | 2,586 | 3,195 | 3,050 | 2,977 | 2,972 |

ALASSO/min[MSE] | 2,820 | 2,582 | 2,586 | 2,809 | 3,050 | 2,832 | 3,195 | 3,006 | 2,589 | 2,817 |

ALASSO/max[^{2} |
2,820 | 2,582 | 2,809 | 2,586 | 3,050 | 3,195 | 2,832 | 3,006 | 2,817 | 2,972 |

The simulated dataset QTLMAS2010ny012.zip can be found in

The author wrote, read and approved the final version of the manuscript.

Financial support was provided by the Beijer Laboratory for Animal Science, SLU, Uppsala.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

^{2}