^{*}

Edited by: Guilherme J. M. Rosa, University of Wisconsin-Madison, United States

Reviewed by: Breno De Oliveira Fragomeni, University of Connecticut, United States; Nicolas Gengler, Gembloux Agro-Bio Tech, University of Liege, Belgium

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Diagonal elements of the coefficient matrix are necessary to calculate the genomic prediction accuracy. Here an improved methodology is described, to update the inverse of the coefficient matrix (

In the last decade, technological advances have significantly decreased genotyping costs, particularly for agricultural livestock and cropping species. This reduction in costs has enabled regular genomic best linear unbiased prediction (GBLUP) (VanRaden,

Here we propose a method to calculate the accuracy of new individuals, with and without phenotypes, by updating the coefficient matrix inverse (^{−1}) for new individuals only, without re-doing the whole population. Using this method, we significantly reduce time and computational demand by updating the accuracy of new individuals and reducing redundancy in the reference population.

We consider a simple animal model without fixed effects. This model is

where

where ^{−1} and the diagonal of ^{ii} is the diagonal element of C^{−1} for individual i, and g_{ii} is the diagonal element of

To calculate the accuracy of individuals with or without phenotypes, each individual can be added to ^{−1} separately. In this case, the partitioned matrix of MME (Equation 2) is

where subscript p and q are core individuals forming the reference population and new individuals respectively. New individuals may or may not have phenotypes.

As demonstrated in Equation (2), ^{−1} becomes ^{−1} becomes

based on Equations (2) and (3). Inverting

With ^{−1} = ^{−1} and ^{−1} is

where ^{−1} as (^{−1})^{−1} is used for simplification and is shown below in Equation (8). For the partitioned matrices in Equation (4), ^{−1} becomes

where ≈ is approximation sign. By using lemma (6) ^{−1} is not required and we only need to invert the middle matrix (^{−1} can be updated for each new individual using Cholesky decomposition and multiplying the Cholesky factors, i.e., ^{−1} = ^{−T}^{−1} (Harville,

Therefore, Equation (7) can be written as

based on Equation (8)

and

and ^{qq} which is the inverse of _{qq} becomes

where “—” is the right division sign (multiplying numerator by inverse of denominator) and

Only _{pq}, _{qq}, and

For animals without phenotypes ^{qq} is

In summary, Equations (14) and (15) can be used to calculate the prediction accuracies of individuals with and without phenotype, respectively.

Based on Equations (14) and (15) only

by regarding previous assumptions and Equation (8). ^{−1} is the largest matrix that was generated in the previous run and can be compressed and stored in binary format to avoid memory issues. The other matrices were small and can be built efficiently by using optimized Linear Algebra PACKage (LAPACK). The equations (14, 15, and 16) were implemented as an R function (

Matrices with seven columns representing seven single-nucleotide polymorphisms for each individual and 1000, 2000, 3000, … 24000, and 25000 rows were created and filled with 0 (AA), 1 (AB), and 2 (BB) randomly. The genomic relationship matrices (

To evaluate performance, each set was run in three steps. In the first step, the elapsed time to build the coefficient matrix by using the classic approach (i.e., inverting ^{qq} was measured. In the third step, the time to calculate c^{qq} by using the initial matrices was measured.

By calculating the accuracy of young individuals using Equations (14) and (15) computational times have been significantly reduced. Computational performance using this method is considerably faster, in comparison with existing methods, as shown in ^{−16}). The proposed approach using ^{qq} compared to when using the classic approach to calculate accuracies. This method can be extended in order to accommodate fixed effects and dense ^{qq} is updated. Furthermore, the part of ^{pp}) must be updated as more individuals are phenotyped.

The graph shows the elapsed time required to calculate c^{qq} using different approaches. ^{qq} for a new individual. ^{qq}^{qq} when there is or there is not phenotype for the individual, respectively. Their performances were very similar, and as such the lines overlap.

This method could be exploited within routine breeding value estimation for expidited accuracy calculations. Breeding value accuracy is based on an individual's relatedness to the core reference, such that high accuracy indicates high relatedness. This method to calculate accuracy will affect how the genotypes are used, based on how informative they are for the prediction, improving efficiency by reducing redundant information.

New individuals with phenotypes and low accuracy can be added to the core population, as it is likely these animals are lowly related. Their addition improves the diversity and informativity of the core reference population, and can further improve imputation accuracy of the missing genotypes, with added diversity into the imputation haplotype library. Individuals with high accuracy are not required to be added to the core, with or without phenotype, as their accuracy indicates their relatives are already included in this reference population, making their addition redundant. New individuals without phenotypes and low accuracy, should have relatives genotyped to improve accuracy and/or should have their phenotypes recorded to improve the core population.

It is possible to exploit the accuracy calculation as a type of quality control filter for population data, such that individuals with an expected level of relatedness to the reference population, obtains a low accuracy, this may be indicative of genotyping/sampling error, mis-assigned breed, etc. The rapid accuracy calculation for those individuals without phenotype can provide important context for quickly developing a phenotyping strategy.

Updating the inverse of

MF developed the method, structured the manuscript, and wrote the method and theory. NC wrote the introduction, result and discussion, and performed major revision. BT gave some comments to improve the final method and article.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Parts of this article has been published in proceeding of

R functions that show the prototype.