^{1}

^{2}

^{*}

^{1}

^{2}

Edited by: Mariza De Andrade, Mayo Clinic, United States

Reviewed by: Lucas Lodewijk Janss, Aarhus University, Denmark; Kui Zhang, Michigan Technological University, United States

This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Genome-wide association mapping (GWA) has been widely applied to a variety of species to identify genomic regions responsible for quantitative traits. The use of multivariate information could enhance the detection power of GWA. Although mixed-effect models are frequently used for GWA, the utility of _{d}), and missing proportion of phenotypic records (_{prop}). Simulation results showed that, when _{prop} was low, the multivariate _{d} differ, and as _{prop} increased, the multivariate _{d} increased. These observations were consistent with results of the analytical evaluation of the _{prop} was at the maximum, i.e., when no individual had phenotypic values for multiple variates, as in the case of meta-analysis, the multivariate _{d} increased. Although using multivariate information in mixed-effect model contexts did not always ensure more detection power than with univariate tests, the multivariate

Genome-wide association mapping (GWA) has been widely applied to humans, animals, and plants to identify genomic regions responsible for quantitative traits, which has been made feasible by decreases in the cost and time required to obtain genome-wide single-nucleotide polymorphisms (SNPs) and sequences. Whereas various statistical methods have been proposed for GWA, particularly in recent animal and plant studies, mixed-effect models are often used to correct population stratification (e.g., Zhao et al.,

In contrast, biological data are usually multivariate. For example, phenotypes are often measured for multiple traits. Phenotypes may be measured on multiple occasions or in different locations and/or by multiple individual using a variety of methods. For plants, multi-year and/or multi-environment evaluation is often conducted to understand the complex reactions of plants to environmental stimuli, which can be observed as genotype-by-environment interactions. Moreover, meta-analysis or phenotypes belonging to different groups of individuals, for example, sex, age, or geographical populations, can be considered as multivariate with the phenotypic record of each sample consisting of only a single variate.

Thus, the use of multivariate information to enhance the detection power of GWA is straightforward. Indeed, various methods for multivariate GWA or quantitative trait locus (QTL) mapping have been proposed (Piepho,

In the present study, we formulated the

GWA is often conducted assuming the following univariate mixed-effect model,

where

where

where

Here,

where

A multivariate mixed model can be written as

Here, _{m} is the vector of response variables of length _{m} is the matrix of covariates, including the intercepts, SNP genotypes, and so on. For example, when

_{m} is the vector of fixed effects of length _{m} is the design matrix, which for the example of

_{m} and _{m} are the polygenic effects and residuals, respectively, and

where ⊗ indicates the Kronecker product and

where

In the

Here,

We assume that when the fixed effect of interest is 0, the _{m} includes missing records, the dimensions of _{m}, _{m}, and _{m} and the denominator of _{1} and μ_{2} are the intercepts and _{1} and _{2} are the effects of a SNP,

We refer to this test as the multivariate

Korte et al. (

Genome-wide markers were generated using a coalescent simulator, Genome (Liang et al.,

We selected four SNPs per chromosome randomly as QTLs (i.e., a total of 20 QTLs). Among the QTLs, one QTL was randomly selected as a “target QTL,” and the remaining ones were used as “background QTLs.” For the background QTLs, the additive genetic effects were simulated using a multivariate normal distribution,

where _{m,bQTL} is a vector of background QTL effects with length

For the target QTL, the QTL effect on the first variate was generated from the standard normal distribution,

and then the effect on the

where _{j} is a real value between −1 and 1. We investigated the detection power of the multivariate _{j} and _{m}, were generated by summing all of the products between the QTL effects and QTL genotypes. The residuals of individual

where _{m}). Thus, the heritability was 0.5 throughout the simulation. The phenotypic correlation was largely

We set

When _{d} was −1, −0.95, −0.9, −0.8, …, 0.8, 0.9, 0.95, and 1. When _{d} was 0, 0.1, 0.2, …, 0.9, 0.95, and 1. Then, _{j} (_{d} with constant intervals. For example, when _{4} = 0.4, _{2} = 0.8 and _{3} = 0.6, respectively.

We did not assess the detection power of _{d}, a wide range of _{d} values are obtained. For example, when we draw random values from

Approximately 80% of variable pairs are represented with _{d} < 0.9, and even 10% of pairs show _{d} < 0. However, as we show later, _{d} has an impact on the detection power of multivariate

Missing phenotypic values were randomly generated, such that every individual had a phenotypic value for at least one variate. Thus, when _{prop} was set to 0, 0.125, 0.25, 0.375, and 0.5; when _{prop} was 0, 0.188, 0.375, 0.563, and 0.75; and when _{prop} was 0, 0.219, 0.438, 0.656, and 0.875. Note that when _{prop} is the maximum value, every individual has a phenotypic value only for a single variate. When _{prop} > 0, _{d} was 0, 0.1, 0.2, …, 0.9, 0.95, and 1 when _{d} and _{j} (_{prop} = 0.

In the analysis of each scenario, we used SNPs that were not selected as QTLs to construct the genomic relationship matrix for the polygenic effects. Then, the effect of the target QTL was tested. Principal components were not added. Other than the QTL genotypes, only intercepts for each variate were added to the model as the fixed effects.

The univariate and multivariate

We assessed the power of the multivariate

When _{d} and _{d} and

where _{2},

Mean difference of –log10_{d} denote the number of variates, phenotypic correlation, and the relative size of QTL effects between variates, respectively. Missing proportions of phenotypes are zero. For the results when

Distribution of _{2}, _{2},

The results when _{d} are near zero (_{j} (_{d} with constant intervals (see Materials and Methods). For example, when _{4} = 0, we set _{2} and _{3} to 0.666 and 0.333, respectively. This means that the QTL effects on the first to fourth variates become _{tQTL,1}, 0.666_{tQTL,1}, 0.333_{tQTL,1}, and 0, respectively. If we focus on variates 1 and 2 and examine the heat map for _{d} value (0.666), the multivariate _{j} will take various values when

For each _{d} where the multivariate _{prop} = 0.656 because of a failure of variance component estimation in 98.1% of analyzes. We return to this issue later. When _{prop} was the maximum, i.e., when every individual had a phenotypic value only for a single variate, the superiority of the multivariate _{d}, i.e., as _{d} increased, the test outperformed the univariate one. However, it was also observed that the gain in–log10_{prop} was the maximum can be illustrated analytically. Suppose that

Mean difference of –log10_{d} denote the number of variates, phenotypic correlation, and the relative size of QTL effects between variates, respectively. _{prop} denotes the missing proportion of phenotypes. At each _{prop} is the maximum (i.e., each individual has a phenotypic record only for a single variate) are presented in the last column. The results when _{prop} = 0.656 (the third column on the bottom row) are not shown because of the high frequency of failure in variance component estimation. For results when

Then, the phenotype (co)variance matrix appearing in equation (2) can be written as

where _{X} and _{Y} are the identity matrices for groups _{m} resulting in:

The parameter

To examine the multivariate ^{2} < 0.1 with any QTLs;

A major drawback of the multivariate _{prop} increased. We observed that the solver we used (remlf90) occasionally failed to converge within the default iteration number (5,000) or to return a positive-definite matrix. As mentioned above, when _{prop} = 0.656, the estimation failed in most cases in this scenario. In other scenarios, the frequencies of failure were 0.004, 0.054, and 0.081 when ^{3}. Considering this computational difficulty, meta-analyzes will be an attractive alternative to use multivariate information. For example, TETAS can combine

As mentioned in the Materials and Methods, explicitly, the statistics in equations (1) and (2) do not follow the

When missing records are included, both the multivariate and univariate

The author confirms being the sole contributor of this work and has approved it for publication.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Dr. Mai Minamikawa and Mr. Kosuke Hamazaki for their helpful comments on the draft of this manuscript.

The Supplementary Material for this article can be found online at: