^{1}

^{1}

^{*}

^{1}

^{2}

^{3}

^{4}

^{4}

^{5}

^{4}

^{6}

^{4}

^{6}

^{7}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

Edited by: John Anthony Hammond, Pirbright Institute (BBSRC), United Kingdom

Reviewed by: Fabyano Fonseca Silva, Universidade Federal de Viçosa, Brazil; Gregor Gorjanc, University of Edinburgh, United Kingdom

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Network based statistical models accounting for putative causal relationships among multiple phenotypes can be used to infer single-nucleotide polymorphism (SNP) effect which transmitting through a given causal path in genome-wide association studies (GWAS). In GWAS with multiple phenotypes, reconstructing underlying causal structures among traits and SNPs using a single statistical framework is essential for understanding the entirety of genotype-phenotype maps. A structural equation model (SEM) can be used for such purposes. We applied SEM to GWAS (SEM-GWAS) in chickens, taking into account putative causal relationships among breast meat (BM), body weight (BW), hen-house production (HHP), and SNPs. We assessed the performance of SEM-GWAS by comparing the model results with those obtained from traditional multi-trait association analyses (MTM-GWAS). Three different putative causal path diagrams were inferred from highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 using the inductive causation algorithm. A positive path coefficient was estimated for BM → BW, and negative values were obtained for BM → HHP and BW → HHP in all implemented scenarios. Further, the application of SEM-GWAS enabled the decomposition of SNP effects into direct, indirect, and total effects, identifying whether a SNP effect is acting directly or indirectly on a given trait. In contrast, MTM-GWAS only captured overall genetic effects on traits, which is equivalent to combining the direct and indirect SNP effects from SEM-GWAS. Although MTM-GWAS and SEM-GWAS use the similar probabilistic models, we provide evidence that SEM-GWAS captures complex relationships in terms of causal meaning and mediation and delivers a more comprehensive understanding of SNP effects compared to MTM-GWAS. Our results showed that SEM-GWAS provides important insight regarding the mechanism by which identified SNPs control traits by partitioning them into direct, indirect, and total SNP effects.

Genome-wide association studies (GWAS) have become a standard approach for investigating relationships between common genetic variants in the genome (e.g., single-nucleotide polymorphisms, SNPs) and phenotypes of interest in human, plant, and animal genetics (Hayes and Goddard,

Complex traits are the product of various cryptic biological signals that may affect a trait of interest either directly or indirectly through other intermediate traits (Falconer and Mackay,

Although MTM-GWAS is a valuable approach, it only captures correlations or associations among traits and does not provide information about causal relationships. Knowledge of the causal structures underlying complex traits is essential, as correlation does not imply causation. For example, a correlation between two traits, T1 and T2, could be attributed to a direct effect of T1 on T2 or T2 on T1, or to additional variables that jointly influence both traits (Rosa et al.,

A SEM methodology has the ability to handle complex genotype-phenotype maps in GWAS, placing an emphasis on causal networks (Li et al.,

The analysis included records for 1,351 broiler chickens provided by Aviagen Ltd. (Newbridge, Scotland) for three phenotypic traits: ultrasound of breast muscle (BM) at 35 days of age, body weight (BW), and hen-house egg production (HHP), defined as the total number of eggs laid between weeks 28 and 54 per bird. The sample consisted of 274 full-sib families, 326 sires, and 592 dams. More details regarding population and family structure were provided by Momen et al. (

Each bird was genotyped for 580,954 SNP markers with a 600k Affymetrix SNP (Kranis et al., ^{−6}) were removed. The main reason for conducting the HWE test was to remove SNPs with potential genotyping error. Finally, 354,364 autosomal SNP markers were included in the analysis.

MTM-GWAS is a single-trait GWAS model extended to multi-dimensional responses. When only considering additive effects of SNPs, the phenotype of a quantitative trait using the single-trait model can be described as:

where _{i} is the phenotypic trait of individual _{j} = (_{1}, …, _{p}) is the number of A alleles (i.e., _{j}∈{0, 1, 2}) in the genotype of SNP marker _{j} is the allele substitution effect for SNP marker _{j}) is its standard error.

The single locus model described above is naive for a complex trait because the data typically contain hidden population structure and individuals have varying degrees of genetic similarity (Listgarten et al., _{i}, including a covariance matrix reflecting pairwise similarities between additive genetic effects of individuals, can be included to control population stratification. The similarity metrics can be derived from pedigree information or from whole-genome marker genotypes. This model, extended for analysis of

where

The positive definite matrix

where _{j} and _{j} = 1 − _{j} are the allele frequencies at marker locus

A SEM consists of two essential parts: a measurement model and a structural model. The measurement model depicts the connections between observable variables and their corresponding latent variables (Anderson and Gerbing,

SEM can be applied to GWAS as an alternative to MTM-GWAS to study how different causal paths mediate SNP effects on each trait. The following SEM model was considered:

where

The vectors

We considered the following GWAS models with their causal structures were recovered by the inductive causation (IC) algorithm (Pearl,

A direct SNP effect is the path coefficient between a SNP as an exogenous variable and a dependent variable without any causal mediation by any other variable. The indirect effects of a SNP are those mediated by at least one other intervening endogenous variable. Indirect effects are calculated by multiplying path coefficients for each path linking the SNP to an associated variable, and then summing over all such paths (Mi et al., _{BM}, ŝ_{BW}, ŝ_{HHP}], with corresponding standard errors and respective t-values.

In the SEM-GWAS formulation described earlier, the structure of the underlying causal phenotypic network needs to be known. Because this is not so in practice, we used a causal inference algorithm to infer the structure. Residuals are assumed to be independent in all SEM analyses, so associations between observed traits are viewed as due to causal links between traits and by correlations among genetic values (i.e., _{1}, _{2}, and _{3}). Thus, to eliminate confounding problems when inferring the underlying network among traits, we used the approach of Valente et al. (_{t} and _{t} and

Figure

Causal graphs inferred using the IC algorithm among three traits: breast meat (BM), body weight (BW), and hen-house production (HHP) in the chicken data. SEM-A75 and SEM-G75 were the inferred fully recursive causal structures with HPD > 0.75 and corrected for genetic confounder using A (pedigree-based) and G (marker-based) matrices. SEM-A85 and SEM-A95 were obtained with HPD > 0.85 and HPD > 0.95, respectively, corrected with A. Arrows indicate direction of causal relationships. Dashed lines indicate negative coefficients, and the continuous arrows indicate positive coefficients.

Given the causal structures inferred from the IC algorithm, the following SEM was fitted:

Note that only a small number of the entries in the structural coefficient matrix (λ in Equation 5) are non-zero due to sparsity. These non-zero entries specify the effect of one phenotype on other phenotypes. The corresponding directed acyclic graph is shown in Figure _{1}, y_{2}, and y_{3} represent BM, BW, and HHP, respectively; SNP_{j} is the genotype of the _{jl} is the direct SNP effect on trait l; and the remaining variables are as presented earlier. This diagram depicts a fully recursive structure in which all recursive relationships among the three phenotypic traits are shown. Arrows represent causal connections, whereas double-headed arrows between polygenic effects are correlations.

A diagram for causal path analysis of SNP effects in a fully recursive structural equation model for three traits, _{g}), polygenic effects (_{t}), environmental effect on trait _{t}), effects of _{j(yt)}), and recursive effect of phenotype

We examined the fit of each model implemented to assess how well it describes the data (Table

Model comparison criteria: logarithm of the restricted maximum likelihood function (log L), Akaike's information criteria (AIC), Schwarz Bayesian information criteria (BIC) were used evaluate model fit for two multiple trait models (MTM) and four structural equation models (SEM).

MTM-A | −7093.480 | −7105.48 | −7142.436 |

SEM-A75 | −7098.370 | −7110.415 | −7147.321 |

SEM-A85 | −7095.188 | −7107.188 | −7144.143 |

SEM-A95 | −7097.517 | −7109.517 | −7146.470 |

MTM-G | −6529.270 | −6541.276 | −6578.232 |

SEM-G75 | −6537.391 | −6549.391 | −6586.34 |

Table

Estimates of three causal structural coefficients (λ) derived from four different structural models.

λ_{BM → BW}(λ_{21}) |
2.13 | 2.19 | 2.14 | 2.14 |

λ_{BM → HHP}(λ_{31}) |
−0.17 | −0.280 | ||

_{λBW → HHP(λ32}) |
−0.27 | −0.096 | −0.31 |

Also shown in Table _{21} quantifies the (direct) causal effect of BM on BW. This suggests that a 1-unit increase in BM results in a λ_{21}-unit increase in BW. Likewise, the negative causal effects λ_{31} and λ_{32} offer the same interpretation.

We can decompose SNP effects into direct and indirect effects using Figure _{3} (HHP) is given by d_{SNPj → y3}: Ŝ_{j(y3)}, where d denotes the direct effect. Note there are only one direct and many indirect paths. We find three indirect paths from SNP_{j} to y_{3} mediated by y_{1} and y_{2} (i.e., the nodes formed by other traits). The first indirect effect is ind_{(1)SNPj → y3}: λ_{32}(λ_{21}Ŝ_{j(y1)}) in the path mediated by y_{1} and y_{2}, where ind denotes the indirect effect. The second indirect effect ind_{(2)SNPj → y3}: λ_{32}Ŝ_{j(y2)}, is mediated by y_{2}. The last indirect effect, is ind_{(3)SNPj → y3}: λ_{31}Ŝ_{j(y1)}, mediated by _{1}. Therefore, the overall effect is given by summing all four paths, T_{SNPj → y3}: λ_{32}(λ_{21}Ŝ_{j(y1)}) +λ_{32}Ŝ_{j(y2)}+λ_{31}Ŝ_{j(y1)}+Ŝ_{j(y3)}. The fully recursive model of the overall SNP effect is then:

For y_{1} (BM), there is only one effect, so the overall effect is equal to the direct effect. For y_{2} (BW) and y_{3} (HHP), direct and indirect SNP effects are involved. There are two paths for y_{2}: one indirect, ind_{Sj → y2}:Ŝ_{j(y1)} → y_{1} → y_{2}, and one direct, d_{Sj → y2}:Ŝ_{j(y2)} → y_{2}. Here, the SNP effect is direct and mediated thorough other phenotypes according to causal networks in SEM-GWAS (Figures _{3} into four direct and indirect paths is T_{Ŝj → y3}:λ_{32}λ_{21}Ŝ_{j(y1)}+λ_{32}Ŝ_{j(y1)}+λ_{31}Ŝ_{j(y1)}+Ŝ_{j(y3)}.

The scatter plots in Figure _{Ŝj → y3}) obtained from SEM-GWAS and those from MTM-GWAS. We observed good agreement between SEM-GWAS and MTM-GWAS. The total SNP signals derived from SEM and MTM are the same but SEM provides biologically relevant additional information.

Comparison of multiple trait (MTM) and fully recursive overall SNP effects obtained with

Figures S1–S4 present scatter plots of MTM-GWAS and SEM-GWAS signals (SEM-A75, SEM-G75, SEM-A85, and SEM-A95) for the BM → BW path, which was a common path across all SEM-GWAS considered. These two traits have a genetic correlation of 0.5 (results not shown). We partitioned the SEM causal link into direct, indirect, and overall effects based on directed links inferred from the IC algorithm with HPD > 0.85, whereas MTM-GWAS captures an overall SNP effect on BW. Scatter plots of the overall effects from SEM-GWAS and those of the total effects from MTM-GWAS indicated almost perfect agreement (top left plots, Figures S1–S4). We also observed concomitance between estimated overall and direct effects (top right plots, Figures S1–S4). In contrast, there was less agreement in the magnitude of the SNP effects when comparing overall vs. indirect effects (bottom left plots, Figures S1–S4). There was no linear relationship between the indirect and direct SNP effects (bottom right plots, Figures S1–S4). In short, genetic signals detected in SEM-GWAS were close to those of MTM-GWAS for overall effects because both models are based on a multivariate approach with the same covariance matrix. In all SEM-GWAS, results showed that direct effects contributed to overall effects more than the indirect effects.

Figure

Manhattan plot showing overall, direct and indirect SNP effects using a full recursive model based on

The corresponding Manhattan plot based on –log_{10} (_{10} (

As an illustration, the six most significant SNPs with the highest –log_{10} (

Six most significant SNPs selected according –log_{10} (

_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
_{Sj → y(HHP)} |
||||
---|---|---|---|---|---|---|---|---|---|---|---|

Top SNPs for direct effects | 14 | Gga_rs313620413 | GRIN2A | 7.4242 | 0.1499 | 9.6599 | 7.4525 | −5.7827 | −0.0498 | −5.8326 | −5.78511 |

7 | Gga_rs16591372 | OLA1 | 7.0868 | 0.2220 | 9.0119 | 6.9783 | −22.5681 | 0.2983 | −22.2698 | −22.3520 | |

3 | Gga_rs15390496 | EPHA7 | 7.0209 | 0.2214 | 8.6122 | 7.0297 | −22.4233 | −0.2149 | −22.6382 | −22.4098 | |

1 | Gga_rs314001234 | – | 7.0147 | 1.1067 | 9.0710 | 7.1653 | −26.6538 | −0.9018 | −27.5556 | −26.9360 | |

7 | Gga_rs315626061 | – | 6.8300 | 0.3360 | 8.9974 | 6.9529 | 5.1767 | 0.0910 | 5.26783 | 5.22295 | |

7 | Gga_rs316509306 | – | 6.8241 | 0.3442 | 8.9952 | 6.9485 | 5.1742 | 0.0928 | 5.267116 | 5.22105 | |

Top SNPs for indirect effects | 4 | Gga_rs316082590 | LOC422264 | 0.7137 | 3.6868 | 0.4754 | 0.5696 | −1.2913 | 0.4505 | −0.84073 | −1.07339 |

4 | Gga_rs313358833 | LOC422265 | 0.6449 | 3.2345 | 0.4310 | 0.5202 | −1.2067 | 0.4235 | −0.78322 | −1.01618 | |

4 | Gga_rs314615897 | MAEA | 0.1170 | 2.9505 | 0.0474 | 0.0387 | −0.2799 | 0.3853 | 0.105456 | −0.09807 | |

1 | Gga_rs15301842 | – | 0.0393 | 2.9408 | 0.1436 | 0.0149 | −0.1301 | 0.5053 | 0.375199 | 0.050463 | |

1 | Gga_rs314551852 | – | 0.0632 | 2.8858 | 0.1100 | 0.0065 | −0.2038 | 0.4994 | 0.295514 | −0.02218 | |

1 | Gga_rs317379325 | – | 0.1599 | 2.8473 | 0.0070 | 0.0931 | −0.4789 | 0.5000 | 0.021148 | −0.29321 | |

Overall effects | 14 | Gga_rs313620413 | GRIN2A | 7.4242 | 0.1499 | 9.6599 | 7.4525 | −5.7827 | −0.0498 | −5.83262 | −5.7851 |

1 | Gga_rs314001234 | – | 7.0147 | 1.1067 | 9.0710 | 7.1653 | −26.653 | −0.9018 | −27.5556 | −26.9360 | |

7 | Gga_rs315626061 | – | 7.0868 | 0.2220 | 9.0119 | 6.9783 | −22.5681 | 0.2983 | −22.2698 | −22.3520 | |

7 | Gga_rs315626061 | – | 6.8300 | 0.3360 | 8.9974 | 6.9529 | 5.1767 | 0.0910 | 5.26783 | 5.2229 | |

7 | Gga_rs316509306 | – | 6.8241 | 0.3442 | 8.9952 | 6.9485 | 5.1742 | 0.0928 | 5.267116 | 5.2210 | |

7 | Gga_rs15850017 | ZNF385B | 6.6582 | 0.0499 | 8.6397 | 6.6176 | −20.8591 | −0.0718 | −20.9310 | −20.7681 | |

MTM | 14 | Gga_rs313620413 | GRIN2A | 7.4242 | 0.1499 | 9.6599 | 7.4525 | −5.7827 | −0.0498 | −5.8326 | −5.7851 |

1 | Gga_rs314001234 | – | 7.0147 | 1.1067 | 9.0710 | 7.1653 | −26.6538 | −0.9018 | −27.5556 | −26.936 | |

3 | Gga_rs15390496 | EPHA7 | 7.0209 | 0.2214 | 8.6122 | 7.0297 | −22.4233 | −0.2149 | −22.6382 | −22.4098 | |

7 | Gga_rs16591372 | OLA1 | 7.0868 | 0.2220 | 9.0119 | 6.9783 | −22.5681 | 0.2983 | −22.2698 | −22.352 | |

7 | Gga_rs315626061 | – | 6.8300 | 0.3360 | 8.9974 | 6.9529 | 5.1767 | 0.0910 | 5.26780 | 5.2229 | |

7 | Gga_rs316509306 | – | 6.8241 | 0.3442 | 8.9952 | 6.9485 | 5.1742 | 0.0928 | 5.2671 | 5.2210 |

_{Sj → y(HHP)}, ind_{Sj → y(HHp)}, T_{Sj → y(HHP)} and MTM_{Sj → y(HHP)}, represents, direct, indirect and overall from SEM and MTM effects of j-th SNP on HHP. The bold values are –log_{10} (corrected p-value) for each type of significant SNP effects categories

We noted that the six SNPs selected according to the –log_{10} (_{SNPj → y(HHP)}) had small indirect effects ranging from -0.9018 to 0.2983. These indirect effects were negligible compared with their corresponding direct and total effects. Also, exploring the indirect effect sizes of the six most significant SNPs showed that indirect effects that are transmitted through inferred causal networks have the ability to change the magnitude of overall SNP effects, even changing them to the opposite direction (i.e., from positive to negative or vice versa).

It should also be noted that the estimated additive SNP effects obtained from the four SEM-GWAS can be used for inferring pleiotropy. For instance, a pleiotropic QTL may have a large positive direct effect on BW but may exhibit a negative indirect effect coming from BM, which in turn reduces the total QTL effect on BW. Arguably, the methodology employed here would be most effective when the direct and indirect effects of a QTL are in opposite directions. If the direct and indirect QTL effects are in the same direction, the power of SEM-GWAS may be the same as the overall power of MTM-GWAS. The overall effect (T_{Ŝj → y(HHP)}) of a given SNP consisted of large indirect (ind_{Ŝj → y(HHP)}) and small direct (d_{Ŝj → y(HHP)}) effects on HHP, as observed for the top most significant indirect SNPs localized on GGA4 and GAA1, whereas the opposite pattern was observed for the most significant direct SNPs on GAA3, GGA7, and GGA14, which showed large direct and small indirect effects. Although the overall effects of these SNPs from SEM-GWAS and MTM-GWAS were similar, the use of decomposition allowed us to determine that the trait of interest is affected in different manners. For instance, a given SNP effect may largely act directly on HHP without any mediation by BM and BW, whereas another SNP may be transmitting a large effect through a causal path mediated by BM and BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

It is becoming increasingly common to analyze a set of traits simultaneously in GWAS by leveraging genetic correlations between traits (Gao et al.,

The primary purpose of estimating the goodness of fit criterions was to determine whether full recursive SEM and MTM models with different assumptions yield the same or nearly the same BIC and AIC scores. Because our results showed that SEM and MTM produced nearly the same goodness of fit criterions, we conclude that the essential difference between these models cannot be articulated in terms of an expressive power of joint distributions or goodness of fit (Valente et al.,

Estimated path coefficients reflect the strength of each causal link, quantifying the proportion of direct and indirect effects of a given SNP or genes on the outcome of interest via the mediator phenotypic traits or the predefined causal pathway between a set of mediators and the target outcome. For instance, a positive path coefficient from BM to BW suggests that a unit increase in BM directly results in an increase in BW. Our results showed that MTM-GWAS and SEM-GWAS were similar in terms of the goodness of fit as per the AIC and BIC criteria. This finding is in agreement with theoretical work of Gianola and Sorensen (

The results obtained in this study using the three economic traits in chickens suggest that causal inference and the SEM framework can be used for a set of phenotypes by considering both the raw and partial correlation relationships among traits in breeding programs. For example, in model SEM-A85, BM and HHP are unconditionally independent. However, conditioning on BW results in a non-zero partial correlation. Conditioning on BW breaks the causal chain from BM to HHP as observed in the case of full recursive models (SEM-A75 and SEM-G75) and their partial correlation becomes non-zero. This indicates that when all three variables are causally connected, both raw and partial correlations will all be non-zero, but they will change the magnitude depending on the signs of the path coefficients.

The advantage of SEM-GWAS over MTM-GWAS is that the former decomposes SNP effects by tracing inferred causal networks. Our results showed that by partitioning SNP effects into direct, indirect, and total components, an alternative perspective of SNP effects can be obtained. As shown in Table

SEM offers insights into how phenotypic traits relate to each other. We illustrated potential advantages of SEM-GWAS relative to the commonly used standard MTM-GWAS by using three chicken traits as an example. SNP effects pertaining to SEM-GWAS have a different meaning than those in MTM-GWAS. Our results showed that SEM-GWAS enabled the identification of whether a SNP effect is acting directly or indirectly, i.e., mediated, on given trait. In contrast, MTM-GWAS only captures overall genetic effects on traits, which is equivalent to combining direct and indirect SNP effects from SEM-GWAS together. Thus, SEM-GWAS offers more information and provides an alternative view of putative causal networks, enabling a better understanding of the genetic quiddity of traits at the genomic level.

MM carried out the study and wrote the first draft of the manuscript. GR and DG designed the experiment, supervised the study, and critically contributed to the final version of manuscript. GM contributed to the interpretation of results, provided critical insights, and revised the manuscript. BV and AA participated in discussion and reviewed the manuscript. MA, AK, and RM contributed materials and revised the manuscript. All authors read and approved the final manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

MM wishes to acknowledge the Ministry of Science, Research and Technology of Iran for financially supporting his visit to the University of Wisconsin-Madison. Work was partially supported by the Wisconsin Agriculture Experiment Station under hatch grant 142-PRJ63CV to DG.

The Supplementary Material for this article can be found online at: