^{1}

^{*}

^{1}

^{2}

^{1}

^{2}

Edited by: Mariza De Andrade, Mayo Clinic Minnesota, USA

Reviewed by: Zhan Ye, Marshfield Clinic, USA; Karim Oualkacha, Université du Québec à Montréal, Canada

*Correspondence: Meng Wang

This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The efficient analysis of hybrid designs [e.g., affected families, controls, and (optionally) independent cases] is attractive because it should have increased power to detect associations between genetic variants and disease. However, the computational complexity of such an analysis is not trivial, especially when the data contain pedigrees of arbitrary size and structure. To address this concern, we developed a pragmatic test of association that summarizes all of the available evidence in certain hybrid designs, irrespective of pedigree size or structure. Under the null hypothesis of no association, our proposed test statistic (POPFAM+) is the quadratic form of two correlated tests: a population-based test (e.g., wQLS), and a family-based test (e.g., PDT). We use the parametric bootstrap in conjunction with an estimate of the correlation to compute

To boost the power of genetic association studies, researchers are invariably compelled to increase sample size (Hirschhorn and Daly,

Often, researchers employ the “divide and conquer” approach: that is, they analyze (possibly dependent) study-specific subsets of the data separately [e.g., a case-control analysis, and a family-based association analysis (Spielman et al.,

Likelihood-based tests (Bourgain et al., ^{1}^{2}

A third approach uses “meta-analysis” to combine the separate study-specific results, and usually provides a more efficient summary of the evidence for association (Kazeem and Farrall,

Here, we propose POPFAM+^{3}

For a given single nucleotide polymorphism (SNP) and for a dichotomous trait, let's suppose that we want to test the null hypothesis of no association. If the SNP is in LD (i.e., linkage disequilibrium) with a causal genetic factor, then the alleles at the SNP will be associated with the trait, and the allele frequency difference between cases and controls cannot be zero. It is also true that heterozygous parents will preferentially transmit risk alleles at the SNP to their affected offspring, thereby providing a second line of evidence for association. POPFAM+ combines these two lines of evidence to form a single, unified, and more powerful test of association.

Let (^{4}^{5}^{6}

In order to summarize the overall evidence for association, we combine

Note that POPFAM+ does not follow a χ^{2} distribution with 2 degrees of freedom under the null hypothesis of no association. Therefore, we use the parametric bootstrap with ^{1},Y^{1}), …, (X^{t},Y^{t}) from a

Ideally, we would like to use the best value of τ in Equation (1), but this would require knowledge of the joint distribution of (X, Y) under the alternative hypothesis of association. Because this information is never known, and because we found our results to be robust to different values of τ, we recommend setting τ = 0.5. In practice, this strikes a reasonable balance between a χ^{2} test with 2 degrees of freedom, and the test that defines “more extreme” in terms of Euclidean distance from the origin (see Figure

^{2} distribution with 2 degrees of freedom. The dotted line corresponds to τ = 0, and the solid line is the contour of constant probability for POPFAM+ when τ = 0.5. The X- and Y-axes are the values of (normalized) test statistics, with X representing the appropriately signed square root of wQLS, and Y representing PDT.

To better understand the impact of τ, and how our proposed test compares with other competing tests (including a χ^{2} test with 2 degrees of freedom), let's consider a few instructive examples. In each example, we assume that the null hypothesis of no association is true, and that the sample is sufficiently large so that the finite sampling distribution of POPFAM+ is well approximated by its limiting distribution. First, if τ were 1, then the limiting distribution of POPFAM+ would be a χ^{2} distribution with 2 degrees of freedom. The contour of constant probability for this distribution is shown in Figure

We carried out extensive simulations to assess the performance of POPFAM+. Our simulations mimic the structure of an ongoing family study of idiopathic generalized epilepsy (IGE); summary statistics for the IGE families from this study are shown in Table ^{7}

3 | 47 | 0 | 47 |

4 | 117 | 9 | 126 |

5 | 74 | 15 | 89 |

6 | 39 | 9 | 48 |

7 | 27 | 2 | 29 |

When generating data under the null hypothesis of no association, D′—a measure of linkage disequilibrium between the test SNP and the disease gene, was fixed at zero. For all alternative hypotheses (i.e., when there was association), D′ was fixed at 0.8. For all scenarios, the MAFs at the disease gene and test SNP were the same, and these frequencies were varied from 0.04 to 0.10. Prevalence of the disease was fixed at 5%, while penetrance, relative risk, and MAFs were modified accordingly to satisfy this prevalence constraint. The results for each scenario under both null and alternative hypotheses were evaluated on the basis of 1,000 replicates with τ set to 0.5.

In a real data application, we analyzed 350 SNPs located in a 6 Mb region on chromosome 18 that was previously identified by cosegregation analysis (HLOD = 4.5; Durner et al.,

To confirm the validity of POPFAM+, Table

POPFAM+ | 0.051 | 0.052 | 0.046 |

wQLS | 0.056 | 0.056 | 0.045 |

PDT | 0.047 | 0.054 | 0.045 |

Furthermore, Figure

To compare the power of POPFAM+, PDT, wQLS, and the max(PDT, wQLS) (denoted MAX), we simulated and analyzed data for relative risks (RRs) of 2 and 3 for dominant models, and 5 and 10 for recessive models. The power results are summarized in Figure ^{2} test with 2 degrees of freedom (data not shown); which is consistent with the finding of Joo et al. (^{2} test as well. Although, all five tests are statistically consistent when association is present (i.e., the power approaches 1.0 as the sample size increases), the relative gains in power achieved by POPFAM+ are important because power is typically far from 1.0 in most genetic studies of common complex traits.

In a real data application, we used POPFAM+ to analyze the genotypes and phenotypes of a complex hybrid design for IGE [

CTIF | TXNL1 | SETBP1 | CTIF |

SETBP1 | PSTPIP2 | ME2^{*} |
SERPINB5 |

ME2^{*} |
KATNAL2 | SLC14A2 | CFAP53 |

RNF165 | SMAD2 | ||

SERPINB5 | DYM | PSTPIP2 | |

DYM | MEX3C | ST8SIA5 | MYO5B |

MYO5B | SKOR2 | EPG5 | SKOR2 |

LOXHD1 | ZBTB7C | LOXHD1 | |

ST8SIA5 | SMAD7 | MBD2 | SETBP1 |

RNF165 | HDHD2 | CTIF | ZBTB7C |

The results of our analysis identified a promising epilepsy candidate gene for follow-up sequencing: malic enzyme 2 (_{POPFAM+} _{PDT} _{wQLS} _{POPFAM+} _{PDT} _{wQLS} _{POPFAM+} _{PDT} _{wQLS} _{POPFAM+} _{PDT} _{wQLS} _{POPFAM+} _{PDT} _{wQLS}

Because our real data analysis is fundamentally a fine-mapping study, we are not concerned with achieving statistical significance

Compared to competing methods like PDT and wQLS, POPFAM+ has increased power to detect coherent alternatives (e.g., alternatives where risk alleles are over-represented in cases, and preferentially transmitted from heterozygous parents to their affected offspring). This could be particularly useful for detecting variants with MAFs < 5%, where the statistical power could be low unless sample sizes or effect sizes happen to be large (Lee et al.,

For several reasons, we fixed the scaling factor τ at 0.5 for all analyses. First, when little is known about the distribution of genetic effects, fixing τ at 0.5 strikes a nice balance between a χ^{2} test with 2 degrees of freedom, and the test that defines “more extreme” as more distant from the origin (Figure

In principle, POPFAM+ could be extended to

MW: Methodology development, data analysis, manuscript drafting, and revisions. WS: Methodology conception and development, data acquisition, manuscript drafting, and revisions.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We are grateful to Dr. David A. Greenberg for providing the pedigree structures and phenotypes from his ongoing family and case-control study of the genetics of common epilepsies. We also thank Rajeswari Swaminathan for compiling the list of test SNPs used in the real data application, and Darel Finley for providing extensive database support. Furthermore, we are grateful to the Reviewers for their thoughtful comments, which contributed substantially to the overall strength and clarity of this manuscript.

For a fixed significance level (say 5%), the rejection regions for MAX (shaded gray), POPFAM+ (complement of the interior of the solid-line ellipse), and the χ^{2} test with 2 degrees of freedom (complements of the interior of the dash-line ellipse) are shown (Figure ^{2} test with 2 degrees of freedom, so too does MAX.

^{2} with 2 degrees of freedom^{2} test with 2 degrees of freedom is outside the dashed ellipse, and the rejection region of MAX, which is defined as max(X,Y), is shaded in gray.

It is also interesting to compare, the collection of points that give rise to the same

To facilitate an understanding of composite rejection regions, consider Figure ^{2} test with 2 degrees of freedom of size 0.1 α is the complement of the interior of the dashed ellipse, and the rejection region for POPFAM+ with τ = 0.5 and size α is the complement of the interior of the solid ellipse. The rejection region of POPFAM+ can be partitioned into two regions: an area shaded with slanted lines which is the rejection region for a χ^{2} test with 2 degrees of freedom and size 0.1 α, and the two crescent shapes in gray. The area shaded with slanted lines contains 10% of the total type I error (e.g., 0.1 α), while the twin crescents contain the remaining fraction of the type I error (namely 0.9 α). Thus, the total type I error for POPFAM+ is still α.

^{2} test with 2 degrees of freedom with size 0.1α, while the area outside the solid ellipse defines a rejection region of size α for POPFAM+. Under the null hypothesis of “no association” the two twin crescents shaded in grey each have probability 0.45α, so that the total probability across all three regions: 0.1α, 0.45α, and 0.45α sum to α.

^{1}

^{2}

^{3}

^{4}Because wQLS is a chi-squared statistic with 1 degree of freedom under the null hypothesis of no association, our modified version is sgn

^{5}

^{6}

^{7}