^{1}

^{2}

^{*}

^{3}

^{1}

^{2}

^{3}

Edited by:

Reviewed by:

*Correspondence:

This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

When studying the genetics of inherited diseases, researchers often collect data on affected families, unrelated cases, and healthy controls. However, the joint analysis of such heterogeneous data is difficult, and the simpler analysis of homogeneous subsets is often suboptimal. For example, while case-control tests of association are sensitive to allele frequency differences, the preferential transmission of risk alleles from heterozygous parents to their affected offspring is typically ignored. Similarly, the transmission disequilibrium test (TDT) fails to incorporate the difference in allele frequencies when testing for association. To boost the power of modern genetic studies, we propose POPFAM – a fast and efficient test of association that can accommodate large affected families, unrelated cases, and controls. We use simulations to assess the type I error and power of POPFAM across different genetic models, and minor allele frequencies. For comparison, we examine the power of competing methods: the trend test, a Wald test (equivalent to the TDT), and SCOUT. Our results show that POPFAM maintains the correct type I error, and that it is more powerful than the trend test or the TDT. It performs as well as, or better than the likelihood ratio test SCOUT, which was developed specifically for case-parent/case-control data. Furthermore, when applied to the human leukocyte antigen genotypes of 401 type 1 diabetic families, POPFAM confirmed the previously reported association between DRB1^{*}03:01 and microvascular complications (

Many researchers have highly informative, but statistically complicated data. For example, their data may contain a mixture of affected families, independent cases, and controls that are often (but not necessarily) collected at different times. Since these

In contrast to the

Using the theory of linear models,

Recall that for most mixed data sets, the population-based and family based tests will use the genotypes of a shared set of cases, which means that the two tests will be correlated. Like

For ease of exposition, let’s suppose that we have genotype data at a single nucleotide polymorphism (SNP) on affected trios and controls sampled from a genetically homogenous population (i.e., there is no PS), and let’s assume that every case is an affected offspring with unphenotyped parents. Later, we will discuss how related cases can be incorporated into the analysis. Define a population-based test of association

where τ denotes Cov(

POPFAM uses the trend (i.e., case-control) test for _{i} count the number of “1” alleles transmitted from a heterozygous parent to his/her affected offspring, and let _{j} and _{k} count the number of “1” alleles in the

where ^{2} = 2^{2} can be estimated with the method of moments by [4g_{2} + g_{1} - (N + M)^{2} ] . / (N + M) . Here, _{1} and _{2} count the number of cases and controls with 1 and 2 alleles of type “1,” and _{2} +g_{1} )/(N+M). The minor allele frequency (MAF) is denoted by

Furthermore, to safeguard against the negative effects of cryptic PS, users could for example base the component test

To assess the power (and type I error) of POPFAM, and to examine the sensitivity of a competing parametric method named SCOUT (

Genetic models for simulated data.

Scenario | DAF | Pr(Aff\textbar D/d) | Pr(Aff\textbar D/D) |
---|---|---|---|

S1 | 0.03 | 0.125 | 0.440 |

S2 | 0.06 | 0.084 | 0.218 |

S3 | 0.09 | 0.066 | 0.237 |

S4 | 0.12 | 0.060 | 0.173 |

S5 | 0.20 | 0.053 | 0.100 |

S6 | 0.30 | 0.050 | 0.080 |

A recent case-control study among type 1 diabetics (^{*}03 and microvascular complications (e.g., retinopathy, nephropathy, and/or neuropathy).

From

Type I error of POPFAM,

Test | 5% | 1% |
---|---|---|

POPFAM | ||

0.054 | 0.009 | |

0.064 | 0.007 |

In our secondary data analysis of type 1 diabetics, POPFAM confirmed the association between DRB1^{*}03 and microvascular complications (

Power (%) to detect an association.

Scenario | POPFAM | SCOUT^{*} |
||
---|---|---|---|---|

S1 | 88.0 | 74.8 | [29.1, 84.1] | |

S2 | 63.3 | 54.5 | [24.1, 68.1] | |

S3 | 60.8 | 55.0 | [49.0, 66.5] | |

S4 | 47.1 | 43.0 | [38.3, 54.0] | |

S5 | 31.2 | 32.4 | [25.2, 41.0] | |

S6 | 24.8 | 23.9 | [15.9, 33.2] |

For each scenario, the power of POPFAM is given in bold, while the minimum and maximum power of SCOUT across additive, dominant, multiplicative, and recessive models is shown in brackets. The scenarios (S1–S6) are defined in

We have demonstrated that POPFAM–our fast, flexible, and nonparametric test of association increases power and controls the type I error. Relative to test that rely either upon population-based or family-based data alone, POPFAM can substantially boost power with gains as large as 14%. We are particularly interested in the increased power for SNPs with MAFs between 0.03 and 0.12 because the power to detect associations using SNPs in this class is quite low (

The meta-analysis literature in statistical genetics and epidemiology has been somewhat misleading with respect to claims of optimally weighted linear combinations of component tests. The problem arises from the fact that several authors have used the term

POPFAM is more than a test of association; it is a testing framework. The attractive features that POPFAM brings to the case-parent/case-control design are likely to transfer to more complex designs as well (i.e., designs that involve covariates, and that may require a larger and more diverse set of component tests). For example, we have already successfully applied POPFAM to a real mixed data set with large, extended families and publicly available controls (i.e., HapMap CEU samples; data not shown). In principle, these wider applications can be carried out in one of two ways: (1) decompose the large families into trios whenever both parents provide genotype data, or (2) choose a family based test statistic that can accommodate large families (e.g., GDT). To facilitate power analyses, POPFAM can also simulate case-parent/case-control genotype data conditional on the affectedness status of each case and control. Overall, POPFAM represents the next logical step for detecting genetic associations with disease from the analysis (or re-analysis) of mixed data. The software is freely available from the web at:

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Dr. Susan E. Hodge for her thoughtful comments and suggestions; Dr. David A. Greenberg and Dr. Ettie M. Lipner for providing the microvascular complications data; and the National Institutes of Health: ES017875, MH48858, MH65213, and NS27941.

For ease of exposition, let ^{2}_{i}, _{j}, _{k},

Now, let _{v} = _{j} whenever the _{v} = the number of transmitted “1” alleles, and _{v} = the number of heterozygous parents, where both _{v} and _{v} are defined relative to the

Now, the inner expectation is easily computed using the unconditional probabilities of the seven possible genotypic configurations for an informative trio. The outer expectation is tedious, but tractable since