^{1}

^{2}

^{3}

^{3}

^{4}

^{3}

^{4}

^{3}

^{5}

^{1}

^{6}

^{7}

^{*}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

Edited by: Mariza De Andrade, Mayo Clinic, United States

Reviewed by: Karim Oualkacha, Université du Québec à Montréal, Canada; Alexandre Bureau, Laval University, Canada

This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Family-based designs have been shown to be powerful in detecting the significant rare variants associated with human diseases. However, very few significant results have been found owing to relatively small sample sizes and the fact that statistical analyses often suffer from high false-negative error rates. These limitations can be avoided by combining results from multiple studies via meta-analysis. However, statistical methods for meta-analysis with rare variants are limited for family-based samples. In this report, we propose a tool for the meta-analysis of family-based rare variant associations, metaFARVAT. metaFARVAT is based on a quasi-likelihood score for each variant. These scores are combined to generate burden test, variable-threshold test, sequence kernel association test (SKAT), and optimal SKAT statistics. The proposed method tests homogeneous and heterogeneous effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. Simulation results demonstrated the robustness and efficiency of the proposed method in different scenarios. By applying metaFARVAT to data from a family-based study and a case-control study, we identified a few promising candidate genes, including

In recent decades, genome-wide association studies (GWAS) have identified tens of thousands of common variants associated with various complex diseases. However, in spite of their success in discovering disease susceptibility loci (DSL), the DSL identified by GWAS only partially explain disease heritability, and rare variants have been implicated as one contributor to this missing heritability (Manolio et al.,

Multiple rare variants can affect disease status, and thus, association analyses with rare variants suffer from genetic heterogeneity among affected individuals. In families, Mendelian transmission results in family members sharing the same alleles, and therefore, affected relatives have a greater chance of being affected by the same disease-causing single-nucleotide polymorphisms (SNPs) than unrelated subjects. For instance, the probability of sibling pairs sharing rare alleles can be calculated (Ionita-Laza et al.,

Aggregation of association signals across multiple genetic variants was expected to provide sufficient statistical power for rare variant analyses and to identify various DSL. However, very few genome-wide significant results have been found because of relatively small sample sizes. When the sample size is small, statistical analyses suffer from high false-negative error rates, and this limitation can be avoided by combining results from multiple studies via mega- or meta-analysis. Mega-analysis assumes that subjects' genotypes and phenotypes from different studies are available, and these are pooled for genetic association analyses. Meta-analysis directly utilizes test statistics from separate studies and combines them into a single test statistic. The choice between mega- and meta-analysis depends on the heterogeneity among studies and the availability of individual genotype and phenotype data from all studies. In particular, if there are systematic differences in phenotype diagnosis or sequencing technology, meta-analysis is often preferred. Otherwise, mega-analysis is recommended if genotypes and phenotypes are available. Recently, several meta-analysis methods for rare variant association tests have been proposed, such as MASS (Tang and Lin,

In this study, we proposed a new meta-analysis method for family-based, population-based, and case-control rare variant association tests, metaFARVAT. metaFARVAT generates a quasi-likelihood score for each variant and combines them to generate burden, VT, SKAT, and SKAT-O statistics. metaFARVAT can assume homogeneous or heterogeneous effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. We evaluated the statistical validity of metaFARVAT using simulated data and compared its estimated power with those of RAREMETAL and seqMeta under various scenarios. Furthermore, metaFARVAT was applied to identify rare variants for chronic obstructive pulmonary disease (COPD) using whole-exome sequencing (WES) data from family-based samples from the Boston Early-Onset COPD Study (EOCOPD) and case-control samples from the COPDGene study.

We assume that there are _{k}_{imk}_{ik}

In some cases, rare variants may be observed only in a subset of studies. If variant

Parental genotypes are transmitted to offspring under Mendelian transmission, and thus our test statistics consider the genetic correlation between family members. The genetic variance-covariance matrix among family members can be specified by a kinship coefficient matrix, _{k}. If we let _{ik}_{k} is defined by:

Under the presence of population substructure, the genetic relationship matrix (GRM) can be estimated with large-scale genotyping data and should alternatively replace _{k} (Thornton et al.,

Last, meta-analysis of rare variant association analyses with multiple studies requires two different types of weights. First, when multiple studies are combined, each study has different features, such as sample size and disease diagnosis, and such differences can be handled with an _{k}, and their _{B}. Second, rare variants have different gene annotations, genomic coordinates, and functional characterization, and various annotation tools have been proposed to choose important features based on their biological properties. We denote the weight for rare variant _{m}_{W} be their

We introduce the offset μ_{ik} for subject

The most efficient choice of

where

where ^{−1} − ^{−1}^{t}^{−1}^{−1}^{t}^{−1}. For a dichotomous phenotype, use of the generalized linear mixed model might be considered an appropriate approach, but we estimated

We let

If we denote the covariance between _{m, k} and

If we let _{m, k}(Choi et al.,

The score vector of rare variants in study

The score statistic tests whether the coded genotypes are linearly independent from the phenotypes; for dichotomous phenotypes, it is equivalent to comparing the minor allele frequencies (MAFs) between cases and controls.

The homogeneous model assumes that the effect sizes of each variant are expected to be similar among different studies, and thus the proposed scores for each study can be collapsed across studies as follows:

Here, we set _{k} to be one. However, the proposed statistics are sometimes unavailable, and the appropriate choice can vary according to the available information. For instance, if standardized test statistics and sample sizes are available, then the inverse function to the square root of the sample size can be utilized.

Rare variant association analysis can be categorized into burden and variance-component tests (Li and Leal,

Variance component tests use the collapsed squared scores (Neale et al.,

We denote eigenvalues for _{l}. If we let

A balanced approach for both scenarios can be achieved by the SKAT-O type statistic (Lee et al.,

If we let its _{0} = 0 < _{1} < … < _{L}

Its

Last, rare variant association analysis utilizes rare variants, but the definition of a rare variant is not clear. VT approaches are very useful in such scenarios. We assume that rare variants are sorted in ascending order of overall MAF. We let _{(m)} be an

then the covariance between

Therefore, we let:

If we denote the realization of _{(m)} and let _{(|max|)} = max{ |_{(1)}|,…, |_{(M)}|}, the

Here,

where

As in the homogeneous model, we propose burden and variance component tests for the heterogeneous model. The heterogeneous model assumes that the effects of specific variants are heterogeneous among studies. If we let _{m, k}) = β_{mk}, the null hypothesis can be expressed by β_{11} = … = β_{MK} = 0, and we consider the following score vector and its variance matrix:

where _{kk} is a

We let:

and we let

Then,

Therefore, the variance component test is defined by:

If we denote the

and its

The performance of metaFARVAT was evaluated via extensive simulation studies. metaFARVAT can be applied to population-based and case-control designs by calculating GRM among samples. Therefore, we only focused on family-based designs in our simulation studies and considered unbalanced families consisting of trios, nuclear families, and extended families with three generations; the family structures that we considered are presented in ^{−8}, and haplotypes were randomly chosen with replacement to build founder genotypes. We defined variants with MAFs < 0.01 as being rare, and 60 rare variants were randomly selected from their haplotypes. Then, non-founder haplotypes were chosen from their parents' haplotypes in Mendelian fashion under the assumption of no recombination.

Phenotypes were generated under the null and alternative hypotheses. Simulation of dichotomous phenotypes was performed using the liability threshold model. Once the quantitative phenotypes were generated, they were transformed into case-control status for dichotomous phenotypes. If quantitative phenotypes were larger than the threshold, they were considered affected and otherwise were considered unaffected. The threshold was chosen to preserve the assumed disease prevalence of 0.1. If the disease prevalence is misspecified, loss of statistical power is expected; however, it has been shown with simulation studies that the effect of misspecification is not very substantial (Won and Lange,

Quantitative phenotypes were defined by summing the phenotypic mean, genetic effect, polygenic effect, main genetic effect, and random error, and we assumed there was no environmental effect shared between family members. The phenotypic mean was denoted by α = 0.3. The polygenic effect for each founder was independently generated from _{g}^{2} = 1), and for non-founders, the average of maternal and paternal polygenic effects was combined with values independently sampled from _{g}^{2}). Random error was independently sampled from _{e}^{2} = 1). Therefore, the heritability of the simulated trait is 0.5. The genetic effect at variant _{mk} and the number of disease susceptibility alleles. To evaluate the type-1 error estimates, β_{mk} was assumed to be 0. To evaluate the statistical power estimates, if we let _{a}^{2} be the proportion of variance explained by rare variants, β_{mk} values were iteratively sampled with a two-step approach.

β_{mk} values were sampled from the uniform distribution _{k}_{k}_{a}^{2} = 0.01. β_{mk} was generated from heterogeneous or homogeneous scenarios. For homogeneous scenarios, we assumed that the effects of each rare variant were in the same direction in all studies. For heterogeneous scenarios, the signs (±) of β_{mk} values sampled from _{k}

We considered previously reported WES data from Boston Early-Onset COPD Study (EOCOPD) families and COPDGene case-control subjects for meta-analysis (Qiao et al., _{1}) of ≤40%, physician-diagnosed COPD, and without severe alpha-1 antitrypsin deficiency. All first-degree relatives, older second-degree relatives, and additional affected family members were enrolled. There were 49 pedigrees with at least two affected family members selected for WES. COPDGene was a multi-center study of smokers with and without COPD and included African-Americans and non-Hispanic whites (Regan et al., _{1} <50% and ratio of FEV_{1} to forced vital capacity (FEV_{1}/FVC) <0.7), as well as 195 controls with normal spirometry (frequency-matched to COPD cases on pack-years of cigarette smoking), were chosen for WES.

Sequencing for both cohorts was performed at the University of Washington (Seattle, WA), using Nimblegen V2 capture (Roche NimbleGen, Inc., Madison, WI), and the Illumina platform (Illumina, Inc., San Diego, CA). Participants selected from the COPDGene cohort were sequenced via the NHLBI Exome Sequencing Program, and EOCOPD subjects were sequenced as part of the Center for Mendelian Genomics. Quality control (QC) filtering for both data sets was performed by the method of Qiao et al. (^{−8}, and average sequencing depth <12, as well as excluding subjects with pedigree, racial, or sex mismatches. After QC, there were 303 individuals from 49 families and 124,288 variants in the EOCOPD data set, and there were 394 unrelated individuals and 108,443 variants in the COPDGene data set. For rare variant analyses, we assumed that variants with MAFs <5% in dbSNP were rare, and in both studies, we separately filtered out singleton variants or genes with minor allele counts (MACs) <10. Finally, 88,737 rare variants in 13,935 genes were analyzed in the EOCOPD data set, and 24,846 rare variants in 10,550 genes were tested in the COPDGene data set. For both EOCOPD and COPDGene data, GRMs were estimated for variants with MAFs >5% and were incorporated as variance-covariance matrices of genotypes to adjust for population substructure. Effects of covariates for binary phenotypes were adjusted by using the BLUP as an offset. First, we fitted the linear mixed model with adjustments for age, sex, and pack-years of smoking as covariates, and then BLUP was set as the offset for the proposed methods. A description of the two datasets is provided in

To evaluate statistical validity, type-1 error estimates for both dichotomous and quantitative phenotypes were calculated at various significance levels using 20,000 replicates of 200 unbalanced families. For each replicate, we performed three different meta-analyses, including 3, 6, and 9 studies. ^{Hom}) and heterogeneous metaFARVAT (metaFARVAT^{Het}) at the 0.1, 0.01, 10^{−3}, and 10^{−4} significance levels with dichotomous phenotypes. Estimates of type-1 error rates were virtually equal to nominal significance levels. However, VT type metaFARVAT^{Hom} showed inflation, especially when there were three studies, and if the number of rare variants is small, it is not recommended. Quantile-quantile (QQ) plots in ^{Hom} and metaFARVAT^{Het} are statistically valid.

Type-1 error estimates from simulation study with dichotomous phenotypes.

Hom | 3 | 0.1 | 0.0960 | 0.0950 | 0.0953 | 0.1100 |

0.01 | 0.0103 | 0.0099 | 0.0100 | 0.0116 | ||

10^{−3} |
0.0009 | 0.0012 | 0.0014 | 0.0017 | ||

10^{−4} |
0.0001 | 0.0001 | 0.0001 | 0.0004 | ||

6 | 0.1 | 0.1002 | 0.0953 | 0.0957 | 0.1018 | |

0.01 | 0.0094 | 0.0085 | 0.0088 | 0.0106 | ||

10^{−3} |
0.0008 | 0.0009 | 0.0008 | 0.0011 | ||

10^{−4} |
0.0001 | 0.0000 | 0.0000 | 0.0001 | ||

9 | 0.1 | 0.1000 | 0.1015 | 0.1025 | 0.1018 | |

0.01 | 0.0096 | 0.0098 | 0.0093 | 0.0110 | ||

10^{−3} |
0.0007 | 0.0009 | 0.0007 | 0.0015 | ||

10^{−4} |
0.0001 | 0.0000 | 0.0000 | 0.0001 | ||

Het | 3 | 0.1 | 0.0987 | 0.1006 | 0.0981 | – |

0.01 | 0.0100 | 0.0091 | 0.0094 | – | ||

10^{−3} |
0.0008 | 0.0008 | 0.0013 | – | ||

10^{−4} |
0.0001 | 0.0002 | 0.0002 | – | ||

6 | 0.1 | 0.1036 | 0.0986 | 0.0985 | – | |

0.01 | 0.0094 | 0.0106 | 0.0105 | – | ||

10^{−3} |
0.0008 | 0.0014 | 0.0012 | – | ||

10^{−4} |
0.0001 | 0.0003 | 0.0002 | – | ||

9 | 0.1 | 0.1041 | 0.1026 | 0.1046 | – | |

0.01 | 0.0107 | 0.0095 | 0.0107 | – | ||

10^{−3} |
0.0009 | 0.0011 | 0.0009 | – | ||

10^{−4} |
0.0001 | 0.0002 | 0.0001 | – |

^{−3}, and 10^{−4} significance levels for dichotomous phenotypes. We assumed that the number of rare variants is 60, and that their minor allele frequencies <0.01. Both homogeneous (Hom) and heterogeneous (Het) models were considered

Secondly, empirical power estimates for dichotomous phenotypes were calculated at the 2.5 × 10^{−6} significance level, showing the changes in power under different scenarios. Empirical power estimates were calculated with 2,000 replicates for seven different statistics: burden, SKAT, SKAT-O, and VT type statistics for metaFARVAT^{Hom} and burden, SKAT, and SKAT-O type statistics for metaFARVAT^{Het}. Results are provided in _{k}

Empirical power estimates for dichotomous phenotype for homogeneous variants among studies.

60/0 | Fisher | 0.1990 | 0.6495 | 0.8940 | |||||||||

minP | 0.0315 | 0.0610 | 0.0715 | ||||||||||

Hom | 0.0195 | 0.3590 | 0.3660 | 0.3915 | 0.1690 | 0.9265 | 0.9150 | 0.9240 | 0.4920 | 0.9975 | 0.9945 | 0.9965 | |

Het | 0.0115 | 0.3390 | 0.4160 | – | 0.0750 | 0.9095 | 0.9330 | – | 0.1865 | 0.9930 | 0.9960 | – | |

48/12 | Fisher | 0.0270 | 0.1060 | 0.2400 | |||||||||

minP | 0.0060 | 0.0070 | 0.0070 | ||||||||||

Hom | 0.0105 | 0.0335 | 0.0670 | 0.0450 | 0.1105 | 0.2290 | 0.3720 | 0.2665 | 0.4000 | 0.5355 | 0.7565 | 0.5720 | |

Het | 0.0045 | 0.0310 | 0.0720 | – | 0.0225 | 0.2080 | 0.3305 | – | 0.0760 | 0.4825 | 0.6325 | – | |

30/30 | Fisher | 0.0000 | 0.0015 | 0.0035 | |||||||||

minP | 0.0000 | 0.0000 | 0.0000 | ||||||||||

Hom | 0.0050 | 0.0000 | 0.0025 | 0.0010 | 0.0555 | 0.0000 | 0.0270 | 0.0000 | 0.2615 | 0.0000 | 0.1650 | 0.0065 | |

Het | 0.0000 | 0.0000 | 0.0005 | – | 0.0020 | 0.0000 | 0.0015 | – | 0.0120 | 0.0000 | 0.0090 | – | |

30/0 | Fisher | 0.0440 | 0.2090 | 0.4520 | |||||||||

minP | 0.0090 | 0.0170 | 0.0205 | ||||||||||

Hom | 0.0140 | 0.0725 | 0.1145 | 0.0900 | 0.1790 | 0.4260 | 0.5760 | 0.4785 | 0.5515 | 0.7970 | 0.9125 | 0.8220 | |

Het | 0.0070 | 0.0605 | 0.1290 | – | 0.0555 | 0.3905 | 0.5545 | – | 0.1410 | 0.7545 | 0.8590 | – | |

24/6 | Fisher | 0.0075 | 0.0365 | 0.0895 | |||||||||

minP | 0.0020 | 0.0020 | 0.0015 | ||||||||||

Hom | 0.0095 | 0.0045 | 0.0215 | 0.0085 | 0.1285 | 0.0465 | 0.1980 | 0.0610 | 0.4480 | 0.1440 | 0.5480 | 0.1765 | |

Het | 0.0025 | 0.0035 | 0.0240 | – | 0.0225 | 0.0340 | 0.1215 | – | 0.0630 | 0.1105 | 0.2890 | – | |

15/15 | Fisher | 0.0000 | 0.0030 | 0.0045 | |||||||||

minP | 0.0000 | 0.0010 | 0.0005 | ||||||||||

Hom | 0.0020 | 0.0000 | 0.0005 | 0.0010 | 0.0550 | 0.0000 | 0.0270 | 0.0025 | 0.2700 | 0.0000 | 0.1650 | 0.0060 | |

Het | 0.0000 | 0.0000 | 0.0000 | – | 0.0025 | 0.0000 | 0.0025 | – | 0.0090 | 0.0000 | 0.0030 | – |

^{Hom} and metaFARVAT^{Het} was calculated for dichotomous phenotypes with homogeneous effects at the 2.5 × 10^{−6} significance level

Empirical power estimates for dichotomous phenotype for heterogeneous variants among studies.

48/12 | Fisher | 0.0240 | 0.1040 | 0.2235 | |||||||||

minP | 0.0065 | 0.0070 | 0.0105 | ||||||||||

Hom | 0.0040 | 0.0340 | 0.0425 | 0.0460 | 0.0130 | 0.2170 | 0.2450 | 0.2555 | 0.0420 | 0.5200 | 0.5305 | 0.5680 | |

Het | 0.0080 | 0.0325 | 0.0590 | – | 0.0270 | 0.1865 | 0.3180 | – | 0.0715 | 0.4555 | 0.6115 | – | |

30/30 | Fisher | 0.0000 | 0.0030 | 0.0020 | |||||||||

minP | 0.0000 | 0.0000 | 0.0000 | ||||||||||

Hom | 0.0005 | 0.0000 | 0.0000 | 0.0005 | 0.0005 | 0.0000 | 0.0005 | 0.0000 | 0.0015 | 0.0000 | 0.0015 | 0.0000 | |

Het | 0.0005 | 0.0000 | 0.0005 | – | 0.0050 | 0.0000 | 0.0030 | – | 0.0090 | 0.0000 | 0.0070 | – | |

30/0 | Fisher | 0.0460 | 0.2220 | 0.4690 | |||||||||

minP | 0.0115 | 0.0145 | 0.0160 | ||||||||||

Hom | 0.0065 | 0.0670 | 0.0945 | 0.0880 | 0.0400 | 0.4385 | 0.4595 | 0.4730 | 0.1185 | 0.7880 | 0.7875 | 0.7980 | |

Het | 0.0060 | 0.0570 | 0.1340 | – | 0.0510 | 0.3930 | 0.5580 | – | 0.1370 | 0.7425 | 0.8380 | – | |

24/6 | Fisher | 0.0095 | 0.0325 | 0.0850 | |||||||||

minP | 0.0030 | 0.0035 | 0.0030 | ||||||||||

Hom | 0.0020 | 0.0070 | 0.0115 | 0.0120 | 0.0045 | 0.0470 | 0.0680 | 0.0655 | 0.0125 | 0.1520 | 0.1875 | 0.1900 | |

Het | 0.0040 | 0.0065 | 0.0270 | – | 0.0215 | 0.0335 | 0.1230 | – | 0.0590 | 0.1185 | 0.2825 | – | |

15/15 | Fisher | 0.0010 | 0.0015 | 0.0020 | |||||||||

minP | 0.0005 | 0.0010 | 0.0005 | ||||||||||

Hom | 0.0000 | 0.0000 | 0.0000 | 0.0005 | 0.0000 | 0.0000 | 0.0000 | 0.0005 | 0.0000 | 0.0000 | 0.0000 | 0.0005 | |

Het | 0.0015 | 0.0000 | 0.0015 | – | 0.0030 | 0.0000 | 0.0010 | – | 0.0095 | 0.0000 | 0.0060 | – |

^{Hom} and metaFARVAT^{Het} was calculated for dichotomous phenotypes with heterogeneous effects at the 2.5 × 10^{−6} significance level

According to our results, the minimum ^{Hom} and metaFARVAT^{Het} under homogeneous and heterogeneous scenarios, respectively. Statistical power estimates also depend on the proportion of rare variants with deleterious or protective effects on the phenotypes, which is often unknown. For example, when all rare causal variants had deleterious effects on the phenotype, burden, and VT type metaFARVAT outperformed all other approaches, but if there were variants with deleterious and protective effects, SKAT-type metaFARVAT was the most efficient. SKAT-O metaFARVAT was not always the most powerful, but its empirical power estimates were usually very close to those of the most efficient approach.

The proposed methods can be applied to quantitative phenotypes, and results for quantitative phenotypes are provided in ^{Hom} under homogeneous scenarios. The SKAT-O type statistic in seqMeta did not perform well when there were as many protective variants as deleterious variants in the gene. metaFARVAT^{Het} outperformed other methods when the effects of each rare variant differed among studies and when there were variants with deleterious and protective effects within a gene.

To identify rare variants associated with COPD, we separately conducted rare variant analyses with EOCOPD and COPDGene data. Manhattan and QQ plots are provided in ^{Hom} and metaFARVAT^{Het}. For both statistics, _{1} and _{2} were set to 1. The QQ plots in ^{Het} and metaFARVAT^{Hom} preserved the nominal significance level. However, VT type metaFARVAT exhibited some inflation, and its results are therefore not included in ^{−6} and is indicated by a solid blue line. ^{−4}) by metaFARVAT^{Hom} SKAT-O.

The candidate genes found by meta-analysis in COPD studies.

Method | Data | Gene | Sample |
Chromosome | Start | End | # Rare |
MAC | P_B | P_S | P_O | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

metaFARVAT ^{Hom} |
EOCOPD & | 697 | 3 | 38,080,978 | 38,163,785 | 9 | 66 | 1.21e-05 | 1.02e-04 | 0.25 | 5.24e-06 | |

COPDGene | 697 | 19 | 11,890,983 | 11,892,255 | 2 | 24 | 9.88e-05 | 3.46e-04 | 1 | 2.13e-05 | ||

metaFARVAT ^{Het} |
EOCOPD & COPDGene | 697 | 3 | 38,080,978 | 38,163,785 | 15 | 66 | 8.03e-06 | 9.54e-04 | 0.16 | 5.43e-06 | |

FARVAT | EOCOPD | 303 | 3 | 38,080,978 | 38,163,785 | 9 | 28 | 3.70e-03 | 0.147 | 1 | 7.24e-03 | |

303 | 19 | 11,890,983 | 11,892,255 | 2 | 13 | 1.12e-03 | 3.83e-03 | 1 | 1.37e-03 | |||

FARVAT | COPDGene | 394 | 3 | 38,080,978 | 38,163,785 | 6 | 38 | 5.80e-04 | 7.96e-04 | 0.25 | 3.53e-04 | |

394 | 19 | 11,890,983 | 11,892,255 | 1 | 11 | 3.09e-02 | 3.09e-02 | 1 | 3.09e-02 |

Family-based association methods are robust against population substructure, and because of genetic homogeneity among family members, they are often utilized for rare variant association analyses. Multiple approaches have been proposed, and Tang and Lin (

In this study, we proposed a new meta-analysis method for family-based rare variant association analyses with dichotomous phenotypes, which can test both homogeneous and heterogeneous effects of variants in different studies. metaFARVAT can also be applied to quantitative phenotypes, and is able to combine all study designs, including family-based, case-control, and population-based designs. Furthermore, the proposed method was applied to a meta-analysis of EOCOPD and COPDGene data, and

Despite the robustness and efficiency of the proposed method, there are still some limitations of the developed method. First, VT methods sort rare variants according to their MAFs and search the optimal threshold for rare variants. This approach is useful when it is not clear how to define rare variants. However, we found that type-1 error can be inflated if the number of rare variants is too small, and it is computationally intensive if there are a large number of variants to investigate. This problem can be solved by using a permutation method, and further investigation of this approach is necessary. Secondly, sufficiently large samples are necessary to guarantee that SKAT-O follows the assumed asymptotic distribution of the SKAT-O approach under the null hypothesis. Therefore, the SKAT-O type metaFARVAT also has this limitation when it is applied to a dichotomous phenotype with a small sample size. Thirdly, the proposed method cannot be applied to X- or Y-linked genes because the distributions of variants in X and Y chromosomes are different in males and females. Such an improvement will be considered in our future work. Lastly, in the simulation studies, we considered a limited number of rare variants and excluded noise variants. However, in practice, it is not known which rare variants are causal and which represent noise. Extensive simulations are thus necessary in our future work.

Despite the importance of rare variant analyses with family-based samples, this field of study has suffered over the last decades from a lack of statistical methods. In this study, we proposed new methods for family-based samples, enabling such statistical analyses.

The R package for metaFARVAT can be downloaded from

LW and SW conceived and designed the experiments, performed the experiments, analyzed and interpreted the data, and drafted the manuscript. SL maintains the software homepage. DQ, MC, ES, and CL edited the manuscript. All authors read and approved the final version of the manuscript.

In the past 3 years, ES received honoraria from Novartis for Continuing Medical Education Seminars and grant and travel support from GlaxoSmithKline. MC has received grant support from GSK. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Sequencing for the Boston Early-Onset COPD Study was provided by the University of Washington Center for Mendelian Genomics (UW CMG) and was funded by the National Human Genome Research Institute and the National Heart, Lung, and Blood Institute grant 1U54HG006493. Sequencing for the COPDGene subjects was provided the NHLBI Exome Sequencing Program (ESP) at the University of Washington.

The Supplementary Material for this article can be found online at: