^{1}

^{*}

^{1}

^{2}

^{1}

^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

Edited by: Pedro Antonio Valdes-Sosa, Centro de Neurociencias de Cuba, Cuba

Reviewed by: Anand Joshi, University of Southern California, USA; Eugene Duff, University of Oxford, UK

*Correspondence: Srinivas Rachakonda

This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Principal component analysis (PCA) is widely used for data reduction in group independent component analysis (ICA) of fMRI data. Commonly, group-level PCA of temporally concatenated datasets is computed prior to ICA of the group principal components. This work focuses on reducing very high dimensional temporally concatenated datasets into its group PCA space. Existing randomized PCA methods can determine the PCA subspace with minimal memory requirements and, thus, are ideal for solving large PCA problems. Since the number of dataloads is not typically optimized, we extend one of these methods to compute PCA of very large datasets with a minimal number of dataloads. This method is coined multi power iteration (MPOWIT). The key idea behind MPOWIT is to estimate a subspace larger than the desired one, while checking for convergence of only the smaller subset of interest. The number of iterations is reduced considerably (as well as the number of dataloads), accelerating convergence without loss of accuracy. More importantly, in the proposed implementation of MPOWIT, the memory required for successful recovery of the group principal components becomes independent of the number of subjects analyzed. Highly efficient subsampled eigenvalue decomposition techniques are also introduced, furnishing excellent PCA subspace approximations that can be used for intelligent initialization of randomized methods such as MPOWIT. Together, these developments enable efficient estimation of accurate principal components, as we illustrate by solving a 1600-subject group-level PCA of fMRI with standard acquisition parameters, on a regular desktop computer with only 4 GB RAM, in just a few hours. MPOWIT is also highly scalable and could realistically solve group-level PCA of fMRI on thousands of subjects, or more, using standard hardware, limited only by time, not memory. Also, the MPOWIT algorithm is highly parallelizable, which would enable fast, distributed implementations ideal for big data analysis. Implications to other methods such as expectation maximization PCA (EM PCA) are also presented. Based on our results, general recommendations for efficient application of PCA methods are given according to problem size and available computational resources. MPOWIT and all other methods discussed here are implemented and readily available in the open source GIFT software.

Principal component analysis (PCA) is used as both a data reduction and de-noising method in group independent component analysis (ICA) (Calhoun et al.,

There are several methods to estimate dominant PCA components with minimal memory requirements, like sequential SVD, cascade recursive least squares (CRLS) PCA, and randomized PCA approaches, to name a few. Sequential or “online” SVD is usually applied in a streaming memory setting where the data streams over time and only a single pass over the datasets is possible. There exist algorithms (Brand,

Randomized PCA methods are a class of algorithms that iteratively estimate the principal components from the data and are particularly useful when only a few components need to be estimated from very large datasets. They provide a much more efficient solution than the EVD approach, which always estimates the complete set of eigenvectors, many of which are eventually discarded for data reduction and de-noising purposes. Clearly, iterative approaches can make a much more intelligent use of the available computational resources. Some popular and upcoming randomized PCA approaches are: implicitly restarted Arnoldi iteration (IRAM; (Lehoucq and Sorensen,

In this paper, we show how to overcome the problem of slow convergence in subspace iteration when a high number of components is estimated by introducing a new approach, named multi power iteration (MPOWIT). Our approach takes into account the number of dataloads, which has often been overlooked in the development of randomized PCA methods. We also show that both subspace iteration and EM PCA methods converge to the same subspace in each iteration. Thus, the acceleration scheme we propose in MPOWIT can also be applied to EM PCA. In addition, we compare the performance of MPOWIT with existing PCA methods like EVD and Large PCA using real fMRI data from 1600 subjects with standard acquisition parameters. Moreover, acknowledging the recent popularization and promising developments in the area of multi-band EPI sequences (Feinberg and Setsompop,

We provide descriptions of EVD, Subsampled PCA, Large PCA, MPOWIT and EM PCA in the Materials and Methods Section. The same section also includes a description of the datasets and experiments conducted for each PCA method. Experiments are performed on the real fMRI data. In the Results Section, we present our experimental results and compare the performance of MPOWIT with existing PCA methods. Finally, we discuss these results and draw conclusions based on the analyses we performed. Additional details are provided in the appendices, including a proof that EM PCA is equivalent to subspace iteration.

In this paper, we are interested in group ICA of fMRI data as originally described in Calhoun et al. (_{i} are the fMRI data of subject _{i} is mean-centered on zero at each time point. Each subject's data is reduced along the time dimension using PCA to retain the top ^{1}_{1}, _{2}, …, _{M}] be the temporally concatenated data where _{i} is the zero-mean

Group ICA is regularly being used to analyze large numbers of subjects (Biswal et al., ^{2}

Group ICA of temporally concatenated fMRI can be used to identify either spatial independent components (Calhoun et al., ^{3}_{i}. In the following, we present algorithms for PCA estimation assuming the case depicted in Figure

In the following, we present a selection of approaches for group PCA, starting with the traditional EVD method, which we consider the standard for accuracy in later comparisons. Then, based on considerations made on the EVD method and properties of the fMRI data, two approaches are proposed for efficient approximate estimation of the group PCA solution, namely subsampled voxel PCA (SVP) and subsampled time PCA (STP). Both approaches are useful for efficient initialization and accelerated convergence of the highly accurate randomized methods presented later. Following, Large PCA, a recent block Lanczos method with high accuracy and potential for application in large group PCA, is introduced for the purpose of comparison. Power methods, including the introduction of our novel MPOWIT technique, are discussed next. The connections and implications of MPOWIT on the popular expectation maximization PCA (EM PCA) approach are presented lastly. At every stage, we strive to present every algorithmic improvement and theoretical development in the context of fMRI data under various conditions. Nevertheless, all considerations should be straightforwardly extensible to other modalities and datatypes. Throughout the following,

Using the EVD approach, the group-level PCA space

Compute the sample covariance matrix

EVD factorizes the (symmetric) covariance matrix

The

From Equation (1) and the description in Figure

Exploiting this structure, ^{2}) when ^{2}) when ^{v} is:

The final estimate of the group-level PCA space is obtained as ^{2}) bytes], and note it only requires ^{v} rather than stacking the entire data in memory to form

Clearly, trade-offs exist between time, memory, and dataloads depending on the exact values of

While EVD is very well-developed and accurate, it still becomes computationally and memory intensive when applied to large data^{4}

Subsampled voxel PCA (SVP) works by setting up a scenario in which ^{3} = 8, i.e., _{a} and _{b}, respectively, using EVD with Equation (5) since typically _{a} and _{b} are projected onto the data _{a} and _{b}, respectively, back to dimension

_{a} and _{b} refer to subsampled data in odd- and even-voxel spaces, respectively. _{a} and _{b} are estimated in subsampled space

SVP is much faster compared to EVD as the voxel dimension is smaller by at least a factor of 8 but only gives an approximate PCA solution. SVP PCA estimates are a great initial solution for any of the randomized PCA methods discussed later, inducing to considerably faster convergence.

The time (stacked) dimension increases as more and more subjects are analyzed in a group PCA framework (Figure _{i}) were randomly organized in groups of size ^{5}

Here, we present a modified version of this three-step data reduction, which we call sub-sampled time PCA (STP). It estimates the PCA subspace

_{g} and _{g} are the _{g}, respectively. Assuming _{g} are projected onto the data _{g} to obtain a _{g} (Equation 13). Equations (11)–(13) are repeated for the next group to compute PCA estimates for data _{g+1}. PCA estimates of group _{g} and group _{g+1} are stacked in time dimension and the common _{g} is obtained using Equations (14) and (15), since typically 2_{g} at the end of the estimation:

STP requires only a single pass through the data to determine an approximate PCA space and is a very useful method when data loading is a bottleneck. Both the estimation accuracy and memory requirements are proportional to the number of subjects included in each group and number of components estimated in the intermediate group PCA. In this paper, we select number of subjects in each group as

Large PCA (Halko et al., ^{T} to obtain the reduced PCA space

A Krylov subspace ^{T} matrix is generated iteratively from an initial standard Gaussian random matrix _{0} of size _{0} = _{Mp × b}, _{0} = _{0}, and

After

Following, χ is projected onto the data matrix as follows:

In order to obtain the PCA space

Finally, retrieve only the first

The choice of ^{6}_{j} increment to _{0} = 6 before the first estimation of the singular values (Equation 21), and continue to augment _{0} until convergence was attained. Our approach incurs a total of _{0}_{0} and some additional computational effort, but controlled memory usage, all while guaranteeing the accuracy of the solution with respect to EVD. Finally, Equation (20) is computed only after convergence is attained.

The time complexity of large PCA over all _{0} iterations is _{0} is set to equal ^{STP} or ^{SVP}, respectively, and we recommend _{0} = 1 since convergence is attained much faster with this initialization.

Power iteration is an iterative technique which uses powers of the covariance matrix ^{v} contain the same eigenvectors (^{v} itself, the largest eigenvalues become more dominant, emphasizing the direction of largest variability. However, power iteration techniques require a normalization step to avoid ill-conditioned situations. Different normalization approaches and the choice of initial PCA subspace mark the key differences among power iteration techniques. In traditional power iteration, the _{2}-norm of the PCA estimates is used for normalization in each iteration, as shown below:

Subspace iteration (Rutishauser, _{2}-norm, to orthonormalize intermediate estimates _{j} at each iteration and prevent them from becoming ill-conditioned. The following equations summarize subspace iteration (Saad, _{j} is the subspace estimate at the ^{th} iteration. Equations (26) and (27) are iterated until convergence. Subspace iteration is straightforward to implement but has slow convergence, especially for the last few eigenvalues, which converge much more slowly. Preconditioning techniques like shift-and-invert and Chebyshev polynomial (Saad,

Here, we introduce a novel method called MPOWIT, which accelerates the subspace iteration method. It relies on making the projecting subspace larger than the desired eigen space in order to overcome the slow convergence associated with the subspace iteration approach. The MPOWIT algorithm starts with a standard Gaussian random matrix of size _{2}-norm of each column of χ_{j−1}^{T} on large data is inefficient in memory, the associative matrix multiplications shown in the center and right hand side of Equation (30) are used instead. Finally, Equation (31) is the EVD of ^{7}_{j}. Equations (29)–(31) are iterated for _{0} = 0, where 0 is a

After convergence, the reduced PCA space ^{T}_{2}-norm. A rank

The time complexity of MPOWIT over all _{2}-norm normalization of the columns of _{j} without full orthogonalization would suffice in Equation (29). However, in our experiments, we determined that assessments about the convergence of the algorithm are considerably more reliable if they are based on the eigenvalues Λ_{j} obtained from orthogonal _{j} instead. Thus, explicit orthonormalization of _{j} in each iteration is preferred.

Also, when MPOWIT is initialized with STP or SVP, _{0} is set as the top ^{STP} or ^{SVP}, respectively, and

To complete our discussion on methods for PCA of large datasets, we present expectation maximization PCA (EM PCA). Our focus is on the connections and the implications that certain MPOWIT concepts have on this popular technique. EM PCA (Roweis,

In Equation (34), a Gaussian random matrix of dimensions _{j−1} is fixed and the transformation matrix _{j} is determined, while in the maximization step (Equation 36), _{j} is fixed and the subspace _{j} is determined. Equations (35) and (36) are iterated until the algorithm converges to within the specified error for tolerance as shown below:

After convergence, the reduced PCA space

The time complexity of EM PCA over all iterations is only _{i} at a time rather than the entire stacked

As a final remark on methods, our MPOWIT method relates to normalized power iteration (Martinsson et al.,

We use 1600 pre-processed subjects from resting state fMRI data (a superset of the data presented in Allen et al.,

In the initial subject-level PCA step of group ICA (Calhoun et al.,

A number of experiments have been conducted to assess memory usage and computation time for all group PCA methods discussed previously. Firstly, we assessed memory usage for varying number of subjects (

After applying a common binary mask to all time points, there were 66,745 in-brain voxels per time point. Figure _{i} at a time for each PCA iteration. Thus, un-stacked versions of these algorithms are also considered. Subsampled EVD methods like SVP and STP are also considered as these have fixed memory requirements and are independent of the number of subjects analyzed.

Some notes about multi-band EPI sequences and subject-level PCA are in order here. First, even if the fMRI data were acquired at a 2 × 2 × 2 mm voxel resolution (roughly,

Here, we present results for the group PCA experiments described in Section Data and Preprocessing. If not specified otherwise, all processes were tested on a server running Linux Centos OS release 6.4 with 512 GB RAM, and MATLAB R2012a. We also note that the files with the results from the initial subject-level PCA step were always saved as uncompressed “.mat” files to later speed up the data loading process during the group PCA step. The parameter settings used to solve the group PCA problem for each algorithm are described below:

^{−6} and the maximum number of iterations was set to 1000.

_{0} was set to 6 and the error tolerance was set to 10^{−6}. Note that as the number of block iterations (

^{−6} and 1000, respectively. The multiplier

The _{2}-norm of the absolute difference between the top

From Figure

_{2}-norm of error is computed between the eigenvalues of each method and the eigenvalues of the EVD method.

Spatial ICA was performed on the final group-level PCA components to determine maximally statistically independent components. The Infomax ICA algorithm (Bell and Sejnowski,

We demonstrated the entire group ICA process including the group PCA step on a Linux server with 512 GB RAM. We infer from Figure

Comparing among EVD, Large PCA, and MPOWIT, we notice that MPOWIT and Large PCA outperform EVD (IRAM) in terms of speed when all datasets are already loaded in memory. As depicted in Figure ^{−6}.

The PCA methods we discussed in this paper are generic and can be applied to any dataset without any major modifications. However, our goal here is to demonstrate the applicability of these algorithms to real-valued fMRI data in the context of group ICA.

The execution time for the largest group ICA analysis in this paper, i.e., 1600 subjects and 100 independent components, using the un-stacked version of our MPOWIT PCA algorithm for the group-level PCA, was 329 min (5.48 h, Figure

_{2} - norm of difference between the eigenvalues obtained from the selected PCA method and EVD method.

Our results also emphasize that EVD is not preferred for large scale analysis as there is an extra cost in storing the covariance matrix in memory. One way to overcome this issue is based on Equation (5). If one dimension of the data is fixed, the covariance matrix can be computed in that dimension and the net covariance matrix could be computed by adding the covariance matrices across subjects. Moreover, since the covariance matrix is symmetric, only the lower or upper triangular portions need to be stored in memory (for eigenvalue problems). LAPACK^{8}

The PCA methods applied for performing spatial group ICA are equally valid for temporal group ICA (Smith et al.,

Based on the methods discussed here, we present our findings in a flowchart (Figure _{0} and

We presented a new approach for PCA-based data reduction for group ICA called MPOWIT and demonstrated that it can efficiently solve the large scale PCA problem without compromising accuracy. The un-stacked version of MPOWIT takes almost the same time to complete the analysis as compared to EVD but requires much less RAM. We showed that MPOWIT enables group ICA on very large cohorts using standard fMRI acquisition parameters within 4 GB RAM. Computationally efficient data reduction approaches like MPOWIT are becoming more important due to the larger datasets resulting from new studies using high-frequency multi-band EPI sequences and from an increased tendency to share data in the neuroimaging community. Even in such challenging scenarios, un-stacked MPOWIT could realistically solve group-level PCA on virtually any number of subjects, limited only by time, not memory, constraints. Given its high scalability and right fit for parallelism, MPOWIT sets the stage for future groundbreaking developments toward extremely efficient PCA of big data using GPU acceleration and distributed implementations.

SR—Developed PCA algorithm (MPOWIT) for data reduction. RS—Worked extensively on revising paper and helped with mathematical proofs. JL—Gave good feedback on improving the manuscript. VC—Gave good feedback on improving the manuscript and also work was done under his supervision.

This work was funded by NIH 2R01EB000840, R01EB020407, and COBRE 5P20RR021938/P20GM103472 (Calhoun).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors would like to thank Tim Mounce, IT administrator at the MIND Research Network, for valuable help installing and mounting software and drives in our Linux servers. We would also like to thank Dr. Tülay Adalı, Professor at the University of Maryland Baltimore County, for giving valuable comments.

The Supplementary Material for this article can be found online at:

^{1}The choice of whitening or not the subject-level data changes the final group PCA estimates (Calhoun et al.,

^{2}

^{3}Direct reduction of the largest dimension of subject-level data is

^{4}Large data means

^{5}This was in addition to subject-level whitening.

^{6}Notice that the Krylov subspace size increases with larger block sizes, as well as the time to solve economy-size singular value decomposition for each iteration.

^{7}MATLAB's

^{8}