A Constrained ICA-EMD Model for Group Level fMRI Analysis

Independent component analysis (ICA), being a data-driven method, has been shown to be a powerful tool for functional magnetic resonance imaging (fMRI) data analysis. One drawback of this multivariate approach is that it is not, in general, compatible with the analysis of group data. Various techniques have been proposed to overcome this limitation of ICA. In this paper, a novel ICA-based workflow for extracting resting-state networks from fMRI group studies is proposed. An empirical mode decomposition (EMD) is used, in a data-driven manner, to generate reference signals that can be incorporated into a constrained version of ICA (cICA), thereby eliminating the inherent ambiguities of ICA. The results of the proposed workflow are then compared to those obtained by a widely used group ICA approach for fMRI analysis. In this study, we demonstrate that intrinsic modes, extracted by EMD, are suitable to serve as references for cICA. This approach yields typical resting-state patterns that are consistent over subjects. By introducing these reference signals into the ICA, our processing pipeline yields comparable activity patterns across subjects in a mathematically transparent manner. Our approach provides a user-friendly tool to adjust the trade-off between a high similarity across subjects and preserving individual subject features of the independent components.


Introduction
Independent component analysis (ICA) is a data driven tool, which is widely employed for functional magnetic resonance imaging (fMRI) data analysis.Based on a linear mixture model, either spatially (McKeown et al., 1998) or temporally (Biswal and Ulmer, 1999) independent components (ICs) can be obtained with ICA, without the requirement of prior information in form of anatomical regions of interest or temporal activation profiles.One problem of ICA is that, because of inherent indeterminacies, it is in general not suitable for group studies.Different subjects have different time courses and spatial maps, and the extracted components will be sorted differently.This can make it difficult to find comparable activation patterns between subjects and draw inferences from subject groups.So far various approaches have been proposed in order to overcome these shortcomings of ICA (Calhoun et al., 2009).Combining components obtained by single subject ICA-based on spatial correlation or clustering was proposed by Calhoun et al. (2001a) and Esposito et al. (2005).Another possibility is the spatial or temporal concatenation of the individual datasets to obtain components in a single ICA step from a group dataset and the employment of back-reconstruction approaches to obtain subject specific components (Calhoun et al., 2001b;Svensén et al., 2002).These concatenation-based approaches were compared with a simple across-subject averaging by Schmithorst and Holland (2004).A more involved approach considered a tensorial extension of ICA, presented by Beckmann and Smith (2005).The authors extended the probabilistic ICA (PICA) model by adding a third dimension representing subject-related dependencies in addition to the spatio-temporal dimensions.The model represents a three-way factor analysis similar to the well-known PARAFAC model (Harshman and Lundy, 1994).
A popular paradigm used to generate data for applying the above mentioned exploratory matrix factorization techniques is the so-called restingstate.During this paradigm, subjects either rest with eyes open fixating a fixation cross, or with eyes closed.Usually, subjects are instructed not to fall asleep and to let their mind wander.Contrary to the simplicity of the paradigm, the generated database has complex spatial structure and temporal dynamics, which arise from low-frequency fluctuations in the BOLD signal (Fox and Raichle, 2007).Furthermore the data is characterized by large spatial dimension and the lack of triggers or any underlying paradigm, which are usually used in task-based fMRI investigations.Because of the latter two aspects, exploratory matrix factorization techniques are appropriate to handle the large amount of data and explore the complex spatial and temporal structure.In this context, ICA-based pipelines have emerged as a state-of-the-art approach to investigate rs-fMRI data (Allen et al., 2011;Remes et al., 2011).This decomposition of resting-state fMRI data results in a so-called parcellation of the cortex into brain networks composed of functionally similar brain areas.In the literature, common brain networks are default-mode, cognitive control, visual, somatomotor, sub-cortical, auditory, cerebellar, depending on the function of the brain areas included in each network (Allen et al., 2014).
In this paper, a hybrid method is proposed for extracting resting state networks (RSNs) from fMRI data, based on constrained ICA (cICA) and empirical mode decomposition (EMD).This constrained extension of ICA optimizes besides the statistical independence, the similarity to a given reference signal.Incorporating a reference into ICA, in the framework of an augmented Lagrangian approach, helps to obtain more robust ICs and can eliminate the ambiguities of ICA (Lin et al., 2009;Lu and Rajapakse, 2006;Rodriguez et al., 2014).In this paper, like (Lin et al., 2009), spatial reference maps were employed to extract resting state networks from fMRI data.It is shown that an EMD based image decomposition technique, denoted as Green's function in tension based bi-dimensional ensemble EMD (GiT-BEEMD) (Al-Baddai et al., 2016b), produces suitable references for cICA.This two-dimensional extension of EMD allows to decompose images into so called bi-dimensional intrinsic mode functions (BIMFs) and can also be used to decompose volumetric fMRI images slice-wise.Because of its inherent natural ordering of the extracted intrinsic modes according to their spatial frequencies, EMD can easily generate prototype spatial maps.Similar spatial maps obtained with the EMD for each subject can be identified and averaged across subjects.In a next step these prototype spatial maps can serve as reference signals for a constrained ICA applied in parallel to the entire group of subjects.In this workflow the references are obtained from the same dataset as used for the analysis, so no prior information is required.A resting state fMRI dataset from the Human Connectome Project (Essen et al., 2012) is used to compare the results to those obtained by the widely used group ICA (gICA) based on temporal concatenation (Calhoun et al., 2001b).Finally potential benefits of the proposed approach are emphasized, by showing that this approach allows the user to actively shape the extracted resting state networks.The trade-off between enforcing a certain similarity across subjects and preserving individual subject features can be determined, and can be well adapted to optimally fulfill the requirements of different studies.

Material and Methods
The following subsections introduce the dataset employed, and describe the data analysis techniques, which combine cICA and GiT-BEEMD, as well as the processing steps of the gICA approach used for comparison.Also a flowchart of the proposed signal processing chain is provided.

Dataset
For this study a data set from the Human Connectome Project was employed (Essen et al., 2012).The S1200 release includes data from subjects which participated in four resting state fMRI sessions, lasting 14.4 minutes each and resulting in 1200 volumes per session.Customized Siemens Connectome Skyra magnetic resonance imaging (MRI) scanners at Washington University with a field strength of B 0 = 3 Tesla were employed for data acquisition, using a multi-band (factor 8) technique (Feinberg et al., 2010;Moeller et al., 2010;Setsompop et al., 2012;Xu et al., 2012).The data was collected by gradient-echo echo-planar imaging (EPI) sequences with a repetition time T R = 720 ms and an echo time T E = 31.1 ms, using a flip angle of θ = 52 • .The field of view was F OV = 208 mm × 180 mm and N s = 72 slices with a thickness of d s = 2 mm were obtained, containing voxels with a size of 2 mm × 2 mm × 2 mm.The preprocessed version, including motion-correction, structural preprocessing and ICA-FIX denoising was chosen (Fischl, 2012;Glasser et al., 2013;Griffanti et al., 2014;Jenkin-son et al., 2002Jenkin-son et al., , 2012;;Salimi-Khorshidi et al., 2014;Smith et al., 2013).For the comparison of the two approaches, 10 sessions from 10 different subjects were selected from the database.A Gaussian smoothing with a half width F W HM = 5 mm was then applied, using the SPM12 software package1 , and the first five images were discarded to account for magnetic saturation effects.

A hybrid cICA -EMD approach
In this section a new approach to deal with an ICA analysis across a group of subjects will be described.The flowchart in Figure 1 presents an overview of the various steps of the data analysis.All processing steps were performed in MATLAB 9.3 Release 2017b.

Preprocessing
The data as obtained from the data repository will be further pre-processed as explained in the following.
• In a first step, the voxel time series were linearly detrended and the voxel values were transformed to have zero mean and unit variance.
• Next a mask common to all subjects was used to exclude voxels that were located outside of the brain.The mask was created, employing the GIFT toolbox2 , using the "Generate mask" option.
• The data of subject s is stored in a K × L-dimensional matrix X (s) , containing in its columns x l (k) the temporal evolution of L brain voxels at K time points.
• Then principal component analysis (PCA) related dimension reduction is performed, based on the singular value decomposition (SVD) of contains the eigenvectors of the covariance matrix C (s) ∝ X (s) X (s)T in its columns.The eigenvectors with largest eigenvalues indicate the directions of largest variance, which are denoted as principal components (PCs) (Jolliffe, 2014).First reducing the data with PCA, then extracting BIMFs with BEMD from the reduced data.Further these BIMFs of each subject are combined in order to get shared references for cICA, which finally help to obtain comparable ICs across subjects.
• Next, the fMRI images of each subject were projected onto the first s) .This reduces the number of images per session to M = 20 < K.
• Finally, from the reduced data sets the image slices were reconstructed to enter into the GiT-BEEMD analysis, while the reduced data X (s) M entered directly into the cICA processing path.The number of selected principal components M determines the number of sources, estimated in the cICA step.A relatively low order of M = 20 was chosen, to obtain robustly observed, large-scale resting state networks (van den Heuvel and Pol, 2010), and to make it easy to identify extracted networks, which are suitable for comparison of results in section 3.

Green's function-based ensemble empirical mode decomposition
For the next step, the extraction of suitable reference signals for cICA, the GiT-BEEMD technique was employed (Al-Baddai et al., 2016b).The idea of this technique is to decompose a two-dimensional brain slice I(r) = I(x, y) into bi-dimensional intrinsic mode functions (BIMFs): Here b j (x, y) denotes the j-th BIMF, which is estimated iteratively as described in Appendix A.1, and, in our notation, we include the residuum r(x, y) as intrinsic mode b J (x, y).The first extracted BIMF contains the highest spatial frequency, which will decrease in every additionally extracted BIMF (Al-Baddai et al., 2016b).Each brain slice was decomposed into five intrinsic modes and one residuum by repeating the sifting step five times.The ensemble step was repeated only twice, whereby noise was either added or subtracted from the data once at each step.The assisting noise was generated with a noise amplitude of a η = 0.2.The tension parameter was initialized to T 1 = 0.9 and reduced after the extraction of the j-th BIMF b j to T j+1 = T j − 1 J .This avoids blob-like artifacts in low frequency modes, if the tension parameter is set too high (Al-Baddai et al., 2016b).An example of a decomposition is provided in figure 2. BIMFs with high spatial frequencies show highly localized spatial activation patterns which are spread all over the brain slice, while BIMFs with lower spatial frequencies concentrate the activity in specific areas in the brain.A combination of lowest spatial frequency intrinsic mode plus the residuum, i. e. b 5 (x, y) + b 6 (x, y) ≡ b 56 (x, y), proved most appropriate to be used as a reference mode for cICA.From all decomposed two-dimensional slices, the corresponding modes b 56 (x, y) were organized into a three-dimensional data array, which then was concatenated into a 3D volume intrinsic mode function (VIMF).Decomposing the M = 20 brain volumes per subject in PC subspace results in M VIMFs per subject.For the next processing steps the voxels inside of the brain are sorted into an M × L matrix again, denoted as Figure 2: Example of a decomposition of a slice with GiT-BEEMD into 5 BIMFs and a residuum.Also the combination of fifth BIMF and residuum is illustrated.A mask was applied after the decomposition to set all intensity values that were located outside of the scanned brain to zero.
In order to extract from each subject RSNs, which are consistent across the proband cohort, corresponding intrinsic modes need to be identified.Averaging most similar modes between subjects then yields proper common reference signals for all subjects to be employed in a cICA of fMRI datasets.A visual inspection of the VIMFs of any two randomly chosen subjects showed that corresponding spatial patterns might occur in different rows of Hence first the extracted VIMFs need to be ordered according to their similarity between subjects.An efficient way to assign similar VIMFs between subjects is offered by the assignment algorithm proposed by Munkres (1957), based on the Hungarian method developed by Kuhn (1955).After computing the proper correspondences between the VIMFs of all subjects, the references R, to be used in the cICA algorithm, then are obtained by averaging the corresponding VIMFs across all subjects.
To summarize, the reference signals are computed as follows: 1. Initialize the reference as R = V (1) M 2. For s = 2, . . ., S, do: (a) Apply the Hungarian algorithm and re-order the rows of To apply the Hungarian algorithm, a cost function 1 − ρ(R, V M ) is defined, which, if optimized, results in an ordering of the rows in V (s) M such that the sum of the correlation coefficients between pairs of rows in the two matrices is maximized.Note that the algorithm achieves the re-ordering without calculating all M !possible assignments.Finally, each row of R is normalized, having entries with zero mean and unit variance, and M = 20 references for cICA algorithm are obtained.

Constrained ICA
Sources Y M = [y 1 , . . ., y L ], y ∈ R M can be blindly estimated from the mixtures X M = [x 1 , . . ., x L ], x ∈ R M according to: with the demixing matrix defined as W = [w 1 , . . ., w M ] T , where w T m are the rows of the demixing matrix and X M collects all L samples of the projected data after being spatially transformed to zero mean and unit variance.
Finding a demixing matrix is solved by designing an optimization problem where inequality and equality constraints are integrated in an augmented Lagrangian formulation.The inequality constraint terms in the Lagrange function are re-written as equality constraints with the help of a slack variable (Lu and Rajapakse, 2006).After finding the optimal value of these slack variables, the modified version of the augmented Lagrangian function is written as where the µ m are the Lagrangian parameters, while γ m represents a user defined penalty.The first term J(W) reflects the cost function of ICA and the second term in the Lagrangian is related with the inequality constraint, which compares the m-th extracted component with the corresponding reference signal: where (•) is a similarity measure and ς m is a threshold parameter.Similarity is conventionally expressed either through a correlation measure E{y m r m } or the mean squared error E{(y m − r m ) 2 }, with y m = w T m x.The expected value is approximated by an average over the available data.
Estimating the demixing matrix W, given the constraint introduced above, can be achieved in different ways, based either on negentropy-like costs functions (points 1 -3) or on a maximum likelihood estimate (point 4): 1. Simply one IC, most similar to the given reference signal, can be extracted.This approach can easily be extended to a multi-reference cICA.However, this additionally requires a decorrelation of the weights during each iteration to prevent different weights from converging to identical estimations (Lu and Rajapakse, 2006).
2. Lu and Rajapakse (2006) introduced an objective function for cICA,which contained an additional equality constraint to bound the weights.Later, a simplification was introduced by Lin et al. (2007), where equality constraints were omitted, rather the weight vectors were normalized at each iteration instead.
3. Also cICA based on fixpoint learning (Lin et al., 2009) was proposed, which should overcome the limitations of the second order Newton-like learning used in the cICA algorithm of Lu and Rajapakse (2006).

4.
Finally, yet another version, using a cost function J(W) based on a maximum likelihood estimate, has been proposed (Rodriguez et al., 2014) according to An iterative procedure is then derived to update the parameters µ = [µ 1 , . . ., µ M ] T and the de-mixing matrix W. Thereby a decoupling scheme based on a Gram -Schmidt orthogonalization is proposed, finally yielding the following objective function where the decoupling vector d m ∈ R M ×1 is defined through Wm d m = 0 and where Wm ∈ R (M −1)×M denotes the de-mixing matrix without entries to the m-th row.
It has been shown by Cardoso (1997) that a maximum likelihood approach to ICA is equivalent to the seminal Infomax approach put forward by Bell and Sejnowski (1995).Thus, in analogy to the maximum likelihood approach, a constrained and decoupled version of the extended Infomax algorithm can be obtained (Rodriguez et al., 2014).The extended Infomax algorithm is often used in an analysis of fMRI data (Correa et al., 2007) and was also used in this study as the basis of the cICA.A more detailed description of this algorithm and a proper metacode are given in Appendix A.2 for the convenience of the reader.
The data, projected onto the first M = 20 PCs, and the M references, transformed to zero mean and unit variance, together enter the cICA algorithm to finally extract M = 20 ICs.The weights are initialized with small random values, and the learning rate for the weights is set to η = 0.5.The scalar penalty can be set to 3 (Rodriguez et al., 2014).The influence of the references can be well determined by adjusting the threshold parameter.Therefore different settings have been studied, using the correlations E{y m r m } as distance measures, and results are presented in section 3.

Group-ICA
Generally fMRI data is compared across a group of subjects by employing the gICA algorithm put forward by Calhoun and his group (Calhoun et al., 2001a).This gICA is made available in the GIFT toolbox 3 and was incorporated in the study for comparison.Voxel time series were preprocessed by variance normalization through linear detrending and transforming the data to zero mean and unit variance.The single subject data matrices X (s) K×L enter the first PCA step, with the temporal evolution of L brain voxels at K time points in columns.The subject datasets then were projected onto the first M = 1.5 • M = 30 PCs in this step by applying an SVD to the data matrix.Then the S reduced M × L matrices X

Results
The goal of the study was to compare RSNs obtained with the newly proposed cICA-EMD approach as opposed to RSNs resulting from the conventional gICA approach.RSNs denote functionally connected brain areas which, however, are anatomically separated but maintain a high level of activity in a resting state of the proband.They are represented in this study by the ICs extracted with the discussed techniques.In this study M = 20 ICs were extracted with either method.Comparable RSNs, obtained by the different approaches, were identified by visual inspection and are depicted in figure 3. There, references used for cICA are shown in the first row, while in the second row the ICs obtained therewith are presented, computed as mean ICs over subjects.In the third row of figure 3 the mean ICs obtained by gICA are exhibited.The significance of the resulting ICs was tested with a one sample student's T-test by employing the SPM12 software package4 .The resulting spatial maps of t-values are depicted in the fourth and fifth row in figure 3. Spatial maps were thresholded at a significance level of p < 0.001 (t = 4.30, df = 9).
Most prominent brain areas are in IC 10/6 (obtained by cICA-EMD/gICA) the left inferior parietal lobule, in IC 17/8 the right angular/supramarginal Figure 3: This figure illustrates RSNs, which were obtained by the two different approaches.The first row shows the references used for cICA, while the second row exhibits mean ICs, averaged over the subject cohort and computed with the newly proposed cICA-EMD method.These ICs are contrasted in the third row with mean ICs obtained by gICA.In the fourth and fifth row of this figure, t-values of the RSNs can be found.All depicted slices were chosen such that they intersect the peak activation voxel of the corresponding ICs obtained by gICA.For visualization purposes, the activations of the networks shown in the first three rows are normalized to zero mean and unit variance.Furthermore, in accordance with common usage, the voxel intensities Î(r) were thresholded by Î(r) > 2, and in the sixth and ninth column the threshold was adjusted to Î(r) > 1.5 for better recognizability of the networks.The color range of the heatmap was adjusted to the largest intensity value in every pictured slice.gyrus, in IC 4/11 the superior occipital gyrus, in IC 9/13 the right inferior frontal gyrus, in IC 20/18 the anterior cingulate cortex, in IC 6/1 the precentral gyrus, in IC 2/9 the paracentral lobule, in IC 8/2 the middle occipital gyrus and in IC 3/7 the middle temporal gyrus.The obtained independent networks, representing well observed RSNs (van den Heuvel and Pol, 2010), and can be further grouped based on to their functions.The corresponding attentional and default mode networks are depicted in the first five columns, while the extracted auditory and sensorimotor networks are shown in the sixth and seventh column.Next, in the eighth and ninth column, visual networks are represented, and in the last column the cerebellum is shown.The similarity threshold for cICA was set to ς = 0.5 in this example.All resting state networks obtained with both approaches, including the employed references are provided in the supplement.
The motivation of different group ICA approaches is to make this explorative analysis technique suitable for studies, where it is necessary to compare extracted networks between different subjects.This means issues with permutation indeterminacy and reproducibility of ICA have to be overcome, to obtain well comparable networks.Therefore a measure of interest for the evaluation of the two different approaches could be the consistency of activation patterns across subjects.This consistency was quantified by measuring how strong a resting state network (y (s) ) T m ∈ R L from one subject s differs on average from the mean network y T m = 1 S S s=1 (y (s) ) T m across subjects.Pearson's correlation ρ (y (s) ) T m , y m was used to measure the correlation between standardized subject networks y (s) m and the related mean networks y T m .The following consistency measure is used: This consistency measure was evaluated for all ICs obtained with the cICA-EMD approach for different settings of the similarity threshold ς.An equivalent procedure was followed using the results from applying the gICA algorithm.The results are listed in table 1.By adjusting the threshold parameter ς, it is possible to well determine the influence of the constraint during the optimization, so choosing a smaller threshold allows for more variability in the estimated components across subjects.Increasing the threshold increases the similarity between subject specific components and common references.This means that if the similarity between every subject component and the shared reference increases, also the similarity of components across subjects will increase, what is quantified by the consistency measure in equation 7. Figure 4 illustrates the behavior of the consistency in dependence of the similarity threshold ς in comparison to gICA.If the threshold parameter is set to a value of ς = 0.40, the consistency

Discussion
The motivation of this paper was to propose a novel workflow for extracting resting state networks consistent across a group of subjects.At first the dimensionality of the fMRI dataset was reduced at subject level with PCA.From that data intrinsic modes were extracted by employing the GiT-BEEMD algorithm (Al-Baddai et al., 2016b).These subject-specific intrinsic modes reflect spatial activity patterns at different spatial frequencies.Hence, for each underlying spatial frequency a common reference mode can be formed.It turned out that low frequency modes concentrate the activity into spatially contiguous patterns and were especially well suited to serve as  The numbers on the x-axis refer to the IC number of the cICA-EMD versus the gICA approach.
reference modes for the extraction of independent components with a cICA algorithm.Note that if intrinsic spatial modes, which are naturally ordered according to their dominant local spatial frequency, are chosen as reference signals within a cICA, the resulting independent modes are also ordered in correspondence to their assigned intrinsic modes.Thus, the natural ordering of the intrinsic modes with respect to their spatial frequencies helps to overcome the permutation ambiguity of ICA in extracting consistent resting state networks across subjects.It was demonstrated that our fMRI data processing pipeline produces commonly observed resting state patterns.These functional networks were then compared to those obtained by the widely used gICA, based on a temporal concatenation of individual datasets (Calhoun et al., 2001b).It was shown that with the constrained extended Infomax algorithm (Rodriguez et al., 2014), the influence of the references upon the estimation of the related ICs could be well controlled.Based on the mathematically well described augmented Lagrangian framework in our workflow, it is transparent to the user with respect to how homologous resting state networks across subjects are deduced.In the presented processing pipeline, the cICA-EMD approach also allowed to shape the optimization procedure by adjusting the threshold parameter, which determines the impact of the reference onto the IC extraction.Choosing a lower threshold, e.g.allowing for a lower similarity to the references, more freedom could be given to the exploratory character of ICA and the formation of subject specific features.Defining a high threshold resulted, across subjects, in a higher consistency of the extracted resting state networks.These RSNs were even more consistent than those obtained by the conventional gICA.Although there is no exact ground truth on how resting state networks should ideally look like, the threshold can be chosen in a way such that the obtained networks optimally fulfill the requirements of a particular study.For example, when performing a classification task, the threshold can be chosen to maximize the accuracy of the classifier.So the good interpretability and high flexibility of the proposed processing pipeline can offer beneficial properties for applications in resting state studies.
considered as the known points of the envelope surface, which can be found with an 8-connected neighborhood strategy.Then surface envelopes s(r u ), at Cartesian coordinates r u = [x u , y u ] T , are represented as a weighted sum of Green's functions (Wessel and Bercovici, 1998): where Φ(r u , r n ) represent the Green's functions and v n the corresponding weights.Further r u denotes a point where the surface is unknown and r n describes the n-th constraint, which corresponds to a local extremum.The Green's function, expressed in 2D-Cartesian coordinates, reads as: with K 0 (•) representing the modified Bessel function of the second kind and zero order and |•| the Euclidean distance.Here p 2 = T D is related with tension T at the boundaries, and D describes the flexural rigidity of the surface (Wessel and Bercovici, 1998).The estimation of the envelope surface is based on two steps.In a first step weights v n can be estimated by taking the known values of local maxima (or minima) as the values s(r n ) = [s(r 1 ), . . ., s(r N )] T in a total of N locations r n and solving a linear system of N equations, described by equation 8.In a second step, if the weights v n are now obtained, the surface can be estimated at any point r u .
To avoid mode mixing and boundary artifacts, a noise assisted ensemble version of the Green's function based BEMD (GiT-BEEMD) can be used (Al-Baddai et al., 2016a).Adding and subtracting noise η from the original image f (x, y) leads to two noisy versions f (x, y) * : f + (x, y) = f (x, y) + η and f − (x, y) = f (x, y) − η.Both versions f (x, y) * can now be decomposed into BIMFs.By computing the mean as 0.5 ( f + (x, y) + f − (x, y)) the original array f (x, y) could be reconstructed and therefore after decomposing the two versions f (x, y) * , the BIMFs of f (x, y) can be naturally obtained by averaging BIMFs of the noisy versions.For this version of BEMD it is sufficient to use a few ensemble steps only to improve the image quality significantly (Al-Baddai et al., 2016b), reducing computational load.The two-dimensional image decomposition was applied to the slices of volumetric fMRI images in the transverse anatomical plane.

A.2 Non-orthogonal constrained extended Infomax
The objective function for ICA, based on Maximum Likelihood, can be derived as (Hyvärinen et al., 2001) J(W) ≈ E For the algorithm a convergence metric like [(vec(∆W)) T vec(∆W)] < τ can be adapted (Rodriguez et al., 2014).Here ∆W is the element by element difference of W after each iteration, vec(•) stores all elements of a matrix in a column vector and τ is the tolerance value.So the non-orthogonal constrained extended Infomax can be summarized in the following steps: B.2 ICs obtained by the cICA -EMD approach

M
on subject level were concatenated to an (S • M ) × L group matrix X (g) S•M entering a second PCA step.The group matrix is projected onto M = 20 PCs, resulting in a reduced M × L matrix X (g) M .The spatial maps S M are extracted from X (g) M = AS M by the extended Infomax algorithm (Lee et al., 1999)  and by additionally employing the ICASSO option(Himberg et al., 2004), running the ICA algorithm ten times with different initializations to assure greater stability.Subject specific spatial S (s) M maps were obtained by the GICA3(Erhardt et al., 2011) back-reconstruction approach.From these maps M mean networks S m * , m = 1, . . ., M were obtained by averaging S (s) M over the subjects, i. e. S M = 1

Figure 4 :
Figure 4: The figure illustrates the consistency values of the respective ICs.The numbers on the x-axis refer to the IC number of the cICA-EMD versus the gICA approach.
a decoupling strategy for the second term of cost function resulting in a objective function for each of the rows of the mixing matrix(Rodriguez et al., 2014)J(w m ) ≈ E log(p(w T m x)) + log |(d T m w m )| + log (S)(11)withS = |det Wm WT m |,where Wm is the de-mixing matrix without m − th row.The decoupling vector is the vector Wm d m = 0 and can be computed as d m = (I − WT m a vector with Gaussian random values.The vector gradient of this function is then derived as∇ wm J(w m ) = E f m (w T m x)x T + called score functions defined as:f m (y m ) = ∂ log p(y m ) ∂y m(14)The extended Infomax algorithm defines these non-linearities f m (y m ) differently for sub-Gaussian or super-Gaussian components(Lee et al., 1999).The Lagrangian multipliers µ m are updated by gradient ascentµ m ← max{0, γ m h m (y m , r m ) + µ m }(15)

Figure 6 :
Figure6: ICs obtained by the proposed approach.The depicted ICs are computed as mean ICs over subjects.Here the similarity threshold for cICA was set to ς m = 0.5.The ICs are presented in the three principal anatomical planes: transverse, sagittal and frontal.The three planes intersect the voxel with highest intensity value in every volumetric image.For visualization purposes, the activations of the networks shown are normalized to zero mean and unit variance, and the intensity of the pictured brain slice Î(r) was thresholded by Î(r) > 2. The color range is adjusted to the largest intensity value in every pictured slice.

Figure 7 :
Figure 7: ICs obtained by gICA.The depicted ICs are computed as mean ICs over subjects.The ICs are presented in the three principal anatomical planes: transverse, sagittal and frontal.The three planes intersect the voxel with highest intensity value in every volumetric image.For visualization purposes, the activations of the networks shown are normalized to zero mean and unit variance, and the intensity of the pictured brain slice Î(r) was thresholded by Î(r) > 2. The color range is adjusted to the largest intensity value in every pictured slice.

Table 1 :
Consistency values of ICs obtained by the proposed cICA-EMD approach for different settings of the similarity threshold ς, as well as the values of ICs obtained by gICA.ICs were sorted like in figure3, so each column shows consistency values of comparable extracted networks.