^{1}

^{*}

^{2}

^{3}

^{2}

^{3}

^{4}

^{2}

^{1}

^{1}

^{5}

^{1}

^{2}

^{3}

^{4}

^{5}

Edited by: Pulin Gong, University of Sydney, Australia

Reviewed by: Ben David Fulcher, Monash University, Australia; James A. Roberts, QIMR Berghofer Medical Research Institute, Australia

*Correspondence: Ricardo P. Monti

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

An exciting avenue of neuroscientific research involves quantifying the time-varying properties of functional connectivity networks. As a result, many methods have been proposed to estimate the dynamic properties of such networks. However, one of the challenges associated with such methods involves the interpretation and visualization of high-dimensional, dynamic networks. In this work, we employ graph embedding algorithms to provide low-dimensional vector representations of networks, thus facilitating traditional objectives such as visualization, interpretation and classification. We focus on linear graph embedding methods based on principal component analysis and regularized linear discriminant analysis. The proposed graph embedding methods are validated through a series of simulations and applied to fMRI data from the Human Connectome Project.

Functional connectivity describes the pairwise statistical dependencies which exist across spatially remote brain regions (Friston,

This has led to the development of several methods through which to quantify the dynamic properties of functional connectivity networks (Allen et al.,

However, obtaining robust and easily interpretable insights from the results of such algorithms raises important statistical challenges. The difficulties are further exacerbated by the fact that often a distinct network is estimated at each observation and potentially across many subjects. One potential solution involves testing for statistical correlations between the estimated edge strengths over time and underlying changes in cognitive task, thereby reporting the set of edges which is functionally modulated by a given task. While such methods are often advocated (Yao et al.,

In this work, we look to address the challenges associated with interpreting time-varying, high-dimensional networks via the use of linear graph embedding methods. Generally speaking, the objective of graph embedding techniques is to map estimated graphs into a (potentially low-dimensional) vector space (Yan et al.,

While a wide range of graph embedding techniques may be employed, in this work we limit ourselves to consider only methods based on linear projections over the edge structure of an estimated graph. This allows us to obtain a clear interpretation of the embedding in the context of functional connectivity. As a result, we consider two distinct graph embedding algorithms. The first embedding considered is based on Principal Component Analysis (PCA). This embedding, which is closely related to the work of Leonardi et al. (

The remainder of this manuscript is organized as follows: We introduce the aforementioned linear graph embedding techniques based on principal component and linear discriminant analysis in Section 2. An extensive simulation study is presented in Section 3. Finally, in Section 4 the proposed methods are applied to task-based fMRI datasets taken from the the Human Connectome Project (Elam and Van Essen,

Throughout this section it is assumed that estimates of time-varying functional connectivity networks have been obtained across a cohort of

The dynamic properties of functional connectivity networks can be quantified in many ways. One popular method for estimating such networks involves the use of sliding windows (Hutchinson et al.,

In this work our objective is to understand dynamic functional connectivity networks using linear graph embedding methods. Such methods allow for the representation of graphs or networks in real-valued vector spaces, resulting in two advantages. First, by embedding graphs in a Euclidean vector spaces we are able to employ traditional visualization and classification techniques. Second, by focusing on linear projections we are able to directly interpret the embeddings in the context of functional connectivity networks. The linear embedding methods considered in this work are based on principal component analysis and regularized linear discriminant analysis. Such methods correspond to unsupervised and supervised learning algorithms, respectively, indicating the they may be used in conjunction to further understand dynamic connectivity networks.

The remainder of this section is organized as follows: we introduce and discuss graph Laplacians in Section 2.1. In Sections 2.2 and 2.3 we introduce two distinct graph embedding methods.

The graph embedding techniques described in this work are based on the Laplacian of each estimated functional connectivity network. While there are a wide variety of different graph Laplacians which may be employed, throughout this work we consider the normalized graph Laplacian (Chung, ^{(s)} directly as input to the proposed graph embedding algorithms. Moreover, we define

In this section we discuss an unsupervised embedding method through which to obtain a low-dimensional embedding that maximizes the amount of explained variance. Following from the method described in Leonardi et al. (

Formally, PCA is an unsupervised dimensionality reduction technique which produces a new set of uncorrelated variables. This is achieved by considering the ^{T}_{k}, can be studied in two ways. First, by considering the entries of each principal component we are able to quantify the contribution of each edge to the principal component in question. As such, combinations of edges which co-vary highly within a dataset can therefore be expected to provide a large contribution to the leading principal components. As each principal component is defined as a weighted sum over the set of edges, they may be interpreted as a recovering a functional connectivity network. Second, the embedding produced by _{k} is obtained as:

While the PCA-driven embedding detailed in Section 2.2 was motivated by understanding the components of functional connectivity which demonstrated the greatest variability over time, we may also be interested in understanding which functional networks are most discriminative across multiple tasks. In this section we describe a supervised graph embedding methods through which to achieve this goal.

We propose the use of LDA to learn the functional connectivity networks which are most discriminative between tasks. LDA is a simple and robust classification algorithm which can also be interpreted as a linear projection (Hand,

In high-dimensional supervised learning problems, such as the one considered in this work, it is of paramount importance to avoid overfitting. Two popular methods to guard against overfitting involve the introduction of regularization, thereby penalizing overly complex models which are more susceptible to overfitting, and cross-validation. Here a combination of both approaches is employed. The proposed graph embedding first employs ℓ_{1} regularization methods in order to reduce the number of candidate edges to be included. This acts as a variable screening procedure whose goal is to retain all discriminative variables (in our case edges) whilst removing noise variables. The latter will correspond to edges which are not discriminative of the tasks in question. Such a variable screening procedure is discussed in Section 2.3.1. Given a subset of screened variables, an LDA classifier is subsequently trained as described in Section 2.3.2.

In this section we detail the variable screening procedure employed within the LDA-driven embedding. As discussed previously, overfitting is a significant problem in the context of high-dimensional supervised learning. Throughout this work, we look to address this issue via the use of a screening procedure.

Formally, the objective of the screening procedure employed is to significantly reduce the number of candidate variables to _{1} penalized LDA classifier was estimated for each subject. Such models can be efficiently estimated as described in Clemmensen et al. (_{1} penalty is parameterized by a regularization parameter. Such a parameter plays a fundamental role in the variable selection procedure and must therefore be carefully selected. While there are a wide range of methods through which to select the regularization parameter, in this work cross-validation was employed.

As a result, a regularized LDA model was estimated for each subject. This resulted in a sparse discriminant vector, β^{(s)}, for each subject. The sparse support of each β^{(s)} may then be studied in order to recover the set of most reproducible edges. Formally, we define the reproducibility of the _{i} effectively counts the proportion of subjects in which the _{i} is greater than some specified threshold, ρ ∈ [0, 1]. This serves to retain only the edges which are active within at least ρ% of all subjects. The set of screened edges which are active is defined as:
^{S·n×|A|}, consisting of only selected variables which have demonstrated reproducible discriminative power across all subjects. Finally, we note that it is important that the aforementioned variable selection procedure is implemented using only the training data and not the test dataset.

The first step of the proposed LDA-driven embedding corresponds to splitting the data into training and test data. Throughout this work, the training and test data were obtained by randomly dividing subjects. As such, a subset of subjects, _{train}, were considered for training and the remainder, _{test}, were retained for the purpose of testing. The variable selection procedure, detailed in Section 2.3.1, is subsequently applied to data corresponding to training subjects in order to prune the number of edges considered.

Once the variable screening procedure has been employed, screened Laplacian matrices are obtained for the training and test data, respectively. We write

Sparse LDA-driven embedding

Python code implementing all the graph embedding algorithms discussed in this section are provided in the supplementary material or can be directly downloaded from

In this section we provide empirical evidence to demonstrate the capabilities of the two graph embeddings methods introduced in Section 2. Throughout these simulations, we produce simulated time series data giving rise to a number of connectivity patterns which reflect those reported in real fMRI data. The data is generated such that the underlying connectivity varies over time and the Smooth Incremental Graphical Lasso Estimation (SINGLE) algorithm (Monti et al.,

In order to thoroughly test the capabilities of the proposed graph embedding algorithms, we look to generate simulated data which contains many of the characteristic properties often associated with fMRI data. There are two main properties of fMRI data which we wish to recreate in the simulation study. The first is the high autocorrelation which is typically present in fMRI data (Poldrack et al.,

In order to achieve this, we follow the simulation study described (Monti et al.,

Following Monti et al. (

In many task based studies, subjects are required to alternate between performing a cognitive task and resting in a cyclic fashion. As such, the simulations presented in this work consist of a cyclic connectivity structure, where the underlying connectivity varies between two simulated networks. As a result, multivariate, simulated data was generated where the underlying covariance structure alternated in a cyclic fashion. As noted previously, we consider three distinct network structures: Erdős-Rényi, scale-free and small-world networks. Furthermore, networks were simulated with ^{n}/_{p} decreases. Throughout this simulation,

In order to evaluate the empirical performance of the graph embedding methods we consider the discriminatory power of the estimated embeddings when predicting the underlying covariance structure. As the underlying covariance structure is simulated to alternate between two network structures, this corresponds to binary classification task and traditional classification scores, such as the area under the ROC curve (AUC), can be employed (Krzanowski and Hand,

Data was simulated as described in Section 3.1. The SINGLE algorithm was subsequently applied in order to estimate time-varying functional connectivity networks for each subject. A detailed review of the algorithm is provided in Appendix

We begin by studying the performance of the PCA-driven embeddings. Recall that the objective of this method is to obtain a low-dimensional embedding which maximizes the amount of explained variance. Figure

In order to be obtain a more comprehensive understanding regarding the performance of the embedding, we consider the predictive power of the embeddings when trying to uncover the underlying covariance structure. In this setting, the underlying covariance structure was treated as a binary variable with two classes: each of which serves to indicate one of the two underlying connectivity regimes. The embedding corresponding to the leading principal component was then employed to discriminate across the class. The AUC score was then employed to obtain a measure of the discriminative capabilities of the embedding (Krzanowski and Hand,

10 | 0.94 (0.02) | 0.94 (0.04) | 0.92 (0.05) |

25 | 0.95 (0.03) | 0.88 (0.07) | 0.80 (0.08) |

50 | 0.91 (0.03) | 0.84 (0.06) | 0.79 (0.07) |

100 | 0.73 (0.05) | 0.76 (0.06) | 0.70 (0.05) |

150 | 0.66 (0.06) | 0.64 (0.05) | 0.67 (0.06) |

10 | 0.97 (0.01) | 0.96 (0.05) | 0.97 (0.06) |

25 | 0.95 (0.03) | 0.93 (0.06) | 0.83 (0.07) |

50 | 0.90 (0.04) | 0.89 (0.06) | 0.78 (0.07) |

100 | 0.75 (0.06) | 0.77 (0.05) | 0.73 (0.06) |

150 | 0.68 (0.05) | 0.70 (0.04) | 0.69 (0.06) |

While the PCA-driven embeddings are motivated by the need to understand components of estimated networks which demonstrate the greatest variability, it is also important to consider embeddings which are discriminative across multiple cognitive tasks. The LDA-driven embeddings introduced in Section 2.3 are one potential method through which to achieve this. Briefly, the objective of such an embedding is to learn a linear combination of edges which is maximally discriminative across across tasks.

The fundamental difference between the PCA and LDA-driven embeddings is that the latter is a supervised embedding. As a result, it is crucial to avoid any potential overfitting. As described in Section 2.3, the proposed method employs a variable screening procedure based on regularized models. This use of regularization also serves to penalize complex models which are naturally more prone to overfit.

We note that the underlying covariance structure was simulated in a cyclic fashion which alternated between two distinct regimes. As a result, the objective of the proposed embedding is to differentiate between two distinct classes. Due to the properties of LDA, this results in a 1-dimensional embedding (Hastie et al., ^{n}/_{p} leads to a corresponding increase in the variability of estimated networks. This may be partially responsible for the difference in embeddings shown in Figure ^{n}/_{p} does not result in significant changes to the magnitude of estimated embeddings.

In this section we present an application of the proposed graph embedding techniques to a task-based fMRI dataset taken from the Human Connectome Project (Elam and Van Essen,

The data consisted of working memory task data taken from the Human Connectome Project (Elam and Van Essen, ^{1}

Preprocessing involved regression of Friston's 24 motion parameters from the fMRI data (Friston et al.,

As in the simulation study, time-varying functional connectivity networks were estimated for each subject using the SINGLE algorithm. This required the specification of three hyper-parameters: the width, _{1} and λ_{2}. A fixed kernel width of _{1} and λ_{2}. In order to reduce the computational burden associated with selecting λ_{1} and λ_{2}, an initial search was performed on a reduced subset of the subjects. This served to identify a region of the parameter space that was consistently selected across subjects, thereby greatly reducing the computational cost.

The estimated functional connectivity networks produced by the SINGLE algorithm were subsequently analyzed using the proposed graph embedding methods. Recall that the objective of the PCA-driven embedding was to provide a low-dimensional embedding which captures a large portion of the variability present in the data. This was achieved in an unsupervised manner by considering the embeddings associated with the

Figure ^{2}

The left panel of Figure

In contrast to the PCA-driven embeddings, the LDA-driven embeddings are a supervised method which seeks to identify a reduced subset of edges which are discriminative across tasks. In this section we study the contrast between 0-back and 2-back working memory tasks^{3}

The results for the LDA-driven embedding are shown in Figure

We are also able to study the embedding in the context of the associated functional connectivity network, shown in Figure

The study of dynamic functional connectivity networks is an important avenue of neuroscientific research which has become popular in recent years (Calhoun et al.,

In this work we look to address these issues via the use of graph embedding methods based on linear projections over the set of edges. The motivation behind the use of linear methods stems from the fact that they may subsequently be interpreted in the context of functional connectivity. As a result, such methods allow for the identification of entire networks which vary throughout a task. In this manner, we are able obtain a more holistic understanding of the dynamic reconfigurations which are taking place.

Formally, the two embedding methods presented in this work are based on principal component analysis and linear discriminant analysis, respectively. These two approaches correspond to unsupervised and supervised learning methods, respectively, and can therefore be seen as complementary tools through which to understanding dynamic functional connectivity in further detail. The PCA-driven embedding presented is closely related to the eigen-connectivity approach introduced by Leonardi et al. (

The empirical capabilities of the proposed embeddings are studied throughout a series of simulation studies. These involved the generation of synthetic data whose properties resemble many of those typically reported in fMRI data. These simulations provide an important insight into the performance of the proposed graph embeddings. In particular, they serve to highlight the drop in performance as the number of regions increases given a fixed number of observations as can be clearly seen in Table

We further note that the simulations presented have employed the SINGLE algorithm to obtain sparse functional connectivity networks. However, the graph embedding methods presented do not require sparse networks

It is also important to consider the limitations of the proposed methods. Firstly, both embeddings are rooted on the assumption that an equal number of observations are available across all subjects. If this were not the case the estimated embeddings would be biased toward subjects with larger number of observations. Furthermore, the PCA-driven embedding described in Section 2.2 is also premised on the assumption that variability in connectivity structure over time dominates an inter-subject variability. It follows that if this were not to be the case, the embedding would instead recover the set of edges which display the greatest inter-subject variability. This is perhaps a more pertinent issue in the context of resting state data as we expect there to be clear changes in network structure induced by distinct cognitive tasks.

An important area for future work would be to study the performance of the proposed graph embedding methods on fMRI data consisting of very large numbers of brain regions (e.g., in the hundreds or thousands). It would also be of interest to consider more complex graph embeddings which further exploit the properties of networks, for example via the use of heat kernels (Chung et al.,

RM: Conceived methods. Ran simulations. Analyzed application results. Wrote the manuscript. RLo: Analyzed application results. Wrote the manuscript. PH: Analyzed application results. Wrote the manuscript. RLe: Analyzed application results. Wrote the manuscript. CA: Conceived methods. Wrote the manuscript. GM: Conceived methods. Wrote the manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at:

^{1}The selection of 206 subjects was based on the data available at the time of the study. Due to the computational burden associated with preprocessing and preparing data, only data for 206 randomly selected subjects was available.

^{2}Note that only LR acquisition datasets plotted here, as the task design varied from LR to RL acquisitions.

^{3}We note that alternative contrasts may also have been employed.