^{*}

Edited by: Xi-Nian Zuo, Institute of Psychology (CAS), China

Reviewed by: Wolfgang Schoppek, University of Bayreuth, Germany; Ariel Telpaz, General Motors, United States

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Circular data is data that is measured on a circle in degrees or radians. It is fundamentally different from linear data due to its periodic nature (0° = 360°). Circular data arises in a large variety of research fields. Among others in ecology, the medical sciences, personality measurement, educational science, sociology, and political science circular data is collected. The most direct examples of circular data within the social sciences arise in cognitive and experimental psychology. However, despite numerous examples of circular data being collected in different areas of cognitive and experimental psychology, the knowledge of this type of data is not well-spread and literature in which these types of data are analyzed using methods for circular data is relatively scarce. This paper therefore aims to give a tutorial in working with and analyzing circular data to researchers in cognitive psychology and the social sciences in general. It will do so by focusing on data inspection, model fit, estimation and hypothesis testing for two specific models for circular data using packages from the statistical programming language R.

Circular data arises in almost all fields of research, from ecology where data on the movement direction of animals is investigated (Rivest et al.,

However, despite the fact that circular data is being collected in different areas of cognitive and experimental psychology, the knowledge of this type of data is not well-spread. Circular data is fundamentally different from linear data due to its periodic nature. On the circle, measurements at 0° and 360° represent the same direction whereas on a linear scale they would be located at opposite ends of a scale. For this reason circular data require specific analysis methods. Some less technical textbooks on analysis methods for circular data have been written (Batschelet,

Therefore, this paper aims at giving a tutorial in working with and analysing circular data to researchers in cognitive psychology and the social sciences in general. The main goal of this tutorial is to explain how to inspect and analyse your data when the outcome variable is circular. We will discuss data inspection, model fit, estimation and hypothesis testing in general linear models (GLM) and mixed-effects models. In this tutorial we decide to mainly focus on one particular approach to the analysis of circular data, the embedding approach. We do so for the flexibility of this approach and the resulting variety in types of models that have already been outlined in the literature on circular data for this approach. Note that for an optimal understanding of the paper, the reader should ideally have some knowledge on

The structure of the tutorial is such that the reader is guided by two examples throughout the paper. One is an example for an ANOVA model and the other for a mixed-effects model. First however, we give a short introduction to circular data in general. Then we introduce the ANOVA example after which descriptive methods for circular data are explained through a section on data inspection for this example. After that we will continue with an analysis of the example datasets. First we analyse the ANOVA dataset using a method for circular GLM and give interpretation guidelines for this model. Subsequently we will introduce and analyse the mixed-effects example data. Again, we analyse this data and include guidelines for interpretation. The analyses of both datasets, the ANOVA and mixed-effects dataset, are performed using the

In the introduction we have briefly mentioned that circular data is data of a periodic nature. The most intuitive form of circular data comes in the form of directions on a compass. For example, a participant in an experiment could be instructed to move or point to a certain target. We can then measure the direction, North, South, East or West on a scale from 0 to 360°. A plot with simulated data containing such measurements for several participants is shown in Figure

Data from participants in an experiment that were instructed to move East. The plot on the left shows the data on a 0° −360° scale. The plot on the right shows the data on the compass.

Clock times are another type of circular data. We might for instance be interested at what time of day a certain event takes place, e.g., the time of day at which positive affect is highest. Figure

Data for the hour at which positive affect is highest for two groups of psychiatrical patients who are being treated for depression at different clinics.

The two examples of circular data that we have just given illustrate why it is important to treat circular data differently from linear data. This goes both for describing your data, e.g., computing circular means, as well as analyzing them, e.g., testing whether the circular means of two groups differ. In the next section we will introduce an example dataset on which we will show several ways to inspect and compute descriptive measures for circular data.

In the previous section we have seen that the computation of a circular mean differs from that of a linear mean. Methods for data inspection, the computation of descriptive statistics and plotting methods, are different for circular data. Because data inspection shoud be done before performing inference of any kind we will outline a basic way to inspect circular data using the

In this section we introduce data from an article by Puglisi et al. (

The motor resonance data can be found in the package

The main question of interest for the motor resonance data is whether the phase difference between the three experimental conditions differs. To be more precise whether there is a smaller phase difference in the explicit condition than in the other two. Differences that are observed in the phase difference are interpreted as differences in the strength of the resonant response (Puglisi et al.,

Figure

Plots of the phase differences for each condition of the motor resonance data.

Table

Descriptives for the motor resonance data with mean direction (_{m}) and circular standard deviation (

_{m} |
||||
---|---|---|---|---|

Explicit | 49.55° | 0.77 | 0.23 | 41.39° |

Semi.implicit | 18.45° | 0.54 | 0.46 | 63.82° |

Implicit | 31.94° | 0.56 | 0.44 | 61.72° |

Graphically we can illustrate the computation of

The computation of a circular mean and (mean) resultant length

In Table

Table _{m} for the circular variance is defined as

We have seen that both the average phase difference and variances of the phase difference seem to be different for the three conditions in the motor resonance data. To test whether these differences in circular means also exist in the population, we can use a projected normal circular GLM. In the next section we will introduce this model and fit it to the motor resonance data.

In this section we will introduce a projected normal circular regression model. Note that because it is a regression model we can also fit AN(C)OVA type models with it, we can thus refer to it as a projected normal (PN) circular GLM. The PN circular GLM falls within the embedding approach to circular data. The embedding approach is characterized by the fact that it takes an indirect approach to modeling circular data. Instead of directly defining a model on the circular outcome θ we use a mathematical trick that allows us to define a model in bivariate linear space. The results of the model in bivariate linear space can then be translated back to the circle. Next, we will outline the theoretical background to the PN circular GLM and the embedding approach. Subsequently we will continue to fit an ANOVA to the motor resonance data. At the end of this section we will shortly consider different methods for circular ANOVA.

In the previous section, at the computation of the circular mean, we have seen that a circular variable θ, e.g., the phase difference in the motor resonance data, can be expressed as a unit vector

A set of circular datapoints (closed dots) connected to two sets of datapoints in bivariate space (open dots) that could have produced them.

From the assumption that

Because

where _{i} is a vector of predictor values for individual _{i} equals 1. Note that the vectors _{i} are allowed to differ for the two components

In terms of the interpretation of the circular effect of a variable the two component structure in (1) poses a problem. Note that we could rotate and shift the components (axes) in bivariate space such that for a categorical predictor the x-component points to the mean of the reference category and the beta weights of the y-component refer to a deviation from this reference mean. This way we could test whether the means of the groups differ in bivariate space. However, this still does not lead to means or effects that are interpretable on the circle. The two components do not necessarily have a useful interpretation for each type of circular data, e.g., we cannot talk of a 12 o'clock (sine component) and 3 o'clock axis (cosine component) in Figure

In this section we will fit a circular ANOVA model to the motor resonance data using the PN circular GLM from the package

To investigate the effect of condition on the phase difference we specify the prediction equation for the mean vector in the PN circular GLM as follows:

where the variables semi.implicit and implicit are dummy variables indicating condition membership,

We use the package

In a Bayesian model that uses MCMC sampling for estimation we always have to assess convergence of the MCMC chain for all parameters in the model. A traceplot is one way to assess the convergence of a parameter. As an illustration we only show the traceplot for the MCMC chains for one of the parameters of the model in Figure

A traceplot showing convergence of the parameter

From the traceplot in Figure

To answer the question whether the phase differences in the three conditions of the motor resonance data differ we investigate their circular means. To do so we use methods from Cremers et al. (

Because we use a Bayesian method we get the posterior distributions of the three circular means. Philosophically, in Bayesian statistics each parameter is said to have its own distribution. The posterior distribution is the result of the prior knowledge we have about a parameter before conducting a study, formalized as a “prior” distribution (in this paper we choose non-informative priors for the parameters) and the information that lies in the data obtained from a study, formalized as the likelihood. The fact that we obtain the distribution of a parameter is convenient for inference purposes since this means that we do not just have a point estimate of a parameter (the mean or mode of the posterior distribution) but we also automatically get an uncertainty estimates (the standard deviation of the posterior distribution). For more background on Bayesian statistics (see e.g., Gelman et al.,

Summary statistics for the posterior distributions of the circular means for each condition are shown in Table

Posterior estimates of the circular means of the phase difference for the three conditions of the motor resonance data.

Explicit | 42.70° | 45.56° | 11.67° | 22.26° | 67.99° |

Semi-implicit | 21.08° | 19.40° | 18.36° | –18.27° | 55.22° |

Implicit | 37.22° | 33.47° | 17.77° | –2.25° | 68.22° |

HPD intervals can also be used to test whether a parameter is different from a certain value or whether two parameter estimates are different. In Table

In addition to testing whether the circular means of the three conditions are different, the circular ANOVA also allows us to test whether there is an effect of condition on the circular variances of the phase differences. Table

Posterior estimates of the circular variances of the phase difference for the three conditions of the motor resonance data.

Explicit | 0.21 | 0.26 | 0.09 | 0.09 | 0.45 |

Semi-implicit | 0.37 | 0.44 | 0.13 | 0.19 | 0.36 |

Implicit | 0.36 | 0.42 | 0.13 | 0.18 | 0.68 |

In the previous section we have tested whether the average phase differences of the three conditions of the motor resonance data differ in the population using a Bayesian PN circular GLM. We can also do this using a frequentist ANOVA for circular data that tests the hypothesis _{0}:μ_{explicit} = _{semi−implicit} = _{implicit}. One of such tests is the Watson-Williams test. This test can be performed using the function

As in ANOVA models for linear data, we have to meet a set of assumptions for this test to be valid. Firstly, in the Watson-Williams test the samples from the different conditions are assumed to be von-Mises distributed. Like the projected normal distribution this is a distribution for circular data. It is unimodal with mean μ and concentration κ. Secondly, the samples are assumed to have the same circular variance. This assumption of homogeneity of variance is tested within the _{0} is not rejected). This means that it is not completely valid to perform the Watson-Williams test on the motor resonance data.

For educational purposes however we do decide to conduct this test. Similar to the projected normal circular GLM we conclude from this test that the average phase differences of the three conditions are not significantly different: [_{(2, 39)} = 1.02,

An advantage of employing the embedding approach to circular data over the intrinsic approach is that it is easier to model more complex data, e.g., repeated measures data, since we can “borrow” methods from the bivariate linear context. In this section we will introduce such a method: the circular mixed-effects model. We will first introduce a new dataset, the cognitive maps data, and give descriptive statistics. Then, we will shortly outline the theoretical background to the mixed-effects model and fit it to the cognitive maps data.

The cognitive maps data is a subset of data from a study by Warren et al. (

The type of maze is a between-subjects factor, participants either had to navigate through a “Euclidean” maze or a “non-Euclidean” maze. The Euclidean maze is the standard maze and is a maze just as we know it in the real world. The other version of the maze, the non-Euclidean maze, has exactly the same layout as the standard maze but it has virtual features that do not exist in reality. It namely contains wormholes by which participants can be “teleported” from one place in the maze to another.

In the test phase of the experiment all participants had to complete 8 trials. In each of these trials participants had to walk to a specific target object. A within-subjects factor is the type of target object. Pairs of start and target objects were of two types: probe and standard. The probe objects were located near the entrance and exit of a wormhole in the non-Euclidean maze whereas the standard objects were located at some distance from the wormholes. For each of these two types of objects participants had to find 4 different targets resulting in a total of 8 trials per participant.

For this experiment we could be interested in the question whether the participants in the non-Euclidean maze make use of the wormholes when navigating to the target objects and whether this is true for both the probe and standard target objects. Due to the design of the mazes the expected angular error was larger if a participant used the wormhole to walk to the target object in the non-Euclidean maze. We can thus use the angular error, our outcome variable, to differentiate between participants that used the wormhole and those that took another path to the target object. Additionally we can control for the amount of trials that a participant completed in the training phase.

The cognitive maps data is incorporated in the package

Descriptives for the cognitive maps data with mean direction (

Angular error | Euclidean | Standard | –4.91° | 0.89 |

Probe | 4.46° | 0.92 | ||

non-Euclidean | Standard | –17.59° | 0.78 | |

Probe | 37.34° | 0.93 |

In this section we will first introduce a circular mixed effects model and fit this model to the cognitive maps data. Next we discuss the output produced by the

The circular mixed-effects model from the package _{ij}, one for each measurement _{ij}. The Bayesian method used in the package

For the cognitive maps data with

where the variables

The interpretation problems caused by the two component structure in (3) is of a similar nature as the one in the GLM model. Cremers et al. (Submitted) introduce new tools that solve the interpretation of circular effects in PN mixed-effects models. In this tutorial we will also use these tools.

To fit the model in (3) we use the

Note that the syntax for the model specification in this function is similar to that of the package

Next we investigate the coefficients of the fixed effects for this model. First we show results for the categorical variables type of maze (

Table

Posterior estimates of the circular mean of the angular error for each condition.

Angular error | Euclidean | standard | –12.97° | –13.48° | 3.9° | –21.42° | –6.06° |

Probe | 11.38° | 11.78° | 3.29° | 5.26° | 18.30° | ||

non-Euclidean | Standard | –1.42° | –2.04° | 6.68° | –15.75° | 10.49° | |

Probe | 31.04° | 30.37° | 4.31° | 22.03° | 38.92° |

By looking at the 95% HPD intervals of the angular errors in Table

For the continuous variable _{c},

In Figure _{c}, _{c} represents the slope of the circular regression line at the inflection point (the square in Figure

Predicted circular regression line for the relation between a linear predictor

For the effect of L.c on the angular error in the cognitive maps data, the HPD intervals for all three circular coefficients, _{c}, _{c} can be interpreted as: at the inflection point, for a 1 unit increase in

Posterior estimates of the coefficients of the effect of L.c on the angular error.

_{c} |
–0.89° | –0.21° | 1.73° | –2.84° | 2.55° |

0.58° | –0.84° | 90.80° | –11.51° | 12.73° | |

–0.63° | –1.11° | 92.44° | –13.17° | 13.14° |

In mixed-effects models we are also interested in evaluating the variance of the random effects. In the model for the cognitive maps data we included a random intercept. This means that we estimate a separate intercept for each participant. How to compute random effect variances on the circle is outlined in Cremers et al. (Submitted) For the cognitive maps data the posterior mode of the intercept variance on the circle is estimated at 3.5*10^{−5} and its HPD interval is (4.2*10^{−6}; 1.4*10^{−3}). This variance is very low which means that the participants do not differ a lot in their individual intercept estimates. Note that this is not necessarily problematic. In some cases we are not interested in the variances of the random effects but simply want to fit a mixed-effects model because we have within factors, such as

When fitting mixed-effects (or multilevel) models we often fit a set of nested models to our data and follow a model building strategy (Hox,

We then update this model with fixed effects for the predictors at the lowest level (within-subjects factors), in this case

We then add fixed effects for the predictors at the higher level (between-subjects factors), in this case

Because we have already seen that the effect of

Additional steps, such as adding random slopes for first level predictors and cross-level interactions, can be taken. In this paper we will however restrict the analysis to the previous three models.

To assess the fit of the models we look at 4 different model fit criteria: two version of the deviance information criterion (DIC and DIC_{alt}) and two versions of the Watanabe-Akaike information criterion (WAIC_{1} and WAIC_{2}). We choose these four criteria because they are specifically useful in Bayesian models where MCMC methods have been used to estimate the parameters. All four criteria have a fit part consisting of a measure based on the loglikelihood and include a penalty in the form of an effective number of parameters. For all criteria lower values indicate better fit. Gelman et al. (

Model fit criteria for several models fit to the cognitive maps data.

DIC | 304.61 | 267.91 | 253.97 | 257.94 |

DIC_{alt} |
324.33 | 286.97 | 257.14 | 260.78 |

WAIC_{1} |
308.41 | 271.61 | 255.00 | 258.41 |

WAIC_{2} |
308.43 | 271.77 | 255.40 | 259.02 |

In the results for the example we see that the fit improves in all 4 model diagnostics for each model except for the last one. This means that the predictor

Apart from information about whether adding predictors improves the fit of the model we are also interested in whether these predictors explain a part of the random effect variances. For the cognitive maps data we are interested in whether the

The posterior mode of the intercept variance in the intercept-only model equals 6.61 ∗ 10^{−5}(8.20 ∗ 10^{−6}; 3.62 ∗ 10^{−3}). This means that there is almost no random intercept variance. The posterior mode of the circular variance is very close to 0. This also means that there is hardly any intercept variance that the ^{−5}(4.40 ∗ 10^{−6}; 1.59 ∗ 10^{−3}). As expected, there is hardly any change in estimates for the variance in the model with

In this paper we have given a tutorial for researchers in cognitive psychology on how to analyse circular data using the package

Apart from the embedding approach to circular data, as used in this tutorial, there are two other approaches to the analysis of circular data. In the wrapping approach the data on the circle is assumed to have originated from wrapping a univariate distribution on the real line onto the circle. In the intrinsic approach distributions, such as the von Mises distribution, are directly defined on the circle. For both approaches models have been described in the literature (Fisher and Lee,

The idea for this paper was conceived by JC with feedback from IK. JC performed the data analysis and developed the software package (bpnreg) to execute them. Methodology from the software package (bpnreg) was developed by JC with contributions from IK. Manuscript textual content, formatting, and figures were produced by JC. IK contributed to manuscript revision, read, and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.