^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

Edited by: Holmes Finch, Ball State University, USA

Reviewed by: Jill S. Budden, National Council of State Boards of Nursing, USA; Michael Smithson, Australian National University, Australia

*Correspondence: Cyril R. Pernet, Brain Research Imaging Center, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK. e-mail:

This article was submitted to Frontiers in Quantitative Psychology and Measurement, a specialty of Frontiers in Psychology.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab^{(R)} based toolbox (

Robust statistical procedures have been developed since the 1960s (Tukey,

Generally, a correlation refers to any of a broad class of statistical relationships involving dependence. Correlation also refers to a broad class of statistical measures aimed at characterizing the strength of the association between two variables. Among these latter measures, Pearson’s correlation is the most widely used technique, despite its lack of robustness (Wilcox,

Alongside the computations of correlations, the toolbox includes tools for visualization and basic assumption checking. The

We illustrate the use of the toolbox with the Anscombe’s (

To compute skipped-correlations, first we estimate the robust center of the data cloud. Because a single outlier can result in the bivariate mean giving a poor reflection of the typical response, one relies here on the minimum covariance determinant (MCD) estimator, which is a robust estimator of multivariate location and scatter (Rousseeuw,

To compute the percentage-bend correlation, a specified percentage of marginal observations deviating from the median are down weighted. Pearson’s correlation is then computed on the transformed data. A skipped correlation is a robust generalization of Pearson’s r by measuring the strength of the linear association, ignoring outliers detected by taking into account the overall structure of the data. In contrast, the percentage-bend correlation only protects against outliers associated with the marginal distributions. Under normality, the percentage-bend and Pearson’s correlations have very similar values, but these values can differ markedly as soon as there is deviation from normality (Wilcox,

The toolbox also computes percentile bootstrap 95% CIs for each correlation. For Pearson’s, Spearman’s, and percentage-bend correlations, pairs of observations are resampled with replacement and their correlation values obtained. For skipped-correlations, the data after outlier removal are resampled, before computing correlation values^{1}

To assess the sensitivity of the different correlation methods, we ran several simulations in which we recorded the actual correlation value (effect size) and the number of times the null hypothesis of independence was rejected (false positive rate and power). In the first simulation, a parent bivariate normal (

To investigate effect sizes, we first tested if the correlations differed from the true population value. Differences between observed correlation values (

To evaluate the false positive rate and power, the average number of times the null hypothesis was rejected was computed. The different correlation techniques were then compared for each sample size based on their binomial distributions (accept/reject H0) using a method for discrete cases with adjustment for multiple comparisons (Kulinskaya et al.,

As put forward by Anscombe (

The Henze–Zirkler test for multivariate normality confirmed visual inspection: only pair 1 is normally distributed (HZ = 0.1,

As designed by Anscombe, Pearson’s correlation is fooled by outliers and, for each pair, a significant correlation of

Pair 1 | Pair 2 | Pair 3 | Pair 4 | |
---|---|---|---|---|

Pearson | ||||

Spearman | ||||

20% bend | ||||

Skipped Pearson | ||||

Skipped Spearman | ||||

Figure

Zero-correlation was well estimated by all methods: all correlation values were close to 0 and the 99.99% CIs of all methods included 0 (Figure

Again, zero-correlation was well estimated by all methods as all correlation values were close to 0. The 99.99% CIs of all methods included 0 (Figure

Only skipped correlation methods estimated well zero-correlation. On average, Pearson’s correlation was

Effect sizes for Gaussian data (Figure

Power analyses showed similar trends for all techniques, with maximum power for Pearson’s correlations and minimum power for skipped Spearman’s correlations. In general, power increased up to 100% as a function of the sample size except for

Effect sizes for Gaussian data contaminated by 10% of marginal outliers (Figure

Power curves revealed that when effect sizes were well estimated, Spearman’s (0.001 <

Effect sizes for Gaussian data contaminated by 10% of bivariate outliers (Figure

The power of Pearson’s correlation did not differ from that of other methods for the few correct estimations (

When data were normally distributed, Pearson’s correlation was the best method, estimating best the true effect sizes and showing more power. Robust alternatives still estimated properly the true effect sizes with slight differences (from −0.001 to −0.02 for the 20% percentage-bend correlation and from −0.006 to −0.001 for the skipped Pearson’s correlation). Those results can be explained by the fact that those robust techniques down-weight or remove data points from the samples being drawn. As a consequence, they also have less power (at most −6% for the 20% percentage-bend correlation and −23% for the skipped Pearson’s correlation). However, the assumption of normality rarely holds (e.g., Micceri,

The first point to consider is the estimation of the true effect sizes in the context of marginal and bivariate outliers. In our simulations, Pearson’s and Spearman’s correlations failed most of the time but occasionally estimated properly ρ. These accurate estimations should not be taken as an indication of the robustness of the methods, but simply an illustration of the effect of the position of outliers. In the case of univariate outliers, outliers were located in such a way that there positions were between −0.3 and −9.94° relative to the population of interest. As a consequence, Pearson’s and Spearman’s correlations always underestimated ρ, being attracted toward [6, 0], the center of the outlier population. In the case of bivariate outliers, outliers were located in such a way that there positions were between +0.4 and +13.4° relative to the population of interest. As a consequence, Pearson’s and Spearman’s almost always overestimated ρ (the exception being ρ = 0.9 where the 2 population were aligned), being attracted toward [6, 6]. To further illustrate this effect of the position of outliers, consider the toy example in Figure

The second point to consider is the power of each method. It has been argued that skipped correlation can lack power compared to Pearson’s correlation (Schwarzkopf et al., ^{2}

The last point to consider is the type I error rate. Schwarzkopf et al. (

To conclude, we demonstrated that robust alternatives to standard correlation methods provide accurate estimates of the true correlations for normal and contaminated data with no or minimal loss of power and adequate control over the false positive rate. Given the range of possible data configurations, all scenarios cannot be tested but some recommendations can be drawn from our results.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

There are various methods to detect outliers. In the context of skipped-correlations, one relies on the detection of univariate outliers among projected data points (points orthogonally projected onto lines joining each data point to the robust estimate of location, see

A normal bivariate population of 100 data points [mu(0,0), sigma([1, 0.5; 0.5, 1])] was generated and 10% of outliers added. Outliers came from a similar bivariate population rotated by 90° and shifted along one dimension by 0, 2, 4, or 6 standard deviations (SD) (Figure

Techniques that rely on the mean performed the worst: they showed high specificity because they failed to detect outliers (i.e., they have low sensitivity). In our simulations, the best method only achieved 74.7% [73.6; 75.7] detection for outliers located at 4 SD from the population of interest, and 92.6% [91.99; 93.34] detection for outliers located at 6 SD from the population of interest. For such obvious outliers, robust methods showed 100% or close to 100% detection rates. Among robust methods, the box-plot rule with adjustment as implemented in the skipped correlation function had the highest specificity, i.e., it removed very few data points from the population of interest (Table

Distance to the center = 0 | Distance to the center = 2 | Distance to the center = 4 | Distance to the center = 6 | |
---|---|---|---|---|

Marginal means | 0.10 [0.07, 0.13] | 0.032 [0.01, 0.05] | 0.01 [0.006, 0.028] | 0.02 [0.007, 0.03] |

Mahalanobis distance | 2.96 [2.85, 3.08] | 1.80 [1.71, 1.90] | 1.0070 [0.9228, 1.091] | 0.86 [0.79, 0.93] |

Bootstrapped Mahalanobis distance | 3.44 [3.31, 3.56] | 2.20 [2.09, 2.31] | 1.26 [1.17, 1.35] | 1.07 [0.9942, 1.15] |

Box-plot | 6.40 [6.18, 6.63] | 4.94 [4.73, 5.15] | 3.9610 [3.78, 4.13] | 3.67 [3.50, 3.85] |

Box-plot with Carling’s adjustment | 4.56 [4.36, 4.76] | 3.45 [3.2873, 3.61] | 2.7570 [2.6, 2.91] | 2.56 [2.42, 2.69] |

Box-plot adjusted for bivariate data | 1.91 [1.78, 2.04] | 1.28 [1.17, 1.39] | 0.9570 [0.87, 1.04] | 0.96 [0.87, 1.04] |

MAD-median rule | 15.41 [15.11, 15.71] | 13.23 [12.94, 13.52] | 11.7580 [11.47, 12.03] | 11.3 [11.01, 11.58] |

MAD-median rule adjusted for bivariate data | 9.23 [8.97, 9.49] | 7.60 [7.35, 7.85] | 6.5610 [6.33, 6.78] | 6.29 [6.07, 6.5] |

MAD-median rule for finite samples | 15.15 [14.85, 15.45] | 13.01 [2.71, 13.32] | 11.5460 [11.27, 11.82] | 11.11 [10.82, 11.40] |

MAD-median rule for finite samples adjusted for bivariate data | 9.04 [8.78, 9.30] | 7.45 [7.20, 7.71] | 6.4070 [6.18, 6.62] | 6.13 [5.92, 6.35] |

S-outliers | 15.63 [15.34, 15.91] | 13.61 [13.31, 13.90] | 12.6660 [12.41, 12.92] | 12.84 [12.55, 13.13] |

Distance to the center = 0 | Distance to the center = 2 | Distance to the center = 4 | Distance to the center = 6 | |
---|---|---|---|---|

Marginal means | 5.67 [5, 6.33] | 6.91 [6.24, 7.57] | 11.8900 [11.04, 12.73] | 15.9400 [15.01, 16.86] |

Mahalanobis distance | 28.77 [27.72, 29.81] | 43.96 [42.84, 45.07] | 69.6500 [68.61, 70.68] | 88.0900 [87.31, 88.86] |

Bootstrapped Mahalanobis distance | 30.55 [29.53, 31.56] | 46.74 [45.55, 47.92] | 74.7300 [73.68, 75.77] | 92.6700 [91.99, 93.34] |

Box-plot | 38.65 [37.42, 39.87] | 61.2400 [59.91, 62.56] | 95.7900 [95.20, 96.37] | 99.9800 [99.98, 100] |

Box-plot with Carling’s adjustment | 35.64 [34.45, 36.82] | 56.8100 [55.47, 58.14] | 94.0600 [93.37, 94.74] | 99.96 [99.96, 100] |

Box-plot adjusted for bivariate data | 28.56 [27.33, 29.78] | 46.5900 [45.19, 47.98] | 87.5000 [86.55, 88.44] | 99.8500 [99.69, 100] |

MAD-median rule | 49.5 [48.12, 50.87] | 74.7800 [73.61, 75.94] | 98.6800 [98.32, 99.03] | 100 |

MAD-median rule adjusted for bivariate data | 42.9700 [41.58, 44.35] | 66.8700 [65.6, 68.13] | 97.4800 [97, 97.95] | 100 |

MAD-median rule for finite samples | 49.19 [47.8, 50.57] | 74.5800 [73.34, 75.81] | 98.6200 [98.25, 98.98] | 100 |

MAD-median rule for finite samples adjusted for bivariate data | 42.64 [41.3, 43.97] | 66.5100 [65.22, 67.79] | 97.3400 [96.8605, 97.81] | 100 |

S-outliers | 49.9 [48.58, 51.21] | 75.1400 [73.89, 76.38] | 98.8000 [98.50, 99.09] | 100 |

^{1}There are possible issues about the dependence among order statistics by removing outliers and bootstrapping rather than bootstrapping and re-run the whole skipped correlation. However, at the moment it is unclear which method is best.

^{2}In Schwarzkopf et al. (