^{1}

^{2}

^{3}

^{4}

^{3}

^{5}

^{6}

^{3}

^{2}

^{7}

^{8}

^{3}

^{9}

^{10}

^{3}

^{4}

^{5}

^{*}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}

^{10}

Edited by: Fabio Galbusera, Galeazzi Orthopedic Institute (IRCCS), Italy

Reviewed by: Tito Bassani, Galeazzi Orthopedic Institute (IRCCS), Italy; Nicola Francesco Lopomo, University of Brescia, Italy

This article was submitted to Biomechanics, a section of the journal Frontiers in Bioengineering and Biotechnology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The aim of this study is comparing the accuracies of machine learning algorithms to classify data concerning healthy subjects and patients with Parkinson’s Disease (PD), toward different time window lengths and a number of features. Thirty-two healthy subjects and eighteen patients with PD took part on this study. The study obtained inertial recordings by using an accelerometer and a gyroscope assessing both hands of the subjects during hand resting state. We extracted time and temporal frequency domain features to feed seven machine learning algorithms: k-nearest-neighbors (

More than 6.1 million people worldwide are affected by Parkinson’s disease (PD) (

Literature has proposed alternative ways to quantify PD symptoms in order to assist its diagnosis and progression (

Although many investigations have evaluated the machine learning classifier performance to precisely categorize the inertial measurements from patients with PD, there are few methodological studies concerning the influence of the technical parameters of this kind of approach. Parameters like the time interval of the inertial sensor readings, type of features extracted from the inertial sensor readings, the number of features used, the type of machine learning classifier, and the type of inertial sensor used have potential to increase or decrease the accuracy of the algorithm (

References that used inertial sensors features to feed machine learning to evaluate the hand tremor of PD patients.

References | Hand activity | Sensor (AR) | Recording duration | Methods of classification | Accuracy |

Resting tremor | Acc and gyros (200 Hz) | 25–30 s | Support vector machine | 59–88.9% | |

Kinetic tremor | Acc (100 Hz) | 5 s | Support vector machine | 100% | |

Kinetic tremor | Gyros (100 Hz) | 10 s | Support vector machine, logistic regression, neural network classifier | 76.2–83.1% | |

Finger tapping | Acc (167 Hz) | Free | Ordinal logistic regression | 87.2–96.5% | |

Resting tremor | Acc (125 Hz) | 10 s | SVM, decision tree, random forest, discriminant analysis | 80.9–85.6 |

Several investigations have used a number of machine learning algorithms to classify and/or to quantify the resting hand tremor of patients with PD, obtaining high accuracy levels (

Several studies have segmented inertial recordings in different window size durations to extract dozens or hundreds of features that fed a machine learning algorithm (

The present study aimed to compare the performance of machine learning algorithms to classify recordings of inertial sensors as healthy people or patients with PD considering different numbers of features extracted from a variety of window length duration of inertial recordings. Those results may contribute in the decision making of the best parameter for the classification of inertial sensor measures analyzed by machine learning algorithms.

All individual participants included in this study gave us their informed and written consent. Every procedure carried out in the present study was in accordance with the ethical standards of the Ethics Committee in Research with Humans from the University Hospital João de Barros Barreto (report #1.338.241) and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Our sample comprised of 50 right-handed participants grouped into healthy control participants (

We used a wearable device MetaMotionC (mbientlab, San Francisco, United States), with on-board sensors, such as a triple-axis gyroscope and an accelerometer (16 bits, ± 2000°/s, ± 16 g). Researchers positioned a wearable device over each patient’s third metacarpal bone at their midway between the carpal and the digital extremities of their metacarpal (

IMU Positioning in the hand of the participant.

To carry out data analysis, researchers programmed Python scripts (Python v3.7.4) by using SciPy (version 1.3.1), NumPy (version 1.17.2), PyWavelets (version 1.0.3), and LibROSA (version 0.7.2) tools. SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering; NumPy is a library for the Python programming used to operate on arrays; LibROSA is a Python package that provides the building blocks necessary to create music information retrieval systems; and PyWavelets is an open source wavelet that transforms software for Python.

Our sequence of analysis consisted of: (1) inertial recordings; (2) raw data filtering; (3) segmentation of the time series in different sets of waveform lengths; (4) data normalization; (5) extraction of features; (6) selection of the best features; (7–8) performing machine learning algorithms with training and test phases; and (9) measuring machine learning performance.

Flow chart of the data analysis steps.

We computed a magnitude vector from each sensor dimension (x, y, and z) using Eq. (1), which is less sensitive to orientation changes (

where

After this, we applied the

We segmented the inertial recordings in fixed sized windows, with no inter-window gaps and non-overlapping between adjacent windows. We also segmented these time series in sets of waveforms with 1, 5, 10, and 15 s window sizes.

We extracted features from time and temporal domains for each sensor dimension.

Features extracted from the inertial readings.

Features | Python code |

Range | range = values.max() - values.min() |

Standard deviation | std = values.std() |

Root mean square | rms = numpy.sqrt(numpy.mean(values^{∗∗}2)) |

Skewness | sk = scipy.stats.skew(values) |

Kurtosis | kt = scipy.stats.kurtosis(values) |

Linear prediction coefficients | lp_coefs = librosa.lpc(values, 3) |

Wavelet transform detail coefficients (cD) | _, cD = pywt.dwt(values, ’db3’) |

cD variance | variance = numpy.var(cD) |

cD entropy | def approximate_entropy(U, m = 2, r = 3): |

U = numpy.array(U) | |

N = U.shape[0] | |

def phi(m): | |

z = N - m + 1.0 | |

x = numpy.array([U[i:i + m] \ | |

for i in range(int(z))]) | |

x_ = numpy.repeat(x[:, \ | |

numpy.newaxis], 1, axis = 2) | |

C = numpy.sum(numpy.absolute(x - \ | |

x_).max(axis = 2) < = r, \ | |

axis = 0)/z | |

return numpy.log(C).sum()/z | |

entropy = abs(phi(m + 1) - phi(m)) | |

Third order cumulant | third_order_cum = scipy.stats.moment(values, moment = 3) |

Peak of energy | p_tf = frequency_values.max() |

Frequency at the peak energy | xf = numpy.linspace(0, af/2, |

frequency_values.size) | |

tf_p = xf[numpy.argmax(frequency_values)] | |

Skewness_tf | sk_tf = scipy.stats.skew(frequency_values) |

Kurtosis_tf | kt_tf = scipy.stats.kurtosis(frequency_values) |

Mean frequency | def mean_frequency(frequency_values): |

xf = numpy.linspace(0, af/2, | |

frequency_values.size) | |

xf = xf[xf > = 1] | |

total_area = numpy.trapz(frequency_values, xf) | |

for i, x in enumerate(xf): | |

partial_area = numpy.trapz(frequency_values[:i], | |

xf[:i]) | |

if partial_area > total_area/2: | |

mean_freq = xf[i-1] | |

Power ratio (1–6 Hz/6–12 Hz) | xf = numpy.linspace(0, af/2, |

frequency_values.size) | |

num = frequency_values[(xf > = 1) & | |

(xf < = 6)] | |

den = frequency_values[(xf > = 6) & | |

(xf < = 12)] | |

power_ratio = num.mean()/den.mean() |

The study extracted 272 features from each one of our participants, considering data extracted: (a) from each one of their hands (dominant and non-dominant); (b) from each inertial sensor parameter (accelerometer and gyroscope); and, (c) from the four dimensions of each sensor (

The study applied

The study used algorithm

To validate the predictive models, we applied the tenfold cross-validation method by using the

We applied seven types of machine learning algorithms to classify the data from both healthy and PD groups. The algorithms were:

The next sentences describe the Python functions used to proceed the machine learning algorithms, as well as the parameters that differed from default values. These parameters were changed to protect the model from overfitting.

Support Vector Classifier (SVC): were applied an SVC algorithm (

Logistic Regression (LR): a binary logistic regression algorithm

Linear Discriminant Analysis (LDA): the study applied the function

Random Forest (RF): we used the function

Decision Tree (DT): similarly to the random forest classifiers, the tree algorithm was proceed using the

Gaussian Naïve Bayes (GNB): the function to proceed a Gaussian Naïve Bayes algorithm was the

Equation (3) calculated accuracy in order to measure the success levels of the classifiers, as follows:

where TP is the true positive value; TN is the true negative value; FP is the false positive value; and, FN is the false negative value.

The study applied the unpaired

Accelerometric and gyroscopic recordings as a function of the time (upper rows) and temporal frequency (lower row) from representative participants of the control and PD groups, using the time window of 5 s. Recordings were carried out on the non-dominant and dominant hands (red and green lines, respectively).

Regardless time window length, the most important features detected were mean frequency, linear prediction coefficients, power ratio, and the power density skew and kurtosis.

Most important features extracted from recordings lasting 1 s

Most of the comparisons had significant differences between training and testing phases. Whenever statistical significance (

The comparisons with no statistical significance were in time windows of:

1 s: random forest algorithm using all features and 70% of them, GNB using 50 and 10%;

5 s: GNB with all features, 70 and 50% of them,

10 s: GNB using 30 and 10% of the features;

15 s: GNB using all features, 70, 50, and 10% of them, SVC using all features, 70 and 50% of them, LDA using all features and 70% of them, LR using 50% of the features, and RF using 30% of the features.

Comparison of the classifiers’ performance in the training (solid bars) and testing (empty bars) phase according the number of features and time window length.

In general, the effects of the machine learning phases on the accuracies were statistically significant. The main effect for classifier type yielded an

Number of victories of each classifier in the significant multiple comparisons for each number of feature condition.

Number of features |
|||||

Algorithm | 100% | 70% | 50% | 30% | 10% |

SVC | 5 | 5 | 3 | 0 | 4 |

GNB | 12 | 16 | 16 | 13 | 2 |

RF | 40 | 40 | 39 | 31 | 27 |

54 | 58 | 61 | 50 | 50 | |

LR | 53 | 48 | 41 | 31 | 6 |

LDA | 34 | 38 | 35 | 27 | 3 |

DT | 36 | 37 | 34 | 28 | 5 |

Number of significant multiple comparisons | 234 | 242 | 229 | 180 | 97 |

^{2} |
63.53 | 57.72 | 63.50 | 57.38 | 142.51 |

<0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |

The main effect for time window length yielded an

Number of victories per time window length in the significant multiple comparisons for each number of feature condition.

Number of features |
|||||

Time window length | 100% | 70% | 50% | 30% | 10% |

1 s | 58 | 61 | 54 | 39 | 12 |

5 s | 64 | 68 | 66 | 52 | 35 |

10 s | 60 | 62 | 60 | 47 | 27 |

15 s | 52 | 51 | 49 | 42 | 23 |

Number of significant multiple comparisons | 234 | 242 | 229 | 180 | 97 |

^{2} |
1.28 | 2.46 | 2.84 | 2.17 | 11.33 |

0.73 | 0.48 | 0.51 | 0.53 | <0.01 |

The interaction effect was significant for all numbers of features conditions (for all the features:

Comparison of the classifier’s performance in the testing phase when using all the features

This paper assessed the hand tremor in individuals with PD and healthy controls by using machine learning algorithms based on inertial sensor recordings. Our objectives were: (i) identifying the best machine learning algorithms to classify hand tremor by using inertial data; (ii) describing the best recording duration to be used by classification methods; (iii) stablishing the number of features necessary to the best performance of the algorithms.

Concerning these objectives, the results of this study showed that the

Many types of machine learning classifiers have been used to analyze PD tremor (

The

Random Forest is a combination of multiple tree predictors that make decisions based in random vectors of features. The RF decision is the more common decision of the collection of tree classifiers (

Logistic Regression is a classification algorithm that uses a logistic sigmoid function to transform observations in two or more classes.

Both GNB and SVC with the worst outcomes. When compared with other algorithms, the GNB classifier delivered lower (

It is important to highlight that directly comparing the performance of the classifiers in different studies must be careful. Each study implements different parameters in the algorithms, which are not always fully described. Furthermore, the number and type of features may influence the classifier accuracies. The present study observed that few features make classifiers’ decisions more similar, while an increased number of features enable the classifiers’ performance to be distinguished, reaching a plateau around 176 features. One must find a trade-off between the number of features and the cost of computational processing for each algorithm especially when trying to implement such method with wearable or mobile devices.

The use of machine learning algorithms to recognize patterns of human motion requires the segmentation of motion recording time series. Previous studies have segmented time series in different lengths for pattern recognition tasks (

This study evaluated the accuracy of classifiers by using different time window lengths. We observed that recordings lasting 5s or 1s delivered the highest accuracy levels. The study also noticed some interaction between the window time length and classifiers, indicating that some classifiers were better to analyze short recordings (i.e.,

The more common features extracted from inertial readings express amplitude of oscillatory series, their spectral content, regularity, and coherence (

We based our approach exclusively on accelerometer and gyroscope sensors, though other sensors are reported in the literature to quantify PD hand tremor using machine learning algorithms. For example,

This study has some potential limitations that deserve further comments. To date, research on this topic has been exploratory. There are no guidelines regarding the use of machine learning approach to quantify hand tremor in PD patients, as well as no established parameters for the choice of inertial sensors. A larger sample size and longitudinal follow-up could reinforce the present interpretations.

The present study suggested

All datasets generated for this study are included in the article/

The studies involving human participants were reviewed and approved by the Ethics Committee in Research with Humans from the University Hospital João de Barros Barreto. The patients/participants provided their written informed consent to participate in this study.

GS, AK, GP, and AAC conceived of the presented idea. ES and GP performed the computations. AA, KS, VF, FS, and RL collected the inertial recordings. LK and BS-L collected the clinical data. AA, ES, GS, AAC, and BC verified the analytical methods. ASC and AB contributed to the interpretation of the results. GS and AAC drafted the manuscript. All authors discussed the results and contributed to the final manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: