^{1}

^{1}

^{2}

^{3}

^{2}

^{2}

^{*}

^{1}

^{2}

^{3}

Edited by: Aifric O'Sullivan, University College Dublin, Ireland

Reviewed by: Athanasios Jamurtas, University of Thessaly, Greece; Brian P. Carson, University of Limerick, Ireland

This article was submitted to Sport and Exercise Nutrition, a section of the journal Frontiers in Nutrition

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The concept of personalized nutrition and exercise prescription represents a topical and exciting progression for the discipline given the large inter-individual variability that exists in response to virtually all performance and health related interventions. Appropriate interpretation of intervention-based data from an individual or group of individuals requires practitioners and researchers to consider a range of concepts including the confounding influence of measurement error and biological variability. In addition, the means to quantify likely statistical and practical improvements are facilitated by concepts such as confidence intervals (CIs) and smallest worthwhile change (SWC). The purpose of this review is to provide accessible and applicable recommendations for practitioners and researchers that interpret, and report personalized data. To achieve this, the review is structured in three sections that progressively develop a statistical framework. Section 1 explores fundamental concepts related to measurement error and describes how typical error and CIs can be used to express uncertainty in baseline measurements. Section 2 builds upon these concepts and demonstrates how CIs can be combined with the concept of SWC to assess whether meaningful improvements occur post-intervention. Finally, section 3 introduces the concept of biological variability and discusses the subsequent challenges in identifying individual response and non-response to an intervention. Worked numerical examples and interactive Supplementary Material are incorporated to solidify concepts and assist with implementation in practice.

It is widely recognized that traditional group intervention-based studies that focus on mean response are limited in the context of personalized sports nutrition, and that across populations, large inter-individual variability exists in response to health and performance related interventions. This variation occurs due to a myriad of factors, including an individual's genotype, phenotype, training status, and nutritional intake (

Key terms that will be used throughout the review have been defined in Table _{110%}, a time-to-exhaustion test]. The study design is illustrated in Figure

Definitions of key terms.

True score | A hypothetical value representing the score on a test that would be achieved if there were no measurement error. |

Measurement error | Processes that causes an observed score on a test to be different from the true score. Measurement error comprises instrumentation and/or biological noise. |

Observed score | The recorded value from a test, which comprises the true score, along with measurement error. |

Instrumentation noise | Measurement error caused solely by the measurement apparatus, while true score remains unchanged. |

Biological noise | Measurement error caused by biological processes (such as circadian rhythm, nutritional intake, sleep or motivation), while true score remains unchanged. |

Typical error | The standard deviation of observed scores in repeated tests where true score remains unchanged. |

Confidence interval | An interval that provides a range of plausible values for quantities that must be estimated (for example, true score) given the observed data. |

Biological variability | Non-intervention related processes that cause true scores to change. |

Smallest worthwhile change | A reference value selected by a practitioner or researcher to indicate a value beyond which a change in true score is likely to be meaningful in practice. |

Response | Occurs when change in true score directly attributable to an intervention exceeds the smallest worthwhile change. |

Schematic of hypothetical study design. CCT_{110%}, High-intensity cycling capacity test; B-A, Beta-alanine supplementation; PLA, Placebo.

Practitioners and researchers routinely select and evaluate interventions depending on baseline information collected from an individual. Therefore, it is essential to consider the accuracy of baseline information and account for error in any decision-making process. An individual's _{s}) comprises a hypothetical true score (_{s}) and measurement error (ϵ), such that _{s} = _{s}+ϵ (

Measurement error associated with any test comprises two primary sources, namely _{110%} to differ from the individuals true score (

As all observed measurements include error, it is important to estimate the potential magnitude of this error and thereby quantify uncertainty in any single measurement. Based on the assumption that observed scores follow a normal distribution centered on the true score, ~68% of observed scores lie in the interval _{s} ± σ and ~95% of observed scores lie in the interval _{s} ± 2σ (Figure

Graphical representation of the normal distribution of observed scores centered on true score. Ts, true score; σ standard deviation of repeated observed scores [also referred to as typical error (TE)].

Two primary methods are available to estimate the TE of a test, including: (1) multiple repeated tests performed by a single individual; or (2) a single test-retest performed by a group of individuals. Using the first approach, the TE is estimated by calculating the standard deviation of observed scores obtained from a single individual performing multiple tests within a time-frame whereby the true score remains theoretically stable. Suitable time-frames will depend on the specific characteristics of a given test. For example, true score in the CCT_{110%} is largely dependent on the capacity of the cardiovascular and muscular systems, neither of which are likely to undergo substantial physiological changes in the absence of intervention within short time-frames. The true score for CCT_{110%} performance should therefore remain stable across days or even weeks, although biological noise in particular (e.g., motivational factors), may cause observed scores to fluctuate within this time-frame (_{110%} example, repeated performance of a high-intensity activity to exhaustion is likely to create a strong stimulus for adaptation (

Based upon the aforementioned limitations, the most popular method to estimate the TE of a test relies on multiple individuals each performing a single test-retest assessment (^{2} + TE^{2} = 2TE^{2}). Therefore, to obtain the TE estimate with a group test-retest design, we first calculate the difference score for each individual, calculate the standard deviation of the differences scores, then divide this value by ^{−1}dm; hence the estimate of TE is ^{−1}dm. It is important to note that this calculation represents an estimate of TE and is unlikely to exactly match the real value. Therefore, we use the notation

Once an observed score and TE estimate have been obtained, a

The measurement assumptions outlined in the previous section enable practitioners to calculate various CI widths by multiplying their TE estimate by values that are based on the normal distribution. In the first row of Table ^{−1}dm, an approximate 95% true score CI for an individual with an observed score of 11.3 would equal 11.3 ± (1.96 × 0.52) = (10.3−12.3) mmol·kg^{−1}dm. It is important to note that the values provided in the first row of Table ^{−1}DM. In contrast, if ^{−1}dm. To identify the number of individuals required for a test-retest, the values presented in Table

Typical error multiples required to calculate confidence intervals of different widths (non-adjusted and adjusted for sample size).

TE multiple non-adjusted | 0.67 | 0.84 | 1.04 | 1.15 | 1.28 | 1.44 | 1.64 | 1.96 | 2.58 |

TE multiple adjusted ( |
0.68 | 0.85 | 1.05 | 1.16 | 1.30 | 1.46 | 1.68 | 2.01 | 2.68 |

TE multiple adjusted ( |
0.68 | 0.85 | 1.06 | 1.17 | 1.31 | 1.48 | 1.70 | 2.05 | 2.76 |

TE multiple adjusted ( |
0.69 | 0.86 | 1.07 | 1.19 | 1.33 | 1.50 | 1.73 | 2.10 | 2.86 |

TE multiple adjusted ( |
0.70 | 0.88 | 1.10 | 1.23 | 1.38 | 1.57 | 1.83 | 2.26 | 3.25 |

TE multiple adjusted ( |
0.74 | 0.94 | 1.19 | 1.34 | 1.53 | 1.78 | 2.13 | 2.78 | 4.60 |

In circumstances where it is not feasible to perform repeated measurements on a single individual or group, practitioners can create CIs for true scores using reliability data published in the literature. To obtain accurate CIs it is recommended that practitioners source reliability data collected using the same test protocols employed with their own clients, and that the populations match as close as possible. TE estimates are commonly reported in reliability studies and practitioners can directly use these published values to calculate CIs using the methods described in Section 1.1.1. It is also common for researchers to report other reliability statistics that can be transformed into a TE estimate. One commonly reported reliability statistic that can easily be transformed is the coefficient of variation (CV). The coefficient of variation is a percentage that expresses the size of the TE relative to the mean [_{110%}, and therefore we describe here (and in _{110%} was 4.94% (

Schematic overview of procedures to estimate typical error and calculate true score confidence intervals.

As described in the previous section, an individual's true score cannot be known due to the existence of measurement error and this uncertainty must be accounted for when interpreting pre- to post-intervention change. This requirement is particularly relevant in sports nutrition based interventions where improvements are often small in magnitude whilst many performance based outcome measures may be prone to relatively large measurement errors. For example, Jeukendrup et al. (

If we assume that measurement error of a test is not only consistent across individuals in a group, but also consistent for individuals across an intervention, then observed scores will display the same variation around the true pre-intervention score and the true post-intervention score. It follows that observed change scores (_{post} − _{pre}) are described by a normal distribution with mean equal to the true score change and standard deviation (i.e., standard deviation of the change scores) equal to ^{.}kg^{−1}DM, with participant 8 (from the beta-alanine group) displaying an observed change score (difference pre-post) of 4.37 mmol·kg^{−1}DM. For this example, we will calculate an unadjusted 50% true score change CI using the appropriate multiplier presented in Table ^{.}kg^{−1}DM. Interactive true score change CI calculators are provided in the

In the previous section we described procedures to calculate true score change CIs that provide a range of plausible values given the data observed. In practice, it is recommended that interventions are classified as successful or not for each individual based on whether CIs for true score change lie within a pre-defined region (^{*}^{1}

Interpretation of true score change confidence intervals using zero-based thresholds.

Thus far in this section, we have focused on scenarios where any true score change greater than 0 in the desired direction is considered meaningful. In many research settings, this approach will be appropriate, given that researchers are likely to deal with experimental scenarios and unknown outcomes. In contrast, in other situations, researchers and practitioners may implement interventions whereby relatively large improvements are expected, such that more substantive changes are required in order to classify an intervention as a success. Take for example our hypothetical intervention, which aims to increase muscle carnosine content through beta-alanine supplementation. Previous investigations indicate that 4 weeks of supplementation can increase muscle carnosine content by 40–60% (

Interpretation of true score change confidence intervals using smallest worthwhile change.

To effectively implement these procedures, tests that comprise appropriate measurement error relative to the SWC are required. It is recommended when implementing this approach that an _{110%} by multiplying the baseline between-participant standard deviation by 0.2 (i.e., a ‘small’ effect), providing a value of 1.6 KJ. From previous research, 4 weeks of beta-alanine supplementation has been shown to improve CCT_{110%} performance by ~10–15% (

Throughout sections 1 and 2 we described procedures to quantify the level of uncertainty in baseline values, quantify the level of uncertainty in change across an intervention, and to identify if observed changes represent meaningful improvements. These procedures outlined do not, however, identify whether underlying changes occurred as a direct result of the intervention or as a result of unrelated confounding factors. Across time periods reflecting those typically used for chronic supplementation or training interventions, it is possible that an individual's true score may change due to factors external to the intervention. Take for example our 12 week hypothetical study, where CCT_{110%} was used to assess cycling capacity. High-intensity exercise performance is influenced by a wide range of factors, including nutritional intake, chronic sleep patterns and physical activity levels, with 12 weeks providing sufficient time for true scores to change in response to alterations in any of these factors. We refer to these intervention independent causes of change as

In the remaining sections of this review we describe procedures in group-based interventions to estimate variability in true score change directly attributable to the intervention, and, subsequently, to estimate proportion of response in a group. The procedures outlined are required during interventions with periods long enough for true score change to occur as a result of biological variability. In contrast, many nutritional supplements (e.g., caffeine or sodium bicarbonate acutely function after a single dose (

It is widely recognized that the most logical means of quantifying variability caused by an intervention is to include a control group or to use data from similar controls published in literature (_{IR}). In practice, this standard deviation is estimated with the following formula ^{.}kg^{−1}DM. Note, this large difference in standard deviations measured between groups provides evidence that true change directly attributable to the intervention was highly variable across participants (^{.}kg^{−1}DM, and as explained in the following section, this value can then be used to estimate the proportion of response.

_{IR}

Consistent with all approaches used previously in this review to estimate quantities of interest (e.g., baseline true score, true score change, or here, proportion of response), we assume a normal distribution, such that true score change directly attributable to the intervention follows a normal distribution centered on the mean observed score change, with standard deviation equal to σ_{IR} (see Figure ^{.}kg^{−1}DM and standard deviation 5.07. If we select a SWC from standard procedures by calculating 0.2 times the baseline standard deviation, we obtain a threshold value of 2.0 mmol^{.}kg^{−1}DM. Using the interactive calculator in the

Proportion of intervention group response. SWC, smallest worthwhile change; σ_{IR}, Intervention response standard deviation.

_{IR}.

Throughout this review, we have described procedures required to interpret data collected from individuals both pre- and post-intervention. Careful and deliberate procedures are required to interpret the data appropriately, due to the fact that all measurement incorporates some degree of error (measurement error = instrumentation noise + biological noise), and changes can often occur due to factors independent of the intervention (biological variability). The procedures we have outlined enable practitioners and researchers in the area of sports nutrition to (1) establish plausible baseline values; (2) assess whether meaningful changes have occurred after an intervention; and (3) estimate the proportion of individuals in a group-based intervention that responded/did not respond to the intervention. We conclude this review with a brief summary including practical recommendations.

Prior to conducting any intervention, practitioners and researchers require baseline data to direct their choice of intervention and provide initial values to monitor and assess an intervention's progress and effectiveness. Tests and measurement procedures adopted should seek to minimize measurement error, which includes both instrumentation and biological noise. It must be recognized, however, that even when the testing environment is controlled as much as possible, some degree of measurement error will always exist. Therefore, typical error should be calculated and CIs applied to baseline measurements to provide a range of plausible true scores given the data observed. Ideally, CIs should be calculated with reliability data obtained by the practitioner using the actual equipment and procedures implemented with their clients. However, where this is not feasible, it is recommended that practitioners obtain data from published reliability studies that match their own procedures as closely as possible with regards to testing protocols and participants.

In situations where CI widths are so wide as to provide no actionable baseline information, practitioners should re-consider the specific %CI used and consider whether this can be reduced given the context of the measurement. For example, 95% CIs frequently produce large ranges for true scores and practitioners have to consider whether they require the actual true score to reside within intervals calculated in 95% of occasions. Where the safety of a client is not influenced by the intervention, narrower %CIs can be justified. For example, practitioners may choose instead to construct CIs with the observed score plus/minus the estimated TE. This calculation is simple to create and maintain across spreadsheets that practitioners may create and for baseline scores provides approximate 70% true score CIs. However, if true score intervals calculated with similar %CIs still provide limited actionable information, this suggests that the test and/or measurement processes adopted create measurement errors too large to be of practical use, and therefore an alternative and more reliable test should be considered.

Once an intervention has been completed, it is good practice to estimate true score change and provide a CI to identify a range of plausible values given the observed data. Such CIs represent the all cause change across the intervention and do not distinguish between change caused by the intervention and external factors. Where appropriate, practitioners can identify the SWC deemed to be of practical relevance for the individual, with success judged to occur when the observed score change plus/minus the estimated TE lie beyond the threshold set. In research settings, the threshold value may be set at 0, however, practitioners should select this value

The existence of biological variability renders it challenging to isolate true score change directly caused by the intervention. For this reason, we recommend that researchers interested in this area and limited to designs with infrequent data collection (e.g., pre-intervention and post-intervention), focus at the group level and estimate proportion of response rather than attempt to identify any one individual as a responder or non-responder, and where appropriate, attempt to identify factors associated with response/non-response [see Hopkins (

Finally, it is important to acknowledge the differences between combining procedures outlined to identify an intervention successful for an individual (e.g., true score change CI's and SWC, as demonstrated in SF-S9), and estimating the proportion of response in group-based interventions (SF-S11). With the former, there is no attempt to distinguish between intervention and non-intervention causes of change. In addition, the procedures outlined for the individual are heavily influenced by the relative magnitudes of measurement error and SWC. The approach described herein, requires that an individual's observed score change exceeds the SWC by, at least, the TE of the test. In scenarios where the TE is large, individuals will typically require true score changes substantially beyond SWC to identify an intervention as a success. Note, this conservative approach is required to routinely avoid individuals obtaining observed score changes greater than the SWC due to the randomness of measurement error alone. In contrast, the procedures described in section 3 to estimate proportion of response do distinguish between intervention and non-intervention causes of change. Estimating the proportion of response using this approach, is to some extent, less influenced by large measurement errors. This is due to the fact that the effects of measurement error are accounted for by variation observed in the control group and are thus removed from the final calculation. With greater participant numbers in the intervention and control group, estimates will become more precise and uncertainty reduced. As a result of these differences, it is possible that the proportion of individuals identified to experience a successful intervention (SF-S9), and the estimate of the proportion of response (SF-S11) will be different. Given the infrequent data collection points routinely used in practice (e.g., pre- and post-intervention), caution is required when interpreting at the level of individuals and it should be remembered that CI's are to be interpreted over the long-run. In scenarios where large measurement errors occur, practitioners/researchers can use knowledge of group-based estimates of response, to provide greater context when evaluating data observed from individuals.

A personalized approach to sports nutrition is increasing in popularity due to recognition of the myriad of factors that influence individual response to nutrition and exercise related interventions. The presence of measurement error and biological variation renders identification of baseline values, change values and response status challenging, thus strategies to account for these issues have been proposed, enabling practitioners, and researchers to make informed decisions and judgements from the data they collect.

ED and PS originally conceived the idea for this review. PS provided the statistical expertise and lead the writing of the review, along with the development of the Supplementary Files, with support from BH. Ongoing critical input was received from BS, BG, and ED. All authors read and approved the final manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

ED (2015/11328-2 & 2017/09635-0), BS (2016/50438-0 & 2017/04973-4), and BG (2013/14746-4) were all supported by research grants from the

The Supplementary Material for this article can be found online at:

A maintained version of this Supplementary File can be found at github.com/sportscientist.

^{1}The calculation used to obtain a true score change is _{post} − _{pre})± TE we have set

For the calculation of CIs it is useful to introduce additional notation and concepts. The first is the notation: 100(1 − α)%, which describes the width of the CI. Here, α is a variable that we choose to set the interval and importantly link the width to the correct multiple of our TE estimate. For example, to set a 90% CI then α must be set to α = 0.1 to give 100(1 − 0.1)% = 90%. Given the consistent assumptions that observed scores are normally distributed we evoke the relevant properties of the distribution, such that a 100(1 − α)% CI for true score is obtained with _{s} ± _{(1−α/2)}. The coefficient _{(1−α/2)} is referred to as the (1 − α/2)-th quantile of the standard normal distribution. In our example where we set α to 0.1 (i.e., for a 90% confidence interval), we require _{(1−0.1/2)}, or the 0.95th quantile of the standard distribution. To obtain this value we can look up standard statistical tables or use software such as MS Excel. Using these methods, we find that _{0.95} is equal to 1.64 and so a 90% true sore CI for an individual would equal _{s} ±

It is important to acknowledge that we can never definitively state the TE and studies only report imperfect estimates _{19, 0.95} = 1.73 and so our 90% true score CI is calculated with _{19, 0.75} = 0.69 to give