This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics
This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Costeffective milking plans have been adapted to supplement the standard supervised twicedaily monthly testing scheme since the 1960s. Various methods have been proposed to estimate daily milk yields (DMY), focusing on yield correction factors. The present study evaluated the performance of existing statistical methods, including a recently proposed exponential regression model, for estimating DMY using 10fold crossvalidation in Holstein and Jersey cows. The initial approach doubled the morning (AM) or evening (PM) yield as estimated DMY in AMPM plans, assuming equal 12h AM and PM milking intervals. However, in reality, AM milking intervals tended to be longer than PM milking intervals. Additive correction factors (ACF) provided additive adjustments beyond twice AM or PM yields. Hence, an ACF model equivalently assumed a fixed regression coefficient or a multiplier of “2.0” for AM or PM yields. Similarly, a linear regression model was viewed as an ACF model, yet it estimated the regression coefficient for a single milk yield from the data. Multiplicative correction factors (MCF) represented daily to partial milk yield ratios. Hence, multiplying a yield from single milking by an appropriate MCF gave a DMY estimate. The exponential regression model was analogous to an exponential growth function with the yield from single milking as the initial state and the rate of change tuned by a linear function of milking interval. In the present study, all the methods had high precision in the estimates, but they differed considerably in biases. Overall, the MCF and linear regression models had smaller squared biases and greater accuracies for estimating DMY than the ACF models. The exponential regression model had the greatest accuracies and smallest squared biases. Model parameters were compared. Discretized milking interval categories led to a loss of accuracy of the estimates. Characterization of ACF and MCF revealed their similarities and dissimilarities and biases aroused by unequal milking intervals. The present study focused on estimating DMY in AMPM milking plans. Yet, the methods and relevant principles are generally applicable to cows milked more than two times a day.
Accurate milking data are essential for herd management and genetic improvement in dairy cattle. In reality, lactation (305 days) yields are not directly measured, but they are calculated from the testday yields, either with or without explicitly imputing DMY for nontest dates (
Nevertheless, testday yields are not directly measured either. In the US, reducedcost milking plans started to displace the standard supervised twicedaily, monthly testing scheme in the 1960s, motivated by reducing visits by a DHIA supervisor (
Various methods have been proposed to estimate daily milk, fat, and protein yields. The landmark developments date to the 1980s and 1990s, focusing on adjustment criteria in two broad categories, namely, additive (
On the other hand, MCFs are ratios of daily yield to yield from a single milking, computed for each MIC. MCFs are also referred to as ratio factors. Multiplying a yield from a single milking by an appropriate ratio factor gives an estimate of daily yield. Various MCF forms have been proposed, yet the statistical interpretations differ (
Previous studies almost exclusively assessed the accuracy of estimated daily yield in the same datasets from which the correction factors were derived (
Milking records were extracted from the data repositories maintained by the Council for Dairy Cattle Breeding (
Number (n) and percentage (%n) of milking records by parities, lactation years, and states in the Holstein and Jersey cattle, respectively.
Variable  Holstein  Jersey  

n  %n  n  %n  
Parity  1  3,006  39.9  366  30.6 
2  4,482  59.4  831  69.4  
3+  56  0.70  0  0  
SUM  7,544  100  1,197  100  
Year  2006  153  2.00  434  36.3 
2007  338  4.50  0  0  
2008  7,000  92.8  360  30.1  
2009  53  0.70  403  33.7  
SUM  7,544  100  1,197  100  
State  Vermont  1,738  23.0  4  0.30 
New York  361  4.80  182  15.2  
Pennsylvania  1,224  16.2  333  27.8  
Indiana  375  5.00  206  17.2  
Minnesota  338  4.50  0  0  
Iowa  153  2.00  434  36.3  
Delaware  511  6.80  2  0.20  
Maryland  900  11.9  0  0  
West Virginia  252  3.30  0  0  
Georgia  945  12.5  36  3.00  
Florida  747  9.90  0  0  
SUM  7,544  100  1,197  100 
The initial AMPM milking plan alternately sampled AM or PM milking on a test day throughout lactation, and the daily yield was obtained by doubling single milk weighed on each test day (
Additive correction factors are evaluated by the expected values of the differences between AM and PM yields, computed locally for each MIC, coupled with other categorical variables such as lactation months (
Given the computed ACF and a single milk yield that has been measured for cow
In the aforementioned equation, we see that an ACF model is equivalent to a regression model assuming a fixed regression coefficient (2.0) for AM or PM yield. ACF models can be fit on AM or PM milk yields separately or jointly.
An ACF model can also be fitted with continuous variables for milking interval (denoted by
Given the estimated model parameters, DMY is estimated by
The linear model approach treats DMY as the response variable. Let
In
Linear regression also offers two methods of estimating DMY. First, DMY for a cow can be estimated directly given the estimated model parameters in
The aforementioned equation is referred to as the model M3A. Second, ACF can be computed on discretized MIC, following the same formula as (7), and then DMY are estimated by the following (denoted by M3B):
Linear regression models can be defined with varying complexity (
Given the estimated model parameters, DMY is estimated directly as follows:
MCF could be derived similar to M2B, yet considering the quadratic terms, but they were not evaluated in the present study.
In the aforementioned equation,
Given the estimated PM MCF, the AM MCF can be computed indirectly (
Given the MCF (
In
Given the computed MCF, DMY is estimated by
The aforementioned model also applies to cows milked more than three times and, arguably, it applies to cows milked twice a day. In the latter case, however, the model is subject to the violation of linearity with a longer milking interval (
In the latter case (denoted by M7B), MCF are obtained by locally taking the expected value on both sides of
Similarly, by taking the firstorder Taylor series approximation of
Considering milking interval and days in milk, the exponential regression model for estimating DMY takes the following form (
By noting
The model parameters can be estimated by taking the following logarithm transformation:
As a direct approach. DMY is estimated, given the model parameter estimates (
Alternatively, MCF is computed locally for discretized MIC (
The logarithm linear regression also suggests that ACF can be computed for estimating
The performance of eight selected models and two strategies (
Statistical methods and correction factors used in the present study
Model  Equation  Additive ( 

M0 


M1 


M2A 

 
M2B 


M3A 

 
M3B 


M4 

 
M5 


M6 


M7A 


M7B 


M8A 

 
M8B 


M0 = daily milk yield (DMY) estimated by doubling morning (AM) or evening (PM) milk yield; M1 = additive correction factor (ACF) model with categorical milking interval classes (MIC) and lactation months; M2A = ACF model with continuous variables for milking interval and days in milk (DIM); M2B = M2A with ACF computed on discretized MIC; M3A = linear regression of daily milk yield on milking interval and DIM; M3B = M3A with ACF computed on discretized MIC; M4 = M3A with quadratic terms for milking interval and DIM; M5 = multiplicative correction factor (MCF) model according to
 = computing yield correction factors is not required.
The correlation between the estimate and actual DMY and the following
Here,
To infer the origin of errors, the mean squared error (MSE) of DMY estimates from the 10fold crossvalidation was decomposed into the variance (
In the aforementioned equation,
Cubic smoothing splines of the individual
Let
In the aforementioned equation,
In the Holstein cows, the mean and median of AM milking intervals were 12.3 h and 12.1 h, respectively, whereas the mean and median of PM milking intervals were 11.6 h and 11.9 h, respectively. The AM milking intervals had a wider range (5.6–23.67 h) than the PM milking intervals (5.0–18.4 h) (
Distributions of morning (AM) and evening (PM) milking interval time in Holstein cows
Longer AM milking intervals led to greater average AM milk yields (
Distribution of morning (AM) and evening (PM) milk yields in Holstein cows
Accuracy and precision are two primary measures of observational or estimation errors. For estimating DMY, accuracy tells how close an estimated DMY is to the actual value, whereas precision shows how well the estimates agree with each other. Precision was measured by the inverse of the variance of DMY estimates. The smaller the variance, the greater the precision. Decomposed MSE were shown in
Decomposed mean squared error,
Method  Holstein  Jersey  

Varb  Bias^{2}  MSE  Acc  Cor  Varb  Bias^{2}  MSE  Acc  Cor  
M0  0  22.8  22.8  0.821 (0)  0.927 (0)  0.000  14.54  14.54  0.798 (0)  0.948 (0)  
M1  0.003  11.3  11.3  0.902 (<0.001)  0.951 (<0.001)  0.012  6.718  6.730  0.895 (<0.001)  0.952 (0.001)  
M2A  <0.001  11.3  11.3  0.902 (<0.001)  0.951 (<0.001)  0.002  6.910  6.912  0.892 (<0.001)  0.952 (<0.001)  
M2B  <0.001  11.4  11.4  0.902 (<0.001)  0.951 (<0.001)  0.002  6.746  6.748  0.895 (<0.001)  0.952 (<0.001)  
M3A  <0.001  10.3  10.3  0.910 (<0.001)  0.951 (<0.001)  0.002  6.078  6.080  0.904 (<0.001)  0.953 (<0.001)  
M3B  <0.001  10.3  10.3  0.910 (<0.001)  0.951 (<0.001)  0.003  6.226  6.229  0.902 (<0.001)  0.952 (<0.001)  
M4  <0.001  10.2  10.2  0.911 (<0.001)  0.952 (<0.001)  0.025  6.280  6.305  0.901 (<0.001)  0.953 (<0.001)  
M5  0.002  11.0  11.0  0.905 (<0.001)  0.951 (<0.001)  0.029  6.707  6.736  0.895 (<0.001)  0.954 (<0.001)  
M6  0.001  11.0  11.0  0.904 (<0.001)  0.952 (<0.001)  0.008  6.517  6.525  0.898 (<0.001)  0.953 (<0.001)  
M7A  <0.001  10.9  10.9  0.905 (<0.001)  0.952 (<0.001)  0.002  6.570  6.572  0.897 (<0.001)  0.954 (<0.001)  
M7B  <0.001  11.0  11.0  0.904 (<0.001)  0.951 (<0.001)  0.004  6.910  6.914  0.892 (<0.001)  0.943 (<0.001)  
M8A  0.001  10.1  10.1  0.912 (<0.001)  0.952 (<0.001)  0.003  6.072  6.075  0.905 (<0.001)  0.954 (<0.001)  
M8B  0.001  11.0  11.0  0.910 (<0.001)  0.952 (<0.001)  0.010  6.088  6.098  0.903 (<0.001)  0.953 (<0.001) 
M0 = daily milk yield (DMY) estimated by doubling morning (AM) or evening (PM) milk yield; M1 = additive correction factor (ACF) model with categorical milking interval classes (MIC) and lactation months; M2A = ACF, model with continuous variables for milking interval and days in milk (DIM); M2B = M2A with ACF, computed on discretized MIC; M3A = linear regression of daily milk yield on milking interval and DIM; M3B = M3A with ACF, computed on discretized MIC; M4 = M3A with quadratic terms for milking interval and DIM; M5 = multiplicative correction factor (MCF) model according to
Var = variance; Bias^{2} = squared bias; MSE, mean squared error; Acc =
Numbers in the brackets were standard errors of the
The standard deviation of the mean
Correlation has been widely used to measure prediction accuracy, e.g., in genomic prediction and machine learning. However, correlation is not as informative as the
A couple of reasons are worth noting for the lower accuracies with the ACF models than linear regression models. First, an ACF model is equivalent to assuming a fixed regression coefficient for partial milk yield, which can limit its predictability. For example, consider the models M2A and M2B. With some rearrangements, these two models can be rearranged into linear regression models of DMY on milk interval and DIM, plus a variable for AM or PM milk yield with a fixed regression coefficient (
Concerning an ACF or MCF model with continuous variables for milking intervals and DIM, discretizing a continuous variable to a categorical variable often leads to loss of information (and, therefore, accuracy) to some extent.
Relative to model M0 (doubling AM or PM milk yields), ACF and MCF models have considerably improved the DMY accuracy. To probe into the details, we computed the
Distribution of individual
Furthermore, the cubic smoothing spline (
Relationships between smooth splines means of individual
Model parameters were estimated and compared for four selected models (M2A, M3A, M7A, and M8A) using all milking data in Holstein and Jersey cows; each was implemented for AM or PM milkings separately and jointly (
Estimated parameters obtained from four models (M2A, M3A, M7A, and M8A), each implemented separately or jointly for known morning (AM) or evening (PM) milk yields
Statistical model  Model parameter  Holstein  Jersey  

AM  PM  Joint  AM  PM  Joint  
M2A 

25.80 (0.431)    26.04 (0.302)  9.593 (1.170)    9.789 (0.807) 

  27.01 (0.870)  26.79 (0.285)    11.84 (0.951)  11.64 (0.692)  

−2.190 (0.035)  −2.222 (0.034)  −2.206 (0.024)  −0.898 (0.090)  −0.905 (0.085)  −0.889 (0.062)  

0.001 (3E4)  −0.001 (3E4)  4.7E5 (2E4)  0.001 (0.001)  0.001 (0.001)  1.4E4 (4E04)  
M3A 

27.76 (0.404)    26.64 (0.283)  13.52 (1.402)    11.22 (0.701) 

  28.02 (0.382)  27.35 (0.267)    12.49 (0.947)  12.90 (0.652)  

−1.898 (0.033)  −1.934 (0.034)  −1.909 (0.024)  −0.797 (0.078)  −0.782 (0.086)  −0.746 (0.059)  

−0.005 (3E4)  −0.005 (3E4)  −0.005 (2E4)  −0.003 (0.001)  −0.003 (0.001)  −0.003 (0.001)  

1.720 (0.008)  1.780 (0.008)  1.749 (0.005)  1.664 (0.017)  1.860 (0.022)  1.750 (0.014)  
M7A 

0.071 (0.008)    0.068 (0.005)  0.269 (0.029)    0.268 (0.020) 

  0.053 (0.007)  0.056 (0.005)    0.231 (0.024)  0.231 (0.017)  

0.036 (0.001)  0.037 (0.001)  0.037 (4E04)  0.021 (0.002)  0.021 (0.002)  0.021 (0.002)  

7E06 (5E06)  5E06 (5E06)  8E07 (4E06)  2E05 (1E05)  2E05 (1E05)  3.3E06 (1E05)  
M8A 

1.779 (0018)    1.856 (0.013)  1.580 (0.067)    1.575 (0.048) 

  1.946 (0.017)  1.877 (0.012)    1.621 (0.060)  1.638 (0.042)  

−0.059 (0.001)  −0.070 (0.001)  −0.065 (0.001)  −0.037 (0.005)  −0.025 (0.005)  −0.032 (0.004)  

2E04 (1E05)  2E04 (1E05)  2E04 (9E06)  3E04 (3E05)  3E04 (4E05)  3E04 (3E05)  

0.861 (0.004)  0.852 (0.004)  0.856 (0.003)  0.812 (0.010)  0.757 (0.011)  0.784 (0.008) 
M2A = additive correction factor model with continuous variables for milking interval and days in milk (DIM); M3A = linear regression of daily milk yield (DMY) on milking interval and DIM; M7A = linear regression of AM or PM and proportion of DMY, on milking interval and DIM (
The models M7A and M8A are the baseline models for the MCF models, M7B and M8B. The MCF models represented substantially different modeling strategies (
Analyzing AM and PM milk yields separately led to slightly different model parameters in Holstein and Jersey cows (
Scatterplot and linear regression fits of the actual daily milk yield against estimated daily milk yields under three scenarios:
Average DMY by milking intervals between 9 and 15 h were computed based on the estimated model parameters by joint analyses for the four selected models (M2A, M3A, M7A, and M8A), compared to the model M0 and the CSS means of actual DMY over milking interval (
Average daily milk yields were obtained from five models and smooth spline (SS) means of the daily milk yield against morning
Additive and multiplicative factors were computed based on the parameter values of the data density functions or smoothing functions. Plots of ACF and MCF by MIC are shown in
Comparison of additive correction factors
With equal (12–12 h) AM and PM milking intervals, the ACF obtained from the M1 and M2B models were all close to zero (0.09–0.123 kg in Holstein cows and 0.67–0.41 kg in Jersey cows). Because these two models each assumed a fixed regression coefficient of 2.0 for the AM or PM milk yield, we concluded that doubling AM or PM milk yields provided an approximate estimate of DMY with equal AM and PM milking intervals. Put in another way. With equal AM and PM milking intervals, the additive correction amount was zero beyond twotime AM or PM milk yield as the estimated DMY. The results agreed with some early studies. For example,
Unlike the ACF model, the MCF models implemented substantially different modeling strategies (
Estimated milk yields by doubling AM or PM milk yields were taken approximately assuming equal AM and PM milking intervals, but they were subject to large errors when AM and PM milking intervals were unequal. The more deviations of AM and PM milking intervals from 12–12 h, the larger errors it generated. ACF and MCF provided effective adjustments to the estimated DMY with unequal AM and PM milkings. ACF provided additive adjustments, evaluated by the expected difference between AM and PM milk yield for each MIC and other categorical variables when applicable. An ACF model equivalently assumed a fixed multiplier (2.0) for AM or PM milk yields. In reality, ACF models with many discrete variables are challenged by insufficient or missing data points for specific MIC categories. Similarly, a linear regression model was implemented as an ACF model which nevertheless estimated the multiplier (regression coefficient) for AM or PM milk yield from the data. Relaxing the limitation on the fixed multiplier for AM and PM milkings allowed linear regression models to fit and predict the data better than ACF models. Multiplicative correction factors were computed by ratios of daily yield to yield from a single milking. Thus, multiplying a known AM or PM yield by an MCF gave an estimated DMY. Overall, the MCF models outperformed the ACF models, providing more accurate DMY estimates in the Holstein and Jersey cows. Nevertheless, computed ACF or MCF on discretized milking interval time suffered from losing information, leading to larger errors and lower accuracies. The exponential regression model (
The present study represented a preliminary effort to revisit the existing statistical methods for estimating DMY, compared to the newly proposed exponential regression model, using milking data collected between 2006 and 2009. In a continuing effort, largescaled highresolution milking data are being collected for followup studies, jointly supported by the US Council on Dairy Cattle Breeding, the USDA Agricultural Genomics and Improvement Laboratories, and the National Dairy Herd Information Association. This is a 3year data collection project. We expect that MCF in use will be updated by then. Finally, we illustrated the methods for estimating DMY in AM and PM milking plans. Yet, these methods and principles are generally applicable, either directly or with necessary modifications, to cows milked more than two times a day.
The data analyzed in this study are subject to the following licenses/restrictions: The Holstein and Jersey milking record data maintained by CDCB are not publicly available, but can be requested to JD, subject to signing an official agreement for noncommercial use only. Requests to access these datasets should be directed to Joao Durr,
XLW conceived and designed this study, with a series of discussion meetings with GW, HDN, AM, CVT, RB, JB, and JD. XLW carried out the data extraction and data analyses and drafted this manuscript. All authors have reviewed and approved the final manuscript.
This project was financially supported by CDCB, a nonprofit organization for dairy genetic/genomic evaluations and data storage.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.