^{1}

^{2}

^{3}

^{1}

^{3}

^{3}

^{3}

^{3}

^{3}

^{1}

^{2}

^{3}

Edited by: Katharina Stärk, SAFOSO AG, Switzerland

Reviewed by: Andrea Apolloni, Agricultural Research Centre For International Development, France; Matthias Greiner, Federal Institute for Risk Assessment (BfR), Germany

Specialty section: This article was submitted to Veterinary Epidemiology and Economics, a section of the journal Frontiers in Veterinary Science

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i) to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model) and (ii) to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant

Antimicrobial resistance in livestock is a matter of public concern. In general, its occurrence is promoted by the exposure with antimicrobial substances and the subsequent selection of resistant bacteria as well as by the horizontal or vertical transfer of resistance determinants. To identify points of action for measures to prevent, reduce, and control antimicrobial resistance in farming and veterinary practice, epidemiological studies on a population level are crucial. These epidemiological studies usually lead to methodological constraints concerning sample size, zero measurements on the outcome as well as on the exposure side or associations between potential risk factors. Therefore, the study design as well as the characteristics of the collected data should be considered in the process of epidemiological model selection.

The diagnostic methods to identify antibiotic resistance as well as the statistical regression techniques that could be applied are manifold (e.g., Poisson model, negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model). To decide on the most suitable statistical model, a structured process of model selection is crucial to avoid misleading results and interpretation.

One major problem in analyzing bacterial data are structural zero measurements. Recently, two studies gave an overview about the impact of the choice of different probability distributions in the context of quantitative microbiological risk assessment to estimate the true prevalence and concentration of microorganisms in foods (

To illustrate such a modeling process with the aim to identify potential factors associated with antibiotic resistance in livestock, data from a cross-sectional investigation on fattening pig farms in Germany were analyzed. In this cross-sectional study, 48 fattening pig farms in Germany formed the study population. In each farm, 10 samples were taken and tested for resistance against cefotaxime in

The aims of this paper are to explain the model building process comparing competing models for count data with a special emphasis on the modeling of zero measurements and to compare these models using the study data as an example. Based on this comparison, we recommend a scheme how to evaluate statistical models for studies of antibiotic resistance in livestock.

The cross-sectional investigation was part of the RESET research network.^{1}

Overall, 288 fecal samples, 96 pairs of boot swab samples, and 95 dust samples were collected. The proportion of positive samples was 61% for fecal samples, 54% for boot swabs, and 10% for dust samples. In seven farms, no cefotaxime-resistant

While the questionnaire consisted of 242 items that could potentially influence the outcome of resistance, in general, the management and structure of the observed pig farms was rather homogeneous. To avoid sparse data problems, 145 items were excluded from the analyses because less than five farms fell into one of the answer categories. Prior to multivariable regression analyses, a univariable analysis was performed for each of the remaining 95 items. Variables with a

The standard method for analyzing count data is Poisson regression modeling (^{2} of counts are the same. This usually is measured with the dispersion index ϕ = σ^{2}/μ. In addition, an excess number of zeros is often observed, which could not be modeled by the Poisson distribution.

To overcome these pitfalls of the Poisson model, two common options to handle overdispersion are the negative binomial model and the quasi-Poisson model (

In epidemiology, the Poisson distribution is the most common model for the analysis of count data. Assuming a cross-sectional study design, let _{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{1},…, β_{k}

In real data sets, the assumption that the dispersion ϕ = 1 is often violated. The consequence of an underdispersion [ϕ < 1, i.e., Var(_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

One way to account for overdispersion in the regression model is to assume a negative binomial distribution for the outcome _{i}_{i}

Another way to handle moderate overdispersion (or underdispersion) is the quasi-Poisson regression model, which introduces the dispersion index ϕ in the Poisson model, so that the variance of the response is modeled as a linear function of the mean:

The dispersion index ϕ can be estimated by
_{i}

A second problem with count data can be an excessive number of zero counts (zero inflation) that cannot be suitably modeled by the Poisson or negative binomial distribution. This problem often appears in resistance research, where a lot of samples are negative [e.g., in broiler chicken, see Ref. (

In the presence of a large proportion of zero counts, a zero-inflated model or hurdle model should be considered as alternatives to the Poisson or the negative binomial models (

The idea of the zero-inflated model is that some of the zeros are modeled to be part of the Poisson (or negative binomial) distribution, while the other part of the zeros are modeled through a binomial distribution (

Unlike zero-inflated models, hurdle models do not make the distinction between different types of zeros and handle them identically (

To distinguish between zero-inflated/hurdle models with a Poisson and those with a negative binomial, the overdispersion within the count-part of the data needs to be evaluated. One option might be to evaluate the dispersion index after excluding all zero counts. However, since we cannot determine which of the zeros belong to the count-part and which belong to the zero-part, the dispersion index might be biased toward underdispersion after excluding all zeros. For a clarification, the histogram of the data distribution needs to be evaluated and compared to simulated data with overdispersion (compare Figure

In more formal detail, zero-inflated models (_{{0}}(_{count}(

The parameter vectors β and γ can be estimated using maximum likelihood. The corresponding regression equation for the mean is given by

In contrast, the probability density function of the hurdle model is defined as
_{count}(_{zero}(

Therefore, in general, zero-inflated Poisson (ZIP), zero-inflated binomial, and hurdle models are often much more appropriate to take into account the several ways of transmit resistance into farms.

To decide which model fits best to our data, we first compared the observed counts with the model predictions, followed by a comparison of the AIC. The AIC is a measure of the goodness of fit of the model, which describes the trade-off between accuracy and complexity of the model. The model with minimum AIC value is preferred. Furthermore, we calculated the Pearson residuals and plotted them against the fitted values. If the model assumptions are suitable, the residuals should randomly fall within an area representing a horizontal band. Finally, we checked the plausibility of the estimates and their SEs. Inflated estimates and SEs might give a further hint for violated model assumptions.

Given the available sample size, we applied the following procedure to reduce the number of variables considered in the model comparison process. Using backward selection based on the AIC criterion, we retained eight variables in the Poisson model. Since the Poisson model discussed here is limited to 7 degrees of freedom only, the variables with the highest

Farms with fattening pigs (%) | |
---|---|

48 | |

Moving single pigs,^{a} |
39 (81.25) |

Seperate pen for diseased pigs,^{a} |
32 (66.67) |

Use of purchased feed only,^{a} |
11 (22.92) |

Water birds in 1 km radius of farm,^{a} |
17 (35.42) |

Disinfection of livestock trail | |

Never,^{b} |
8 (16.67) |

After housing out, |
33 (68.75) |

Less frequent than after housing out, |
7 (14.58) |

Disinfection with chlorine,^{a} |
8 (16.67) |

Number of fattening pigs | |

0 to ≤1,000,^{b} |
20 (41.67) |

>1,000 to ≤1,500, |
14 (29.17) |

>1,500, |
14 (29.17) |

^{a}“No” as reference category

^{b}Reference category

Poisson |
Quasi-Poisson |
Negative binomial |
ZIP |
Hurdle |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

exp( |
SE | exp( |
SE | exp( |
SE | exp( |
SE | exp( |
SE | |||||||

Count model/Count-part | Intercept | 0.82 | 0.34 | 0.548 | 0.82 | 0.42 | 0.632 | 0.46 | 0.45 | 0.095 | 2.18 | 0.73 | 0.290 | 2.61 | 0.41 | |

Moving single pigs | 1.76 | 0.22 | 1.76 | 0.27 | 1.83 | 0.31 | 0.061 | 1.72 | 0.28 | 0.056 | 1.73 | 0.24 | ||||

Separate pen for diseased pigs | 1.74 | 0.20 | 1.74 | 0.25 | 1.85 | 0.30 | 1.50 | 0.26 | 0.127 | 1.44 | 0.21 | 0.084 | ||||

Use of purchased feed only | 0.56 | 0.19 | 0.56 | 0.24 | 0.49 | 0.29 | 0.64 | 0.24 | 0.072 | 0.65 | 0.21 | |||||

Water birds in 1 km radius of farm | 1.24 | 0.16 | 0.177 | 1.24 | 0.20 | 0.285 | 1.44 | 0.26 | 0.170 | 1.10 | 0.17 | 0.576 | 1.09 | 0.16 | 0.592 | |

Disinfection of livestock trail after housing out | 2.21 | 0.24 | 2.21 | 0.30 | 3.01 | 0.33 | 1.15 | 0.45 | 0.761 | 0.99 | 0.26 | 0.977 | ||||

Disinfection of livestock trail less frequent than after housing out | 1.64 | 0.29 | 0.086 | 1.64 | 0.36 | 0.176 | 2.46 | 0.44 | 0.95 | 0.46 | 0.914 | 0.83 | 0.31 | 0.555 | ||

Disinfection with chlorine | 1.38 | 0.19 | 0.090 | 1.38 | 0.24 | 0.182 | 1.72 | 0.31 | 0.089 | 1.24 | 0.21 | 0.307 | 1.20 | 0.19 | 0.318 | |

Number of fattening pigs (>1,000 to ≤1,500) | 1.07 | 0.17 | 0.697 | 1.07 | 0.22 | 0.756 | 1.19 | 0.29 | 0.547 | 0.93 | 0.18 | 0.696 | 0.93 | 0.18 | 0.700 | |

Number of fattening pigs (>1,500) | 2.01 | 0.18 | ^{−04} |
2.01 | 0.22 | 2.83 | 0.29 | 1.47 | 0.24 | 0.104 | 1.42 | 0.20 | 0.082 | |||

Zero-part^{a} |
Intercept | 2.81 | 1.28 | 0.419 | 3.32 | 1.18 | 0.309 | |||||||||

Separate pen for diseased pigs | 0.09 | 1.48 | 0.097 | 0.07 | 1.27 | |||||||||||

Use of purchased feed only | 6.32 | 2.16 | 0.393 | 5.04 | 1.33 | 0.223 | ||||||||||

Water birds in 1 km radius of farm | 0.10 | 2.6 | 0.382 | 0.16 | 1.55 | 0.240 | ||||||||||

Disinfection with chlorine | 0.00 | 7,210.56 | 0.998 | 0.00 | 5,847.27 | 0.998 | ||||||||||

Number of fattening pigs (>1,000 to ≤1,500) | 0.09 | 2.24 | 0.287 | 0.12 | 1.62 | 0.199 | ||||||||||

Number of fattening pigs (>1,500) | 0.02 | 3.86 | 0.331 | 0.04 | 1.66 | 0.056 | ||||||||||

Akaike information criterion (AIC) | 231.6385 | 231.6385 | 265.2818 | 233.2511 | 233.0947 |

^{a}The ZIP model estimates the probability for zero inflation and the hurdle model estimates the probability for hurdle crossing (non-zeros). Therefore, the estimates of the hurdle model were multiplied by (−1) to make them comparable to the ZIP model

Analyses were performed using R 3.0.3 (

Forty-eight fattening pig farms in Germany were investigated. For each farm, the primary outcome was the count of samples with cefotaxime-resistant

In our analysis, the Poisson model exhibited slight overdispersion (dispersion index ϕ = 1.55). Therefore, we investigated the negative binomial and quasi-Poisson models within the model selection process. The descriptive analysis of the observed data (Figure

Comparing the model predictions for the Poisson, negative binomial, ZIP, and hurdle models (Figure

The Pearson residuals for the Poisson and the negative binomial models appeared to violate the assumption of homoscedasticity (Figure

The model with the highest number of statistically significant factors associated with antibiotic resistance was the Poisson model (Table

Next, a variable selection was performed to decide which of these two models should be applied. In the main analysis, factors for the zero-part of the ZIP and hurdle model were chosen using a backward selection procedure based on the hurdle model resulting in the following factors included in the zero-part: “separate pen for diseased pigs,” “use of purchased feed,” “water birds in 1 km radius of farm,” “disinfection with chlorine,” and “number of fattening pigs” (Table

ZIP |
Hurdle |
||||||
---|---|---|---|---|---|---|---|

exp( |
SE | exp( |
SE | ||||

Count-part | Intercept | 1.54 | 0.38 | 0.264 | 2.61 | 0.41 | |

Moving single pigs | 1.82 | 0.23 | 1.73 | 0.24 | |||

Separate pen for diseased pigs | 1.66 | 0.20 | 1.44 | 0.21 | 0.084 | ||

Use of purchased feed only | 0.62 | 0.20 | 0.65 | 0.21 | |||

Water birds in 1 km radius of farm | 1.10 | 0.16 | 0.537 | 1.09 | 0.16 | 0.592 | |

Disinfection of livestock trail after housing out | 1.39 | 0.27 | 0.229 | 0.99 | 0.26 | 0.977 | |

Disinfection of livestock trail less frequent than after housing out | 1.13 | 0.32 | 0.703 | 0.83 | 0.31 | 0.555 | |

Disinfection with chlorine | 1.30 | 0.19 | 0.158 | 1.20 | 0.19 | 0.318 | |

Number of fattening pigs (>1,000 to ≤1,500) | 0.94 | 0.18 | 0.712 | 0.93 | 0.18 | 0.700 | |

Number of fattening pigs (>1,500) | 1.59 | 0.19 | 1.42 | 0.20 | 0.082 | ||

Zero-part^{a} |
Intercept | 0.38 | 0.68 | 0.151 | 0.40 | 0.58 | 0.114 |

Use of purchased feed only | >1,000 | >1,000 | 0.993 | 3.63 | 1.14 | 0.259 | |

Water birds in 1 km radius of farm | 0.00 | >1,000 | 1.000 | 0.19 | 1.26 | 0.185 | |

Number of fattening pigs (>1,000 to ≤1,500) | 0.00 | >1,000 | 1.000 | 0.22 | 1.20 | 0.214 | |

Number of fattening pigs (>1,500) | 0.00 | >1,000 | 0.988 | 0.15 | 1.30 | 0.148 | |

229.91 | 236.71 |

^{a}The ZIP model estimates the probability for zero inflation and the hurdle model estimates the probability for hurdle crossing (non-zeros). Therefore, the estimates of the hurdle model were multiplied by (−1) to make them comparable to the ZIP model

Therefore, for our dataset we conclude that the hurdle model is more stable than the ZIP model. Furthermore, interpretation of the zero-part of the hurdle model is much more in line with the underlying biology of the transmission of antibiotic resistance. Factors in the zero-part of the model may be related to farms without any resistant bacteria at all, whereas factors in the count-part of the model may be interpreted as factors associated with the success of handling the resistance problem.

In contrast to the results by Hering et al. (

In summary, we conclude, therefore, that the hurdle model is the most appropriate model to analyze our data, despite the fact that the Poisson model achieved the lowest AIC. While the residual plot of the Poisson model indicated variance heterogeneity and partly larger residuals, the model predictions and Pearson residuals indicated the hurdle model provided better model fit with plausible estimates and SEs.

In this paper, we address the problem of statistical model building and selection for studies analyzing the association of antibiotic resistance with environmental factors. These regression models based on epidemiological data on a population level are needed to identify potential intervention measures to reduce the occurrence and transmission of resistance.

The Poisson model is the most common model for count data (here, number of resistant samples in a farm). However, in many situations, this is not an appropriate model due to violations of the general model assumptions. Therefore, it is essential to evaluate the model assumptions for the Poisson model in a systematic way in order to identify relevant factors associated with resistance. Hence, the main objective of the paper was to demonstrate possible strategies for model selection, taking into account the two most important problem areas when modeling count data, namely, over- or underdispersion and zero inflation, which applies especially on epidemiological studies on the occurrence of antimicrobial resistance in livestock.

Strong overdispersion can be handled through the use of a negative binomial model. For more subtle overdispersion and even underdispersion, the quasi-Poisson model can be used to correct the SEs and resulting

In practical studies, over- or underdispersion and zero inflation are accompanied with the problem of misclassification of outcome. For the topic addressed here, observed zeros are “true zeros” and “false-negative zeros,” where within the “true zeros” farms were included with no resistance at all. As in other epidemiological studies in veterinary science, this aspect has to be discussed in detail, especially in the light of the multistep isolation procedure suggested in this study. In principal, this is due to two components, the laboratory sensitivity as well as the sampling sensitivity. In our study, the diagnostic protocol was developed with a special emphasis on identifying ESBL positives, i.e., with enrichment on the resistant bacteria. This protocol developed in the RESET-consort meanwhile is approved by the EFSA. In general, it is stated that the diagnostic sensitivity is close to 100% due to the enrichment procedures. Therefore, the impact of false-negatives may be neglected from the laboratory process and may appear only by the sampling procedure. Here, 10 samples per farm were chosen following the general concept of EFSA-baseline studies to narrow this error.^{2}

This paper investigates statistical models for the analyses of count data using a study of cefotaxime-resistant

Motivated by our analysis, we conclude that the recommendation scheme provided in Figure

Study design and data collection: JH, KH, CM, and LK. Data management and analyses: AH, CF, KI, MH, and LK. Wrote the paper: AH, CF, KI, KH, and LK.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We thank the working group at the Institute for Animal Hygiene and Environmental Health, Department of Veterinary Medicine, Free University Berlin, for laboratory processing of samples. The data this study is based on were collected as part of the RESET Consortium financially supported by the German Federal Ministry of Education and Research (BMBF) through the German Aerospace Centre (DLR), grant number 01KI1013A (RESET).

^{1}

^{2}