^{2}Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
Key words: QMRA; betaPoisson doseresponse model; bootstrap; confidence intervals for prediction; Monte Carol simulation.
Here, it is shown that the (95%) bootstrap confidence bands reported in microbial risk literature normally contain far less than 95% of the data points because they are not confidence bands for prediction. This implies that the existing literature confidence band may not be a good representation of the degree of uncertainty of an estimated doseresponse relation. To fill this knowledge gap in QMRA analysis, a new, samplesizedependent bootstrap (95%) confidence band construction algorithm is proposed which can normally contain 95% of the data points and the results are validated by an extensive Monte Carlo simulation study using four well studied experimental doseresponse datasets.
The rest of the paper is organized as follows. Section 2 begins with a description of the general setting for this study which includes: the notation and definitions; datasets; model specification and parameter estimation. The existing and the new bootstrap confidence band algorithms are specified in details in Section .The comparisons between the existing and the new algorithms were made and the results are presented and discussed in the last part of Section 2. The paper is completed with a conclusion section. For a better reading flow and repeatability of our research findings, an appendix section is added at the end which contains: table 4, the details of the selected real doseresponse datasets; table 5, the core parts of R code programs [11] for random sample generation from a betaPoisson doseresponse model.
A detailed description and derivation of the exponential and the betaPoisson doseresponse models are given by Haas et al..[1] For the purpose of this paper, the following model specifications are needed.
If we assume that the actual number of organisms ingested, D, follows a Poisson distribution, i.e.,D~ Poisson $({D}_{i});r$ a constant (a homogeneous susceptibility assumption); and K_{min} = 1 (a singlehit assumption), the simplest QMRA doseresponse $${P}_{I}(d)=1\text{exp(}rd),\text{(1)}$$ where is the mean/effective dose.
In Equation (1), if we take into account the individual susceptibility/host sensitivity by allowing rtofollow a beta distribution with the probability density function (pdf) $$f(r\alpha ,\beta )=\frac{\Gamma (\alpha +\beta )}{\Gamma (\alpha )\Gamma (\beta )}{r}^{\alpha 1}{(1r)}^{\beta 1},\text{(2)}$$ where $0<r<1,$ and parameters $\alpha >0$ and $\beta >0$ we obtain the wellknown betaPoisson (singlehit) doseresponse model[1]: $${P}_{I}(d)=1{\text{}}_{1}{F}_{1}(\alpha ,\alpha +\beta ,d),\text{(3)}$$ where ${}_{1}{F}_{1}(\alpha ,\alpha +\beta ,d)$ is the Kummer confluent hypergeometric function [15] defined by $${}_{1}{F}_{1}(\alpha ,\alpha +\beta ,d)=1+\frac{\Gamma (\alpha +\beta )}{\Gamma (\alpha )}{\displaystyle {\sum}_{j=1}^{\infty}\left[\frac{\Gamma (\alpha +j)}{\Gamma (\alpha +\beta =j)}\frac{{(1)}^{j1}{(d)}^{j}}{j!}\right]}.\text{(4)}$$ Since there are no analytic solutions to ${}_{1}{F}_{1}(\alpha ,\alpha +\beta ,d),$ only numeric approximation solutions or asymptotic solutions can be obtained[15, 16]. In their original paper, Furumoto and Mickey [17] derived the simple, attractive approximation betaPoisson doseresponse formula: $${P}_{I}(d)=1{\left(1+\frac{d}{\beta}\right)}^{\alpha}$$ When $\alpha \ll \beta $ and $\beta \gg 1,\text{}{P}_{I}(d)$ obtained from Equation (5) is a very good approximation to the true values of ${P}_{I}(d)$ defined by Equation (3) [1, 10]
A singlehit model reaches its maximum risk limit when r=1 in Equation (1). This is an important property of the beta Poisson model (Equation (3)) which provides an upper bound of the confidence level of the doseresponse relation. It is worth noting that this maximum risk limit property is not retained in the approximate betaPoisson model (Equation (5)) [10]
Notation 
Definition 
Ds= $\left\{{D}_{1,}{D}_{2},\mathrm{...},{D}_{m}\right\}$ 
mean dose levels (i.e., the average number of organisms per dose or average concentrations); where the meaning is clear, denote $d\equiv {D}_{i}$ for $i=1,2,\mathrm{...},m$ . 
N = $\left\{{N}_{1},{N}_{2},\mathrm{...},{N}_{m}\right\}$ 
${N}_{i}$ is the number of individuals participated in the i th exposure group ($m$ groups, each corresponding to one mean dose level ${D}_{i}$ ). 
y = $\left\{{y}_{1},{y}_{2},\mathrm{...},{y}_{m}\right\}$ 
${y}_{i}$ is the number of infected individuals in the th exposure group 
r 
survival probability of each single organism ingested 
K 
number of organisms surviving to the target site inside a host body 
K_{min} 
the threshold value of the surviving organisms, i.e., the minimum number of organisms required for causing an infection event 
${P}_{I}$ 
probability of infection estimated by a doseresponse model 
${\pi}_{i}^{o}$ 
observed proportion of infected individuals in the i th exposure group, i.e., ${\pi}_{i}^{o}={y}_{i}/{N}_{i}\text{for}i=1,2,\mathrm{...},m.$ 
${\pi}_{i}$ 
predicted binary response by a model, e.g., if response is defined by probability of infection, ${\pi}_{i}={P}_{I}({D}_{i})\equiv {P}_{I}(d)$ . 
exp(.) and log 
exp(1) = $e\approx 2.71828$ is the natural logarithm base such that log(e) = 1. 
$\Gamma (.)$ 
the gamma function [13] 
Existing bootstrap algorithm (confidence band for means) 
New bootstrap algorithm (confidence band for prediction) 

Step 1 
Generate a large number of simulation samples, e.g., N*=2000, based on Ds, N, $\hat{\alpha},\hat{\beta}$ ; Determine the plotting mean dose levels (D_{sim}). 
Step 1 
Determine at which mean dose levels (Dsim) and the number of people (N_{sim}={max(N)}) exposed to hazard for generating simulation samples; 
Step 2 
Fit a betaPoisson model to the generated simulation sample so that 2000 sets of $\left\{{\hat{\alpha}}^{*},{\hat{\beta}}^{*}\right\}$ are obtained; 
Step 2 
Generate a large number of simulation samples, e.g., N*=2000, based on D_{sim}, N_{sim}, $\hat{\alpha},\hat{\beta}$; 
step 3 
Calculate the probability of infection based on Equation (3) or (5) using parameter estimates obtained in Step 2 (2000 values at each D_{sim} level ); 
Step 3 
Calculate the probability of infection using the ratio y_{sim}/N_{sim}, where y_{sim}are the simulation sample values from Step 2 (2000 values at each D_{sim} level); 
Step 4 
Obtain a 95 percentile (2.5% to 97.5%) confidence interval for each D_{sim} level using the results from Step 3; connect all lower/upper bound points to form a confidence band about the infection probability curve. 
Step 4 
Obtain a 95 percentile (2.5% to 97.5%) confidence interval for each D_{sim} level using the results from Step 3; connect all lower/upper bound points to form a confidence band about the infection probability curve. 
The two different approaches to estimate the probabilities of infection, ${P}_{I}(d),$ have a substantial impact on the resulting confidence bands. With respect to each bootstrap sample (i.e., the simulated data) ysim, the parameter estimates ${\hat{\alpha}}^{*}$ and ${\hat{\beta}}^{*}$ are obtained first and the existing algorithm then estimates ${P}_{I}(d)$ using Equation (3) (for an exact betaPoisson model) or Equation (5) (for an approximate betaPoisson model), whereas the new algorithm directly calculates the ratio ysim/N_{sim} as the model estimates for ${P}_{I}(d)$.
Although both algorithms construct the parametric bootstrap confidence bands, it is important to realize that the band based on the existing algorithm counts for the variation of the probability of infection given a mean dose level while the band based on the new algorithm counts for the variation for the probability of infection given a specific dose.
Therefore, the difference between these two confidence bands is an analogy of the difference between confidence bands for a regression line and for prediction of a new data point in a linear regression model. Furthermore, these two algorithms are fundamentally different in evaluating the discrepancy between Equation (3) and Equation (5): the existing algorithm treats the discrepancy as a difference caused by two different doseresponse models (i.e., the exact versus the approximate models) while the new algorithm treats the discrepancy as a difference caused by two different parameter sets under the exact betaPoisson doseresponse model.
Because the new algorithm employs the proportion data as the estimates for ${P}_{I}(d)$ significant data discretization effects are expected in the resulting confidence bands when the maximum number of ${N}_{i}\text{f}s$ (denoted by max(N)) is relatively small (e.g., max(N) < 15). Because the observed proportion of infection $\hat{{\pi}_{i}^{o}}={y}_{i}/{N}_{i},$ where ${\pi}_{i}{}^{o}={\text{log}}_{{N}_{i}\to \infty}\left(\frac{{y}_{i}}{{N}_{i}}\right),$ is used as the optimization target in the ${P}_{I}(d)$ parameter estimation process for Equations (3) and (5), the (median) curve estimated using the new algorithm should converge to the analogous value using the existing algorithm for large max(N), e.g., max(N) > 50. Whereas the large sample assumption is implicitly assumed in the existing algorithm, the new algorithm is critically dependent on the choice of N_{sim}. Smaller (larger) N_{sim} would result in a wider (narrower) confidence band. With respect to each mean dose level $\left(d\equiv {D}_{i}\right),$ the number of individuals exposed to a hazard, ${N}_{i}$ (or N_{sim}) may be considered as a sample size in a binomial process. In this sense, the new algorithm confidence bands are, therefore, samplesizedependent (i.e. N_{sim} =max(N)dependent) These theoretical properties are examined in the Figures presented in next section.
Ds 
{ 0.009, 0.09, 0.9, 9, 90, 900, 9000, 90000} 

N 
{5, 7, 7, 11, 7, 8, 7, 3} 

y 
{0, 0, 1, 8, 6, 7, 5, 3} 

Approximate BP model: 
$\hat{\alpha}$ = 0.253; $\hat{\beta}$ = 0.422 
Exact BP model: 
$\hat{\alpha}$ = 0.167; $\hat{\beta}$ = 0.191 

D_{sim} 
{0.005, 0.009, 0.05, 0.09, 0.5, 0.9, 5, 9, 50, 90, 500, 900, 5000, 9000, 50000, 90000} 

N_{sim}=max(N) 
{11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11 } 
Dataset 1: Rotavirus (CJN) and infection in healthy volunteers [8, 10] 

Ds (mean dose) 
N (total) 
y(infected) 
9×10^{3} 
5 
0 
9×10^{2} 
7 
0 
9×10^{1} 
7 
1 
9 
11 
8 
9×10^{1} 
7 
6 
9×10^{2>} 
8 
7 
9×10^{3} 
7 
5 
9×10^{4} 
3 
3 
Dataset2: Campylobacter and infection in healthy volunteers [8, 10] 

Ds(mean dose) 
N(total) 
y(infected) 
8×10^{2} 
10 
5 
8×10^{3} 
10 
6 
9×10^{4} 
13 
11 
8×10^{5} 
11 
8 
1×10^{6} 
19 
15 
1×10^{8} 
5 
5 
Dataset3: Salmonella (Nontyphoid Strains) and infection in human volunteers (p399,[12]) 

Ds(mean dose) 
N(total) 
y(infected) 
1.52×10^{5} 
6 
3 
3.85×10^{5} 
8 
6 
1.35×10^{6} 
6 
6 
1.39×10^{5} 
6 
3 
7.05×10^{5} 
6 
4 
1.66×10^{6} 
6 
4 
1.5×10^{7} 
6 
4 
1.25×10^{5} 
6 
5 
6.95×10^{5} 
6 
6 
1.7×10^{6} 
6 
5 
1.2×10^{4} 
5 
2 
2.4×10^{4} 
6 
3 
6.6×10^{4} 
6 
4 
1.41×10^{5} 
6 
3 
2.56×10^{5} 
6 
5 
5.87×10^{5} 
6 
4 
8.6×10^{5} 
6 
6 
8.9×10^{4} 
6 
5 
4.48×10^{5} 
6 
4 
1.04×10^{6} 
6 
6 
3.9×10^{6} 
6 
4 
1×10^{7} 
6 
6 
2.39×10^{7} 
6 
5 
4.45×10^{7} 
6 
6 
6.73×10^{7} 
8 
8 
1.26×10^{6} 
6 
6 
4.68×10^{6} 
6 
6 
1.2×10^{4} 
6 
3 
2.4×10^{4} 
6 
4 
5.2×10^{4} 
6 
3 
9.6×10^{4} 
6 
3 
1.55×10^{5} 
6 
5 
3×10^{5} 
6 
6 
7.2×10^{5} 
5 
4 
1.15×10^{6} 
6 
6 
5.5×10^{6} 
6 
5 
2.4×10^{7} 
5 
5 
5×10^{7} 
6 
6 
1×10^{6} 
6 
6 
5.5×10^{6} 
6 
6 
1×10^{7} 
6 
5 
2×10^{7} 
6 
6 
4.1×107 
6 
6 
1.5×106 
6 
5 
7.68×106 
6 
6 
1×107 
6 
5 
1.58×105 
6 
1 
Dataset4: (292 and 295 pooling) Listeria monocytogenes and infection in mice [2] 

Ds(mean dose) 
N(total) 
y(infected) 
2 
6 
0 
5 
6 
1 
1.1102 
6 
2 
5.5×103 
10 
7 
3.24×104 
10 
7 
3.9×104 
6 
4 
5.5×104 
10 
9 
2.51×105 
10 
10 
5.5×105 
10 
10 
2.82×106 
10 
10 
An approximate betaPoisson model can perform poorly in that its predicted P_I (d) curve may go above the maximum risk line (the bold red dashed line); figure1(a) essentially reproduced the result reported by Teunis in 2000[10]. Figure 2(a) is a reproduction of figure 1(a) but with one extra curve added in – the thin red dashed line in the farthest left part of the plot.
It is noted that, in the bottom left corner region of figure 2(a), the upper dotted line exceeds the upper dotdashed line which is already above the bold red dashed line. However, the dotted line remains under the thin red dashed line. Therefore, the confidence band based on the new algorithm (dotted lines) does not fail here, because it is a samplesize dependent confidence band and the thin red dashed line is the finite sample maximum risk line (choosing the same N_{sim} values, as shown in table 3, to take account of the sampling variation in estimating the maximum risk level). On the other hand, the existing algorithm for computing the approximate betaPoisson confidence band (dotdashed lines) does violate the maximum risk limit for MLEs are obtained under the large sample assumption and the bold reddashed line (N_{sim}→ ∞) applies as the maximum risk limit. Figure 2(b) is drawn to verify how the new algorithm confidence band will change as N_{sim} changes (N_{sim} = {200}, compared with N_{sim} = {11} in figure 2(a)). Figure2(b) shows that the thin red dashed line moves much closer to the bold red dashed line and the new algorithm band (dotted lines) shrinks significantly, with the upper dotted line being always under the thinred dashed line. It is noted that the (existing algorithm) upper dotdashed line even exceeds the thin red dashed line in figure 2(b) which confirms the finding reported by Teunis [10]: the maximum risk limit property is not retained in the approximate betaPoisson model. Noting the fact that N_i is seldom more than 20 in the real doseresponse data, it is, therefore, necessary to take into account the sampling variation effects in determination of the maximum risk limits using Equation (1).
As shown by figure 1(b), by ignoring the sampling variation effects, the existing algorithm could underestimate the maximum risk in the very low dose level range (e.g., mean/ effective dose d< 0.05or log(mean/effective dose) = 3 ) which is of particular interest (and concern) in microbial risk assessment practice.
Figures 4 and 5 are drawn in order to show the impacts of the sampling variation on the new algorithm confidence band while the existing algorithm band should remain unchanged. The sampling variation is determined by the sample size N_{sim}. The plots in figure 4(a), (b), (c), and (d), show how the bands, constructed based on the new algorithm, change as the sample size N_{sim} changes.
The literature doseresponse analysis results of dataset 4 can be found in [2].
http://qmrawiki.msu.edu/index.php?title=Listeria_monocytoge nes_%28Infection%29%3A_Dose_Response_Models
(accessed on 23/04/2017). In the analysis, the author(s) compared the fit of the betaPoisson and exponential models to the data and concluded that the betaPoisson model was a adequate model. Figure 7 depicts the results of the doseresponse analysis with dataset 4. Clearly, the new algorithm has provided a much better fit.
 Haas CN, JB Rose,CP Gerba. Quantitative Microbial Risk Assessment. second ed. New York: John Wiley & Sons;INC2014.
 Rose J. RA Wiki 23 January, 2015 [cited 2017 February]; Quantitative Microbial Risk Assessment (QMRA) Wiki] Available from: http://qmrawiki.canr.msu.edu/index.php/Quantitative_Microbial_Risk_Assessment_%28QMRA%29_Wiki
 Mara DD. Sleigh PA, Blumenthal UJ, Carr RM. Health risks in wastewater irrigation: comparing estimates from quantitative microbial risk analyses and epidemiological studies. J Water Health. 2007;5(1): 3950.
 Karavarsamis N, AJ Hamilto. Estimators of annual probability of infection for quantitative microbial risk assessment. Journal of Water and Health, 2010;8(2): 365373.
 Teunis P, Takumi K, and Shinagawa K. Dose response for infection by Escherichia coli O157: H7 from outbreak data. Risk Analysis. 2004;24(2): 401407.
 Xie G, Stratton H, Lemckert C, Dunn PK, Mengersen K. A Generalized QMRA Beta‐Poisson Dose‐Response Model. Risk Analysis, 2016;36(10):19481958. doi:10.1111/risa.12561
 The Interagency MRA Guideline Workgroup, Microbial Risk Assessment Guideline, Pathogenic Microorganisms with Focus on Food and Water. 2012, U.S. Environment Protection Agency (EPA) and U.S. Department of Agriculture/Food Safety and Inspection Service (USDA/FSIS).
 Teunis P, O G. van der Heijden, JWB van der Giessen, A H Havelaar. The doseresponse relation in human volunteers for gastrointestinal pathogens. National Institute of Public Health and the Environment (RIVM); The Netherlands:1996.
 Efron B. RJ. Tibshirani. An introduction to the bootstrap. Vol. 57;1994: CRC press.
 Teunis PF, Havelaar AH. The Beta Poisson dose‐response model is not a single‐hit model. Risk Analysis. 2000;20(4): 513520.
 R Development Core Team. R: A Language and Environment for Statistical Computing. 2014. Available from: http://www.Rproject.org
 Haas CN, JB Rose, CP Gerba. Quantitative Microbial Risk Assessment. first ed. New York:John Wiley & Sons, INC;1999.
 Kreyszig E. Advanced engineering mathematics. 10th edition ed:John Wiley & Sons;2011.
 Haas CN. Conditional doseresponse relationships for microorganisms: Development and application. Risk analysis. 2002; 22(3):455463.
 Muller KE, Computing the confluent hypergeometric function, M (a, b, x). Numerische Mathematik. 2001;90(1): 179196. doi:10.1007/s002110100285
 Butler RW, Andrew T. A. Wood. Laplace Approximations for Hypergeometric Functions with Matrix Argument. The Annals of Statistics. 2002;30(4):11551177.
 Furumoto WA, Mickey R. A mathematical model for the infectivitydilution curve of tobacco mosaic virus: Theoretical considerations. Virology. 1967;32(2):216223. doi:10.1016/00426822(67)902723
 McCullagh P, J Nelder. Generalized linear models. Monographs on statistics and applied probability. Second edition London; Chapman and Hall:1989.
 Morgan B.J. Analysis of quantal response data. Vol. 46. 1992: CRC Press.
 Hankin, RK. Package ‘gsl’. 2013.