BETTER CONFIDENCE INTERVALS FOR A BINOMIAL PROPORTION

The parameter is also called binomial proportion. In practice the value of the parameter is usually unknown and must be estimated from a sample. Let X be a number of successes in a random sample of size n. The maximum likelihood estimator for from the sample is p X/n. This estimator is unbiased and consistent. The 100!(1 )% two-sided confidence interval for parameter is an interval pL, pU such as P(pL pU) 1 , where (1 ) is the desired confidence coefficient, (0,1).


Introduction
Interval estimation of a binomial proportion is one of the basic problems in statistics. In technical practice the binomial proportion is often used in statistical quality control.
Let random variable X follow a binomial distribution with parameters n ʦ N and ʦ (0,1), abbreviated X ‫ف‬ Bi(n,). The probability that a random variable X is equal to the value x is given by (1) The parameter is also called binomial proportion. In practice the value of the parameter is usually unknown and must be estimated from a sample. Let X be a number of successes in a random sample of size n. The maximum likelihood estimator for from the sample is p ϭ X/n. This estimator is unbiased and consistent. The 100и(1Ϫ␣)% two-sided confidence interval for parameter is an interval ͗p L , p U ͘ such as P(p L Յ Յ p U ) Ն 1 Ϫ ␣, where (1Ϫ␣) is the desired confidence coefficient, ␣ ʦ (0,1).
Due to the discrete nature of the binomial distribution the interval estimation of binomial proportion is a complicated problem. The standard Wald interval (Laplace, 1812) and the exact Clopper-Pearson interval (Clopper -Pearson, 1934) are the most common and most frequently used intervals. They are presented in the majority of statistical literature. The standard Wald interval (Wald) is based on the standard normal approximation to the binomial distribution. This interval is simple to compute, it is narrow, but the , , , , P X x n x x n 1 01 x n x f r r = = -= dĥ n h interval has a poor performance. It is known that its coverage probability behaves irregularly even when is not close to 0 and 1. The coverage probability is below a nominal level even for very large sample sizes. It is known that Wald interval has a problem with the zero width interval and overshoot (the lower bound can be below 0 and the upper bound can be above 1). Many autors have pointed out that this interval should not be used (Vollset, 1993, Newcombe, 1998, Brown, Cai, DasGupta, 2001, Pires, Amado, 2008.
The exact Clopper-Pearson interval is based on the exact binomial distribution. This interval eliminates overshoot and zero width intervals and it is known that this interval is strictly conservative and too wide (Newcombe, 1998, Brown, Cai, DasGupta, 2001, Pires, Amado, 2008. Its coverage probability is always equal to or above the nominal level. In this paper we recommend the alternatives of confidence intervals for binomial proportion that have better performance and are often used in practice, but they are presented sporadically in the basic statistical literature. Here we consider the confidence intervals methods that are based on the standard normal approximation: Wilson score interval (Wilson), Wilson score interval with continuity correction (Wilson+CC), Agresti-Coull interval (Agresti-Coull), and finally the interval that is based on the Bayesian approach: Jeffreys interval (Jeffreys).
interval length, average expected length, root mean square error. We summarize the results for the coverage probability in terms of the observed minimum coverage probability and the average coverage probability and we classify the alternatives of confidence intervals into two classes of acceptable intervals-strictly conservative intervals and intervals that are not strictly conservative, but conservative on average.
Our recommendation of these selected alternatives of confidence intervals is based on our investigations of these intervals and on the existing comparative studies that were presented in recent statistical literature, see e. g. Newcombe (1998)

Alternatives of Confidence Intervals
Clopper -Pearson interval. The exact Clopper -Pearson interval (Clopper -Pearson, 1934) is based on inverting two-sided binomial tests on the null hypothesis H 0 : ϭ 0 against the alternative H 1 : 0 . If X ϭ x is observed, the lower and upper bounds are the solutions of the equations (2) The lower and upper bounds of 100и(1Ϫ␣)% Clopper -Pearson interval for 0 Ͻ X Ͻ n are (3) where F ␣ (k 1 , k 2 ) is the ␣-quantile of F-distribution with k 1 and k 2 degrees of freedom. For X ϭ 0 is p L ϭ 0 and .
For X ϭ n is and p U ϭ 1.

Wald interval.
Wald interval (Laplace, 1812) is based on inverting Wald test and is obtained by using the Central Limit Theorem where k ␣ is the ␣-quantile of standard normal N(0,1) distribution.
For X ϭ 0 is p L ϭ 0, for X ϭ n is p U ϭ 1.
Wilson score interval with continuity correction. The continuity correction suggested by Blyth and Still (1983). The lower and upper bounds for 0 Ͻ X Ͻ n are where k ␣ is the ␣-quantile of standard normal N(0,1) distribution .

Jeffreys interval.
This interval is based on the Bayesian approach. Beta-distribution is conjugate priors for binomial distribution. Let random variable X ‫ف‬ Bi(n,) and ‫ف‬ Beta(k 1 ,k 2 ). Then the posterior distribution of is Beta(x ϩ k 1 , n Ϫ x ϩ k 2 ).

Thus 100и(1Ϫ␣)% Bayesian interval is
It is known that non-informative Jeffreys prior is .
Then the lower and upper bounds of 100и(1Ϫ␣)% Jeffreys interval are (8) where Beta(k 1 ,k 2 ) is the ␣-quantile of Beta-distribution with k 1 and k 2 degrees of freedom.
For X ϭ 0 is p L ϭ 0, for X ϭ n is p U ϭ 1.

Criteria for Comparing the Confidence Intervals
In this section we introduce the criteria that are used for comparing the confidence intervals.
Coverage Probability. For the fixed values n and the coverage probability is the probability that the confidence interval CI(X, n) contains the parameter . The coverage probability is defined for the given n and as (9) where 0 Ͻ π Ͻ 1, .
Due to the discrete nature of the binomial distribution the coverage probability can not be exactly equal to the nominal level (1 Ϫ ␣) at all possible values. Therefore, our goal is to construct , , , The confidence interval is conservative on average, if AVEC(n) Ն 1 Ϫ ␣.
Expected Length. The expected length of the confidence interval is defined as (11) where p L (x,n), p U (x,n) are lower and upper bounds of a particular confidence interval.
This criterion measures the confidence interval length. In addition to the coverage probability the interval length is important for evaluating the confidence interval. The confidence interval is better if it has a shorter expected interval length with the similar performance of the other criteria.

Comparsion of Confidence Intervals
In this section we demonstrate the performance of the confidence intervals which are compared in terms of the criteria mentioned above. The coverage probability, conservatism and interval length are important for evaluating the confidence intervals. To evaluate and compare the performance of confidence intervals the coverage probability was computed in 2001 values equally spaced in the interval ͗0,1͘ for n ϭ 1 to 1 000 and for ␣ ϭ 0.05. The calculations were performed in Matlab. As it is impossible to analyze a large number of plots we sumarize the results for the coverage probability in terms of the observed minimum coverage probability and AVEC. The confidence intervals were grouped into two classes of acceptable intervals: 2. not strictly conservative intervals, but conservative on average -intervals whose average coverage probability is at the least nominal level (1 Ϫ ␣), for all n : AVEC(n) Ն 1 Ϫ ␣.
In the first class such a confidence interval is ideal whose minimum coverage probability is equal or a little above the nominal : , min C n 1 $ r r a -r^h level (1 Ϫ ␣). In the second class such a confidence interval is ideal whose AVEC is equal or a little above the nominal level (1 Ϫ ␣) and minimum coverage probability is little below the nominal level (1 Ϫ ␣). A shorter expected length and a smaller average expected length is preferred.
Coverage probability. Fig. 1 shows the coverage probabilities of 95% confidence intervals for the case n ϭ 50. The figures for other values of n are similar to this figure. It is evident why the Wald performs poorly and why the Clopper-Pearson is known as an overly conservative interval. The Clopper-Pearson guarantees that the coverage probability is always equal to or above the nominal level (1 Ϫ ␣). The coverage probability of the Wald is very poor for π near boundaries 0 and 1. The problems with coverage probability exist even for n large. This interval has more chaotic properties and can not be used (Brown, Cai, DasGupta, 2001). The Wilson has coverage probability which fluctuates near the nominal level (1 Ϫ ␣). As n gets larger it comes to the significant improvement. The coverage probability near to boundaries 0 and 1 is problematic. The Wilson+CC falls into conservative intervals, with performance similar to the Clopper-Pearson. In comparsion to the Clopper-Pearson, the Clopper-Pearson is more conservative for π near 0 and 1. The Agresti-Coull is even more conservative especially for n small. In comparsion to the Wilson, the coverage probability is as good as the Wilson, but the Agresti-Coull is quite conservative for π near the boundaries. The Jeffreys has a coverage probability qualitatively similar to the Wilson. Its coverage probability is reasonable, except for the very deep spikes near 0 and 1. As n gets larger it comes to the improvement. Fig. 2 shows the minimum coverage probabilities of 95% confidence intervals for n ϭ 1 to 1 000.  Average expected length. Fig. 5 shows the AVEL of 95% confidence intervals for n ϭ 1 to 100. As it is showed in the figure the Clopper-Pearson and the Wilson+CC are comparable intervals and their AVEL is the biggest of all the intervals. The Wilson and the Jeffreys are comparable intervals. In comparsion to them the Agresti-Coull is larger for n small. From the given figure it is evident that as n gets larger the difference between intervals starts to wear off.

Average coverage probability.
Root mean absolute square error. Fig. 6 shows the RMSE of 95% confidence intervals for n ϭ 1 to 100. It is evident that the RMSE of the Wald is much larger than the other intervals. The Clopper-Pearson and the Wilson+CC are comparable intervals, the RMSE of the Clopper-Pearson is slightly larger than the Wilson+CC. The RMSE of the Jeffreys and the Agresti-Coull are comparable. The Wilson has the smalltest RMSE.

Concluding Remarks
In this section we summarize the classification and performance of the alternatives of confidence intervals.
The Wald interval should not be used. It performs poorly in terms of the coverage probability and the RMSE, though the expected length is short. In comparsion to the Wald all the intervals mentioned above outperform the Wald.
From the alternatives of confidence intervals mentioned above only the Clopper-Pearson is strictly conservative. The Clopper-Pearson guarantees the minimum coverage probability which is equal to or is above the nominal level. This interval is too conservative on average, too wide and has the larger RMSE.
The other confidence intervals the Wilson, the Wilson+CC, the Agresti-Coull and the Jeffreys are not strictly conservative, but are conservative on average. The Wilson and the Jeffreys have similar properties such as a relatively small length, comparable AVEL and RMSE. The Wilson has excellent properties, the coverage probability near the nominal level, except for the problems with the coverage probability for values near 0 and 1 that makes a very low minimum coverage probability. Similar problems with the minimum coverage probability exist for the Jeffreys due to unlucky deep spikes near boundaries 0 and 1. But otherwise the Jeffreys has also good properties.
The Agresti-Coull has the minimum coverage probability better than others. To compare it to the intervals in this class, except for the Wilson+CC, the Agresti-Coull is slightly conservative and wider on average, but its advantages are easy calculation and presentation.
The Wilson+CC is similar to the Clopper-Pearson. It is too conservative on average, wide and has the larger RMSE. This interval is almost strictly conservative. The coverage probability for some values that are near boundaries 0 and 1 is slightly below the nominal level. (For example, for n ϭ 15, ϭ 0.003485, ␣ ϭ 0.05 is C(n,) ϭ 0.9490.) Which method should be used in practical applications? The choice from the alternatives depends on the situation where they should be used and on preferences of users. The strictly conservative Clopper-Pearson is a choice for a situation when the coverage probability must be guaranteed to be equal to or above the nominal level. Otherwise if strict conservativeness is not a major criterion, the preference is to use the confidence intervals which are conservative on average, and their coverage probability is quite close the nominal level and are narrower. The almost strict Wilson+CC is also a valid choice. The Jeffreys is also an appropriate choice for practice but it is more complicated to compute. Considering properties of alternatives of the confidence intervals the Wilson and the Agresti-Coull are the best choice in this class. They perform very well and are simple to compute. The comparsion of 95% confidence intervals in terms of Table 1. minimum coverage probability (MCP), average coverage probability (AVEC), root mean square error (RMSE) and average expected length (AVEL) for n ϭ 10, 30, 50, 100, 500, 1 000. These recommended confidence intervals are much better to guarantee the estimation of a binomial proportion when compared with the standard and frequently used Wald interval.