跳转至

单样本率置信区间

渐进正态法

假设有效率 \(p\) 服从二项分布,则 \(E(p) = p\)\(Var(p) = p(1-p)/n\),渐进正态法使用以下公式计算置信区间:2

\[ L = p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \ , \ U = p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \]

使用上述公式计算的置信区间上下限在某些情况下可能会超出 \([0, 1]\) 范围,因此置信区间宽度的计算需要考虑边界问题:

\[ d = \min\left\lbrace U, 1\right\rbrace - \max\left\lbrace L, 0\right\rbrace \]

从上述方程中解出样本量 \(n\) 需要分类讨论:

情况 1. \(U \ge 1, L \gt 0\)
\[ d = 1 - \left(p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}\right) \Rightarrow n = \frac{z_{1-\alpha/2}^2 p(1-p)}{\left(d+p-1\right)^2} \]

将上式代入条件:

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \ge 1 \Rightarrow d \ge 2(1-p) \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \gt 0 \Rightarrow d \lt 1 \]

因此,

\[ n = \frac{z_{1-\alpha/2}^2 p(1-p)}{\left(d+p-1\right)^2} \ ,\ 2(1-p) \le d \lt 1 \]
情况 2. \(U \ge 1, L \le 0\)
\[ d = 1 - 0 = 1 \]

这种情况下置信区间宽度恒等于 1,与样本量 \(n\) 无关,无实际意义。

情况 3. \(U \lt 1, L \gt 0\)
\[ d = \left(p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}\right) - \left(p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}\right) \Rightarrow n = \frac{4 z_{1-\alpha/2}^2 p(1-p)}{d^2} \]

将上式代入条件:

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \lt 1 \Rightarrow d \lt 2(1-p) \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \gt 0 \Rightarrow d \lt 2p \]

因此,

\[ n = \frac{4 z_{1-\alpha/2}^2 p(1-p)}{d^2} \ ,\ d \lt \max\left\lbrace 2p, 2(1-p) \right\rbrace \]
情况 4. \(U \lt 1, L \le 0\)
\[ d = \left(p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}\right) - 0 \Rightarrow n = \frac{z_{1-\alpha/2}^2 p(1-p)}{(d-p)^2}\ 且 \ d \gt p \]

将上式代入条件:

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \lt 1 \Rightarrow d \lt 1 \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \le 0 \Rightarrow d \ge 2p \]

因此,

\[ n = \frac{z_{1-\alpha/2}^2 p(1-p)}{(d-p)^2} \ ,\ 2p \le d \lt 1 \]

综上,

\[ n = \begin{cases} \frac{z_{1-\alpha/2}^2 p(1-p)}{\left(d+p-1\right)^2} &, 2(1-p) \le d \lt 1 \\ \frac{4 z_{1-\alpha/2}^2 p(1-p)}{d^2} &, d \lt \max\left\lbrace 2p, 2(1-p) \right\rbrace \\ \frac{z_{1-\alpha/2}^2 p(1-p)}{(d-p)^2} &, 2p \le d \lt 1 \end{cases} \]

渐进正态法(连续性校正)

渐进正态法 的基础上添加校正项 \(\frac{1}{2n}\),置信区间计算公式如下:

\[ L = p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n}\ , \ U = p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n} \]

使用上述公式计算的置信区间上下限在某些情况下可能会超出 \([0, 1]\) 范围,因此置信区间宽度的计算需要考虑边界问题:

\[ d = \min\left\lbrace U, 1 \right\rbrace - \max\left\lbrace L, 0 \right\rbrace \]

从上述方程中解出样本量 \(n\) 需要分类讨论:

情况 1. \(U \ge 1, L \gt 0\)
\[ d = 1 - \left(p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n}\right) \Rightarrow \frac{1}{2n} + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - (p + d - 1) = 0 \]

\(x = \frac{1}{\sqrt{n}}\)\(A = z_{1 - \alpha/2}\sqrt{p(1-p)}\),则:

\[ \frac{1}{2}x^2 + Ax - (p + d - 1) = 0 \]

解上述一元二次方程,取正根:

\[ x = -A + \sqrt{A^2 + 2(p + d - 1)} \]

代入 \(x = \frac{1}{\sqrt{n}}\),得:

\[ n = \frac{1}{x^2} = \frac{1}{\left(-A + \sqrt{A^2 + 2(p + d - 1)}\right)^2} \]

将上式代入条件,且根据 \(\frac{1}{2}x^2 + Ax = p + d - 1\)

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n} \ge 1 \Rightarrow p + Ax + \frac{1}{2}x^2 \ge 1 \Rightarrow d \ge 2(1-p) \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n} \gt 0 \Rightarrow p - Ax - \frac{1}{2}x^2 \gt 0 \Rightarrow d \lt 1 \]

因此,

\[ n = \frac{1}{\left(-A + \sqrt{A^2 + 2(p + d - 1)}\right)^2} \ , \ 2(1-p) \le d \lt 1 \]
情况 2. \(U \ge 1, L \le 0\)
\[ d = 1 - 0 = 1 \]

这种情况下置信区间宽度恒等于 1,与样本量 \(n\) 无关,无实际意义。

情况 3. \(U \lt 1, L \gt 0\)
\[ d = \left(p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n}\right) - \left(p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n}\right) \]

化简:

\[ d = 2z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{n} \]

\(x = \frac{1}{\sqrt{n}}\)\(A = z_{1 - \alpha/2}\sqrt{p(1-p)}\),则:

\[ x^2 + 2Ax - d = 0 \]

解上述一元二次方程,取正根:

\[ x = -A + \sqrt{A^2 + d} \]

代入 \(x = \frac{1}{\sqrt{n}}\),得:

\[ n = \frac{1}{x^2} = \frac{1}{\left(-A + \sqrt{A^2 + d}\right)^2} \]

将上式代入条件,且根据 \(x^2 + 2Ax = d\)

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n} \lt 1 \Rightarrow p + Ax + \frac{1}{2}x^2 \lt 1 \Rightarrow d \lt 2(1-p) \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n} \gt 0 \Rightarrow p - Ax - \frac{1}{2}x^2 \gt 0 \Rightarrow d \lt 2p \]

因此,

\[ n = \frac{1}{\left(-A + \sqrt{A^2 + d}\right)^2} \ , \ d \lt \min\left\lbrace 2p, 2(1-p) \right\rbrace \]
情况 4. \(U \lt 1, L \le 0\)
\[ d = \left(p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n}\right) - 0 \]

\(x = \frac{1}{\sqrt{n}}\)\(A = z_{1 - \alpha/2}\sqrt{p(1-p)}\),则:

\[ \frac{1}{2}x^2 + Ax + p - d = 0 \]

解上述一元二次方程,取正根:

\[ x = -A + \sqrt{A^2 - 2(p - d)} \]

代入 \(x = \frac{1}{\sqrt{n}}\),得:

\[ n = \frac{1}{x^2} = \frac{1}{\left(-A + \sqrt{A^2 - 2(p - d)}\right)^2} \]

将上式代入条件,且根据 \(\frac{1}{2}x^2 + Ax = d - p\)

\[ p + z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n} \lt 1 \Rightarrow p + Ax + \frac{1}{2}x^2 \lt 1 \Rightarrow d \lt 1 \]
\[ p - z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} - \frac{1}{2n} \le 0 \Rightarrow p - Ax - \frac{1}{2}x^2 \le 0 \Rightarrow d \ge 2p \]

因此,

\[ n = \frac{1}{\left(-A + \sqrt{A^2 - 2(p - d)}\right)^2} \ , \ 2p \le d \lt 1 \]

综上,

\[ n = \begin{cases} \frac{1}{\left(-A + \sqrt{A^2 + 2(p + d - 1)}\right)^2} &, 2(1-p) \le d \lt 1 \\ \frac{1}{\left(-A + \sqrt{A^2 + d}\right)^2} &, d \lt \min\left\lbrace 2p, 2(1-p) \right\rbrace \\ \frac{1}{\left(-A + \sqrt{A^2 - 2(p - d)}\right)^2} &, 2p \le d \lt 1 \end{cases} \]

其中,\(A = z_{1 - \alpha/2}\sqrt{p(1-p)}\)

Clopper-Pearson

Clopper-Pearson 法使用以下公式计算置信区间:

\[ L = \left[ 1 + \frac{n - np + 1}{np F_{\frac{\alpha}{2};\ 2np,\ 2(n - np + 1)}} \right]^{-1} \]
\[ U = \left[ 1 + \frac{n - np}{(np + 1) F_{1-\frac{\alpha}{2};\ 2(np + 1), \ 2(n - np)}} \right]^{-1} \]

置信区间宽度:

\[ d = U - L \]

上述方程的求解需要使用数值方法。

Wilson Score

Wilson Score 法使用以下公式计算置信区间:

\[ L = \frac{\left(2np + z_{1-\alpha/2}^2\right) - z_{1-\alpha/2} \sqrt{z_{1-\alpha/2}^2 + 4np(1-p)}}{2\left(n + z_{1-\alpha}^2\right)} \]
\[ U = \frac{\left(2np + z_{1-\alpha/2}^2\right) + z_{1-\alpha/2} \sqrt{z_{1-\alpha/2}^2 + 4np(1-p)}}{2\left(n + z_{1-\alpha}^2\right)} \]

置信区间宽度:

\[ d = U - L = \frac{z_{1-\alpha/2} \sqrt{z_{1-\alpha/2}^2 + 4np(1-p)}}{n + z_{1-\alpha}^2} \]

整理可得:

\[ \begin{align} & \left(n + z_{1-\alpha/2}^2\right)^2 d^2 = z_{1-\alpha/2}^2 \left(z_{1-\alpha/2}^2 + 4np(1-p)\right) \\ \Rightarrow & \left(n^2 + 2z_{1-\alpha/2}^2 n + z_{1-\alpha/2}^4\right) d^2 = z_{1-\alpha/2}^4 + 4p(1-p)z_{1-\alpha/2}^2 n \\ \Rightarrow & d^2 n^2 + 2z_{1-\alpha/2}^2 \left(d^2 - 2p(1-p)\right) n + z_{1-\alpha/2}^4 (d^2 - 1) = 0 \end{align} \]

\(A = d^2\)\(B = 2z_{1-\alpha/2}^2 \left(d^2 - 2p(1-p)\right)\)\(C = z_{1-\alpha/2}^4 (d^2 - 1)\),则:

\[ n = \frac{-B \pm \sqrt{B^2 - 4AC}}{2A} \]

此处应选取较大的那个根:

\[ n = \frac{-B + \sqrt{B^2 - 4AC}}{2A} \]
判别式 \(B^2 - 4AC \ge 0\) 的证明
\[ \begin{align} B^2 - 4AC = & 4 z_{1-\alpha/2}^4 \left(d^2 - 2p(1-p)\right)^2 - 4 d^2 z_{1-\alpha/2}^4 (d^2 - 1) \\ = & 4 z_{1-\alpha/2}^4 \left(d^4 - 4p(1-p)d^2 + 4p^2(1-p)^2 - d^4 + d^2\right) \\ = & 4 z_{1-\alpha/2}^4 \left(d^2(1-4p(1-p)) + 4p^2(1-p)^2\right) \\ = & 4 z_{1-\alpha/2}^4 \left(d^2 (1-2p)^2 + 4p^2(1-p)^2\right) \end{align} \]

由于 \(d^2 (1-2p)^2 \ge 0\)\(4p^2(1-p)^2 \ge 0\) 恒成立,因此判别式 \(B^2 - 4AC \ge 0\) 恒成立。

Wilson Score 连续性校正

Wilson Score 连续性校正的置信区间公式如下:

\[ L = \frac{\left(2np + z_{1-\alpha/2}^2 - 1\right) - z_{1-\alpha/2} \sqrt{z_{1-\alpha/2}^2 - \frac{1}{n} + 4np(1-p) + 4p - 2}}{2\left(n + z_{1-\alpha}^2\right)} \]
\[ U = \frac{\left(2np + z_{1-\alpha/2}^2 + 1\right) + z_{1-\alpha/2} \sqrt{z_{1-\alpha/2}^2 - \frac{1}{n} + 4np(1-p) - 4p + 2}}{2\left(n + z_{1-\alpha}^2\right)} \]

置信区间宽度:

\[ d = U - L \]

上述方程的求解需要使用数值方法。

Wilson Score 连续性校正置信区间宽度随样本量 \(n\) 的变化

\(p = 0.9, \alpha = 0.05\) 为例,绘制置信区间宽度随样本量 \(n\) 变化的图像如下: Wilson Score 连续性校正置信区间宽度图像

如果将 \(n\) 视为连续性变量,则随着 \(n\) 的增大,置信区间宽度先增大后减小,这可能会给数值求解带来一些麻烦。

若设定置信区间宽度为 \(0.8\),则理论上存在两个数值解,实际应取较大的解作为样本量估算结果。

brentq 要求求根区间左右两端点处的函数值异号,此时可先用 minimize_scalar 求出区间内的极大值,将极大值点作为求根区间下限,再应用 brentq 进行数值求解。