Learning Objectives
Building on Previous Sections
This section builds heavily on Sufficiency (Section 04). You should understand sufficient and minimal sufficient statistics before proceeding. We'll combine these concepts to find optimal estimators!
By the end of this section, you will be able to:
Understand when a sufficient statistic has "no extra parts"
Identify statistics whose distribution is independent of θ
Prove independence between complete sufficient and ancillary statistics
Find the unique UMVUE using complete sufficient statistics
See how sufficiency, completeness, and ancillarity work together
The Big Picture: Completing the Estimation Theory
Sufficiency tells us what information to keep. Completeness tells us we've kept exactly the right amount — no more, no less.
In Section 04, we learned that sufficient statistics capture all information about θ. But a sufficient statistic might still contain "extra" information unrelated to θ. This section addresses:
Does T have any "useless parts"? Can we construct unbiased estimators of zero from T?
If T is complete: No such useless parts exist.
What statistics carry NO information about θ at all?
Ancillary statistics: Their distribution is constant across all θ.
| Property | What It Tells Us | Analogy |
|---|---|---|
| Sufficiency | Contains all info about θ | "Nothing lost" |
| Minimal Sufficiency | Maximum compression of sufficient stat | "Most compact form" |
| Completeness | No useless unbiased estimators of zero | "No extra baggage" |
| Ancillarity | Contains no info about θ | "Irrelevant to θ" |
The Grand Goal
When we have a complete sufficient statistic, we can find theunique best unbiased estimator (UMVUE) for any estimable function. This is the culmination of estimation theory!
What Is Completeness?
Intuitive Understanding
Imagine a sufficient statistic as a compressed summary of your data. Completenessasks: does this summary contain any "junk" — parts that are unrelated to the parameter?
Every piece of the statistic tells us something about θ. No "dead weight".
Contains some "noise" or "extra parts" that don't help with estimation.
Technical Insight: A complete statistic has no non-trivial unbiased estimator of zero. You can't construct a function g(T) with E[g(T)] = 0 for all θ unless g(T) = 0.
Formal Definition
A statistic T(X) is complete for θ if:
In words: If a function g(T) has expected value zero for every possible θ, then g(T) must be identically zero (almost surely).
What does this mean for estimation?
If T is complete and sufficient, then:
- Any unbiased estimator based on T is unique
- There cannot be two different unbiased estimators for the same thing!
- The unbiased estimator we find must be the best
Why? If we had two unbiased estimators h1(T) and h2(T), then g(T) = h1(T) - h2(T) would be an unbiased estimator of zero. By completeness, g(T) = 0, so h1(T) = h2(T)!
A sufficient statistic T is complete if there's only ONE unbiased estimator based on T for any estimable function.
Completeness Examples
Bounded Completeness
Sometimes we have a weaker form:
T is boundedly complete if for any bounded function g:
Complete ⇒ Boundedly complete, but not vice versa.
What Is Ancillarity?
Intuitive Understanding
An ancillary statistic is the opposite of sufficient — it containsNO information about θ. Its distribution is the same regardless of what θ is.
Key Insight: Ancillary statistics tell us about the "shape" or "configuration" of the data, but not about the parameter we're trying to estimate.
Formal Definition
A statistic A(X) is ancillary for θ if its distribution does not depend on θ:
Equivalently: The PDF/PMF of A does not involve θ.
An ancillary statistic has a distribution that doesn't depend on θ. It carries information about the "shape" of the data, not the parameter.
Distribution depends on μ (shifts with μ)
Distribution does NOT depend on μ!
For location family X = μ + Z (Z has known distribution):
- Xi - X̄ is ancillary for μ (doesn't depend on μ)
- X̄ is sufficient for μ
- Together they partition information!
Ancillarity Examples
Basu's Theorem
One of the most elegant results in statistics connects completeness and ancillarity:
If T is a complete sufficient statistic and A is an ancillary statistic, then:
This independence holds for all values of θ.
This holds for ALL values of θ
For N(μ, σ²) with σ² known:
- X̄ is complete sufficient for μ
- Xi - X̄ is ancillary for μ
- By Basu's: X̄ ⊥ (Xi - X̄)
- Proves independence without calculation
- Simplifies variance computations
- Key tool for theoretical statistics
- Connects sufficiency and ancillarity
Why Is Basu's Theorem So Powerful?
- Proves independence without calculation: Instead of computing covariances or joint distributions, just verify completeness and ancillarity.
- Simplifies variance computation: If X̄ ⊥ S² (from Basu), then Var(X̄ + S²) = Var(X̄) + Var(S²).
- Theoretical elegance: Shows that sufficient and ancillary statistics partition information in a clean way.
The Lehmann-Scheffé Theorem
This is the crown jewel of estimation theory — it tells us how to find thebest possible unbiased estimator.
Let T be a complete sufficient statistic for θ. If h(T) is an unbiased estimator of g(θ), then:
- h(T) is the unique unbiased estimator of g(θ) based on T
- h(T) is the UMVUE (Uniformly Minimum Variance Unbiased Estimator) of g(θ)
Finding UMVUE: The Recipe
- Find a complete sufficient statistic T
Use factorization + exponential family (usually complete)
- Find any unbiased estimator of g(θ)
Call it U (doesn't need to be based on T)
- Condition on T: h(T) = E[U | T]
This is the Rao-Blackwell step
- h(T) is the UMVUE!
Unique, minimum variance, based on complete sufficient T
Select what you want to estimate, and see how the Lehmann-Scheffé theorem gives us the Uniformly Minimum Variance Unbiased Estimator.
Estimating the Mean
X̄ is unbiased and a function of the complete sufficient statistic, so it's UMVUE.
Connection to Rao-Blackwell
The Rao-Blackwell Theorem says conditioning on a sufficient statistic always improves (or maintains) the variance of an estimator. Combined with completeness, this gives us UMVUE:
For any unbiased U and sufficient T:
Conditioning never increases variance!
If T is also complete:
- E[U|T] is unique
- No other unbiased estimator can beat it
- It's the UMVUE!
The Power of Complete Sufficiency
With a complete sufficient statistic, finding UMVUE is mechanical: find any unbiased estimator and condition on T. The result is guaranteed optimal!
Real-World Applications
Python Implementation
Let's implement these concepts and verify the theorems in Python:
1import numpy as np
2from scipy import stats
3
4def demonstrate_completeness_uniqueness():
5 """Show that complete sufficient statistics give unique UMVUE."""
6 np.random.seed(42)
7
8 # Normal distribution: X_bar is UMVUE for mu
9 true_mu = 5.0
10 sigma = 2.0
11 n = 50
12 n_simulations = 10000
13
14 umvue_estimates = []
15 alternative_estimates = []
16
17 for _ in range(n_simulations):
18 data = np.random.normal(true_mu, sigma, n)
19
20 # UMVUE: sample mean
21 umvue = np.mean(data)
22 umvue_estimates.append(umvue)
23
24 # Alternative unbiased estimator (median for symmetric dist)
25 # Note: median is also unbiased for mu in normal
26 alt = np.median(data)
27 alternative_estimates.append(alt)
28
29 print("Demonstrating UMVUE Uniqueness (Normal Mean)")
30 print("=" * 55)
31 print(f"True mu: {true_mu}")
32 print(f"
33UMVUE (Sample Mean):")
34 print(f" Mean of estimates: {np.mean(umvue_estimates):.4f}")
35 print(f" Variance: {np.var(umvue_estimates):.6f}")
36 print(f" Theoretical variance: {sigma**2/n:.6f}")
37 print(f"
38Alternative (Median):")
39 print(f" Mean of estimates: {np.mean(alternative_estimates):.4f}")
40 print(f" Variance: {np.var(alternative_estimates):.6f}")
41 print(f" Theoretical variance: {np.pi/2 * sigma**2/n:.6f}")
42 print(f"
43Efficiency ratio: {np.var(umvue_estimates)/np.var(alternative_estimates):.4f}")
44 print(f"(UMVUE always has lower or equal variance)")
45
46def demonstrate_ancillarity():
47 """Show that ancillary statistics don't depend on parameter."""
48 np.random.seed(42)
49
50 n = 30
51 n_simulations = 5000
52
53 # Test for different values of mu (location parameter)
54 mus = [0, 5, 10, 100]
55 sigma = 3
56
57 print("
58Demonstrating Ancillarity: Deviations from Mean")
59 print("=" * 55)
60 print("For Normal(mu, sigma^2), deviations X_i - X_bar are ancillary for mu")
61 print()
62
63 for mu in mus:
64 # Collect variance of deviations across simulations
65 deviation_vars = []
66 ranges = []
67
68 for _ in range(n_simulations):
69 data = np.random.normal(mu, sigma, n)
70 deviations = data - np.mean(data)
71 deviation_vars.append(np.var(deviations))
72 ranges.append(np.max(data) - np.min(data))
73
74 print(f"mu = {mu:3d}: Var(deviations) = {np.mean(deviation_vars):.4f}, "
75 f"Range = {np.mean(ranges):.4f}")
76
77 print("
78Notice: Statistics are nearly identical for all mu values!")
79 print("This is ancillarity in action.")
80
81def demonstrate_basu_theorem():
82 """Verify Basu's theorem: complete sufficient & ancillary are independent."""
83 np.random.seed(42)
84
85 n = 20
86 n_simulations = 10000
87 mu_true = 5.0
88 sigma = 2.0
89
90 correlations = []
91 for _ in range(n_simulations):
92 data = np.random.normal(mu_true, sigma, n)
93
94 # Complete sufficient statistic for mu (when sigma known)
95 T = np.mean(data)
96
97 # Ancillary statistic for mu
98 A = np.var(data) # Sample variance is ancillary for mu
99
100 correlations.append((T, A))
101
102 T_vals = [c[0] for c in correlations]
103 A_vals = [c[1] for c in correlations]
104
105 correlation = np.corrcoef(T_vals, A_vals)[0, 1]
106
107 print("
108Demonstrating Basu's Theorem")
109 print("=" * 55)
110 print(f"Sample size: n = {n}")
111 print(f"Complete sufficient statistic: T = X_bar (for mu)")
112 print(f"Ancillary statistic: A = S^2 (for mu)")
113 print()
114 print(f"Correlation between T and A: {correlation:.6f}")
115 print(f"(Should be close to 0 by Basu's theorem)")
116 print()
117 print("By Basu's theorem: T ⊥ A for all values of mu!")
118
119def find_umvue_examples():
120 """Find UMVUE for various parameters."""
121 np.random.seed(42)
122
123 print("
124Finding UMVUE via Lehmann-Scheffe")
125 print("=" * 55)
126
127 # Example 1: UMVUE for variance in Normal
128 print("
1291. UMVUE for sigma^2 in Normal(mu, sigma^2):")
130 mu, sigma = 5.0, 3.0
131 n = 30
132 n_sim = 10000
133
134 s2_estimates = []
135 biased_var_estimates = []
136
137 for _ in range(n_sim):
138 data = np.random.normal(mu, sigma, n)
139 s2_estimates.append(np.var(data, ddof=1)) # S^2 = sum(x-xbar)^2/(n-1)
140 biased_var_estimates.append(np.var(data, ddof=0))
141
142 print(f"True sigma^2: {sigma**2}")
143 print(f"UMVUE S^2 mean: {np.mean(s2_estimates):.4f} (unbiased)")
144 print(f"Biased estimator: {np.mean(biased_var_estimates):.4f}")
145
146 # Example 2: UMVUE for theta in Uniform(0, theta)
147 print("
1482. UMVUE for theta in Uniform(0, theta):")
149 theta_true = 10.0
150 n = 20
151
152 umvue_estimates = []
153 mle_estimates = []
154
155 for _ in range(n_sim):
156 data = np.random.uniform(0, theta_true, n)
157 mle = np.max(data) # MLE (biased)
158 umvue = (n + 1) / n * mle # UMVUE
159 mle_estimates.append(mle)
160 umvue_estimates.append(umvue)
161
162 print(f"True theta: {theta_true}")
163 print(f"UMVUE ((n+1)/n * max): {np.mean(umvue_estimates):.4f} (unbiased)")
164 print(f"MLE (max): {np.mean(mle_estimates):.4f} (biased)")
165
166 # Example 3: UMVUE for p in Bernoulli
167 print("
1683. UMVUE for p in Bernoulli(p):")
169 p_true = 0.3
170 n = 50
171
172 umvue_estimates = []
173
174 for _ in range(n_sim):
175 data = np.random.binomial(1, p_true, n)
176 umvue = np.mean(data) # Sample proportion is UMVUE
177 umvue_estimates.append(umvue)
178
179 print(f"True p: {p_true}")
180 print(f"UMVUE (sample proportion): {np.mean(umvue_estimates):.4f}")
181 print(f"Variance: {np.var(umvue_estimates):.6f}")
182 print(f"CRLB (p(1-p)/n): {p_true*(1-p_true)/n:.6f}")
183 print("(UMVUE achieves the Cramer-Rao bound!)")
184
185# Run all demonstrations
186if __name__ == "__main__":
187 demonstrate_completeness_uniqueness()
188 demonstrate_ancillarity()
189 demonstrate_basu_theorem()
190 find_umvue_examples()Try It Yourself
Run this code to verify the theorems numerically. Notice how Basu's theorem predicts zero correlation, and how UMVUE always has the lowest variance!
Key Insights
A complete statistic has no "useless" parts. You can't construct non-trivial unbiased estimators of zero from it.
Ancillary statistics carry information about the "shape" of data, not about θ. They're useful for diagnostics but not for estimation.
Complete sufficient statistics are independent of ancillary statistics. This elegant result simplifies many calculations in statistics.
With a complete sufficient statistic, any unbiased function of it is automatically UMVUE. The search for optimal estimators becomes mechanical!
Sufficiency, completeness, ancillarity, efficiency, and consistency all work together. Understanding these connections is key to mastering estimation theory.
Summary
| Symbol | Name | Meaning |
|---|---|---|
| T | Complete Sufficient | Has no useless parts |
| A | Ancillary Statistic | Distribution free of θ |
| T ⊥ A | Independence | By Basu's Theorem |
| UMVUE | Uniformly Minimum Variance | Best unbiased estimator |
| E[U|T] | Rao-Blackwell | Improves estimator U |
- E[g(T)] = 0 for all θ ⇒ g(T) = 0
- No non-trivial unbiased estimators of zero
- Exponential families are complete
- Ensures uniqueness of UMVUE
- Distribution doesn't depend on θ
- Carries no info about parameter
- Deviations, ratios, ranges
- Independent of complete sufficient (Basu)
Congratulations! You've completed the foundations of point estimation theory:
- Section 01: Estimators and parametric framework
- Section 02: Bias, Variance, and MSE
- Section 03: Consistency and Efficiency
- Section 04: Sufficiency and Minimal Sufficiency
- Section 05: Completeness and Ancillarity
In Chapter 12, we'll apply these concepts to specific methods: Method of Moments, Maximum Likelihood Estimation, and the EM Algorithm!