Chapter 11
30 min read
Section 78 of 175

Completeness and Ancillarity

Point Estimation

Learning Objectives

Building on Previous Sections

This section builds heavily on Sufficiency (Section 04). You should understand sufficient and minimal sufficient statistics before proceeding. We'll combine these concepts to find optimal estimators!

By the end of this section, you will be able to:

Define Completeness

Understand when a sufficient statistic has "no extra parts"

🔧
Define Ancillarity

Identify statistics whose distribution is independent of θ

🔗
Apply Basu's Theorem

Prove independence between complete sufficient and ancillary statistics

🏆
Use Lehmann-Scheffé Theorem

Find the unique UMVUE using complete sufficient statistics

💡
Connect All Properties

See how sufficiency, completeness, and ancillarity work together


The Big Picture: Completing the Estimation Theory

Sufficiency tells us what information to keep. Completeness tells us we've kept exactly the right amount — no more, no less.

In Section 04, we learned that sufficient statistics capture all information about θ. But a sufficient statistic might still contain "extra" information unrelated to θ. This section addresses:

Completeness

Does T have any "useless parts"? Can we construct unbiased estimators of zero from T?

If T is complete: No such useless parts exist.

🔧Ancillarity

What statistics carry NO information about θ at all?

Ancillary statistics: Their distribution is constant across all θ.

🗺The Complete Picture of Estimation Theory
PropertyWhat It Tells UsAnalogy
SufficiencyContains all info about θ"Nothing lost"
Minimal SufficiencyMaximum compression of sufficient stat"Most compact form"
CompletenessNo useless unbiased estimators of zero"No extra baggage"
AncillarityContains no info about θ"Irrelevant to θ"

The Grand Goal

When we have a complete sufficient statistic, we can find theunique best unbiased estimator (UMVUE) for any estimable function. This is the culmination of estimation theory!


What Is Completeness?

Intuitive Understanding

Imagine a sufficient statistic as a compressed summary of your data. Completenessasks: does this summary contain any "junk" — parts that are unrelated to the parameter?

Complete Statistic

Every piece of the statistic tells us something about θ. No "dead weight".

Like a perfectly packed suitcase — everything has a purpose.
Incomplete Statistic

Contains some "noise" or "extra parts" that don't help with estimation.

Like carrying empty containers in your luggage.
Technical Insight: A complete statistic has no non-trivial unbiased estimator of zero. You can't construct a function g(T) with E[g(T)] = 0 for all θ unless g(T) = 0.

Formal Definition

📚Definition: Complete Statistic

A statistic T(X) is complete for θ if:

Eθ[g(T)]=0 for all θ    Pθ(g(T)=0)=1 for all θE_\theta[g(T)] = 0 \text{ for all } \theta \implies P_\theta(g(T) = 0) = 1 \text{ for all } \theta

In words: If a function g(T) has expected value zero for every possible θ, then g(T) must be identically zero (almost surely).

What does this mean for estimation?

💡The Key Implication

If T is complete and sufficient, then:

  • Any unbiased estimator based on T is unique
  • There cannot be two different unbiased estimators for the same thing!
  • The unbiased estimator we find must be the best

Why? If we had two unbiased estimators h1(T) and h2(T), then g(T) = h1(T) - h2(T) would be an unbiased estimator of zero. By completeness, g(T) = 0, so h1(T) = h2(T)!

🔎Understanding Completeness

A sufficient statistic T is complete if there's only ONE unbiased estimator based on T for any estimable function.

Completeness Examples

Bounded Completeness

Sometimes we have a weaker form:

📚Definition: Bounded Completeness

T is boundedly complete if for any bounded function g:

Eθ[g(T)]=0 for all θ    Pθ(g(T)=0)=1E_\theta[g(T)] = 0 \text{ for all } \theta \implies P_\theta(g(T) = 0) = 1

Complete ⇒ Boundedly complete, but not vice versa.


What Is Ancillarity?

Intuitive Understanding

An ancillary statistic is the opposite of sufficient — it containsNO information about θ. Its distribution is the same regardless of what θ is.

📊
Sufficient
Contains all info about θ
Regular Stats
Contains some info about θ
🔧
Ancillary
Contains NO info about θ
Key Insight: Ancillary statistics tell us about the "shape" or "configuration" of the data, but not about the parameter we're trying to estimate.

Formal Definition

📚Definition: Ancillary Statistic

A statistic A(X) is ancillary for θ if its distribution does not depend on θ:

Distribution of A(X) is the same for all θ\text{Distribution of } A(X) \text{ is the same for all } \theta

Equivalently: The PDF/PMF of A does not involve θ.

🔧Ancillary Statistics Visualization

An ancillary statistic has a distribution that doesn't depend on θ. It carries information about the "shape" of the data, not the parameter.

Sample Mean (NOT Ancillary)
X̄ = 9.745

Distribution depends on μ (shifts with μ)

Deviations Xi - X̄ (Ancillary for μ)
-3.465.04-4.742.495.135.51-5.952.73...

Distribution does NOT depend on μ!

Key Insight: Location-Scale Families

For location family X = μ + Z (Z has known distribution):

  • Xi - X̄ is ancillary for μ (doesn't depend on μ)
  • X̄ is sufficient for μ
  • Together they partition information!

Ancillarity Examples


Basu's Theorem

One of the most elegant results in statistics connects completeness and ancillarity:

Basu's Theorem

If T is a complete sufficient statistic and A is an ancillary statistic, then:

T!!!A(T and A are independent)T \perp\\!\\!\\!\perp A \quad (\text{T and A are independent})

This independence holds for all values of θ.

🔗Basu's Theorem: The Independence Bridge
T
Complete Sufficient
A
Ancillary
T and A are INDEPENDENT!

This holds for ALL values of θ

Example: Normal Distribution

For N(μ, σ²) with σ² known:

  • X̄ is complete sufficient for μ
  • Xi - X̄ is ancillary for μ
  • By Basu's: X̄ ⊥ (Xi - X̄)
Why This Matters
  • Proves independence without calculation
  • Simplifies variance computations
  • Key tool for theoretical statistics
  • Connects sufficiency and ancillarity

Why Is Basu's Theorem So Powerful?

  1. Proves independence without calculation: Instead of computing covariances or joint distributions, just verify completeness and ancillarity.
  2. Simplifies variance computation: If X̄ ⊥ S² (from Basu), then Var(X̄ + S²) = Var(X̄) + Var(S²).
  3. Theoretical elegance: Shows that sufficient and ancillary statistics partition information in a clean way.


The Lehmann-Scheffé Theorem

This is the crown jewel of estimation theory — it tells us how to find thebest possible unbiased estimator.

🏆Lehmann-Scheffé Theorem

Let T be a complete sufficient statistic for θ. If h(T) is an unbiased estimator of g(θ), then:

  1. h(T) is the unique unbiased estimator of g(θ) based on T
  2. h(T) is the UMVUE (Uniformly Minimum Variance Unbiased Estimator) of g(θ)

Finding UMVUE: The Recipe

📋Steps to Find UMVUE
  1. Find a complete sufficient statistic T

    Use factorization + exponential family (usually complete)

  2. Find any unbiased estimator of g(θ)

    Call it U (doesn't need to be based on T)

  3. Condition on T: h(T) = E[U | T]

    This is the Rao-Blackwell step

  4. h(T) is the UMVUE!

    Unique, minimum variance, based on complete sufficient T

🏆Finding UMVUE via Lehmann-Scheffé

Select what you want to estimate, and see how the Lehmann-Scheffé theorem gives us the Uniformly Minimum Variance Unbiased Estimator.

Estimating the Mean

📊
Distribution:X ~ N(μ, σ²), σ² known
Complete Sufficient:T = X̄
🏆
UMVUE:
📈
Variance:σ²/n

X̄ is unbiased and a function of the complete sufficient statistic, so it's UMVUE.

Connection to Rao-Blackwell

The Rao-Blackwell Theorem says conditioning on a sufficient statistic always improves (or maintains) the variance of an estimator. Combined with completeness, this gives us UMVUE:

📈Rao-Blackwell

For any unbiased U and sufficient T:

Var(E[UT])Var(U)\text{Var}(E[U|T]) \leq \text{Var}(U)

Conditioning never increases variance!

🏆+ Completeness = UMVUE

If T is also complete:

  • E[U|T] is unique
  • No other unbiased estimator can beat it
  • It's the UMVUE!

The Power of Complete Sufficiency

With a complete sufficient statistic, finding UMVUE is mechanical: find any unbiased estimator and condition on T. The result is guaranteed optimal!


Real-World Applications


Python Implementation

Let's implement these concepts and verify the theorems in Python:

🐍python
1import numpy as np
2from scipy import stats
3
4def demonstrate_completeness_uniqueness():
5    """Show that complete sufficient statistics give unique UMVUE."""
6    np.random.seed(42)
7
8    # Normal distribution: X_bar is UMVUE for mu
9    true_mu = 5.0
10    sigma = 2.0
11    n = 50
12    n_simulations = 10000
13
14    umvue_estimates = []
15    alternative_estimates = []
16
17    for _ in range(n_simulations):
18        data = np.random.normal(true_mu, sigma, n)
19
20        # UMVUE: sample mean
21        umvue = np.mean(data)
22        umvue_estimates.append(umvue)
23
24        # Alternative unbiased estimator (median for symmetric dist)
25        # Note: median is also unbiased for mu in normal
26        alt = np.median(data)
27        alternative_estimates.append(alt)
28
29    print("Demonstrating UMVUE Uniqueness (Normal Mean)")
30    print("=" * 55)
31    print(f"True mu: {true_mu}")
32    print(f"
33UMVUE (Sample Mean):")
34    print(f"  Mean of estimates: {np.mean(umvue_estimates):.4f}")
35    print(f"  Variance: {np.var(umvue_estimates):.6f}")
36    print(f"  Theoretical variance: {sigma**2/n:.6f}")
37    print(f"
38Alternative (Median):")
39    print(f"  Mean of estimates: {np.mean(alternative_estimates):.4f}")
40    print(f"  Variance: {np.var(alternative_estimates):.6f}")
41    print(f"  Theoretical variance: {np.pi/2 * sigma**2/n:.6f}")
42    print(f"
43Efficiency ratio: {np.var(umvue_estimates)/np.var(alternative_estimates):.4f}")
44    print(f"(UMVUE always has lower or equal variance)")
45
46def demonstrate_ancillarity():
47    """Show that ancillary statistics don't depend on parameter."""
48    np.random.seed(42)
49
50    n = 30
51    n_simulations = 5000
52
53    # Test for different values of mu (location parameter)
54    mus = [0, 5, 10, 100]
55    sigma = 3
56
57    print("
58Demonstrating Ancillarity: Deviations from Mean")
59    print("=" * 55)
60    print("For Normal(mu, sigma^2), deviations X_i - X_bar are ancillary for mu")
61    print()
62
63    for mu in mus:
64        # Collect variance of deviations across simulations
65        deviation_vars = []
66        ranges = []
67
68        for _ in range(n_simulations):
69            data = np.random.normal(mu, sigma, n)
70            deviations = data - np.mean(data)
71            deviation_vars.append(np.var(deviations))
72            ranges.append(np.max(data) - np.min(data))
73
74        print(f"mu = {mu:3d}: Var(deviations) = {np.mean(deviation_vars):.4f}, "
75              f"Range = {np.mean(ranges):.4f}")
76
77    print("
78Notice: Statistics are nearly identical for all mu values!")
79    print("This is ancillarity in action.")
80
81def demonstrate_basu_theorem():
82    """Verify Basu's theorem: complete sufficient & ancillary are independent."""
83    np.random.seed(42)
84
85    n = 20
86    n_simulations = 10000
87    mu_true = 5.0
88    sigma = 2.0
89
90    correlations = []
91    for _ in range(n_simulations):
92        data = np.random.normal(mu_true, sigma, n)
93
94        # Complete sufficient statistic for mu (when sigma known)
95        T = np.mean(data)
96
97        # Ancillary statistic for mu
98        A = np.var(data)  # Sample variance is ancillary for mu
99
100        correlations.append((T, A))
101
102    T_vals = [c[0] for c in correlations]
103    A_vals = [c[1] for c in correlations]
104
105    correlation = np.corrcoef(T_vals, A_vals)[0, 1]
106
107    print("
108Demonstrating Basu's Theorem")
109    print("=" * 55)
110    print(f"Sample size: n = {n}")
111    print(f"Complete sufficient statistic: T = X_bar (for mu)")
112    print(f"Ancillary statistic: A = S^2 (for mu)")
113    print()
114    print(f"Correlation between T and A: {correlation:.6f}")
115    print(f"(Should be close to 0 by Basu's theorem)")
116    print()
117    print("By Basu's theorem: T ⊥ A for all values of mu!")
118
119def find_umvue_examples():
120    """Find UMVUE for various parameters."""
121    np.random.seed(42)
122
123    print("
124Finding UMVUE via Lehmann-Scheffe")
125    print("=" * 55)
126
127    # Example 1: UMVUE for variance in Normal
128    print("
1291. UMVUE for sigma^2 in Normal(mu, sigma^2):")
130    mu, sigma = 5.0, 3.0
131    n = 30
132    n_sim = 10000
133
134    s2_estimates = []
135    biased_var_estimates = []
136
137    for _ in range(n_sim):
138        data = np.random.normal(mu, sigma, n)
139        s2_estimates.append(np.var(data, ddof=1))  # S^2 = sum(x-xbar)^2/(n-1)
140        biased_var_estimates.append(np.var(data, ddof=0))
141
142    print(f"True sigma^2: {sigma**2}")
143    print(f"UMVUE S^2 mean: {np.mean(s2_estimates):.4f} (unbiased)")
144    print(f"Biased estimator: {np.mean(biased_var_estimates):.4f}")
145
146    # Example 2: UMVUE for theta in Uniform(0, theta)
147    print("
1482. UMVUE for theta in Uniform(0, theta):")
149    theta_true = 10.0
150    n = 20
151
152    umvue_estimates = []
153    mle_estimates = []
154
155    for _ in range(n_sim):
156        data = np.random.uniform(0, theta_true, n)
157        mle = np.max(data)  # MLE (biased)
158        umvue = (n + 1) / n * mle  # UMVUE
159        mle_estimates.append(mle)
160        umvue_estimates.append(umvue)
161
162    print(f"True theta: {theta_true}")
163    print(f"UMVUE ((n+1)/n * max): {np.mean(umvue_estimates):.4f} (unbiased)")
164    print(f"MLE (max): {np.mean(mle_estimates):.4f} (biased)")
165
166    # Example 3: UMVUE for p in Bernoulli
167    print("
1683. UMVUE for p in Bernoulli(p):")
169    p_true = 0.3
170    n = 50
171
172    umvue_estimates = []
173
174    for _ in range(n_sim):
175        data = np.random.binomial(1, p_true, n)
176        umvue = np.mean(data)  # Sample proportion is UMVUE
177        umvue_estimates.append(umvue)
178
179    print(f"True p: {p_true}")
180    print(f"UMVUE (sample proportion): {np.mean(umvue_estimates):.4f}")
181    print(f"Variance: {np.var(umvue_estimates):.6f}")
182    print(f"CRLB (p(1-p)/n): {p_true*(1-p_true)/n:.6f}")
183    print("(UMVUE achieves the Cramer-Rao bound!)")
184
185# Run all demonstrations
186if __name__ == "__main__":
187    demonstrate_completeness_uniqueness()
188    demonstrate_ancillarity()
189    demonstrate_basu_theorem()
190    find_umvue_examples()

Try It Yourself

Run this code to verify the theorems numerically. Notice how Basu's theorem predicts zero correlation, and how UMVUE always has the lowest variance!


Key Insights

💡Insight 1: Completeness = No Junk

A complete statistic has no "useless" parts. You can't construct non-trivial unbiased estimators of zero from it.

💡Insight 2: Ancillary = θ-Free Information

Ancillary statistics carry information about the "shape" of data, not about θ. They're useful for diagnostics but not for estimation.

💡Insight 3: Basu Bridges Complete Sufficient and Ancillary

Complete sufficient statistics are independent of ancillary statistics. This elegant result simplifies many calculations in statistics.

💡Insight 4: Lehmann-Scheffé Gives UMVUE

With a complete sufficient statistic, any unbiased function of it is automatically UMVUE. The search for optimal estimators becomes mechanical!

💡Insight 5: Everything Connects

Sufficiency, completeness, ancillarity, efficiency, and consistency all work together. Understanding these connections is key to mastering estimation theory.


Summary

📚Symbol Glossary
SymbolNameMeaning
TComplete SufficientHas no useless parts
AAncillary StatisticDistribution free of θ
T ⊥ AIndependenceBy Basu's Theorem
UMVUEUniformly Minimum VarianceBest unbiased estimator
E[U|T]Rao-BlackwellImproves estimator U
Completeness
  • E[g(T)] = 0 for all θ ⇒ g(T) = 0
  • No non-trivial unbiased estimators of zero
  • Exponential families are complete
  • Ensures uniqueness of UMVUE
🔧Ancillarity
  • Distribution doesn't depend on θ
  • Carries no info about parameter
  • Deviations, ratios, ranges
  • Independent of complete sufficient (Basu)
🎉Chapter 11 Complete!

Congratulations! You've completed the foundations of point estimation theory:

  • Section 01: Estimators and parametric framework
  • Section 02: Bias, Variance, and MSE
  • Section 03: Consistency and Efficiency
  • Section 04: Sufficiency and Minimal Sufficiency
  • Section 05: Completeness and Ancillarity

In Chapter 12, we'll apply these concepts to specific methods: Method of Moments, Maximum Likelihood Estimation, and the EM Algorithm!

Loading comments...