Chapter 15
30 min read
Section 102 of 175

Likelihood Ratio Tests

Common Statistical Tests

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

  • • Understand the principle of comparing model likelihoods
  • • Derive and interpret the likelihood ratio statistic
  • • State Wilks' theorem and its conditions
  • • Calculate degrees of freedom for nested model comparisons
  • • Connect LRT to information criteria (AIC, BIC)

🔧 Practical Skills

  • • Perform likelihood ratio tests for common distributions
  • • Compare nested regression models using LRT
  • • Implement LRT in Python with scipy and statsmodels
  • • Choose between LRT, Wald, and Score tests

🧠 AI/ML Applications

  • Feature Selection - Test whether adding features significantly improves model fit
  • Model Comparison - Compare neural network architectures with nested structures
  • Regularization - Understand connection between LRT and penalty terms (L1, L2)
  • Cross-Entropy Connection - See how likelihood maximization relates to minimizing cross-entropy loss
  • Mixture Models - Test number of components in GMMs via LRT variants
Central Message: The Likelihood Ratio Test provides a unified, principled framework for comparing any two nested statistical models. It answers the fundamental question: "Does adding complexity to my model significantly improve its fit to the data?"

The Big Picture: A Unified Framework

Imagine you're building a machine learning model and face a crucial decision: Should I add more features? More layers? More parameters? Every addition increases your model's capacity to fit the training data, but also risks overfitting. You need a principled way to decide when added complexity is truly justified by the evidence in your data.

The Fundamental Question

"Given two nested models, does the more complex model fit the data significantlybetter than the simpler model, or could the improvement be due to chance?"

The Likelihood Ratio Test (LRT) answers this question elegantly by comparing how well each model explains the observed data. The test is remarkably general—it works for virtually any parametric model where we can compute the likelihood.

Historical Origins: Neyman, Pearson, and Wilks

📜

The Birth of Modern Hypothesis Testing

The LRT emerges from the collaborative work of Jerzy Neyman and Egon Pearson in the 1930s, who revolutionized hypothesis testing with their Neyman-Pearson lemma (covered in Section 14.5).

In 1938, Samuel S. Wilks proved the remarkable result that the LR statistic follows a chi-square distribution asymptotically—making it practical for real-world applications.

"The likelihood ratio principle is arguably the most important single idea in the theory of testing hypotheses." — Statistical tradition

The LRT occupies a special place in statistics because it's optimal in many situations (by the Neyman-Pearson lemma) and incredibly versatile—applicable to any situation where we can write down a likelihood function.


The Likelihood Ratio Test Statistic

Mathematical Formulation

Consider two nested models:

  • H₀ (Restricted Model): Parameter lies in subset Θ0\Theta_0
  • H₁ (Full Model): Parameter lies in full space Θ\Theta

The Likelihood Ratio is:

Likelihood Ratio

λ=supθΘ0L(θ;X)supθΘL(θ;X)=L(θ^0;X)L(θ^;X)\lambda = \frac{\sup_{\theta \in \Theta_0} L(\theta; X)}{\sup_{\theta \in \Theta} L(\theta; X)} = \frac{L(\hat{\theta}_0; X)}{L(\hat{\theta}; X)}

Ratio of maximized likelihoods: restricted model vs. unrestricted model

The Likelihood Ratio Test Statistic is:

LR Test Statistic

Λ=2logλ=2[(θ^0)(θ^)]=2[(θ^)(θ^0)]\Lambda = -2 \log \lambda = -2 \left[ \ell(\hat{\theta}_0) - \ell(\hat{\theta}) \right] = 2 \left[ \ell(\hat{\theta}) - \ell(\hat{\theta}_0) \right]

where (θ)=logL(θ)\ell(\theta) = \log L(\theta) is the log-likelihood

SymbolMeaningInterpretation
λLikelihood ratio (0 ≤ λ ≤ 1)Closer to 0 = more evidence against H₀
ΛLR statistic (-2 log λ ≥ 0)Larger = more evidence against H₀
θ̂₀MLE under H₀ (restricted)Best fit possible under null hypothesis
θ̂MLE under H₁ (unrestricted)Best fit possible overall
ℓ(θ)Log-likelihoodLog of probability of data given θ
Why -2 log? The factor of -2 is chosen so that the statistic follows a chi-square distribution asymptotically. The log transformation converts the ratio to a difference, which is more convenient mathematically.

Intuition: Comparing Model Fits

The LRT has a beautiful intuitive interpretation:

When λ ≈ 1

The restricted model fits almost as well as the unrestricted model.

→ No evidence against H₀. The extra parameters don't help.

Λ ≈ 0, Fail to reject H₀

When λ ≈ 0

The unrestricted model fits much better than the restricted model.

→ Strong evidence against H₀. The extra parameters matter!

Λ → large, Reject H₀

The ML Analogy

Think of the LRT as asking: "Is the training loss improvement from adding features large enough to be real, or could it just be fitting noise?"

  • Restricted model (H₀): Like a simpler neural network (fewer layers/units)
  • Unrestricted model (H₁): Like a more complex network with additional capacity
  • The test: Does the loss improvement justify the added complexity?

Interactive: LR Statistic Explorer

Explore how the likelihood ratio statistic works by adjusting the restricted and unrestricted model parameters. See how the statistic responds to different fits.

Interactive: Likelihood Ratio Statistic Explorer

H₀H₁MLE = 0.85-10123Mean (μ)Log-Likelihood
Log-Likelihoods
ℓ(H₀):-60.97
ℓ(H₁):-58.08
Difference:2.89
LR Statistic
5.788
Λ = -2(log L₀ - log L₁)
Test Decision (df = 1)
p-value:0.0309
χ²₀.₀₅:3.841
Reject H₀
Interpretation: The LR statistic measures how much better the unrestricted model fits the data compared to the restricted model. Under H₀, Λ ~ χ²(1). Try moving the restricted mean away from the MLE to see the statistic increase.

Wilks' Theorem: The Asymptotic Distribution

The practical power of the LRT comes from Wilks' theorem, which tells us exactly what distribution the test statistic follows under the null hypothesis.

Theorem Statement and Conditions

Wilks' Theorem (1938)

Under H₀ and appropriate regularity conditions, as nn \to \infty:

Λ=2logλdχ2(r)\Lambda = -2 \log \lambda \xrightarrow{d} \chi^2(r)

where r=dim(Θ)dim(Θ0)r = \dim(\Theta) - \dim(\Theta_0) is the difference in number of free parameters

Regularity conditions include:

  1. The true parameter is an interior point of the parameter space
  2. The log-likelihood is three times differentiable
  3. The Fisher information matrix is positive definite
  4. The models are nested (H₀ is a special case of H₁)
Why Chi-Square? The intuition is that near the MLE, the log-likelihood is approximately quadratic (Taylor expansion). Under H₀, the MLE is constrained, creating a quadratic loss that follows a chi-square distribution with degrees of freedom equal to the number of constraints.

Calculating Degrees of Freedom

ComparisonFull Model ParamsRestricted Model Paramsdf
Mean = 0 vs Mean free (Normal)2 (μ, σ²)1 (σ²)1
Linear vs Quadratic regression3 (β₀, β₁, σ²)4 (β₀, β₁, β₂, σ²)1
ANOVA: k groups equal vs different meansk+12k-1
Logistic: model with vs without featurep+1p1

Interactive: Wilks' Theorem Demonstration

See Wilks' theorem in action! Generate samples under H₀, compute the LR statistic for each, and watch the histogram converge to the theoretical chi-square distribution.

Interactive: Wilks' Theorem Demonstration

Wilks' theorem states that under H₀, the LR statistic asymptotically follows a chi-square distribution with degrees of freedom equal to the difference in parameters between models. Watch the histogram converge to the theoretical distribution.

χ²₀.₀₅ = 3.8403691215LR Statistic (Λ)DensitySimulated LRχ²(1) theory
Simulated Statistics
Mean:
0.878
Variance:
1.372
Theoretical χ²(1)
Mean:
1.000
Variance:
2.000
Key Insight: As sample size increases, the simulated histogram converges more closely to the theoretical χ² curve. This is Wilks' theorem in action! The approximation improves with larger samples.

Testing Nested Models

The LRT is designed for nested models—where one model is a special case (a restriction) of another. This is the most common scenario in practice.

Examples of Nested Models

Regression

H₀: y = β₀ + β₁x (linear)
H₁: y = β₀ + β₁x + β₂x² (quadratic)

Classification

H₀: Logistic with 5 features
H₁: Logistic with 8 features

Mixture Models

H₀: GMM with k components
H₁: GMM with k+1 components*

Distribution Testing

H₀: μ = μ₀ (specific value)
H₁: μ free (any value)

*Note: Standard LRT doesn't directly apply to mixture models due to boundary issues (see Limitations section)

Interactive: Nested Model Comparison

Compare a linear model against a quadratic model using the LRT. Adjust the true data-generating process to see when the quadratic term is detected as significant.

Interactive: Nested Model Comparison

Compare a simple linear model (H₀) against a more complex quadratic model (H₁). The LRT determines if the additional parameter significantly improves the fit.

0 = true model is linear, >0 = true model has curvature

xyTrue modelLinear (H₀)Quadratic (H₁)Data points
Linear Model (H₀)
y = 1.88 + 0.51x
RSS = 93.92 | R² = 0.301
log L = -86.71
Quadratic Model (H₁)
y = 1.12 + 0.51x + 0.24
RSS = 71.22 | R² = 0.470
log L = -79.79
LR Test Result
Λ = 13.83
χ²₀.₀₅(1) = 3.841 | p = 0.0041
Quadratic term significant
Try this: Set the true quadratic coefficient to 0 (linear truth) and observe how often the LRT incorrectly rejects H₀. Then increase it and watch the test gain power to detect the true curvature.

LRT vs Information Criteria

The LRT is closely related to information criteria like AIC and BIC. Understanding this connection reveals deep insights about model selection.

LRT

Λ=2(01)\Lambda = -2(\ell_0 - \ell_1)

Tests if improvement is statistically significant. Binary decision.

AIC

AIC=2+2k\text{AIC} = -2\ell + 2k

Penalizes by 2× parameters. Good for prediction-focused selection.

BIC

BIC=2+klog(n)\text{BIC} = -2\ell + k \log(n)

Stronger penalty growing with n. Consistent model selection.

The Deep Connection

When comparing two nested models, the LRT statistic equals the difference in deviances. Information criteria add penalties to this comparison:

ΔAIC = Λ - 2(k₁ - k₀) = -2(ℓ₀ - ℓ₁) - 2Δk
ΔBIC = Λ - log(n)(k₁ - k₀)

Intuition: AIC and BIC include the LRT's fit comparison but add a penalty for model complexity. The LRT at α=0.05 is approximately equivalent to choosing a model with ΔAIC > 2 or ΔBIC > log(n).

Interactive: LRT vs AIC/BIC

Compare how LRT, AIC, and BIC select among nested models of varying complexity. See how sample size affects each criterion's behavior.

Interactive: LRT vs Information Criteria

Compare the Likelihood Ratio Test with AIC and BIC for model selection. See how different criteria balance fit and complexity, and how sample size affects their behavior.

ModelParamsLog-LikAICBICLR vs M₀p-value
Intercept only1-149.1300.1302.7--
1 predictorTRUE2-133.9271.8*277.0*30.27<0.001
2 predictors3-133.9273.8281.630.27<0.001
3 predictors4-133.8275.7286.130.44<0.001
4 predictors5-133.8277.7290.730.44<0.001
5 predictors6-133.8279.6295.330.460.001
AIC Selection
Best model: 1 predictor
AIC = -2&ell; + 2k (penalizes complexity lightly)
BIC Selection
Best model: 1 predictor
BIC = -2&ell; + k·log(n) (penalizes more heavily)
LRT Selection (α = 0.05)
Most complex significant: 5 predictors
Tests sequential improvements vs intercept-only
Key insight: BIC penalizes complexity more heavily than AIC, especially for large n, making it more conservative. LRT makes binary accept/reject decisions, while information criteria provide continuous rankings. As n grows, all criteria tend to identify the true model.
Practical Guidance:
  • Use LRT when you have a specific hypothesis to test (e.g., "Is this feature important?")
  • Use AIC when optimizing for prediction accuracy
  • Use BIC when trying to identify the true data-generating process
  • In deep learning, these ideas motivate regularization and early stopping

Worked Examples


Applications in AI/ML

The likelihood ratio test has profound connections to modern machine learning, even when not used explicitly. Understanding these connections deepens your intuition about model selection.

🔍 Feature Importance Testing

In GLMs and tree-based models, LRT provides rigorous p-values for feature importance. Compare model with vs. without each feature to get statistical significance.

🐍python
1# statsmodels provides LRT automatically
2from statsmodels.stats.anova import anova_lm
3comparison = anova_lm(reduced_model, full_model)
4# Returns LR chi-square and p-value

⚖️ Cross-Entropy Loss Connection

Cross-entropy loss = -log-likelihood (for categorical outcomes). Minimizing cross-entropy is equivalent to maximizing likelihood!

CrossEntropyLoss = -Σ y·log(p) = -ℓ(θ)

🎚️ Regularization as Bayesian Prior

L2 regularization (Ridge) corresponds to a Gaussian prior on weights. L1 (Lasso) corresponds to a Laplace prior. The penalty term is like comparing to a restricted model.

L = -ℓ(θ) + λ||θ||² ≈ comparing to θ=0

🏗️ Architecture Search

When comparing neural network architectures, the LRT mindset applies: Is the validation loss improvement worth the added complexity? AIC/BIC formalize this for smaller models.

Deep Learning Caveat: The classical LRT with chi-square distribution requires certain regularity conditions that may not hold for deep neural networks (non-convex loss surfaces, many local optima). For DNNs, use cross-validation, held-out test sets, or Bayesian methods instead.

Python Implementation

Let's implement the likelihood ratio test from scratch and then see how to use established libraries.

Likelihood Ratio Test for Normal Mean
🐍python
1

Import scipy.stats for statistical distributions and optimize for numerical MLE

8

Calculate log-likelihood for normal distribution with given mean and standard deviation

15

Under H₀, mean is fixed at mu0; only variance is estimated (MLE of σ² = average squared deviation from mu0)

19

Under H₁, both mean and variance are estimated from data using MLE (sample mean and biased sample variance)

23

LR statistic is -2 times the log-likelihood difference; larger values indicate H₁ fits better

27

P-value from chi-square distribution with df=1 (one parameter difference between models)

25 lines without explanation
1from scipy import stats, optimize
2import numpy as np
3
4def lrt_normal_mean(data, mu0=0):
5    """LRT for testing H0: mu = mu0 vs H1: mu != mu0 (Normal data)"""
6    n = len(data)
7    x_bar = np.mean(data)
8
9    # Log-likelihood function for Normal
10    def log_lik(mu, sigma, data):
11        return -n/2 * np.log(2*np.pi) - n*np.log(sigma) \
12               - np.sum((data - mu)**2) / (2*sigma**2)
13
14    # MLE under H0: mu = mu0, sigma estimated
15    sigma0_sq = np.mean((data - mu0)**2)
16    sigma0 = np.sqrt(sigma0_sq)
17    ll_H0 = log_lik(mu0, sigma0, data)
18
19    # MLE under H1: both mu and sigma estimated
20    sigma1_sq = np.var(data, ddof=0)  # MLE variance
21    sigma1 = np.sqrt(sigma1_sq)
22    ll_H1 = log_lik(x_bar, sigma1, data)
23
24    # LR statistic
25    lr_stat = -2 * (ll_H0 - ll_H1)
26    df = 1
27
28    # P-value from chi-square
29    p_value = 1 - stats.chi2.cdf(lr_stat, df)
30
31    return {'statistic': lr_stat, 'p_value': p_value, 'df': df}

Now let's see the full implementation with model comparison utilities:

🐍python
1import numpy as np
2from scipy import stats
3import statsmodels.api as sm
4from statsmodels.stats.anova import anova_lm
5
6# ============================================
7# 1. LRT for Comparing Nested Linear Models
8# ============================================
9
10def lrt_nested_models(ll_restricted, ll_full, df):
11    """
12    Likelihood ratio test for comparing nested models.
13
14    Parameters
15    ----------
16    ll_restricted : float
17        Log-likelihood of the restricted (smaller) model
18    ll_full : float
19        Log-likelihood of the full (larger) model
20    df : int
21        Degrees of freedom (difference in number of parameters)
22
23    Returns
24    -------
25    dict with LR statistic, p-value, and decision
26    """
27    lr_stat = -2 * (ll_restricted - ll_full)
28    p_value = 1 - stats.chi2.cdf(lr_stat, df)
29
30    return {
31        'lr_statistic': lr_stat,
32        'p_value': p_value,
33        'df': df,
34        'reject_H0': p_value < 0.05
35    }
36
37
38# Example: Polynomial Regression Comparison
39np.random.seed(42)
40n = 100
41x = np.linspace(-3, 3, n)
42y_true = 1 + 0.5*x + 0.3*x**2
43y = y_true + np.random.normal(0, 1, n)
44
45# Fit linear model (H0)
46X_linear = sm.add_constant(x)
47model_linear = sm.OLS(y, X_linear).fit()
48
49# Fit quadratic model (H1)
50X_quad = sm.add_constant(np.column_stack([x, x**2]))
51model_quad = sm.OLS(y, X_quad).fit()
52
53# LRT comparison
54ll_linear = model_linear.llf
55ll_quad = model_quad.llf
56result = lrt_nested_models(ll_linear, ll_quad, df=1)
57print(f"Linear vs Quadratic: Λ = {result['lr_statistic']:.2f}, p = {result['p_value']:.4f}")
58
59
60# ============================================
61# 2. Using statsmodels' Built-in LRT
62# ============================================
63
64# For OLS regression
65from statsmodels.stats.anova import anova_lm
66
67anova_result = anova_lm(model_linear, model_quad)
68print("\nANOVA Table (LRT):")
69print(anova_result)
70
71
72# ============================================
73# 3. LRT for Logistic Regression
74# ============================================
75
76from sklearn.datasets import make_classification
77
78# Generate classification data
79X, y = make_classification(n_samples=500, n_features=10,
80                           n_informative=5, n_redundant=2, random_state=42)
81
82# Full model (all features)
83X_full = sm.add_constant(X)
84model_full = sm.Logit(y, X_full).fit(disp=0)
85
86# Reduced model (first 5 features)
87X_reduced = sm.add_constant(X[:, :5])
88model_reduced = sm.Logit(y, X_reduced).fit(disp=0)
89
90# LRT
91lr_stat = -2 * (model_reduced.llf - model_full.llf)
92df = X_full.shape[1] - X_reduced.shape[1]  # difference in parameters
93p_value = 1 - stats.chi2.cdf(lr_stat, df)
94
95print(f"\nLogistic Regression LRT:")
96print(f"  Full model LL: {model_full.llf:.2f}")
97print(f"  Reduced model LL: {model_reduced.llf:.2f}")
98print(f"  LR statistic: {lr_stat:.2f}")
99print(f"  df: {df}")
100print(f"  p-value: {p_value:.4f}")
101
102
103# ============================================
104# 4. LRT for Distribution Parameters
105# ============================================
106
107def lrt_exponential_rate(data, lambda0):
108    """Test H0: lambda = lambda0 vs H1: lambda != lambda0 for Exp(lambda)"""
109    n = len(data)
110
111    # MLE under H1
112    lambda_mle = 1 / np.mean(data)
113
114    # Log-likelihoods
115    ll_H0 = n * np.log(lambda0) - lambda0 * np.sum(data)
116    ll_H1 = n * np.log(lambda_mle) - lambda_mle * np.sum(data)
117
118    lr_stat = -2 * (ll_H0 - ll_H1)
119    p_value = 1 - stats.chi2.cdf(lr_stat, df=1)
120
121    return {
122        'lr_statistic': lr_stat,
123        'p_value': p_value,
124        'mle_lambda': lambda_mle
125    }
126
127# Example: Test if waiting times follow Exp(0.5)
128waiting_times = stats.expon.rvs(scale=2, size=100, random_state=42)  # True rate = 0.5
129result = lrt_exponential_rate(waiting_times, lambda0=0.5)
130print(f"\nExponential rate test: Λ = {result['lr_statistic']:.2f}, p = {result['p_value']:.4f}")
131
132
133# ============================================
134# 5. Model Comparison with AIC/BIC
135# ============================================
136
137def compare_models(models, names=None):
138    """Compare multiple models using LRT, AIC, and BIC"""
139    if names is None:
140        names = [f"Model_{i}" for i in range(len(models))]
141
142    results = []
143    for model, name in zip(models, names):
144        results.append({
145            'name': name,
146            'params': model.df_model + 1,  # +1 for intercept
147            'log_lik': model.llf,
148            'aic': model.aic,
149            'bic': model.bic
150        })
151
152    # Print comparison table
153    print("\nModel Comparison:")
154    print(f"{'Model':<15} {'Params':<8} {'Log-Lik':<12} {'AIC':<12} {'BIC':<12}")
155    print("-" * 60)
156    for r in results:
157        print(f"{r['name']:<15} {r['params']:<8} {r['log_lik']:<12.2f} "
158              f"{r['aic']:<12.2f} {r['bic']:<12.2f}")
159
160    # Find best by each criterion
161    best_aic = min(results, key=lambda x: x['aic'])
162    best_bic = min(results, key=lambda x: x['bic'])
163    print(f"\nBest by AIC: {best_aic['name']}")
164    print(f"Best by BIC: {best_bic['name']}")
165
166    return results
167
168# Fit models of increasing complexity
169X1 = sm.add_constant(x)
170X2 = sm.add_constant(np.column_stack([x, x**2]))
171X3 = sm.add_constant(np.column_stack([x, x**2, x**3]))
172
173models = [
174    sm.OLS(y, X1).fit(),
175    sm.OLS(y, X2).fit(),
176    sm.OLS(y, X3).fit()
177]
178compare_models(models, ['Linear', 'Quadratic', 'Cubic'])

Limitations and When Not to Use LRT

While powerful, the LRT has important limitations that every practitioner should understand:

⚠️ Non-Nested Models

The standard LRT only works for nested models. For non-nested comparisons (e.g., Random Forest vs. Neural Network), use AIC, cross-validation, or the Vuong test.

⚠️ Boundary Parameters

When H₀ places parameters on the boundary of the parameter space (e.g., testing σ² = 0 or testing number of mixture components), the χ² approximation fails. The true null distribution is often a mixture of chi-squares.

⚠️ Small Sample Sizes

Wilks' theorem is an asymptotic result. For small n, the chi-square approximation may be poor. Consider exact tests, parametric bootstrap, or Bartlett corrections.

⚠️ Model Misspecification

The LRT compares two specific models. If both are wrong, you're just picking the "least wrong" model. Always check model assumptions separately.

SituationProblemAlternative Approach
Comparing RF vs NNNon-nested modelsCross-validation, AIC, Vuong test
Testing # GMM componentsBoundary parameter problemBootstrap LRT, BIC
n < 30Poor χ² approximationExact tests, parametric bootstrap
Deep neural networksNon-convex, many optimaValidation set, cross-validation
Misspecified likelihoodBoth models wrongRobust methods, sandwich estimators

Knowledge Check

Test your understanding of likelihood ratio tests with this interactive quiz.

Knowledge Check

Question 1 of 8

What does the Likelihood Ratio Test compare?


Summary

Key Takeaways

  1. The LRT compares model fits: It measures whether a restricted model (H₀) fits significantly worse than an unrestricted model (H₁).
  2. Test statistic: Λ=2(01)\Lambda = -2(\ell_0 - \ell_1), which equals twice the difference in log-likelihoods (or equivalently, the difference in deviances).
  3. Wilks' theorem: Under H₀ and regularity conditions, Λ follows a chi-square distribution with df equal to the difference in number of parameters.
  4. Connection to information criteria: AIC and BIC are penalized versions of the LRT, balancing fit against complexity.
  5. Applications in ML: Feature selection, architecture comparison, hypothesis testing about model parameters.
  6. Limitations: Requires nested models, may fail at parameter boundaries, needs adequate sample size, assumes correct model specification.

Quick Reference

ConceptFormula / Rule
Likelihood Ratioλ = L(θ̂₀)/L(θ̂)
LR StatisticΛ = -2 log λ = -2(ℓ₀ - ℓ₁)
Degrees of Freedomdf = dim(Θ) - dim(Θ₀)
Null DistributionΛ ~ χ²(df) asymptotically
Reject H₀ whenΛ > χ²_{α}(df) or p < α
AIC connectionΔAIC = Λ - 2·Δk
BIC connectionΔBIC = Λ - log(n)·Δk

The Trinity of Likelihood-Based Tests

Three asymptotically equivalent tests exist for parametric hypotheses:

LRT

Compares likelihoods

Wald Test

Uses MLE distance from H₀

Score Test

Uses slope at H₀

All three converge to the same χ² distribution as n → ∞. The next section covers Wald and Score tests.

Looking Ahead: In the next section, we'll explore Wald and Score Tests, which complement the LRT. The Wald test is computationally simpler (requires only the full model MLE), while the Score test is useful when the full model is hard to fit.
Loading comments...