Can You Compare AIC Across Different Dependent Variables?

AIC (Akaike Information Criterion) is a widely used metric for model selection, but can you compare AIC values across models with different dependent variables? The short answer is no. This article explains why.

AIC is calculated as 2k – 2log(L), where k represents the number of parameters in the model and L is the maximum likelihood function. It balances model complexity (k) with goodness of fit (L). A lower AIC suggests a better model, as it indicates a good balance between complexity and fit.

The formula for Gaussian log-likelihood highlights why comparing AIC across different dependent variables is problematic:

log(L(θ)) = -(|D|/2)log(2π) - (1/2)log(|K|) - (1/2)(x-μ)^T K⁻¹ (x-μ)

Where:

  • |D| is the number of data points
  • K is the covariance matrix
  • μ is the mean response
  • x is the dependent variable

The key lies in the fit term: -(1/2)(x-μ)^T K⁻¹ (x-μ). This term calculates the difference between the observed dependent variable (x) and the model’s prediction (μ). If ‘x’ represents different variables across models (e.g., raw data vs. log-transformed data), this term becomes incomparable. Essentially, you’re comparing apples and oranges.

AIC can also be expressed as:

|D|log(RSS/|D|) + 2k

Where RSS is the residual sum of squares. This formulation further emphasizes the issue. RSS, the sum of squared differences between observed and predicted values, is inherently dependent on the scale of the dependent variable. Comparing RSS (and thus AIC) across models with different dependent variables is meaningless.

Akaike’s work, rooted in KL divergence (a measure of difference between probability distributions), aims to approximate the true data distribution. A lower AIC indicates a model’s distribution is closer to this true distribution. However, this comparison only holds when assessing models of the same data with the same dependent variable.

Key Considerations When Using AIC:

  • Same Dataset: AIC is only valid for comparing models trained on the same dataset.
  • Identical Dependent Variable: All candidate models must have the same dependent variable.
  • Sufficient Data: The number of data points (|D|) should significantly exceed the number of parameters (k) for reliable results.

Instead of using AIC to choose a dependent variable, examine the residuals of each model. Normally distributed residuals in the transformed data model (e.g., log-transformed) often indicate the transformation is appropriate. You might also investigate whether the raw data follows a lognormal distribution, providing further justification for transformation. Choosing the correct dependent variable relies on understanding the data and applying appropriate statistical diagnostics, not comparing AIC values across disparate models.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *