Comparing mortality rates between countries seems straightforward, but it’s fraught with statistical challenges. While crude mortality rates are readily available, using them to directly compare the health of populations or the impact of events like pandemics can be misleading. This article delves into the complexities of comparing mortality rates, highlighting the issues with commonly used measures and proposing a more transparent alternative.
One approach to measuring excess deaths, often used to assess the impact of health crises, is the Z-score, exemplified by the EuroMOMO framework. EuroMOMO defines an excess death measure, Z(it), for country i in week t as:
Z(it) = (x(it) – μ(it)) / sigma(it)
Here, x(it) represents the weekly death count, μ(it) is the predicted baseline death count derived from historical data, and sigma(it) is the standard deviation of residuals. This model, while sophisticated, relies on several assumptions and estimations. It predicts ‘normal’ deaths based on past data, incorporating trends and seasonal variations, and uses a modified Poisson process to estimate standard deviation. However, each country within the EuroMOMO network develops its own model, potentially introducing inconsistencies across comparisons.
A more transparent and non-parametric measure is the P-score:
P(it) = (x(it) – x ̅(it))/x ̅(it)
In this formula, x ̅(it) is simply the average weekly death count over the preceding 5 years. This P-score offers a direct comparison of current deaths to the recent historical average, making it easily interpretable. There’s also a parametric version, PEM(it), which, using EuroMOMO’s predicted values, can be expressed as:
P^EM (it) = (x(it) – μ(it))/μ(it)
However, the statistical underpinnings of the Z-score, particularly the Poisson distribution assumption, are questionable when describing weekly death counts. The assumption of constant mean and independence over time is also problematic. As highlighted in previous research, systematic mean shifts can occur even during seasons typically unaffected by flu or heatwaves. EuroMOMO’s sigma, intended to represent the standard deviation for ‘normal’ seasons, inadvertently incorporates variation from these systematic factors alongside random noise.
A more realistic model for death counts might be:
x(it) = β × W(it) + ε(it)
Here, W(it) represents systematic factors influencing death variations, and ε(it) is random white noise, potentially approximated by distributions like Poisson, binomial, or normal, assuming constant variance σ(i)2. EuroMOMO’s sigma then becomes a blend of the true standard deviation, σ(i), and the variability of W(it).
Our proposed P-score, a simpler measure of excess death rate, becomes:
P(it) = [(β × W(it) – W ̅(it)) + (ε(it) – ε ̅(it))] / (W ̅(it) – ε ̅(it))
using 5-year moving averages for W-bar(it). When a significant event like a pandemic occurs, W(it) deviates substantially from its historical average. The P-score effectively captures this jump in W(it) and is readily understandable even for those without specialized statistical knowledge. While the P-score’s properties, such as serial independence and constant variance, may vary depending on W(it), it offers a practical empirical measure. Econometric methods could further refine the understanding of the stochastic process driving death counts by estimating W(it) using deterministic variables and state-space terms.
When comparing regions within a country or, by extension, countries with different population sizes, the inherent noisiness of weekly death counts becomes crucial. Smaller populations, and consequently fewer normal deaths, lead to greater relative variability in weekly death counts compared to expected values. The ratio Z(it)/PEM(it) = sigma(it)/μ(it) tends to be lower in smaller regions. This issue extends to comparing populous countries to smaller ones, assuming similar overall normal mortality rates. While PEM(it) and P(it) exhibit similar trends, especially during pandemics when the surge in W(it) dominates, averaging data across larger populations reduces the ratio σ(i)/μ(i). At a country versus region level, this ratio might vary inversely with the square root of normal deaths. Importantly, this relationship should hold regardless of the specific distribution of the white noise ε(it).
However, EuroMOMO’s sigma, being a composite of σ(i) and W(it) variation, is less sensitive to population size or normal death levels than σ(i) alone. Systematic factors driving W(it) under normal conditions are likely similar across regions within a country on a per capita basis. During a pandemic, however, W(it) factors can diverge more significantly due to varying infection starting points and spread rates.
As previous research indicates, rankings of peak week excess death rates in European countries are similar for both Z-scores and P-scores, even for countries like Belgium and the Netherlands with smaller populations. However, for regions or nations with considerably smaller normal death counts, rankings based on Z-scores can diverge due to increased relative noisiness. Directly adjusting published Z-scores using population size to improve comparability is statistically unsound. P-scores offer a more robust and inherently comparable measure for assessing excess mortality across different populations, mitigating the issues associated with varying population sizes and the complexities of parametric models. Comparability is best achieved using the P-score.
Reference
[36] EuroMOMO. (n.d.). About EuroMOMO. Retrieved from [EuroMOMO Website – Replace with actual URL] (Note: Replace with the actual EuroMOMO website URL)