How to Compare Two Diagnostic Tests

Medical tests help detect diseases and conditions, but they aren’t perfect. Sometimes, multiple tests exist for the same condition, using different methods. Comparing these tests head-to-head, using a paired study design, allows us to determine if one test outperforms the other, or if a cheaper, less invasive alternative exists. This article outlines methods for comparing two diagnostic tests using R.

Understanding Key Statistical Measures

Comparing diagnostic tests involves calculating several key statistics:

Sensitivity: The proportion of positive tests among individuals with the disease. A high sensitivity indicates the test effectively identifies those with the condition.
Specificity: The proportion of negative tests among individuals without the disease. High specificity means the test effectively rules out those without the condition.
Positive Predictive Value (PPV): The probability of having the disease given a positive test result. PPV answers the question: “If the test is positive, how likely is it that I actually have the disease?”
Negative Predictive Value (NPV): The probability of not having the disease given a negative test result. NPV answers: “If the test is negative, how likely is it that I truly don’t have the disease?”

Sensitivity and specificity are often used in early test development, while PPV and NPV are more relevant in clinical settings as they provide information about disease status given a test result.

Comparing PPV and NPV with Paired Data

Comparing PPV and NPV requires a paired study design where both tests are performed on the same individuals. The DTComPair package in R offers the pv.rpv() function for this purpose. This function compares the relative difference between PPV or NPV values, providing a more informative comparison than absolute differences.

The pv.rpv() function requires data formatted using the tab.paired() function, which takes three arguments:

d: Gold standard results (1 = disease present, 0 = disease absent).
y1: Results of diagnostic test 1 (1 = positive, 0 = negative).
y2: Results of diagnostic test 2 (1 = positive, 0 = negative).

The output of pv.rpv() includes:

PPV for each test.
Relative PPV (rPPV), indicating the proportional difference between the PPVs.
Confidence interval for rPPV.
Test statistic and p-value for testing the null hypothesis that rPPV equals 1 (no difference).

Comparing Sensitivity and Specificity with McNemar’s Test

The McNemar test, implemented in R using the sesp.mcnemar() function from DTComPair, compares sensitivities and specificities. This test assesses whether the difference in proportions is statistically significant. The function also requires a paired data object created with tab.paired(). To obtain confidence intervals for the differences in sensitivity and specificity, use the sesp.diff.ci() function.

Practical Example: Cystic Fibrosis

Moskowitz and Pepe (2006) compared two prognostic factors for severe respiratory infection in cystic fibrosis patients: a positive culture for Pseudomonas Aeruginosa (PA) and a prior respiratory infection (PEx).

Analysis revealed that PEx exhibited significantly higher sensitivity, PPV, and NPV compared to PA. The pv.rpv() function demonstrated a statistically significant relative difference in PPV, favoring PEx. McNemar’s test confirmed a significant difference in specificity, again favoring PEx.

Conclusion

Comparing diagnostic tests is crucial for determining the most effective and efficient methods for disease detection. By using statistical techniques such as relative PPV comparison with pv.rpv() and McNemar’s test with sesp.mcnemar(), clinicians and researchers can make informed decisions about which diagnostic tests to utilize. The R package DTComPair provides readily available tools for performing these analyses.

References

Stock C, Hielscher T (2014). DTComPair: comparison of binary diagnostic tests in a paired study design. R package version 1.0.3.
McNemar Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153-7.
Moskowitz CS, and Pepe MS (2006). Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clin Trials, 3(3):272-9.
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
“Evaluating Risk Prediction with ROC Curves.” Columbia Public Health.