Analyzing and Comparing ROC Curves: A Comprehensive Guide

Receiver Operating Characteristic (ROC) curve analysis is a crucial statistical method for evaluating the performance of diagnostic tests and predictive models. This article delves into the effective ways to Analyze And Compare ROC curves, particularly using the pROC package in R and S+. We will explore the accuracy of different comparison tests and demonstrate a practical case study to highlight the functionalities of pROC in clinical data analysis.

Initial evaluations of ROC comparison tests reveal consistent p-values under the null hypothesis across all unpaired tests (refer to Additional Files 1 and 2). Notably, a strong correlation exists between DeLong’s and bootstrap tests (see Additional Files 1 and 3). Further investigation also explores the relationship between Venkatraman’s test and other methods (Additional Files 1 and 4).

Let’s now illustrate a typical ROC analysis using pROC. In a recent clinical study by Turck et al. [31], researchers examined biomarker levels in patients admitted to the hospital following aneurysmal subarachnoid hemorrhage (aSAH) to predict 6-month outcomes. The study included 141 patients, classified based on their outcome using the Glasgow Outcome Scale (GOS). The performance of these biomarkers was then compared to the World Federation of Neurological Surgeons (WFNS) scale, a well-established neurological assessment obtained at admission.

Clinical Case Study: Analyzing aSAH Data with ROC Curves

This case study focuses on identifying patients at high risk of poor post-aSAH outcomes, as they require specialized medical attention. For this purpose, a highly specific clinical test is essential. For a comprehensive understanding of the study results, refer to Turck et al. [31]. Here, we will concentrate on the aspects directly relevant to ROC analysis and the comparison of different predictive measures.

ROC curves were generated using pROC for five biomarkers (H-FABP, S100β, Troponin I, NKDA, and UFD-1) and three clinical factors (WFNS, Modified Fisher score, and age).

AUC and pAUC: Evaluating Predictive Power

Given the need for a clinical test with high specificity in this scenario, the analysis focused on the partial Area Under the Curve (pAUC) between 90% and 100% specificity. pAUC provides a more targeted measure of performance within a clinically relevant specificity range.

The WFNS scale demonstrated the highest pAUC at 3.1%, closely followed by S100β at 3.0% (Figure 1). In this specificity region, a perfect clinical test would achieve a pAUC of 10%, while a test with no discriminatory ability would yield 0.5%. For WFNS, the standardized pAUC was calculated at 63.7% using McClish’s formula (Equation 1). This standardized pAUC is composed of 0.5% from the portion of the ROC curve below the identity line and 2.6% from the portion above it within the specified specificity range. In R, using pROC, the standardized pAUC for WFNS can be computed with the command:

roc(response = aSAH$outcome, predictor = aSAH$wfns, partial.auc = c(100, 90), partial.auc.correct = TRUE, percent = TRUE)

Figure 1

ROC curves of WFNS and S100β illustrating their predictive performance. The ROC curves for WFNS (blue) and S100β (green) are displayed, with confidence intervals for WFNS at threshold 4.5 (black bars) and the confidence interval shape for S100β (light green area). The partial AUC region is highlighted in light grey. The pAUC values for both empirical curves and the p-value of their difference from a bootstrap test are shown in the plot.

Full size image

For the remainder of this discussion, we will focus on non-standardized pAUC values for simplicity.

Confidence Intervals (CI) for pAUC: Assessing Variability

Given the pAUC value for WFNS, it is important to calculate a 95% confidence interval (CI) to understand the variability of this measure. Using 10,000 bootstrap replicates, a CI of 1.6-5.0% was obtained. This number of replicates provides a reliable estimate for the second significant digit. A lower number of replicates, such as the default 2,000, would give a reasonable estimate for only the first significant digit. Various types of confidence intervals can be calculated.

Using the coords function in pROC, the threshold with the point furthest from the diagonal line in the specified region was determined to be 4.5. A rectangular confidence interval at this threshold yields bounds of 89.0-98.9% for specificity and 26.0-54.0% for sensitivity (Figure 1). If the variability of sensitivity at a fixed 90% specificity is of greater interest, the sensitivity interval is calculated as 32.8-68.8%.

As demonstrated in Figure 1 for S100β, a CI shape can be generated by calculating CIs for sensitivities across multiple specificity levels. These CI bounds are then connected to form the shape. The R code below demonstrates how to calculate the confidence shape:

plot(x = roc(response = aSAH$outcome, predictor = aSAH$s100, percent = TRUE, ci = TRUE, of = "se", sp = seq(0, 100, 5)), ci.type="shape")

It is crucial to understand that confidence intervals for thresholds and predefined sensitivity or specificity levels address different analytical questions. For example, reporting only the sensitivity CI bound for a specific threshold (like 4.5) without the corresponding specificity CI bound would be misleading. Similarly, calculating separate CIs for sensitivity and specificity at a cut-off value is also statistically inaccurate.

Statistical Comparison of ROC Curves

While S100β showed the second-best pAUC at 3.0%, its difference from WFNS (3.1%) is minimal. A bootstrap test in pROC indicates that this difference is not statistically significant (p = 0.8, Figure 1). However, a Venkatraman’s test, which evaluates the entire ROC curve shape, suggests a difference in the ROC curves (p = 0.004). Indeed, focusing on the high sensitivity region (90-100% sensitivity) reveals a significant difference in pAUCs (p = 0.005, pAUC = 4.3 for WFNS and 1.4 for S100β). Despite this, given our primary interest in the high specificity region, the comparison indicates no significant difference in predictive performance between WFNS and S100β within that clinically relevant range.

pROC provides tools for pairwise comparison of ROC curves. It is important to note that multiple testing corrections are not automatically implemented. When conducting multiple comparisons, users should be mindful of the increased risk of Type I errors and apply appropriate correction methods if necessary, as recommended in statistical literature [32].

The bootstrap test for comparing two ROC curves can be performed in R using the following code:

roc.test(response = aSAH$outcome, predictor1 = aSAH$wfns, predictor2 = aSAH$s100, partial.auc = c(100, 90), percent = TRUE)

Smoothing ROC Curves: Addressing Data Limitations

The decision to smooth an ROC curve is complex. Smoothing can be beneficial for ROC curves generated from datasets with limited data points. In such cases, the trapezoidal rule tends to underestimate the true AUC [17]. This is often observed with clinical scores like WFNS, as shown in Figure 2. pROC offers three smoothing methods: normal distribution fitting, density smoothing, and binormal smoothing.

In our aSAH case study:

Normal distribution fitting (red curve in Figure 2) results in a significantly lower AUC estimate (Δ = -5.1, p = 0.0006, Bootstrap test). This difference is attributed to the non-normality of the WFNS data distribution. While distribution fitting can be powerful with known underlying distributions, it should be used cautiously in other contexts.
Density smoothing (green curve in Figure 2) also yields a lower AUC (Δ = -1.5, p = 6*10-7). Interestingly, even with a smaller AUC difference, the p-value can be more significant due to increased covariance.
Binormal smoothing (blue curve in Figure 2) provides a slightly higher, but not significantly different, AUC compared to the empirical ROC curve (Δ = +2.4, p = 0.3). In this instance, binormal smoothing appears to be the most appropriate of the three methods, potentially correcting for the underestimation of the empirical AUC for WFNS. For comparison, Additional File 5 presents a comparison of our binormal smoothing implementation with that in pcvsuite [15].

Figure 2

ROC curve of WFNS with different smoothing techniques applied. The empirical ROC curve for WFNS is shown in grey, along with three smoothing methods: binormal (blue), density (green), and normal distribution fit (red).

Full size image

Figure 3 illustrates how to generate plots with multiple smoothed curves using pROC in S+. Within S+, users can load the pROC library, select the ROC curve item from the Statistics menu, choose the dataset, and then configure smoothing parameters in the Smoothing tab.

Figure 3

Screenshot demonstrating the pROC graphical user interface in S+ for smoothing the WFNS ROC curve. Top left: the General tab for data input. Top right: smoothing parameter settings. Bottom left: plot customization options. The “Add to existing plot” checkbox allows overlaying multiple curves. Bottom right: the resulting plot in the standard S+ plot device.

Full size image

Conclusion: Insights from ROC Analysis with pROC

This case study demonstrates the practical application of pROC for comprehensive ROC analysis. The primary finding from this analysis is that, among the biomarkers tested, none outperformed the neurological WFNS score in predicting patient outcome following aSAH. pROC provides a robust and versatile toolkit for researchers and clinicians to effectively analyze and compare the performance of diagnostic and predictive markers using ROC curve methodologies.

Installation and Usage of pROC

R Environment

To install pROC in R, execute the following command in the R console:

install.packages("pROC")

To load the package, use:

library(pROC)

For help and documentation, use:

?pROC

S+ Environment

pROC is accessible in S+ via the File menu under Find Packages…. It can be loaded through the File menu, selecting Load Library….

In addition to command-line functionality, a graphical user interface (GUI) becomes available within the Statistics menu. This GUI includes windows for univariate ROC curves (with options for smoothing, pAUC, CIs, and plotting) and separate windows for paired and unpaired comparisons of two ROC curves. A dedicated help file for the GUI is also accessible from the same menu.

Summary of Functions and Methods in pROC

Table 2 provides a summary of the functions available to users in the command-line version of pROC. Table 3 lists the methods provided for plotting and printing ROC analysis results.

Table 2 Functions provided in pROC for ROC curve analysis and comparisonFull size table

Table 3 Methods provided by pROC for standard functions in ROC analysisFull size table

Analyzing and Comparing ROC Curves: A Comprehensive Guide

Clinical Case Study: Analyzing aSAH Data with ROC Curves

AUC and pAUC: Evaluating Predictive Power

Confidence Intervals (CI) for pAUC: Assessing Variability

Statistical Comparison of ROC Curves

Smoothing ROC Curves: Addressing Data Limitations

Conclusion: Insights from ROC Analysis with pROC

Installation and Usage of pROC

R Environment

S+ Environment

Summary of Functions and Methods in pROC

Comments

Leave a Reply Cancel reply