Tuesday, May. 14, 2019
In the education domain, it has become increasingly popular for researchers to use Early Warning Systems (EWS) and Early Warning Indicators (EWI) to predict student outcomes such as high school dropout, college enrollment, STEM degrees, and STEM careers. To see whether the EWS and EWI work well in making these predictions, educational researchers tend to use statistical significance to examine each predictor. However, statistical significance ignores the accuracy, or the duality of sensitivity and specificity, of the predictors. To overcome this problem, Alex J. Bowers and Xiaoliang Zhou (2019) showed the usefulness of ROC AUC in evaluating the accuracy of EWS and EWI indicators of education outcomes.
The authors used the data from the Education Longitudinal Study of 2002 (ELS:2002), a survey on US high schools students, their parents, and principals conducted by the U.S. National Center for Education Statistics (NCES) from 2002 to 2012. With sampling weights, findings from this dataset are generalizable to the nationwide population of US high school students. The data include over 6,500 variables, such as high school dropout status, college enrollment status, college major, and career at 26. The authors focused their study on how accurately variables, such as standardized math and English scores, number of STEM courses selected at college, and absenteeism, predict these education outcomes.
The authors promoted the use of ROC AUC to evaluate the performance of predictors of education outcomes for four reasons. First, it is possible to compare the AUC’s of two predictors through a significance test. Second, AUC is easy to understand since it is the same as the Mann–Whitney U-statistic, or the Mann–Whitney–Wilcoxon. Third, AUC has many advantages over other similar diagnostic statistics such as Kappa. For example, AUC is robust to skewed distributions of samples (Jeni, Cohn, & De La Torre, 2013) and a higher value of it always means better accuracy (Baker, 2015). Fourth, it is easy to implement AUC calculations and comparisons with R packages such as pROC (Robin et al., 2011).
Most importantly, the AUC which takes into account the dimensions of both sensitivity and specificity can be vividly shown on a single plot as in the figure below. The x-axis represents the false-positive proportion, or 1-specificity, and the y-axis the true-positive proportion, or the sensitivity. For any predictor, a shift in specificity is associated with a shift in sensitivity, a relationship which is shown as the curve in the plot. The area under the curve (ROC AUC) which ranges from 0.0 to 1.0 indicates the accuracy of a predictor where the diagonal gray line has an AUC of 0.5 and means random guessing. The closer a curve is to the point (0, 1), the more accurate a predictor is. As a rule of thumb, an AUC above 0.85 means high classification accuracy, one between 0.75 and 0.85 moderate accuracy, and one less than 0.75 low accuracy (D' Agostino, Rodgers, & Mauck, 2018).
The figure below is an example of how to compare the ROC AUC’s of three predictors for college enrollment and postsecondary STEM degree respectively. As indicated, among the three predictors for college enrollment, GPA has the largest area under the curve, at 0.767, compared with the AUC of extracurricular activities (2004) and that of AP. On the other hand, among the three predictors for postsecondary STEM degree, number of STEM courses has the largest area under the curve, at 0.957, compared with the AUC of math t score (2002) and that of STEM GPA. The relative performance of each predictor is easy to see in the plot by comparing the area under each curve.
The authors’ findings have proved the utility of the ROC AUC to examine and compare the accuracy of EWS and EWI to predict education outcomes. First, it is easy for education researchers to use the pROC package to diagnose the accuracy of their predictors of the education outcomes they are concerned about following the authors’ online R code. Second, education policy makers can evaluate the accuracy of the predictors currently in use to see whether these variables are accurate or need to be supplemented by more accurate predictors. Third, ROC AUC allows policy makers to compare possible predictors for the outcome they care about and select the one most accurate so that they can find the most students at risk while minimizing the rate of false positive cases.
Bowers, A.J., Zhou, X. (2019) Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes. Journal of Education for Students Placed at Risk, 24(1) p. 20-46.