Thinking Outside the Bubble
Published in Inside - Volume XVI, No. 8
Ian Blood started out teaching English in Africa. He’s ended up exploring the power of computerized test scoring to diagnose learning issues.
The world of computerized test scoring can seem cold to people who are passionate about teaching – a mechanistic science that misses students’ individuality. For others, though, the real promise of such scoring is that, done right, it can provide more insight into how people learn and where their understanding breaks down.
Ian Blood, who received his master’s degree in applied linguistics in May, has followed a path into the world of testing that, on the face of it, might seem surprising.
Blood, a 29-year old native of Aurora, Illinois, taught English in the south of Chad for a period of 10 months between high school and college. “I was inspired by how seriously people there took their education,” he recalls. “I realized how important it is to get it right.”
After studying French and African languages as an undergraduate and working as an English conversation teacher in Osaka, Japan, Blood came to TC to learn about how people acquire second languages – a field that, of necessity, continually seeks to assess how much students have learned. His interest in assessment eventually led him to explore the computerized grading of essays.
Multiple-choice questions are tailor-made for computer scoring: the computer simply counts the number of bubbles filled in correctly. “But I was curious about how it could possibly work that a computer could grade an essay, and that that score would really be a valid score,” Blood says.
In fact, computers can be programmed to automatically count a wide range of linguistic features of essays, such as the number of words and sentences, average word length, the presence or absence of predetermined vocabulary words, or the number of various grammatical structures. Automated scoring techniques plug this information into an equation which is used to arrive at a score. The computer’s score is then compared with scores assigned by humans to the same essay. The closer the match, the more accurate the computer scoring is assumed to be.
While reviewing research for his master’s degree final project at TC, Blood learned that computerized scoring works relatively well for the Test of English as a Foreign Language (TOEFL), a standardized test developed by the Educational Testing Service (ETS) which is used world-wide to measure English-language proficiency. When scoring essays for style, the current technology can exceed 90 percent correlation with human scorers.
But the same methodology works less well for other types of tests. “These are powerful tools for certain applications, but not all,” Blood says. “It all depends on what kind of information we are trying to get out of our assessments.”
Computers can be more accurate, consistent and impartial than human scorers, who might be influenced by a relationship with the test-taker or by external factors such as fatigue. They can also be programmed to offer limited diagnostic feedback on writing style, “but they can’t really give meaningful feedback on essay content yet,” Blood says. “Teachers and administrators need to think carefully about why they are testing, and whether or not automatic scoring is appropriate for that particular test use.” For now, ETS offers practice essay tests that are scored by computer, and it supplements human raters with computerized scoring on some essay tests such as the TOEFL.
Blood is a full-time research assistant at ETS, where he is currently part of a team that is designing formative reading assessments that provide “rich information” for teachers about each learner. “Our test is designed to give rise to ‘teachable moments’ – anything that presents a learning gap that can be addressed,” he says.
Blood is also working on a second TC master of education degree and hopes eventually to earn a doctorate at the College in language assessment validity. “Tests are so valued by our society,” he says. “They have an extremely important impact on the lives of students. I just came to realize that these tests are not going to go away. So we need to make them as fair, valid and useful as possible.”