As quantitative researchers get better at explaining the real world, TC is hiring some of the best young minds in the game
Long-time education scholar Henry Levin avoids absolutes, but on one point he is unequivocal.
“An educational researcher needs a qualitative understanding of the practices being studied, but increasingly policy-makers want quantitative evidence on the impacts of specific policies and practices,” says Levin, William Heard Kilpatrick Professor of Economics & Education. Several new variables factor into that equation. “An explosion of information about education is revealing greater complexity in the world than we could see before,” says Aaron Pallas, Arthur I. Gates Professor of Sociology & Education. “It’s been spurred by development of the internet and high-speed computers and the trend towards evidence-based policy and practice in fields such as medicine and human services.”
JUDITH SCOTT-CLAYTON Associate Professor of Economics & Education
Quantitative techniques have been faulted both for failing to account for real-life nuances and for enabling researchers to generate whatever analysis illustrates a preferred story. Yet researchers grounded in theories of education and human development also are bringing a new rigor to more complex, humanistic questions. TC has hired many of them in recent years. Spanning domestic and international education policy, economics and the science of data analysis itself, they seek to understand individual people and institutions as well as mass trends. “You can’t do quantitative work in a vacuum,” says Professor Madhabi Chatterji, Director of TC’s Assessment and Evaluation Research Initiative. “Without knowing a school’s culture, you’ll ask the wrong questions and draw conclusions that don’t help.” TC’s newest faculty hires get that, Pallas says. “They’re developing and using new tools to harness complexity. They’re driven by questions, not methods — and that’s what TC has always tried to do.” Here, we bring you some of TC’s most powerful quantitative work.
THE REAL SCORES ON REMEDIAL ED
Problem: Fewer than half of U.S. college students earn a degree six years after enrolling. Colleges spend upwards of $7 billion annually on remedial courses to upgrade basic skills, yet students who take them often fail to earn a two-year degree or transfer to a four-year institution.
Why isn’t remedial education working?
Two years ago, Judith Scott-Clayton, Associate Professor of Economics & Education, found that tens of thousands of students were placed in remedial courses unnecessarily at more than 50 community colleges that based assignments on a brief standardized test. Subsequently, she demonstrated that remedial assignments based on high school grade point averages would be more accurate, and that a quarter of remedial students would pass college courses with a B or higher.
Scott-Clayton employed a technique called regression discontinuity design that, in looking at any intervention, ferrets out the aspect that causes an observed effect. Regression discontinuity focuses on those closest to a cut point for deciding who receives treatment — in this case, students just passing or just failing remedial screening. Because those two groups are academically nearly identical, it’s assumed that remedial assignment or non-assignment must determine their subsequent outcomes. “A community college degree pays off in the labor market,” Scott-Clayton says, “so we need to target the right students for remediation.”
GETTING PARENTS TO TUNE IN
AN EXPLOSION OF INFORMATION ABOUT EDUCATION IS REVEALING GREATER COMPLEXITY IN THE WORLD...SPURRED BY THE INTERNET AND HIGH-SPEED COMPUTERS AND THE TREND TOWARDS EVIDENCE- BASED POLICY AND PRACTICE.”
When parents are watching them, students attend class, do their homework and get better grades. But getting parents to tune in — especially lower-income parents — can be tough. Many lack time to meet with teachers or the skills to check grades online. Those from other countries may not understand the U.S. grading system.
Two years ago, at a low-income, predominantly Latino high school in Los Angeles, TC Assistant Professor of Economics & Education Peter Bergman randomly signed up half the parents to receive bimonthly text messages about their children’s grades and missed assignments. Attendance improved, and GPAs went up. Parents initiated more contact with the school, developed sharper perceptions of how hard their kids were working, and became more likely to use punishment and revoke privileges as motivation.
“It was so effective that kids would ask each other, ‘Have you been Petered yet?’” Bergman recalls. “The school warned me not to park on their street.”
Now, funded by the Smith-Richardson Foundation, Bergman is testing the use of electronic grade books that automatically text parents whenever a teacher records a failing grade or absence. The study is “quantitative” in that a computer randomized the parents into the “treatment” and “control” groups, and because Bergman now has a database with which to model future policy approaches through simulation rather than direct observation. Meanwhile, he has addressed what economists call “information friction,” in which only the “seller” — in this case, the student — knows the quality of what he or she is providing. Each text costs just a tenth of a cent — minuscule compared with, say, paying teachers to make calls. “Parents care about their kids’ education,” Bergman says. “So we’re bringing school to them.”
Statistically Speaking: Peter Bergman
When Bergman texted parents at a Los Angeles high school that their kids were skipping class or missing assignments, attendance improved, GPAs went up and parents met more frequently with teachers. “It was so effective that kids would ask each other, ‘Have you been Petered yet?’” he says. “The school warned me not to park on their street.”
DOES TEACH FOR AMERICA REALLY GET RESULTS?
Advocates of Teach for America (TFA), which sends graduates of elite colleges to teach in schools in low-income communities, say it pairs the best and brightest with the neediest. Critics say TFA teachers are untrained, deprive qualified teachers of jobs and quit before learning their craft.
In August 2014, Douglas Ready, Associate Professor of Education & Public Policy, published an eight-year study of the math and reading performance of 500,000 children in Duval County, Florida, which has employed 500 TFA teachers since 2005. He found that students of TFA teachers, who work in the lowest-performing schools, typically scored below others on state assessments. But when he measured student progress, a different picture emerged. By comparing students’ outcomes in years they had a TFA teacher to their own outcomes in years they did not, Ready discounted the potential impact of school quality, student socioeconomic status or teacher inexperience, isolating the impact of whether the teacher was TFA or non-TFA.
The result: a small “adjusted” advantage for TFA teachers in math and literacy.
“INTERNATIONAL ASSESSMENTS WERE BASED ON A LOOSE NETWORK
OF SCHOLARS GUIDED BY SPECIFIC RESEARCH QUESTIONS AND HYPO-THESES. [NOW] OFFICIAL REPORTS CONTAIN MORE RANKING TABLES AND LESS ABOUT RESEARCH.”
The Policy Analysts
LOKING UNDER THE HOOD OF INTERNATIONAL ASSESSMENTS
Matthew Johnson, Young-Sun Lee, Oren Pizmony-Levy
A slip in the global education rankings can trigger nationwide hand-wringing. Yet there is often more nuance to the story. For example, while U.S. eighth graders trailed 10 other countries in the 2007 Trends in International Mathematics and Science Study (TIMSS), analysis by TC’s Matthew Johnson, Associate Professor of Statistics & Education, and Young- Sun Lee, Associate Professor of Psychology & Education, revealed that Americans outperformed several countries on specific skills such as data analysis, probability, location and movement.
“These distinctions have major implications for how we approach math education,” Johnson says.
For Oren Pizmony-Levy, Assistant Professor of International & Comparative Education, such findings highlight another concern: How testing and international ranking of nations became a legitimized global practice. The question has intrigued Pizmony-Levy since graduate school, when he attended a meeting of the International Association for the Evaluation of Educational Achievement (IEA).
“The Association was formed by U.S., European and Israeli scholars to test hypotheses of how social contexts affect education,” he says. “Nations weren’t ranked. Now, it’s all about providing high-quality data, benchmarks and indicators to governments.”
Pizmony-Levy created a quantitative data set mapping countries’ participation in all large-scale international assessments from 1958—2012. He interviewed key IEA members and sifted through unopened boxes of the organization’s records. “Until the early ’90s, international assessments were based on a loose network of scholars guided by specific research questions and hypotheses,” he says. “Since then, the work has been framed in terms of global governance and auditing of education systems. Official reports contain more ranking tables and less about research. Rankings can shake public confidence, creating an ‘education crisis’ that may not exist. Schools might narrow their curriculum to focus on test prep.” Now, he’s developing courses on the social analysis of international assessments to show students what we can and cannot learn from these assessments, which affect public discourse, policy and practice. That’s where the scholarship gets really interesting.”
Statistically Speaking: Douglas Ready
Ready conducted an eight-year study of students in Duval County, Florida. By comparing students’ outcomes in years they had a Teach For America teacher to their own outcomes in years they didn’t, he discounted the potential impact of school quality, student socioeconomic status or teacher inexperience, isolating the impact of whether the teacher was TFA or non-TFA.
DESIGNING TRIALS FOR THE REAL WORLD
Elizabeth Tipton and Bryan Keller
Since 2002, the federal Institute of Education Sciences has sought to establish randomized clinical trials as the gold standard for determining what works and why.
In these large-scale experiments, researchers recruit schools and districts for studies to evaluate curricular or after-school programs, whole-school reforms and teacher professional development strategies. Half of schools are randomly assigned to receive a program and half to continue with business as usual. Outcomes are compared — say, student test scores — and the differences, if any, provide evidence that the program works.
At TC, Assistant Professor of Applied Statistics Elizabeth Tipton helps researchers make better, more thoughtful generalizations from their experiments. As part of this work, Tipton collaborates with study designers to ensure recruitment of the most broadly representative populations.
“When you evaluate a reading program in 40 schools, you really want to know whether the program works in West Virginia, or Texas or nationwide,” Tipton says. “You want to apply the results to policymaking on the largest scale. So I help think through how a study will be used. Then I work with recruiters, who often aren’t keyed in to the need for generalizability.”
Funded by the Spencer Foundation, Tipton is developing new web-based software (www.thegeneralizer.org) to facilitate this process. “There’s nothing like that right now,” she says. “It could improve the relevance of education research.”
Often, truly randomized trials are impossible or unethical. For example, you wouldn’t withhold a proven math program or make children repeat a grade just to observe the effect on future educational success or earnings. But in real life, kids get held back and schools lack funds for proven programs. When researchers observe such unscripted experiences, they often compare children who differ in income, race, cultural practices, geographic location, and school quality and culture.
In such situations, Assistant Professor of Applied Statistics Bryan Keller devises statistical methods — often after data has been gathered — to mimic random assignment to treatment. Keller specializes in separating out the effect caused by an intervention from the impact of differences in race, income or culture — variables that, in real life, may partly dictate why someone receives the intervention. For example, children of color are likelier to be retained in grade, due in part to societal preconceptions.
Keller uses a technique called “propensity score analysis” to identify subjects in both study arms — treatment and control — with the most similar probability, based on all factors, of receiving the treatment. The process yields two groups matched in terms of key variables, isolating the newly introduced variable — the treatment — as the cause of difference in outcome.
Now Keller is combining propensity score estimation with the use of a method harnessing multiple computer processing units to parse big data. The technique “automatically handles complex relationships in the data too difficult for an analyst to detect.”
“WHEN YOU EVALUATE A READING PROGRAM IN 40 SCHOOLS, YOU WANT TO KNOW WHETHER IT WORKS IN WEST VIRGINIA OR TEXAS OR NATIONWIDE. YOU WANT TO APPLY THE RESULTS TO POLICY- MAKING ON THE LARGEST SCALE.”
Statistically Speaking: Matthew Johnson
U.S. eighth graders trailed peers in 10 other countries on a major 2007 mathematics and science assessment, but Johnson and TC’s Young-Sun Lee found that Americans outperformed children from several countries on specific skills such as data analysis, probability, location and movement. “These distinctions have major implications for how we approach math education,” Johnson says.
Statistically Speaking: Ryan Baker
“Some people dismiss our data as ‘roadkill,’ meaning, figuratively speaking, that we ran it over by accident,” Baker says “But I’m from Texas, and I say roadkill can be a delicious meal.” He adds that “learning analytics is useful in messier situations — for example, to measure science inquiry skills when it’s not yet established what ‘good inquiry’ is.”
The Data Miners
TOO MUCH INFORMATION? THEY DON’T THINK SO
Ryan Baker and Alex Bowers
“PEOPLE SAY WE NEED THE RIGHT DATA, BUT I SAY WE HAVE IT,” BOWERS SAYS. “PAIR GRADES WITH STANDARDIZED ASSESS- MENT, USE CURRENT DATA- MAXIMIZING TECHNIQUES, AND YOU CAN SEE THE PROBLEMS.”
MADHABI CHATTERJI Professor of Measurement, Evaluation & Education
For decades, federal and state agencies have tracked school and student performance. Now smart tutoring systems and other technologies that record every keystroke have spawned the field of learning analytics, in which researchers search data for patterns and correlations to identify challenges facing individual learners, classes and entire school systems.
“Some people dismiss our data as ‘roadkill,’ meaning, figuratively speaking, that we ran it over by accident.” Ryan Baker, Associate Professor of Cognitive Studies, grins. “But I’m from Texas, and I say roadkill can be a delicious meal.”
As Baker puts it, “learning analytics is useful in messier situations — for example, when neither “good inquiry” nor what matters in achieving it has been defined. TC has emerged as a leader in learning analytics, with Baker and recent hire Alex Bowers, Associate Professor of Educational Leadership, among two of the field’s rising stars.
Baker focuses on creating computer-based environments that best engage students in their work. He has correlated the in-the-moment intellectual decisions of teens who used intelligent tutoring systems with their subsequent academic success. He’s also taught and analyzed a MOOC (massive open online course) to determine better MOOC teaching strategies.
Now, Baker has created a learning analytics master’s degree program drawing on TC’s broad expertise in making sense of data — including diagram production and comprehension, because teachers and administrators want the reams of new data formatted to highlight key information.
Bringing Rigor to the Field
STUDYING WITH TC’s Henry Levin, Moumié Maoulidi (Ph.D. ’09) read “Let’s Take the Con out of Econometrics,” an Edward Leamer article calling for more rigorous empirical work. The critique piqued his interest in experimental research methods such as randomized control trials. n Subsequently, while working at Columbia University’s Earth Institute, Maoulidi noticed how findings from these trials are influencing policy discussions in developing countries. Now at Stanford University’s Institute for Economic Policy Research, he is using knowledge from TC’s Economics & Education program to conduct cross-disciplinary applied research.
Are teachers using such data? In a study of an analytics-informed Texas math program called “Reasoning Mind,” Baker found that students were engaged and on task 89 percent of the time, meaning they received the equivalent of 40 more hours of math instruction than in a typical classroom.
Alex Bowers is a former pharmaceutical cell biologist who was among the first to make use of the newly sequenced human genome. His team’s challenge: to identify from among roughly 3 billion base pairs of nucleotides (the strands that form DNA), those that would make likely targets for new drugs.
While patenting two genetic targets for cancer drugs, Bowers encountered a knottier problem moonlighting as a community college science instructor: “My students had never written a paper or taken a class where you talk through science issues.”
Today, helping schools and districts improve students’ long-term learning, Bowers still mines data to find intervention targets. One turns out to be young children’s grades. Researchers and policymakers consider test scores more reliable than grades in predicting future performance. Yet, in his dissertation study, Bowers showed that even a student’s marks in first grade powerfully predict the odds of graduating from high school.
Bowers retrieved overlooked student records and applied a technique called cluster analysis to identify meaningful patterns. The result, now adorning his office, is a chart with horizontal lines representing the entire K—12 academic careers of two districts’ students. On each line are a student’s grades. Pairs of the most similar student performance trajectories over time are grouped together. Each pair is grouped with another to which it, in turn, is most similar — and so on, forming color-coded clusters.
“The human eye is good at picking out blocks,” Bowers says, and indeed, the chart clearly shows, by third grade, who will and won’t graduate. Students diverge into higher- and lower-achieving clusters, divided by a B in subjects such as reading. Third-graders with B minuses and C pluses fail in high school. Nearly half the third-graders in the lower cluster fail to graduate.
“Statistics seem impersonal, but I’m showing each student’s experience,” says Bowers. “Like qualitative work, it creates the possibility for tailored interventions.
“People say we need the right data, but I say we have it. Pair grades with standardized assessment, use current data-maximizing techniques, and you can see the problems.”
— Joe Levine, Photographs by Deborah Feingold
Published Wednesday, Nov. 4, 2015