Mixing Methods to Learn 'What Works' | Teachers College Columbia University

Skip to content Skip to main navigation

Mixing Methods to Learn 'What Works'

There are the Reading Wars, the Math Wars, the School Choice Wars and then there is the war over how to settle all the other wars. At its root are the questions: What research methods help us know what really works in education? What constitutes valid evidence that a program helps students? TC Professor Madhabi Chatterji explains.

What really helps students? A TC expert says finding out takes more than comparing A to B.

There are the Reading Wars, the Math Wars, the School Choice Wars, and then there is the war over how to settle all the other wars. At its root are the questions: What research methods help us know what really works in education? What constitutes valid evidence that a program helps students?

In 2002, an arm of the U.S. Department of Education called the What Works Clearinghouse offered its own answer for screening educational studies, in the form of the Study Design and Implementation Assessment Device (DIAD). In essence, the DIAD (which is still in draft) would sanction only studies of school programs that are "experimental" -- that is, they compare a set of students who have been randomly assigned to receive an intervention to another set who have not. If the first group achieves better academic results than the second, the program is judged a success; if not, it's consigned to the scrap heap.

The experimental method is an intuitively appealing one: after all, scientists conduct clinical trials of new medicines in much the same way.

"But education programs aren't tightly scripted entities -- they're housed in schools, rather than in controlled laboratories, and schools are complex, open-ended institutions where there's a lot going on," says Madhabi Chatterji, Associate Professor of Measurement-Evaluation and Education. "Randomized experiments alone won't yield the best evidence on what works, when it works, and whether it works in the same way in different schools. For that, one must mix different research methods and look at programs over their lifetime, as they evolve and mature in organizational settings."

Chatterji attracted national attention in December when she made those same points in a paper titled "What Works: An Argument for Extended-Term Mixed-Method (ETMM) Evaluation Designs," published  in Education Researcher.

"The developers of the WWC standards end up endorsing a single research method to the exclusion of all others," she wrote in her paper, which received an outstanding publication award from the American Educational Researchers Association and will be reprinted this summer."In doing so, they not only overlook established knowledge and theory on sound evaluation designs; they also ignore critical realities about social, organizational and policy environments in which education programs and interventions reside."

The result of experiments conducted along WWC lines, Chatterji writes, is "an a-theoretical, poorly concepturalized, 'black-box' evaluation, where little is unveiled as to the reasons and conditions under which a program worked (if indeed desired outcomes were manifested), or the causes for its apparent failures (in the event outcomes could not be documented.)"

In June, many cities are learning whether their students' standardized test scores have gone up or down. Some will pat themselves on the back, others will be attacked by critics -- but in Chatterji's view, to isolate what really caused the test scores to change, both camps should think of using mixed research methods over longer periods of time.

"When you're answering a 'what works' question, you need to make what's called a generalized causal inference - that is, you've got to establish a conclusive link, through the research design, between the program and the outcome," Chatterji says. She argues that the type of research method one uses should be dictated by where the program stands in its lifecycle and by the questions that need to be answered about it at that time.

"When a math or literacy program is first adopted at a school, it is young and probably not functioning well," she says. "Resources haven't yet been fully allocated, teachers haven't been trained, and there's a lot of unevenness in how the program is delivered. Under such condition, it's hard to show a conclusive link between the program and the achievements of children."

At the early stage of a program, Chatterji advocates what she calls a "formative study" - an in-depth, descriptive and qualitative look at the program and its environment to understand all the variables that may be helping it to succeed or getting in the way of its success. What's actually going on in the classroom? How do the teachers feel about their work? The formative study should be followed by feedback to the people running the program to help them adjust their practices to match the original intent of the program - or, in evaluative researcher lingo, "improve the treatment fidelity".

When the program has matured somewhat, the investigators should return to do a "summative" or confirmatory study, that's much more quantitative and experimental in design.

The changes that come to light using these methods can be tremendously illuminating, as Chatterji illustrates with the example of an after-school program she evaluated in Harlem. In the formative study, only five percent of the teachers said they were getting sufficient support to implement the program. In the summative study, 15 percent said they were getting enough support. Teacher assessment of student engagement increased from five percent to 24 percent. Reported student misbehavior dropped from 41 percent to 16 percent.

More tellingly, in the younger grades, the formative study showed positive effects in reading and math - but in the fourth and fifth grades, there was no gain in reading and a decline in math among those in the program. But the study also revealed that the fourth and fifth graders were misbehaving badly in the program. When a second study was conducted, the misbehavior had subsided, and the results for the older children were similar to those for the younger grades.

"Without the mix of qualitative and quantitative data and the "before" and "after" studies, we'd have concluded that the program worked for younger children, but not for older ones," says Chatterji, "I'm not saying that experimental methods don't have a place - only that they should be used judiciously, and that they are better used once the context for a given program has been understood with other methods."

During the coming year, Chatterji will begin ETMM-style research focused on a major program, funded by the National Science Foundation, that's designed to reduce math achievement gaps at schools in East Ramapo, New York. Still, she says, such work -- which she concedes costs more and is more resource-intensive -- is done all too rarely.

"The WWC standards are still drafts, but they reflect the current Administration's predilection for rigid experiments, in keeping with the No Child Left Behind legislation," she says. "Researchers don't often propose mixed-method research projects because they're afraid their proposals will be rejected for funding.

"The polarizing debate over quantitative versus qualitative methods began years ago, but it is no longer productive to continue that debate," she says. "You can't do sound evaluation research without being expert in both types of methods. It is time for the field to come together on this."

Published Monday, Oct. 22, 2007