Debating the Use
of Test Scores
to Evaluate Teachers | Teachers College Columbia University

Skip to content Skip to main navigation

Debating the Use
of Test Scores
to Evaluate Teachers

Last week, New York City released performance ratings for 18,000 teachers based on student test scores, following a ruling the state's highest court the information could be made public.
Last week, New York City released performance ratings for 18,000 teachers based on student test scores, following a ruling the state’s highest court the information could be made public. Here are responses from several Teachers College-affiliated experts to two questions:

Should teachers be evaluated based on their students' test scores? Does the public have the right to that information?

A. Lin Goodwin, Vice Dean and Professor of Education

It is the ultimate irony that despite our hand-wringing about U.S. rankings in international assessments and our apparent desire to learn from high-performing nations, we ignore any lesson that the latter might offer. Finland has stated clearly that it would never—and has never—used test scores to rate or evaluate teachers. The same is true of other top performers such as Singapore and China. What is even more ironic is that these same competitors have been heavily influenced by U.S. educators, from John Dewey to Linda Darling-Hammond. It seems they learned critical lessons about curriculum and teaching from us, lessons that have helped them focus on learning versus testing, professionalize and support teachers, and demonstrate excellence in more ways than test scores. Wouldn’t we be wise to learn from them/ourselves?

Alumna Carol Burris, Principal of South Side High, Rockville Centre, NY.

(This past July, Burris published an open letter to U.S. Secretary of Education Arne Duncan in The Washington Post  protesting federal policies aimed at encouraging the evaluation of teachers based on students’ standardized test scores.)

Teachers should not be evaluated in whole or in part by student test scores.  There is no evidence that evaluation systems that incorporate student test scores produce gains in student achievement. Value added scores, while informative in the aggregate, are not stable enough to use for high stakes decisions about teacher employment.  In addition, the collateral damage will be enormous. With a focus on end-of-year testing, there inevitably will be a narrowing of the curriculum as teachers focus more on test preparation and skill-and-drill teaching. Enrichment activities in the arts, music, civics and other non-tested areas will diminish. Teachers will subtly be incentivized to avoid students with health issues, reluctant learners, students with disabilities, English Language Learners or students suffering from emotional issues. Research has shown that no model yet developed can adequately account for all of these ongoing factors.

Because of all of the above, the relationships between students and teacher will be altered, and the collaboration among teachers will be replaced by competition. There is already anecdotal evidence from the field that this is occurring.  We should instead look to alternative approaches, including the evaluation systems of Cincinnati and Montgomery County, Maryland, which are effective in improving instruction without relying on student test scores.
Parents certainly have the right to their child’s scores and the public has a right to school-wide scores.  The publication of individual educator scores, however, does not serve the common good. Not only are such scores prone to error, they result in public humiliation as evidenced by some of the headlines we have seen this week. There are even greater worries, however, for those of us who are concerned about equity.

My doctoral studies at Teachers College focused on another sort-and-select system--tracking or ability grouping. Research demonstrates how “parents in the know” and those with political power, work the tracking system to gain advantage for their kids. We will see similar dynamics at work if teachers are sorted and selected into “score” groups. My prediction: We will see a tremendous push by the most skilled, demanding, and well-resourced parents to get each year’s “highly effective teacher” and for district offices to “stick” the ineffective teacher in a class (or school) where the parents are less likely to complain. Each parent will make a valid case why their child needs the highly effective teacher, or why he or she should not be with a teacher who is “developing”, terms used in the New York State system. If you doubt that this will occur, read the work of scholars such as Jeannie Oakes, Kevin Welner and Amy Stuart Wells about how tracking systems and the high-track advantage play out in the real world. 

I also predict that student grades assigned by a teacher labeled less than effective will be challenged. One can only imagine the lawsuits that will arise. The evaluation scores given to teachers by principals who themselves are rated less than effective will be challenged as well. Can a teacher be fairly rated by a principal who was rated ineffective that year? And when the “ineffective principal” is dismissed, who will agree to lead that school, if the ineffective rating was based in large part on student achievement? No administrator will risk that move — achievement cannot be turned around that quickly — and the students in struggling schools will lose again.

All of the unintended consequences of the incorporation of test scores into the evaluation of teachers will be further fueled as educators suffer the worry of public humiliation. The New York State Freedom of Information Act statute should be changed to ensure that the evaluations of public employees cannot be shared with the public.

Aaron Pallas, Professor of Sociology and Education

(Pallas is a frequent commentator on testing issues. Currently his piece, "The Right to Know What?" currently appears in his column, A Sociological Eye on Education, in The Hechinger Report of TC's Hechinger INstitute on Education and the Media. Pallas also recently was interviewed about teacher evaluation on the show "Democracy Now."

Teacher evaluations should reflect the things we want good teachers to do, and that includes advancing what students know about math and the use of the English language.  But there are two big problems in using students’ test scores as the centerpiece of teacher evaluations. The first is that there are many other things we want teachers to do--such as cultivate students’ creativity, intellectual curiosity and respect for others, or prepare students to be engaged citizens in our democracy--that are equally important, but hard to measure.  The danger in relying on students’ test scores is that they don't tell us much, if anything, about teachers’ ability to advance these other goals of schooling.

The second problem is that it’s really hard to isolate a teacher’s contribution to his or her students' achievement.  Tests cover only a small sample of the content domain they're supposed to represent, and one teacher’s class of students differs from other teachers’ classes in ways that can’t always be measured.  If we’re going to use tests in this way, we have to understand how limited and error-ridden the scores are.

Whether the public has a right to teachers’ evaluations apparently is a matter of state law.  New York State’s Freedom of Information law presumes that final government agency determinations, including personnel evaluations, are a matter of public record unless explicitly exempted by federal and/or state law.  In New York, the personnel records of police officers, firefighters and corrections officers are exempted; those of teachers are not.  I favor amending the law to protect the privacy rights of teachers.  

Eric Nadelstern, ’73, TC Visiting Professor of Practice and former New York City Deputy Chancellor of Schools

Students’' test scores must be considered when evaluating teachers, but the scores don't tell the whole story. The evaluations of supervisors, parents, peers and students, when taken together with test scores, present a more complete teacher profile.
As long as we make promotional decisions for students based on test scores, parents and the public have a right to know how well each teacher does to prepare her students for these exams.

James Corter, Professor of Statistics and Education

Student evaluations have always been an integral part of education (and if designed and used properly, can aid education).  Because students and teachers have a collaborative task to perform -- maximizing student learning and maturation -- it seems fair to evaluate teachers, too, and to base that evaluation in part on student test scores. 

But it’s only as fair as the tests are good. In this context, what makes a test good?  The tests must be reliable, and they should test all and only the set of relevant skills that the student needs, and that are under some degree of teacher control.  Simply defining this set of skills, let alone measuring them well, is difficult and probably controversial.

Finally, the test results need to be used in a fair and circumspect manner -- they must take into account student characteristics (including students’ previous knowledge and capabilities) and the teacher’s working environment -- and they should not account for all of a teacher's evaluation rating; other sources of information such as peer observations should play a role. This is a long set of requirements, so it seems clear that it would take a multi-year endeavor, requiring lots of motivation, political will and resources, to get it right.

Until the tests and the scoring systems have been in place and evaluated for two or three years, it seems a bit reckless to use the results for any high-stakes purpose. Telling principals or parents that Teacher X is bad (or good) is a high-stakes outcome. We should not do this until we have hard evidence that the tests are as good as we can make them, and the numbers are being used appropriately.

William Gaudelli, Associate Professor of Social Studies and Education, and a member of the South Orange-Maplewood Board of Education

I believe test scores should be used as part of a series of measures on the part of supervisors to support the growth of existing teachers.  They serve as a useful, though limited, measure of a teacher's capacity, but the unaccounted-for variables (students, context, etc.) of course complicate any facile reading of what they mean.  In the same way that we use GRE scores as a measure of student capacity, no one would accept those results singularly as evidence of student potential. Yet most of us account for them in making evaluations of student capacity.  

Should the public have access to teacher evaluations based on student test scores? Absolutely not.  This line of thinking is tied into the rampant consumerism that now suffuses educational discourse – the mindset that parents directly employ teachers and should be involved in judging their quality, as though shopping for a new car or a handbag.  We have competent administrators who handle those responsibilities.   I can say as a Board of Education member in New Jersey that we take special care not to unnecessarily expose teachers who are underperforming, as doing so helps no one. Rather, our approach is to get such teachers the support they need to make improvements; supervise them carefully so that children are not disadvantaged by their work; and if they do not improve, relieve them of their duties.  Public humiliation of teachers serves neither schools nor the public.

Christopher Emdin, Assistant Professor of Science Education.

(The following response is an edited excerpt from Emdin’s recent blog on this topic in The Huffington Post.)

One of the most dangerous effects of teacher ratings based on test scores is that they will deter the aspiring teacher who wants to make a difference in schools with low resources and underserved populations from wanting to work there. The risk of public humiliation for working in the most challenging schools will result in an exodus of teachers from already hard to staff schools.

Just as we will have an issue with teacher recruitment, there will be another issue of an exodus of good teachers from schools where they are most needed. This exodus is already a big piece of the equation in NYC public schools where seasoned teachers, after they have "earned their stripes" in city schools, move to more suburban school districts for better pay. 

We can easily determine what the collateral results of the release of data will be before we even see it. There will be many “bad” teachers in low-income schools populated by youth of color, who in some cases have the most dedicated teachers, and these schools will be positioned as places where neither teachers nor students would want to go. At the same time, schools with high test scores will be positioned as having the best teachers, even if the students would score well on the exams without teachers in the room.

This narrow narrative will promote the TV, movie and reality show mindset of a hero vs. villain, good vs. bad. Applying this dynamic in schools will open up the arena for the privatization of schools, increased sales of test prep training and materials, and further ignoring of the true needs of youth in schools.

The contradiction between advocating that teachers should not teach to the test and then using standardized test scores to evaluate teachers makes no sense, and simply shows us that despite what is being shared with the public, many people with the power to change the direction of public schools still believe that in education, testing is everything.

Published Thursday, Mar. 1, 2012