TC Media Center from the Office of External Affairs

Section Navigation

What We Know About School Grading

Welcome to the inaugural edition of “Read All About It”, a Web feature from TC that periodically will provide background information on issues of the day in education policy and research 

What We Know About School Grading

In November, New York City took accountability in education to a new level by grading its public schools. Nearly a quarter received an “A” – with grades assigned around a curve, more than expected – but 99 received a “D” and 50 were designated failures. 

The largest portion of the grades was based on the improvement of schools’ students on state tests from one year to the next – the so-called growth model. The grades also reflect a comparison of schools with similar student populations.

Many observers have hailed the grades as a much needed step toward greater accountability, but some parents and educators also have complained that the grades place too much emphasis on standardized test scores.

So: How to read the grades? What precedent exists for this kind of grading, and how have such efforts fared in the past? More specifically: Has grading schools resulted in improved performance?

In the interest of helping education policymakers better understand the context of this new accountability measure, Teachers College offers the following short list of titles and abstracts of papers written by education researchers on the subject, as well as 'pop quiz' by TC Professor Celia Oyler, the answers to which may surprise you.

Reporting School Quality in Standards-Based Accountability Systems (CRESST Policy Brief 3, Spring 2001)

Robert Linn, Distinguished Professor Emeritus of Education at the University of Colorado at Boulder and Co-Director of the National Center for Research on Evaluation, Standards, and Student Testing. 

This brief discusses ways to measure and report school quality. At present, the differences in state accountability systems make comparisons of schools and school systems very difficult. The most common approach to reporting school status is in the context of current status, an approach in which the school mean or median score for students in the grade assessed is reported. A preferable approach is to place greater emphasis on improvement than on current status. One way to do this is to compare test scores between 2 years but for the same grade. Another way is to track the performance of students from one grade to the next. An alternative is to base the accountability on a comparison of the performance of all students in the school in one grade in one year with the performance of all students in the next grade tested the next year. This is the quasi-longitudinal approach. Some states report similar schools' scores as a way to account for the effects of socioeconomic status. No reporting method is without some disadvantages, but some recommendations can be made to improve reporting for accountability purposes: (1) place more emphasis on school improvement than on current performance; (2) report the margin of error for any school result; (3) evaluate the validity of the uses and interpretations of assessment results; and (4) validate trends with results from other indicators, such as the National Assessment of Educational Progress or other tests.


How Principals Level the Playing Field of Accountability in Florida’s High-Poverty/Low-Performing Schools (International Journal of Educational Reform)

Michelle Acker-Hocevar, Associate Professor, Department of Educational Leadership, University of South Florida; and Debra Touchton, Associate Professor of Education, Stetson University

In response to the pressures of accountability, an emerging field of study analyzes schools that defy the odds by exceeding state expectations of performance with low socioeconomic students and students of color… these studies not only point that all students can learn but also suggest that learning time varies significantly… Most important, these studies provide educators with best practices for high-performing, high-poverty and minority schools… What is less well understood, however, is how principals in these struggling schools work toward this school improvements… This case study, intended as part of a larger qualitative and ethnographic study planned over several years, examines how principals in these low-performing schools are confronting the stigmatizing levels of “low performance. [In interviews with 10 elementary school principals] We asked two questions: 1. What are principals’ views toward the states accountability measures in reference to their schools, their roles; and 2. What, if any, effect has external accountability had on internal accountability, or developing the organizational capacity of their school?

Part I: The Intersection of High-Stakes Testing and Effects of Poverty on Teaching and Learning  (International Journal of Educational Reform, Vol. 11, No. 2, Spring 2002) 

Part II: Building Organizational Capacity Under the Auspices of the A Plus Plan (International Journal of Educational Reform, Vol. 11, No. 3, Summer 2002)

Part III: Effects of High-Poverty Schools on Teacher Recruitment and Retention (International Journal of Educational Reform, Vol. 11, No. 4, Fall 2002)

Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity (The Journal of Human Resources, Vol. 37, No. 4, Autumn 2002, pp. 752-777)

Daniel M. Koretz, Professor, Harvard Graduate School of Education 

Test-based accountability rests on the assumption that accountability for scores on tests will provide needed incentives for teachers to improve student performance. Evidence shows, however, that simple test-based accountability can generate perverse incentives and seriously inflated scores. This paper discusses the logic of achievement tests, issues that arise in using them as proxy indicators of educational quality, and the mechanism underlying the inflation of scores. It ends with suggestions, some speculative, for improving the incentives faced by teachers by modifying systems of student assessment and combining them with numerous other measures, many of which are more subjective than are test scores.

Toward a Framework for Validating Gains Under High-Stakes Conditions (Unpublished) 

Daniel Koretz, Professor Harvard Graduate School of Education; Daniel F. McCaffrey, Senior Statistician, Head RAND Statistics Group; and Laura S. Hamilton, Senior Behavioral Scientist, RAND Corporation

Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains. The report draws on this framework to suggest a classification of forms of test preparation and their likely effects on the validity of gains. Finally, it suggests concrete directions for validation efforts that would be consistent with the framework.

The Promise and Pitfalls of Using Imprecise Schools Accountability Measures (The Journal of Economic Perspectives, Vol. 16, No. 4, Autumn 2002, pp. 91-114)

Thomas J. Kane, Professor of Education and Economics, Harvard Graduate School of Education; Douglas O. Staiger, Associate Professor Economics, Dartmouth College

Over the last decade, states have constructed elaborate incentive systems for schools using school-level test scores. By the spring of 2002, nearly every state and the District of Columbia had implemented some form of accountability for public schools using test scores. For instance, the state of California spent nearly $700 million on financial incentives in 2001, with bonuses of up to $25,000 for teachers in schools with the largest test score improvements. The federal No Child Left Behind Act of 2001 mandates even broader accountability, requiring all states to test children in grades three through eight within three years and to intervene in schools failing to achieve specific performance targets.

Economists have paid scant attention to the properties of school accountability systems. The nature of the incentives presented to schools ultimately depends upon the strengths and weaknesses of the school-level mean test score measures upon which most accountability systems are based. Accordingly in this article, we describe the statistical properties of school test score measures, which are less reliable than is commonly recognized, and explore the implications for school incentives. Many accountability systems that appear reasonable at first glance perform in perverse ways when test score measures are imprecise.

Do Accountability and Voucher Threats Improve Low-Performing Schools?  (Journal of Public Economics, Volume 90, Issues 1-2, January 2006, Pages 239-255)

David N. Figlio, Professor of Economics, Warrington College of Business Administration, University of Floridal; and Cecilia Elena Rouse, Theodore A. Wells ’29 Professor of Economics and Public Affairs at Princeton University

We study the effects of the threat of vouchers and stigma in Florida on the performance of “low-performing” schools. Estimates of the change in raw test scores from the first year of the reform are consistent with the early results which claimed large improvements associated with the threat of vouchers. However, we also find that much of this estimated effect may be due to other factors. The relative gains in reading are largely explained by changing student characteristics, and the gains in math – though larger – appear limited to the high-stakes grade. We also find some evidence that these improvements were due more to the stigma of receiving the low grade rather than the threat of vouchers.

Pop Quiz: How Much Do You Know About School Grading in New York City?

Celia Oyler, Associate Professor of Education, Teachers College, Columbia University

(send comments to

In November, 2007, The New York Department of Education issued a letter grade of A through F to each school in the city. Each grade is based on a very complex set of formulas. Test your knowledge of some of the aspects of the grades.

  1. The school grade is based on three “elements”: school environment, school performance and school progress[1].  At the elementary and middle school level, what percentage of the final grade is derived from comparing each student’s score on two achievement tests from one year to the next year?

a.       10%

b.      25%

c.       60%

d.      85%

  1. The New York State achievement tests used to calculate progress are designed by psychometricians and are normed in advance on a large group of students to ensure that the items at each grade level are appropriate for that grade level. 

TRUE                                      FALSE

  1. The New York State test scores are reported in stanines, percentiles, or standard scores so that test measurement error is factored into the final score, making each student’s final score accurate (to their hypothesized “true score”) with approximately 66% reliability. 

TRUE                                      FALSE

  1. Under No Child Left Behind, each student at a school is expected to show one year of progress as measured by achievement tests in grades 3 through 8.

TRUE                                      FALSE

  1. New York City elementary school principals receive a print out with a list of students who have failed to make a year of progress in math and in reading.

TRUE                                      FALSE

  1. The year of progress is calculated using statistical methods that account for students being able to randomly guess the right answer.

TRUE                                      FALSE 

  1. To get the highest score of a “4” (1 is lowest) on the English Language Arts test (ELA), in 5th grade, a child can only get one question wrong.[2]

TRUE                                      FALSE

  1. Using group achievement tests that take a raw score and convert it into a score of 1, 2, 3 or 4 and then compare each student’s score from year to year is recognized as a statistically reliable way to measure progress.

TRUE                                      FALSE

  1. The scoring of the writing sample of the achievement tests uses a rubric and is conducted by:

a.       Department of Education personnel to ensure that all results are reasonably fair

b.      Teachers across the city who sometimes know the schools they are grading for

c.       Personnel from the New York State Department of Education who are trained to not take into account such factors as the children’s handwriting

d.      Temporary workers hired by each school.

  1. Each school in New York City is subjected to a team of visitors who visit each classroom and conduct a Quality Review.

TRUE                                      FALSE

  1. The results of these Quality Reviews are then factored into the final grades each school receives.

TRUE                                      FALSE

  1. A school can receive a “proficient” on its Quality Review and still receive a school grade of “F”.

TRUE                                      FALSE

  1. Circle all that are correct: The school grades are based on how well each school:

a.       Teaches children to solve problems

b.      Uses culturally relevant pedagogy

c.       Integrates the arts

d.      Provides time for children to exercise

e.       Prepares children to make healthy food choices

f.        Helps teachers work cross-racially and cross-culturally

g.       None of the above

  1. There is a strong correlation between how students perform on the New York State achievement tests in reading and math and how New York City students tested on the national achievement test (called the National Assessment of Educational Progress and administered since 1969 to samples of students across the country).

TRUE                                      FALSE

  1. There is a strong correlation between the list of schools that New York State has rated as failing and the one that received a grade of “F” by the New York City Department of Education.

TRUE                                      FALSE

  1. The ARIS computer system specifically designed by IBM for the DOE and intended to track student progress on annual and periodic assessments cost approximately

a.       Eighty million dollars

b.      Eight million dollars

c.       Eight hundred thousand dollars

d.      Eighty thousand dollars

  1. The DOE assigns each school its final grade based on the actual score in relation to all the peer schools so in theory every school could achieve an A, if all students showed a year of progress.

TRUE                                      FALSE

  1. A school can receive an “F” even if 98% of its students are rated on grade level in math and 86% are on grade level in language arts, as measured by the New York State tests.

TRUE                                      FALSE

  1. After all the large number of calculations are completed—including being compared to the schools in the “peer group”, each school receives a final score. These scores are then converted into a final grade.

TRUE                                      FALSE

  1. The final score of one school may be only one hundredth point (0.01) away from another school, but one school can get a higher letter grade than the other.

TRUE                                      FALSE

  1. Short Answer (extra credit): Since these school grades are so expensive to produce, based predominantly (in elementary and middle schools) on state tests not designed to be used to make year-to-year comparisons of student growth, and do not take into account standard statistical measures of error, why are these school grades being used by the Bloomberg/Klein administration?

Answer Key

  1. d. School Environment counts for 15% of score, and is calculated from attendance and the results of parent, student, and teacher surveys. Student Performance counts for 30% of score and is measured by elementary and middle school students’ scores each year on the New York State tests in English Language Arts and Mathematics. Student Progress counts for 55% of score and is measured by comparing each student’s score on the state math and state English Language Tests from one year to the next. So two test scores equal 85% of the total grade.
  2. False. The New York State tests are not norm referenced tests.
  3. False. The raw score of each student is converted to a 1, 2, 3 or 4 and thus, does not take into account the error of measurement.
  4. True. This is why elementary school principals are often worried if their 3rd grade test scores are really high; this means that when those children are in 4th grade, they have to go even higher to show progress. Obtaining a 3 on the ELA in 3rd grade and then again on the 4th grade test does not necessarily show up as a year’s gain.
  5. True.
  6. False
  7. True. Last year’s ELA score of 4 required 30 or 31 correct answers on the multiple choice section, out of 31 items.
  8. False
  9. b
  10. True
  11. False
  12. True
  13. g
  14. False. See
  15. False
  16. a
  17. False. The grades are calculated “on a curve” with only a certain percentage being allowed in each category.  See
  18. True. See
  19. True
  20. True
  21. Possible correct answers to this question remain a bit of a mystery. Hypotheses abound, but what Joel Klein says is: “Everyone knows what A and F mean. Summing up all relevant measures with a single, simple grade draws sharp attention to the great work at many schools and the stagnation that might otherwise escape notice elsewhere. Grades make people face facts.”  (New York Times, letter, 11/16/07


[1] DOE website

[2] NYSED website

previous page