Beyond Test Scores: Adding Value to Assessment

An improvement over existing forms of measuring student performance, yet practitioners find challenges in implementation by ROBERT ROTHMAN
J RegisterJesse Register is superintendent of the Metro Nashville Public Schools.

At a time when teacher quality has emerged as a key factor in student learning, a statistical technique that determines the “value added” that teachers bring to student achievement is getting new scrutiny.

Value-added measures compare students’ growth in achievement to their expected growth, based on prior achievement and demographic factors over which teachers have no control. Teachers whose students exceed the expected rate of growth are considered effective, while those whose students grow in achievement at a slower rate are considered less effective.

Some advocates of the approach have proposed it as a fair way of awarding teachers bonuses based on perform-ance, and indeed some school districts, notably Houston, are using value-added methods in pay-for-performance systems. Other districts have found the assessments offer a great deal of information that can lead to school improvement, and they are using the method enthusiastically for tracking school performance, determining professional development needs and identifying effective teachers — without tying it to teacher pay.

“We limit ourselves in not really understanding the information available” from value-added assessments, says Jesse Register, director of the Metro Nashville Public Schools in Tennessee. “It adds another dimension, beyond just the snapshot of test scores.”

Statisticians caution the method, like any statistical technique, has limitations and does not provide a precise measure of teacher or school performance. Many school districts that use the approach take steps to minimize the possibility of error and report the level of confidence in the data.

And advocates point out that, despite their limitations, value-added measures are far superior to existing measures of performance. “People today are looking under the microscope and finding areas of imperfection,” says Theodore Hershberg, a professor of public policy and history and the director of the Center for Greater Philadelphia at the University of Pennsylvania. “[They’ll say] ‘you might get things wrong, and therefore, you can’t do it because it’s imperfect.’”

“But which is more imperfect?” Hershberg asks. “The value-added system or the current system? The current system isn’t working.”

Farm Fresh
The value-added method has its roots on the farm. William L. Sanders, then a professor at the University of Tennessee’s agriculture school, developed the technique in the 1980s to determine which farms were more productive, given equivalent environmental conditions. After Sanders adapted the method to education, the Tennessee legislature in 1992 adopted it for use statewide.

Since then, two other states, Ohio and Pennsylvania, have adopted the approach, and a number of districts in Texas and Florida use it as well. Many school districts that have received funds from the federal Teacher Incentive Fund are developing value-added measures for determining teacher effectiveness. And 13 states use a variation of the approach to measure student growth to determine whether schools have made adequate yearly progress under the No Child Left Behind Act.

Under the approach, statisticians calculate an expected rate of achievement growth for each student, based on the student’s prior achievement and demographic factors. Then they compare each student’s actual achievement to the expected rate to determine the value added that teachers and schools contribute. (The growth models under NCLB determine whether students are on a trajectory toward proficiency.)

Advocates of the method say it is a much better way of determining teacher and school performance than traditional test measures. Because students are compared against their past performance, teachers and schools are not rewarded for having high-achieving students or punished for having a group of low performers.

“It levels the playing field,” says Jim Mahoney, the executive director of Battelle for Kids, a Columbus, Ohio-based organization that works with about 100 school districts to provide value-added assessment. “If you want to know the level of achievement, just look at the zip code. That’s not true with value added.”

Register, who was superintendent in Hamilton County, Tenn., for 10 years before moving to Nashville, says the value-added assessments provide much better information about school performance than traditional test scores. But this method has proved controversial. The measures show some schools that appear to be high performing might not be raising achievement from year to year.

“It’s interesting to look at high-performing schools,” Register says. “You can see if they are really stretching the students, even if they’re scoring at the 90th percentile.”

Value Added and the Teacher Incentive Fund

The Teacher Incentive Fund, created by Congress in 2006, provides grants to districts and schools to create performance-based compensation systems for teachers and administrators in high-needs schools. Under the program’s guidelines, grantees “must consider gains in student academic achievement” as part of the proposed compensation systems.

read more

One district in Ohio, for example, mobilized the entire staff once the value-added data showed that while the district was high performing, achievement was not growing sufficiently. The superintendent created a sense of urgency and launched a districtwide effort to align instruction to state standards. The district is now improving.

Although many high-performing schools claim that tests have a “ceiling effect,” and high-performing schools cannot raise performance as rapidly as low-performing schools, that is not the case, says Mahoney. “Very, very few kids can answer every question right every year in every subject.”

Register notes that the measures also show that supposed low-performing schools are also successful because they demonstrate great gains in achievement, even if overall achievement levels still lag behind those of other schools. “If you’ve got to climb a 20-foot ladder, you don’t reach the top in one step,” he says.

Yet while he is enthusiastic about the measures, Register has found many educators in Nashville are unaware of its potential. “We haven’t focused on value-added data as much as I think we ought to,” he says, adding “not a lot of dissension” exists over its use.

Comparing Notes
In addition to using value-added measures as tools for accountability, the data are also useful for school improvement. Schools can use the measures as a means of diagnosing problems and developing solutions, says Mahoney, who was formerly the superintendent of the Muskingum Valley Education Services Center in Ohio before moving into his current role. “The goal is not to prove; it’s to improve.”

Jim MahoneyJim Mahoney, a former superintendent, is the executive director of Battelle for Kids in Columbus, Ohio, which promotes value-added assessment.

He recalls two principals in an unidentified suburban district who oversaw schools with similar student populations and similar levels of achievement. They compared value-added scores for one grade level and found one school had much larger learning gains than the other. “One principal looked at the other and asked, ‘What are you doing?’” Mahoney recalls.

In response, the school with lower value-added scores revamped its science program by dividing the grade into four sections. Teachers wrote lesson plans and traded students every nine weeks. “Do I think that’s the answer for teaching science? No. Is it an answer? Absolutely,” Mahoney says. “People can use this (data) to answer a lot of questions.”

Similarly, teachers can use the information to find other teachers who are effective and study their practices, says Roger Bunnell, principal of Hamilton Middle School in Houston. At Hamilton, the block schedule enables teams of teachers to meet for 45 minutes every other day to compare notes on effective practices. “It might be that, in multiplication, Rose added a lot of value, and I didn’t add as much,” he says. “I could go to Rose and pick her brain. What did she do differently?”

Bunnell also notes that Houston provides bonuses to teachers with high value-added scores, and the bonuses can be substantial — a top teacher can earn a bonus of $11,130 this year. But he says the additional funding does not seem to add much of an incentive for teachers at his school, who are already highly motivated. In fact, he says, although the district offered additional financial incentives for top teachers to transfer to high-needs schools, no teachers from Hamilton took them. “They’d rather stay here because the climate is good,” he says.

Some Limitations
Despite the promise of value-added measures, educators and researchers caution education leaders about the challenges associated with their use.

One of the most significant challenges is what to do with teachers who teach in grades or subjects that are not tested. In some cases, school districts rely on school scores and regard the entire faculty responsible for gains in achievement, notes Douglas N. Harris, an associate professor of educational policy studies at the University of Wisconsin-Madison, who is writing a book on the topic. “Even if you’re the gym teacher, you have an incentive to improve,” he says.

In Houston, in addition to the bonuses for individual teachers, the district also provides bonuses to all teachers in a school that produces high value-added gains. Teachers in schools with the top 25 percent of gains can earn $1,000 bonuses, and those in the schools with next 25 percent receive bonuses of $500. Noninstructional staff receive bonuses as well.

In other cases, school districts have supplemented state examinations with their own assessments to generate data for studying teacher performance in additional grades and subject areas.

The challenge is acute in high schools, where students take different subjects each year. Prior achievement in algebra might not predict achievement in geometry, for example. But Harris says this problem is not as severe as it might appear. Previous test scores can reveal a level of general mathematics ability that might predict how a student will perform in a later mathematics course. In fact, school districts could use test scores in other subjects to predict mathematics achievement growth because scores in all subjects tend to be correlated with one another.

High-achieving students tend to be high achieving across the board, Harris notes, adding, “It’s plausible you could get the value added using that approach.”
Another significant challenge relates to the uncertainty associated with value-added scores. Like any statistical measure, value-added assessment is subject to measurement error, and it is difficult to ascribe a precise score to a teacher or a school.

Some recent research has found that the errors associated with value-added measures can be high, particularly for individual teachers. Tim R. Sass, a professor of economics at Florida State University, found teachers’ value-added scores may fluctuate widely from year to year. About 10 percent to 15 percent of teachers in the top quintile of performance would end up in the bottom quintile the following year, Sass discovered.

What accounts for these fluctuations? Sass found they tend to reflect variations in student performance that cannot be explained by a students’ observable characteristics. That is, a teacher’s value-added score might drop not because she is less productive, but because her students do less well on tests than they previously did.

Similarly, the way students are assigned to teachers also might affect value-added scores. Jesse Rothstein, an economist at the University of California at Berkeley, looked at data from North Carolina and found the seemingly absurd result that 5th-grade teachers’ assignments “caused” 4th-grade students’ test-score gains. This paradoxical result came about because teachers are not randomly assigned to students. Effective teachers might be more likely to teach higher-performing students.

A Single Measure
Collecting more data can reduce the level of error. School-level results are more reliable than those for individual teachers. In addition, many school districts rely on three years of data to determine a value-added score. “The only way to suppress errors is to use large numbers,” says Mahoney.

Yet in some cases, gathering and applying three years of performance data might not be feasible. New teachers lack enough years of testing. And in determining needs for professional development, year-to-year comparisons are important, says Harris. “If you did a three-year rolling average, you might be missing improvement.”

Another way to account for measurement error is to report the level of uncertainty attached to the results. That is, districts can report the possible range of value-added scores for an individual teacher or school.

Ultimately, school districts should supplement value-added measures with other measures of teacher and school effectiveness, suggests Mahoney. “You’d be hard-pressed to find people who think a single measure is the only evidence for whether a teacher is effective or ineffective,” he says. “I would not make [value added] the sole measure. But I would make it a measure. This can be a tool for improvement.”

Robert Rothman is a senior fellow with the Alliance for Excellent Education in Washington, D.C. E-mail: brothman@all4ed.org

Additional Resources
The author has compiled a short list of materials for those who’d like to pursue value-added assessment in their school districts.

•  “Evaluating Value-Added Models for Teacher Accountability” by D.F. McCaffrey, J.R. Lockwood, D.M. Koretz and L.S. Hamilton. RAND Corp.

•  “Value-Added and Other Methods for Measuring School Performance” by R.H. Meyer and M.S. Christian, National Center for Performance Incentives, Vanderbilt University 

•  “Would Accountability Based on Teacher Value Added Be Smart Policy? An Examination of the Statistical Properties and Policy Alternatives” by D.N. Harris, Education Finance and Policy, Fall 2009

•  “Student Sorting and Bias in Value-Added Estimation: Selection on Observables and Unobservables” by Jesse Rothstein, Education Finance and Policy, Fall 2009

•  The National Center for Analysis of Longitudinal Data in Education Research, or CALDER, a new federally funded research center with significant resources