Separating Growth From Value Added

Two academic models offer different tools for different purposes — measuring individual learning and measuring what affects learning by Raymond Yeagley

Could it be true that we were shortchanging our brightest students?

Over the years a handful of parents in the Rochester, N.H., schools had complained that their high-performing children were bored by our curriculum. Because these reports were small in number, we were unsure whether they signaled a teaching problem or reflected students’ desire to be entertained. These students were performing well above the 90th percentile on our standardized assessments, and the number of students moving to the top achievement levels was increasing.

Until the school district started using Measures of Academic Progress, or MAP, a computer adaptive assessment system from Northwest Evaluation Association that reports both status and growth scores, we did not recognize a disturbing pattern among our students. Rochester appeared to be effective at moving its low-performing students forward, but the high achievement scores for our most advanced students masked an alarmingly low rate of academic growth.

Rochester adopted MAP shortly after the district implemented the Education Value Added Assessment System created in the 1980s and early ’90s by William Sanders, then director of the Value-Added Assessment and Research Center at the University of Tennessee. In our first year of the value-added system, the data identified some teachers who seemed to be more effective in challenging the advanced students, and one team in particular that was generating extraordinary growth for nearly all of its students.

As we studied the growth-related data from these systems, one thing became clear to everyone: These two ways of examining academic growth offer extremely different data and serve widely different purposes. While the terms growth and value added often are used interchangeably, they do not represent the same concept.

Identifying Needs
Historically, the Rochester district had relied on annual administration of stand-ardized tests to measure student and school performance, which became a source of frustration as we tried to identify a growth estimate and derive formative information from the test. Standardized tests of the day had almost no relationship to our curriculum, offered little help in identifying individual student needs and did not generate meaningful growth data.

For several years, we had encouraged Rochester teachers to incorporate student data into their instructional planning, but often were rebuffed with the very reasonable and supportable assertion that the data we provided were not useful in creating learning plans for individuals and small groups.

Rochester’s introduction to growth measures was the turning point. Recognizing there would be some sequencing differences between our local curriculum and the assessment’s growth scale, our teachers still found the information generated by the new test, as well as the instructional resources linked to MAP, gave them their first effective tool for identifying individual student academic needs and creating a learning sequence that made sense. Flexible grouping and differentiated instruction finally became understandable, feasible and useful.

We found a real bonus from MAP related to our outliers — those students performing significantly above or below grade level. Because each item is anchored to a vertically aligned scale covering all grades tested, the adaptive test for each student was not tied to grade-level content. Teachers, for the first time, were receiving information about the actual performance level of these students and about content appropriate for their needs.

The grade-level equivalent scores we received from traditional tests long had been a source of confusion because, in spite of frequent explanations, nearly all of our parents interpreted an 8.5 grade-level score for their 5th graders as evidence the students were capable of and should be working on 8th grade content. This myth often was perpetuated by teachers new to the school district who didn’t yet understand that grade-level equivalent indicated simply that these students had responded to items from their own grade level similarly to average students in the higher grades.

In contrast, the adaptive test selected item difficulty and cognitive complexity based on each student’s response pattern, moving outside of grade-level constraints as appropriate. With this information, our teachers finally could create an instructional plan that would support challenging and reasonable academic growth for every student, not just identify a few proficiency targets that were impossible for some and already achieved by others.

Individualized Learning
Perhaps the most exciting aspect of growth measures and computer adaptive testing is the capacity to involve students in creating their own learning plan and tracking their own progress. In many districts that use Measures of Academic Progress, students tested in the fall will meet with their teachers to review the results to plan the year ahead. Students are shown a typical growth rate for those at their grade and performance level so they can select a goal for spring testing. Most students choose targets beyond the typical growth norm.

Additionally, the students are introduced to a set of learning objectives, aligned with their state’s curriculum standards, that must be learned to achieve the target score. This serves as an advance organizer for the students and helps them to recognize new concepts important for achievement of their learning goal. Often, the biggest distraction during a spring testing session in Rochester is the sudden and loud exclamations as students view final scores and realize they have achieved their growth goal.

The main focus of currently available growth measures is formative assessment — providing data to inform instructional planning. Large-scale growth assessments can be used in conjunction with other formative tools, including frequent, short diagnostic tests related to a limited number of learning objectives. Teacher-created diagnostic assessments, even when constructed from pools of validated items, do not constitute a growth measure. However, they can be used in conjunction with growth measures to provide a more complete picture of student learning for short-term instructional planning.

Growth assessments also can be correlated with statewide summative assessments. These can be highly predictive of performance on state exams and can identify learning objectives to move lower-achieving students toward proficiency.

Estimating Effects
Value-added assessment is not a student assessment system. Rather, it is a way of analyzing results from student assessments over time to estimate the relative effect teachers and schools have on student learning. Value-added models are not intended to track individual student growth for instructional planning.

Several statistical models have emerged with a host of names, such as simple and layered mixed effect models; simple fixed effect models; hierarchical linear models; co-variate adjustment models; multi-variate models; and gain score models. Often a single value-added model is described by several of these names.

The various approaches are based on different assumptions and often have been developed to address a theoretical or observed weakness in another model. Among the differences are assumptions about whether and how the effect of a teacher persists into future years, procedures to control for differences in student background and previous knowledge, whether to make prior adjustments to status scores before imputing gains in assessments that were not designed to measure growth, and how to address missing data.

Regardless of their differences and unique statistical complexities, all of the models share a couple of basics. First, they do not rely on a vertically aligned growth measure, such as MAP, to work. Growth can be imputed through statistical treatment of status scores from various student assessments, which may or may not include growth-based assessments. Second, they recognize that a teacher is not responsible for what students bring to the table on the first day of class, but assume that teachers will have a major impact on what and how much the students will learn while in their charge.

Fundamentals of the most commonly used value-added model involve comparing scores for all students tested in a school district over multiple years, then creating a predicted score for each student based on the district average and that student’s test performance history. The student’s actual score for the most recent test event is then compared with the predicted score. If the average difference between actual and predicted scores in a class, grade or school is high enough to exceed the effect of chance factors and measurement error, whether in a positive or negative direction, then the difference is attributed to teacher or school effect. If the value is too small to account for those factors with a sufficiently high degree of confidence, no significant difference is recorded. Thus the value-added model is intended to identify the outliers, not to document small differences in typical teacher performance.

Teacher Impact
Since the introduction of William Sanders’ Tennessee Value-Added Assessment System in 1992, the concept has been the subject of debate, study and controversy among researchers and practitioners. Much of this debate centers on the appropriateness of value-added models for teacher evaluation and high-stakes accountability.

Although the idea of using objective, data-based criteria for justifying teacher rewards and penalties is attractive, many researchers have expressed concern about the viability of this approach. They cite questions about the basic assumptions in value-added modeling and, more importantly, the difficulty of establishing causality without random assignment of teachers and students.

In spite of the teacher evaluation controversy, value-added models can provide useful data as schools and teachers look at different aspects of their operations. In Rochester, with a guarantee that we would not evaluate using these data, our principals were able to engage teachers in analyzing the learning patterns among their students. In some cases, teachers were surprised to see their own classroom reflected our general finding that the most gifted students were not experiencing the same rate of growth as their less advanced peers. Some teachers saw good performance for students near the middle, with significantly less growth for students at both ends of the spectrum. Others found differences based on gender or other student characteristics. In nearly all of these cases the value-added data caused the teacher to ask more questions and search for ways to change the pattern.

One of the most notable findings for the school district was that all of the teachers from a single grade level at one school were seeing very high growth among nearly all of their students. The high growth was evident regardless of students’ initial achievement level, gender, socioeconomic status or any other factors considered in the value-added analysis.

Further, when a new teacher was brought onto that team, whether transferring from another school or coming straight out of college, the accelerated growth rate appeared within the teacher’s first year and continued in future years.

The value-added data led us to look at a number of factors, including how the team organized itself for instruction, its student grouping practices and involvement of the team’s lead teacher in shaping the practices on that team. In the end, we identified what we believed to be significant factors in the consistent success of the team and we were able to share those factors with other principals and instructional teams.

Exercising Caution
Growth measures and value-added data both can provide powerful information to educators for improving instruction and increasing student learning. Each, however, must be used with an understanding of its strengths and limitations.

First and perhaps most important, both of these tools focus on only one aspect of a student’s educational experience — academic growth. Most often, even this is limited to a few core subjects. Neither growth measures nor value-added data should be considered a complete measure of student progress or school performance. Practitioners need to triangulate the assessment data with at least two other measures, and preferably more, that address student engagement and aspirations, physical and emotional well-being, perceptions of school climate and community support, and a host of other factors.

Educators also need to recognize that neither growth measures nor value-added analysis in their current forms are easily applied to high schools. The multiplicity of courses with their associated lack of a single, linear curriculum path, create noise. Additionally, multiple scheduling models, which may compress course content to a single quarter or expand it to two periods per day for a full academic year, make it difficult to assess all students at the same time relative to course completion.

Finally, much thought and care must be exercised before using assessment data for teacher evaluation. Technical debates aside, as school administrators encourage teachers to use data to inform their instructional planning and delivery, they may want to ask whether using value added or other student performance data as an accountability club will discourage the use of data by teachers for their own planning.

In Rochester, we found the value-added data supported what we already knew and had documented through careful classroom observation and use of effective evaluation procedures. We did not need student assessment data to make the case for teacher improvement, discipline or contract non-renewal, and we certainly didn’t need an almost inevitable legal debate focused on data accuracy rather than classroom performance and student needs.

Both value-added analysis and growth measures have contributed to strengthening the public schools in Rochester. Use of data among teachers and administrators is no longer an abstract exercise for looking at curious and misunderstood relationships. Rather, the accurate, high-quality data available to the district’s teachers now reveal individual student needs and serve as a source for discovering what is working and what can be strengthened.

Raymond Yeagley is chief operations officer of the Northwest Evaluation Association, 5885 S.W. Meadows Road, Suite 200, Lake Oswego, OR 97035. E-mail: Raymond.Yeagley@nwea.org. He served as superintendent in Rochester, N.H., for 17 years.