Out-of-Level Testing

An assessment practice gains in popularity, but is it just another name for low expectations for hard-to-reach students? by JUDITH ELLIOTT and MARTHA THURLOW

Never before has there been as much scrutiny of the results of public education.

The push for accountability, accompanied by sweeping state mandates, has forced school personnel to confront a well-known challenge head on: the multitude of students who are performing significantly below grade level. The high-stakes nature of reporting test scores has resulted in heightened attention by school district leaders about how well their students are achieving and increasing concern about the news media's display of those results.

These worries often overshadow the reality of how well students are actually performing. This means decisions are made sometimes about who takes what test or level of test based on how well the students are expected to perform. This practice of assessing students using other versions, typically lower, of an examination is called out-of-level testing.

Justified Practice?
Out-of-level testing occurs when a student in one grade is assessed using a level of a test developed for students in another grade. Consider a school district that annually assesses students in grades 3, 5, 8 and 10. In out-of-level testing, a 5th grader who is reading at the 3rd-grade level would be administered the 3rd-grade reading test rather than the version for grade 5.

Proponents of this practice argue that a student should take the reading test appropriate for the instruction he receives. After all, they contend, why would we want that 5th-grader to take the 8th-grade reading test if he is still reading at grade 5? And isn't it more important to avoid frustrating the student than it is to precisely measure schoolwide performance? Answering these questions relates to the purpose of the school district's or state's accountability and assessment systems.

What is certain is that the practice of out-of-level testing is becoming more popular now under the reauthorized Individuals with Disabilities Education Act, the federal law that requires all students with disabilities to participate in state and school district assessments.

Two Accountability Types
Generally, there are two types of accountability--system accountability and student accountability.
System accountability can be thought of as a way to keep people, both inside and outside of the educational system, informed about how well students are progressing in school and meeting intended results.

At first glance this seems rather straightforward. However, at closer inspection, the controversy stems from the fact that some states and school districts administer various assessments. In one Midwestern state, every district must administer at least one norm-referenced test, but it is left to the district to decide which norm-referenced test, or NRT, to use. One school district may administer the Iowa Test of Basic Skills, while a neighboring district gives the Metropolitan Achievement Test.

While the use of two different tests that measure different things cannot be used validly to compare the students' performance in the two districts, the news media makes these unfair comparisons anyway. Nationally, states are compared to each other on how well they do on the National Assessment of Educational Progress, the SAT and the ACT.

Another reason for controversy in system accountability is the integrity in which the assessment programs are implemented--that is, who is included in the test, with what needed accommodations and how are scores of students reported. A common frustration of many school leaders is the lack of equitable implementation and assessment of all students, thereby creating an unequal playing field, even among schools within the same district. In system accountability, educators, schools and/or districts are held accountable and rewards or consequences follow.

Many states are implementing sanctions and rewards based on school districts' assessment results. However, the nature of norm-referenced tests most commonly used often prohibit the use of accommodations that students need to take the assessment. In turn, if a student requests an accommodation not contained on the approved list, her score is kicked out of the system and not counted. On the other hand, parents are exempting students because they view assessments as useless or unrelated to their child's learning.

The bottom line is this: School districts are reporting incomplete assessment data. Even more tragic, policy decisions are being made based on these data that don't include all children.

Meanwhile, the purpose of student accountability is to find out how well individual students are achieving in the curriculum or on the established content standards. The individual student is held accountable and reaps the rewards or consequences based on his or her performance.

Many states and districts have added or supplemented traditional norm-referenced tests with criterion-referenced tests, sometimes known as standards-based assessments. The criterion-referenced tests, or CRTs, compare students against themselves on mastery of the content tested.

The advantage of CRTs is they can be aligned with content standards and curriculum, thereby providing a more accurate estimate of how students are achieving toward identified goals. Norm-referenced tests, on the other hand, are constructed to measure a broad range of skills and generally have a 20 to 30 percent overlap with what is taught in the classroom.

Out-of-Level Proponents
The use of out-of-level testing in state and district assessments has increased during the past 10 years, largely as an accommodation or modification for students with disabilities. Those who argue for the use of out-of-level testing offer several reasons:


  • The student is not functioning on the grade level of the test.


    Some argue that the student is incapable of taking the test because she is not at the functional level of the test. Why would we have a student take an 8th-grade reading test if she is reading and receiving instruction in 3rd-grade material?

    Answer: It depends on the purpose of the test. If we want to know how all 8th graders are progressing in the expected grade-level curriculum (system accountability), then it is important to obtain information on all 8th-grade students on that material. A 3rd-grade test measures 3rd-grade skills and curricula, therefore rendering little accurate information for system accountability.


  • The content of the out-of-level test better matches the student's instruction and knowledge.


    If an 8th-grade student is reading on the 3rd-grade level, give him the 3rd-grade test. This will yield an accurate score on how the student is currently functioning. Right?

    Answer: Perhaps. But again it depends on the purpose of the test. If the intent is to gather information for deciding what and how to teach the student, then there are better ways than out-of-level testing to obtain that information. The use of curriculum-based assessment is one such method.

    If the purpose relates to system accountability, it is important for the student to be counted in the grade 8 assessment. If it is truly impossible for the student to take any part of the test, a zero should be given as a placeholder in the accountability system. This practice ensures all students are counted in the grade 8 reporting.


  • The student is not learning the curriculum that will be tested, which will cause him or her undue frustration.


    One must question why the student is not learning the curriculum. Is it due to lowered expectations of the student? Is it for lack of curricular adaptations? Is it due to the need for professional development in identified content standards and curriculum? Is it due to anticipated student anxiety and frustration on the test? Could that anxiety and frustration be related to the fact the student has not been given the opportunity to learn the content and/or lacks understanding of the test format?

    Answer: All of these questions warrant careful consideration before deciding whether out-of-level testing is even considered. It may very well be an instructional rather than student-centered issue.

  • Opposition Arguments
    Many who argue against the use of out-of-level testing do so on these grounds:

  • Out-of-level assessments create a lack of consistency between the test and the purpose of the test.


    In other words, assessments must be consistent with the purpose for which they are being used. Out-of-level tests may be useful for making instructional decisions. However, they are viewed as inappropriate for accountability assessments.

    Since state and district assessments are now used predominantly for accountability purposes, testing at a lower grade level does not reflect a student's performance in relation to a set of standards being assessed for the majority of students.

  • Out-of-level testing reflects low expectations for students.


    A decision to administer an out-of-level test to a student reflects the belief that the student does not have enough mastery of the current grade-level curriculum or standards to pass the grade-level test. The student is expected to fail the test at his or her own grade level so is given the lower level of the test. While the student may pass the lower level, what is he or she really passing? Certainly not the curriculum his or her age mates are being tested in.

    In turn, this has a negative impact on instruction. If we allow students to take lower-level tests, who is responsible for making sure these students learn the grade-level material expected of them?

  • Reporting of test scores can become confused and convoluted.


    If an 8th-grade student takes a 5th-grade test, does the score get reported and where? There have been indications of some schools reporting the 8th-grader's 5th-grade score along with the other 8th-grade scores. Others are reporting the student's 5th-grade score with the 5th graders. Still other schools and districts do not report any scores of students who take out-of-level tests.

    What this suggests is that out-of-level testing contributes to problems in accurately assessing and communicating the academic achievement of all students--that is, unless the tests have been scaled and equated for out-of-level use.

  • Scaling Assessments
    The use of out-of-level testing is a technical matter. The good news is that both norm-referenced tests and criterion-referenced tests can be scaled so that administration of a lower-level test is psychometrically sound and renders valid information.

    Scaling is a psychometric procedure that is conducted to equate one score across different grade levels. Say a district administers its assessments at grades 3, 5, 8 and 10. If this district was to scale its assessment, some 5th graders would take both the 3rd- and 8th-grade tests, some 8th graders would take both the exams at grade 5 and grade 10, some 3rd graders would take the 5th- and 8th-grade exams, etc. Then the scores obtained from students who took the different level are equated.

    For example, Paco is an 8th grader who reads at the 5th-grade level. The district reading assessment has been scaled for out-of-level testing. (This means that some 8th graders have taken the grade 5, 8 and 10 levels of the assessment.) Paco is administered the grade 5 assessment. He received a raw score of 20. A raw score of 20 on the grade 5 assessment has been calculated to be a raw score of 10 on the grade 8 test. Once Paco's grade-level raw score is obtained, a standard score can be assigned, as is typical for any group test.

    The result: Paco's raw score of 10 on the grade 8 exam (or a raw score of 20 on the grade 3) has been equated through the scaling process to a standard score of 1300 on the grade 8 assessment. In the same manner, we can validly report the scaled score of a 10th grader who took the grade 5 assessment and received a raw score of 20, which in our scenario equates to standard score of 1300.

    Basic Considerations
    If your district or state is considering, or already using, out-of-level testing, the first thing that must be examined is whether the state and district assessments have been scaled for this purpose. Regardless of the type of test (an off-the-shelf norm-referenced test, a criterion-referenced test or a standards-based test), it must be scaled in the manner described above in order for the test to pychometrically allow for a grade-level student to take a lower-level test or for a younger student to take a test at a higher level.

    Some test publishers allow one grade level below to be administered. However, when equating is conducted, a random sample of all students who may be taking an out-of-level test must be included in that sample. Check the test's statistical manuals for that detail.

    And don't overlook several basic but critical considerations in the current landscape of accountability and assessment systems:

  • Effective instruction is critical to the learning and achievement of all students.


    We can only blame a test for so long. At some point we need to examine what goes on in the classroom. Many students are not learning at grade level.

    Historically, out-of-level testing has been used because students are not on grade level when district assessments are administered. Most recently, it has been widely used with students with disabilities who often achieve at lower performance levels than their peers. As a result of changes in federal education laws, they now must participate in state and district assessments.

    The bigger issue remains, however: What does instruction look like and sound like? What kind of sustained professional development is being implemented to address needs and concerns of instruction and assessment?

  • The central importance of assessment must be kept in mind.


    Most states have established content standards and about half have established performance standards for students. However, not all students were kept in mind when standards were developed (students with limited English proficiency, students with disabilities, etc.). This situation contributes to many administrators and teachers who have lowered expectations for some students.

    When a lack of expectation for student performance exists, there is a general lack of concern about progress, especially if educators know a student will be given an out-of-level test, exempted or excluded from an assessment and scores will not be reported.

  • Who is ultimately responsible?


    If out-of-level testing is an option for student assessment, who will ultimately be responsible for all students learning what is expected? What does this mean for students who must pass an exam to graduate? The purpose of graduation exams is to certify mastery of standards. If some students are continually given out-of-level tests, what will become of them at the time of the exit exam? Will an out- of-level exit exam become an option?

  • Where to Go From Here?
    Clearly, many issues surround the use of out-of-level testing. You need to know where your state and district are in relation to the use of out-of-level tests and respective psychometric soundness and validity. How is it integrated into system accountability?

    Hard discussions need to take place that focus on why students are in need of out-of-level tests. Begin by asking these tough questions:

    1. What is the purpose of the state or district assessment?

    2. Was the assessment designed to have different levels?

    3. What are the unintended consequences of out-of-level testing?

    4. What is the real purpose of out-of-level testing--to help the student or escape system accountability?

    5. If out-of-level testing is used to inform instructional decisions, how can students be assured of the opportunity to reach high standards?

    6. What instructional practices are currently used? Are they empirically sound and do they reflect best practice?

    Judy Elliott is assistant superintendent for special education in the Long Beach Unified Schools, 1515 Hughes Way, Long Beach, CA. 90810. E-mail: jelliott@lbusd.k12.ca.us. She formerly worked as a research associate at the National Center on Educational Outcomes at University of Minnesota. Martha Thurlow is director of the National Center on Educational Outcomes.