Back to Education index

 

All available topics:

Understanding Standardized Test Scores

 

Last edited October 2005, Last reviewed September 2007

 

Understanding the true meaning of standardized test scores requires additional information than the numbers themselves. If we are forced to use these tests out of political necessity, we should at least be wary consumers of the data. For our purposes here of understanding how to “assess the assessments” we will use the widely known CTB/McGraw Hill Terra Nova test as an example.

Let's briefly review the basics. Terra Nova scores, like many other similar tests, are expressed as a percentile. An overall score in the 50th percentile means that fifty percent of the hundreds of thousands of children taking the test nationwide will score higher and fifty percent will score lower. Note the tense in that sentence. Since not all of the tests nationwide have necessarily been given as of the period a particular child took the test, the percentile score the child receives is actually a statistical prediction of what percentile is most likely to occur. We will discuss the overall score here, which is comprised of the reading, language and mathematics sub-scores. The same principles apply to all of the sub-scores.

A score of 50 percentile is defined as the national average. This is related to the bell curve often discussed, meaning that if national scores are charted on a graph most scores would be grouped in the middle and gradually fewer scores would be found in the lower percentiles or higher percentiles. The line on such a graph forms a bell shape. Remember, writers of this test want to create an assessment vehicle that results in such a bell shaped graph, so they purposely seek to include questions that their research shows are answered correctly only about half of the time.  Test writers are looking to rank students, not rate them.

There are two simple ways to critically examine a child's overall test score in a meaningful way. First, examining a child's score in the context of his own previous school performance will allow an understanding how accurate or inaccurate the testing was and it may also provide insight into the child's state of mind during the testing. Second, putting the child's overall score in the context of other overall scores from children in the same classroom and the same grade in the same school will provide a means to further refine an understanding of what actually happened on test day. We'll go over the fine points of using each context below.

The context of previous performance

Parents and teachers know each child and know his areas of academic strengths and weaknesses. The longer he has been in school, the more that is known. From his homework, from his portfolios used in lower grades, from his performances on classroom tests and quizzes, from parent-teacher conferences, from his report cards, and from similar sources there is already a clear picture of a child's academic progress. For our example here, let us suppose that a particular child is very strong in language and writing, he is at expected grade level in mathematics, he struggles somewhat with spelling and that, by and large, his teacher finds his progress satisfactory. In other words, this child's progress compares positively with what is expected at grade level and with his peers.

Now, consider the potential meaning of overall Terra Nova scores based on this background information. (Incidentally, for our discussion we will leave out the possibility that test scores will be inaccurate due to information and strategies not covered in this child’s classroom.) If the overall score of the child in our example is something between, say, the 45th percentile and 55th percentile then perhaps the test will verify what is already known. No new knowledge has been gained.

If the overall score for this child is much higher, say in the 60th to 75th percentile, then some questions are in order. We are confident in the real-world assessment of this child, based on years of previous data gathered from the sources mentioned above. Why would this child score peculiarly high? There could be several explanations. For one, he may have been introduced to test-taking strategies, which are known to artificially raise scores due to awareness of test procedures rather than true knowledge or ability. For another, he may be in a classroom where extra support was given during the testing. Consider that some classrooms with students who have special needs might have the test proctor read the test questions aloud rather than have the students read the questions themselves. Sometimes teachers make thinly veiled suggestions too.  Another possibility is that the lessons in this child's classroom were, by design or by accident, particularly well "aligned" with the types of questions on the test.

If the percentile score is even higher than 75 for this child that we know by real world example to be in the middle of his peer group, then we must have another explanation. Either everything we thought we knew about this child based on years of experience is wrong, or there has been some degree of coaching for the test.

What of a score lower than the average; say in the 35th to 45th percentile? Again, based on what we know from the real-world experiences of our example child, this sort of score is cause for concern. The reason for the lower than expected score could be due to the child being a poor test-taker, or due to the child feeling stressed or under the weather on test day. It could also be because the lessons taught in his classroom are normally evaluated through performance-based assessments, student projects and other hands-on measures and that the child simply does not have experience taking bubble tests. It is a strong possibility that the test questions did not relate well to the culture and life experiences of our example child, leaving him to question the questions themselves. The difficulty of urban minority children in relating to test questions is well documented, but the same principle can apply to country-suburban students and others. Parents and families in a school district come from distinct cultural, religious and regional backgrounds. It is not inconceivable that a child taking the Terra Nova could lack personal understanding of a picture used in a test question or lack the experience described in a reading passage for a group of questions. This would directly affect the score.

Suppose the overall score for our example child is even lower than the 35th percentile--even going down to a single digit percentile. Now our concern must multiply. For a child that we know from years of experience to be on target for his peer group to score this low can only mean that the child simply shut down on test day. Scores this low from our example child mean that he did not participate in the test, he was not able to perform at all on this type of assessment, or that he completely misunderstood the directions for taking the test.

Let's review so far. We understand a particular child very well from many years worth of real-world experience. If this child's Terra Nova score corresponds to this understanding, we may have confirmed what we already know. If this child's score is much higher than what our understanding would lead us to believe, then it is important to determine if the reason is due to extra test preparation or classroom support during the test, if the child's lessons are overly-focused on test subjects, or if there are similar reasons for the disparity. If this child's score is much lower than what we would expect based on our understanding, it is critical to determine if the child "froze up" during the test, misunderstood the directions, or was feeling stressed or ill during the test period. All of these questions should be raised immediately, especially since precious months have likely gone by from the time the test was given and the day we received the test results.

The context of the group

Having examined a child's test score against the knowledge we have of his previous real-world performance, the next step is to examine the score against the group of children who took the test during the same time. This will give us further clues as to the accuracy of the test results.

The percentile score a child receives is based on a prediction of how all students nationally will do on the test. However, what we need in order to further evaluate the accuracy of a particular child's score is the class level scores for that child's classroom, and the grade level scores for that child's grade in the same physical school.  This information will not be automatically included in the CTB/McGraw Hill home reports sent to parents, it requires deeper digging.

With the class and grade level scores, we can better judge if an unexpected score for a child is related to him alone or to an issue that affects other students too. Let's say our example child from above receives a percentile score of 35. We are concerned because this is unexpectedly low. In examining the anonymous class level scores for this child's classroom, we note that most of the other scores are centered near the 50th percentile. When we examine the grade level scores we notice the same thing. Now we can deduce that our example child had a bad day or has an individual problem taking this type of test. On the other hand, suppose the class level scores are all well below the 50th percentile, but the other grade level scores exhibit a more expected or typical range near the 50th percentile. Now we can deduce that something is different in this child's classroom. Perhaps this is the only teacher who did not resort to test-prep, or perhaps this teacher is focused on performance based assessments rather than simplistic bubble-test assessments. Further investigation is necessary to know.

In looking at class level scores, we want to compare what we feel is the relative accuracy of this child's score to the scores of others in the class. In other words, if we feel this child's score is off by twenty percentile either way, we look at the classroom scores to see if they exhibit a typical bell curve centered on the 50th percentile. If they do, we must further examine this child's performance. If they do not and they are seemingly erroneous just like this child's score, we must further examine classroom issues.

Similarly, we must compare our example child's class-level scores to the school’s grade level scores to make a determination if this child's class has any unique issues or if the entire grade, and therefore the entire test or the entire curriculum can be associated with a disparity from the typical bell curve centered on the nationally predicted 50th percentile. Without being able to compare this child's scores to class level and grade level scores from the district, we simply cannot see the entire picture.

After critically examining a child's score and making some determination as to the accuracy, it is important to follow up with the child's teacher. If there is a disparity between real-world observation and the test score, we must find out why and make every effort to make sure the basis for deciding the future direction of a child's academic career is based on real-world observation and not a bubble test. It would be tragic to have a child be given overly advanced work, or provided with unneeded remedial support based simply on an inaccurate test score given one day in a classroom and accepted without question by a teacher and the child's parents.

*** Copyright 2005, rationalamerican.com ***

To cite this article:

Painter, John.  Understanding Standardized Test Scores. (May 2005). Retrieved month x, 2xxx, from

 <http://rationalamerican.com/education>


© Copyright 2003-2007 rationalamerican.com.  All rights reserved.