Testing Precision & Accuracy Problemssupporting detailThis page is a collection of detailed explanations to support the page evaluating MAP testing and the discussion of how implementing testing lowers standards. |
( 12/27/08 ) |
|||||||||||||||||||||||||||||||||||
|
Accuracy Considerations |
A guide for choosing a test | |||||||||||||||||||||||||||||||||||
|
Purpose vs outcomes of standards The standards movement started with good intentions. Business leaders noticed that students were coming out of high schools with a lot of knowledge but no real skills. The graduates could not take on major projects, they could not balance work between independent initiative and cooperation. College professors noticed that students were dependent learners. They failed to ask critical questions. They limited their learning to the course outline, and only learned what was required for the test. Citizens groups noticed that recent graduates could not interpret news or interpret numbers. People could not estimate numbers related to their own accounts, or understand common numerical information in the news. So various groups set out to create standards. The standards were intended to improve real performance in real life circumstances. People needed to learn to work cooperatively, be more self-aware, integrate complex information, solve non-routine problems, and evaluate the quality of diverse information. These were the intentions of the standards movement. However, in a short period of time the standards movement became political. Committees set up by government agencies created detailed lists of knowledge and skills that students should be expected to learn. The knowledge and skills were reduced to their elemental parts and based in tradition. State curricula were created to define when, where, and how each of those skills will be taught. Standardized tests were created to measure the collection of knowledge and skills. The Federal government passed a regulatory program (NCLB) mandating that all students must be able to pass those tests, stipulating that school funding would depend on those test results. In the end, schools modified their strategies intending to fill students with the required skills and knowledge. Schools reduced their emphasis on critical thinking, on project based learning, on cooperative learning, on subject integration. School increased their emphasis on test skills, on basic knowledge and rote skills. Under state and federal mandates, schools all over the nation have been reducing their focus on high-level achievement, so that they could invest their energies into preparing students for high stakes tests. The standards movement, under political mandates, resulted in promoting the very problems the originators had intended to solve. |
||||||||||||||||||||||||||||||||||||
|
Multiple Measures Achievement and academic potential can not be reduced to a single number and still carry much meaning. Every number we use has its own limitations. No number tells the whole story. We can review some common measures and their pitfalls:
Attempts to reduce measures of cognition, learning, and education to simple numbers can prove very counterproductive. The information omitted is always greater than the information measured. The information that proves most important may never get measured, may get measured incorrectly, or may get diluted as the various measures are all averaged together. |
|
|||||||||||||||||||||||||||||||||||
|
Acceleration through low level skills vs. high achievement .The word "level" creates much confusion in education. Level can mean grade level or knowledge level which is a reference to what knowledge and skills students should have by a given grade level. Level can mean cognitive level which is typically a reference to Bloom's Taxonomy. Or level can mean achievement level. Achievement level is similar to Bloom's Taxonomy, but focuses on real outcomes. School administrators and teachers frequently confuse the various definitions of level. But knowledge level has nothing to do with cognitive level or achievement level. Measures of knowledge level, such as the RIT scores, focus on the two lowest cognitive levels, and ignore the highest cognitive levels. Complex thinking and achievement are totally ignored in grade level measures. Students can work at a very high cognitive level while still demonstrating a low knowledge level. Students can demonstrate amazingly high achievement while retaining very little knowledge. But in most schools, particularly those that use standardized tests to guide curriculum, students are expected to race through knowledge while not being provided time to attain high cognition or high achievement. This is the distinction between accelerated learning and high level learning. Accelerated learning is usually low in both cognition and achievement. Test scores may be high, while students learn information usually reserved for higher grade levels, but the students' ability to use the information wisely is rarely supported. In high (cognitive) level learning students invest great energy into building their understanding, reasoning through information, evaluating and integrating information. Students develop their ability to think, but they may fail to memorize facts needed to pass tests. Test based programs emphasize accelerated learning over high achievement. This leads to a problem for high achieving students placed in accelerated programs. Curriculum for core classes, such as math, is designed for the average student at that grade level. State and Federal laws require that the class be assessed using tests that are designed to identify the underachieving students. So placing high achieving students in accelerated classes results in those students being instructed according to the needs of average students, and assessed according to the needs of underachieving students. Although this tends to result in high test scores, it tends to result in superficial learning. Students get high scores with low standards, so that nobody will risk getting low scores with high standards. |
||||||||||||||||||||||||||||||||||||
|
Standards for High Achievers Vs Standards for Low Achievers The question is, then, raised, "should standards for the high achievers be the same as for the low only faster?" Programs for high achievers should involve deepening their thinking. The students should be given non-routine problems requiring reasoning, and demanding their own search for new knowledge to apply to the problem. They should be required to communicate what they have discovered, or created, and how they overcame challenges along the way. They should be able to discuss the similarities and differences between the problem they solved, and other real life problems. This perception of achievement is embodied in the ELS Design Principles, the NCTM Standards, the AAAS 2061 Goals, and the inquiry-based learning philosophy. There is no straight forward means to test high achievement. Programs for low achievers usually involve very specific lists of knowledge and skills that must be learned. The information is sequenced from easy to harder. Periodic assessments are performed to ensure that the knowledge is being learned. Standardized tests are used to measure how much of the information they actually retained. This is probably not the best method for teaching low achievers, simply this method is used because it is easy to measure and reinforce. Research has shown that most teachers teach to the needs of their low achievers. When I started at my school, we were all encouraged to use the methods for high achievers with all students. Using the expeditionary inquiry-based model, we achieved outstanding results with the top half of the class, but had doubtful results with the bottom half. In response to this challenge we hired a curriculum coordinator. Within two years time, we switched to using the methods designed for low achievers with all students. This improved test scores, but eliminated the phenomenal successes for the high achievers. |
||||||||||||||||||||||||||||||||||||
|
Test Results: Information vs. Measure We test because we need information. School administrators need to know what specific goals to focus on. Teachers need to know specifically what skills to remediate, and how fast they can move through material. Students need be told what specific concepts to review, and how to improve. The goal of testing is to provide this information. However, the methods that we use to grade tests undermines this very goal. We grade tests by scoring them, by returning a number. What does that number tell us? Using a single number to score tests washes out the very information that we need. The state may tell us that a student scored a 612 on the EOG, or NWEA tells us that the student got a RIT score of 231. But what exactly does this mean? The number does not tell me what specific skills the student did not understand. The number does not tell us what caused the student's difficulty in understanding. The number does not tell us what should be changed. These test results provide almost no useful information that can guide our decisions for future instruction. This is the problem with using test scores to drive school improvement. The scores do not provide the information that educators need to improve student knowledge, comprehension, and achievement. Test scores can even discourage educators from distinguishing between knowledge, comprehension, and achievement. To guide improvement educators need information lists. The information gathering needs to ask, "What appears to be the greatest problem?" and "What appears to be the cause of the problem?" These questions need to be asked for both the individual learners, as well as the group. Current testing methods do not provide this type of information. |
Accuracy vs precision | |||||||||||||||||||||||||||||||||||
|
Analogies to test accuracy We may demonstrate the problem of test scores using analogies. If you are not a teacher, imagine that your company wanted a quantified measure of your work. Do you think they could create an assessment that successfully ranked each worker based on a single number measure? Here are some analogies
|
||||||||||||||||||||||||||||||||||||
|
Precision Considerations |
||||||||||||||||||||||||||||||||||||
|
Standard Deviation, r-value, and differentiating instruction Standard Deviation (sigma) assesses how much measures of data tend to vary from their true values. To use any measure to reliably make decisions about individuals the standard deviation must be low compared to the distinctions you wish to make. How low standard deviation must be depends on the quality of the measure that you need. As a rule of thumb, you will want twice the standard deviation (2 sigma) to be smaller than the distinctions that you need to use. For example supposing you are creating three math groups: above grade level, grade level, and below grade level, and you want to use test scores to place your students into groups. If your grade level range 20 points, and your precision estimate (1 sigma) is 3, then you still have a high risk placing any student who is within 6 points of your dividing lines into the wrong group.
|
Standard
Deviation: overview
|
|||||||||||||||||||||||||||||||||||
|
Expected Growth |
MAP growth precision | |||||||||||||||||||||||||||||||||||
|
Random Chance Factors in Bubble Tests
|
||||||||||||||||||||||||||||||||||||
|
Statistical & Learning Errors |
|||||||||||||||||||||||||||||||||||
|
High Achievers vs. Outliers Faith in testing frequently results from the innumeracy of those promoting testing. Promoters of testing typically don't understand the numerical concepts of testing. In response to testing reliability reports I made to our committee, our curriculum coordinator e-mailed me the following response: "Karl, I havent digested this fully, but reading the highlighting I do recall from my psychometrics text that all standardized tests correlate to negative growth for high achievers, simply because they are the out-liers and tend to regress toward the mean." The various errors in this statement are worthy of serious discussion. This statement demonstrates serious innumeracy, a habit of learning low on Bloom's Taxonomy (memorizing without understanding), and extremely low expectations for high achievers. The first problem is the failure to distinguish between a valid out-lier (extreme score) and a large testing precision error (doubtful score.) For a real outlier (e.g.: high achievers) repeated testing should result in approximately the same high score. High achievers are not statistical accidents. Their scores have no natural reason to regress to the mean. Any student can have a day when their test score is an out-lier for their own personal skill level, regardless of their skill level. Their own personal outlier could be either low or high. In subsequent tests, their scores will tend to regress towards their own personal mean, not the mean of the group. The statement from the e-mail above demonstrates a failure to distinguish between high achiever and test error. It demonstrates a failure to distinguish from expectations for individuals (regressing towards one's own personal mean) and expectations for the group (each individual tending to retain his relative rank within the group.) In fact, high achievers should not regress towards the group mean, they should move progressively farther from the group mean. The very abilities, attitudes, and effort that made them rise above the rest of the group should continue to define their performance, creating constantly growing difference between their performance and that of the group. Like one driver going 60 mph while the other drivers go 40 mph, the driver going 60 mph will not fall back into the group (regress to the mean), instead he will continue to get even farther ahead of them. Expecting high achievers to regress to the mean demonstrates both a failure to understand "regression to the mean" and a failure to understand high achievement. The claim that high achievers will "regress to the mean" demonstrates extremely low expectations for high achievers. It suggests that one believes that the high achievers have no natural abilities, personal attitudes, effort, or background knowledge that leads them to succeed. This attitude sees high achievers as flukes, mere testing accidents, who should be expected to fall back into the norm. There is no natural reason, intrinsic to high achievers, that should cause their performance to fall. Such declines must result from either the test or the instructional methods. So then, what is implied that MAP, and possibly other standardized tests, correlate to negative growth for high achievers? It could imply a few things. It might suggest that the makers of the test did not know how to measure high achievement. The high end of the test is full of errors and is thus unreliable. But this would tend to cause scores to fluctuate wildly, not to simply first rise and then fall. The negative growth implies that schools that depend on standardized testing to guide curriculum actually make choices that hurt high achievers. Testing actually leads to instructional and curriculum choices that are bad for high achievers. Some aspect of testing, and test-based instruction, actually undermines the success of high achievers. This is not surprising since most testing is directed at the needs of the low achievers. It is therefore advisable to reject any decisions based on standardized tests when scores are in the negative growth range. |
||||||||||||||||||||||||||||||||||||
|
Summation: In these discussions we have noticed that standardized testing has serious accuracy problems, and serious precision problem. We have noticed that standardized testing tends to lead schools to lower their standards, and that testing may actually lower the achievement of the top students. These problems are observed by many teachers, but are never acknowledged by politicians who pass laws requiring testing, or school administrators who implement testing in schools. |
||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
|
Footnote: Myers-Briggs Personality Types and Learning In the Myers-Briggs profile, those who tend to be most proficient at math are introverted, intuitive, thinkers (INTP and INTJ personalities). They readily understand big picture concepts, intuit strategies to solve complex problems, create new approaches, and reflect on their own learning. They frequently demonstrate rapid grasp of the general concepts, even while making unusual errors with the details. They learn new ideas in seemingly random order as the relationships between the ideas drives their thinking. Those who manage schools and curriculum design programs tend to have sensing-judging (SJ) personalities. They make lists of everything that must be learned. They make calendars scheduling when each detail must be learned. They prescribe the official order for learning, typically starting with small easy details building into more complex skills, and ending with application of those skills. While laboring to ensure that all the details are laid out in the right order, they frequently fail to support the very intent for the learning. The SJs create curriculum designs that do not support the instinctive intuition of the INT personalities. These curriculum designs do not promote the learning and working styles that will be used in engineering and science - careers that INTs tend to gravitate towards. The NCTM Standards, the Expeditionary Design Principles, and inquiry-based learning model were all designed to acknowledge the learning styles of intuitive introverts, as well as the goal of teaching how to do real work with the skills that one is learning. In contrast, most states have set as their high school math standards a list of skills that will not support the vast majority of students in achieving real outcomes as adults. |
|
|||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||