EDDRA


Education Disinformation Detection and Reporting Agency


-- a Gerald Bracey Report on the Condition of Education

Index of
EDDRA
Articles
A View from the 1999 AERA Annual Conference

I must say our chairman Jim Guthrie saw more coherence among the papers than I was able to muster. I will address individually three of them, including Bill Schmidt's which was not delivered to discussants in advance, but not Allan Odden's which addressed an entirely different issue from what was listed in the program. I also won't address Jacob Adams' model since he delivered it so fast (time was running out), that it has slipped entirely from my memory.

Concerning Schmidt…

Bill showed that when you disaggregate TIMSS data into smaller curriculum areas, American students perform very well in some areas compared to their peers in other countries and perform awful in some other areas. Bill contended that this variability of performance means that we need national standards. I say poo on that for reasons given below.

Bill also mentioned that students in other nations also showed variability in performance from area to area. He alleged that in most countries there are common standards through grade 8. Since these students coming from countries with common standards also show variability from area to area, standards would not seem to be the key.

Given the variability in performance from area to area and given that this differs from country to country, I conclude that these data prove that children do well on material they have studied and not well on material that they haven't. For this information we needed to spend $51 million taxpayer dollars???

There is another reason for opposing national standards. As is common with national standards advocates, Bill presented the existence of 16,000 school districts as an anachronism. One could argue that when we had 100,000 we were better off because so much more of the community had to be involved.

More to the immediate point, I was in conversation recently with an Indian educator who stated baldly that the US has the best educational system in the world. Given the constant drumbeat of criticism within this country, I was floored at her comment. The reason, in her mind, was that given the conditions of local control, a great deal of experimentation and innovation is possible and that this is good. I concur. In nations with national standards it is not. I recall reading an article on choice in Holland. Virtually anyone can start a school, but the state-run examination system virtually eliminates the meaningfulness of that choice except that protestant kids get to sit next to protestant kids, Muslims next to Muslims, etc. I have never been able to reconcile, nor have I heard anyone else reconcile satisfactorily, the call for standards with the call for choice, charter schools, etc. It is NOT as simple as a means-ends distinction.

Finally, in my November, 1998 Research column in Phi Delta Kappan, I used a more precise rendering of the TIMSS-NAEP linking study to show that the top third of our students do almost as well as anyone while the bottom third do almost as poorly. Only Iran, Colombia, Kuwait, and South Africa scored lower than the bottom third. Only three nations underperformed the District of Columbia in math, and only two in science. Similarly, there are states where as many as three-fourths of the students scored above the international average of all 41 nations and states where only 25% are above the international average.

Those that score low are mostly readily identifiable as areas of poverty. Thus, it seems to me that those who will make use of any national standards don't need them, and those who need them won't use them--they've got much more pressing problems. What will national standards do for places that have no textbooks or where science textbooks predict that man might one day walk on the moon?

People who think national standards will do anything ought to reread Savage Inequalities and the brief from Alabama Circuit Judge Eugene W. Reese in Alabama Coalition for Equity v. Hunt and Harper v. Hunt.

 

Steve Klein presented the case for using Computerized Adaptive Testing as a replacement for paper and pencil tests. I think this technology has a great deal of merit. I think my book, Put to the Test, succinctly limns the limitations of multiple-choice tests. I think as well that the Annenberg/PBS series, "Minds of Our Own" makes a compelling case that when teaching is lecturing and testing is multiple choice, you are absolutely precluded from knowing if the kids really understood what you were trying to teach (there are exceptions to this in graduate school where stems can be so complex and questions so subtly worded that understanding is demonstrated, but this doesn't happen in k-12). What we need is a technology that offers the economy of p & p testing and the power of performance testing. It is possible that CAT will eventually come to embody this promise.

But during the presentation I kept hearing "teacher-proof, teacher-proof." This, of course, was the motto of many post-Sputnik curriculum reformers. They were trying to make materials that would speak directly to children without the intervention of teachers, who were presumed to be the root cause of our educational ills. The result was a disaster, of course. Anything that doesn't include teachers as an integral part of the process from the beginning is doomed. When Steve said, half-jokingly, but only half, that the tests would be housed on some central server in DC or Santa Monica, my eyes rolled towards the heavens.

During the presentation, I found myself wishing, once again, that Dewey and not Thorndike had been the granddaddy of educational research and testing. Thorndike was a control freak who told his graduate students not to bother actually spending time in schools. Thorndike was at home only in a laboratory. He saw schools as incredibly messy. To Dewey, on the other hand, the school was a laboratory. The consequence of Thorndikian paternity is that most research, and virtually all testing, has been seen by teachers as irrelevant to what they do--as it mostly is.

Steve's model also presumes we know a lot more about what it means to learn something than we do. His presumption that "every item is placed on a vertical scale from easy to hard and that this scale is independent of the student's age or grade" contains many unsubstantiated assumptions about learning and what it means to learn material that is taught in school. Perhaps some of this issue will be cleared up by research in "expert systems", but that is also in the future. The model also bypasses the question of what is important to learn and test and who gets to decide.

Thus, while I think that Steve's analysis of the problems of the current system of testing is right on, his faith in his CAT is at least premature, and containing the assumption of a centralized system of testing, doomed.

Richard Rothstein brought, as usual, a great deal of attention to detail coupled with a lot of overall wisdom. Richard argued that an accountability system should include all of the variables people think are important.

He is dead right that when people are accountable for something, they concentrate on that something and not always in right ways. When the airlines began to be held accountable for on-time arrivals and departures, many changes in flying took place. First, the time it took to get from one place to another increased. Definitions of "on time" changed both in terms of the amount of time and when the clock started ticking and the criterion event. People were asked to board earlier. Some airlines give your seat away if you haven't shown up at least 10 minutes prior to "departure time" (thereby effectively changing the time). Some airlines close the gate five minutes before the "departure time." Anyone who flies much can easily observe signs of anxiety if a departure time is threatened, suggesting that there lurk sanctions against the employees somewhere in the system.

The entire accountability movement currently is predicated on the notions that educators can't be trusted. The sanctions for failing to meet the standards are almost invariably negative--you can't graduate, you'll lose your accreditation, you'll get fired. Even when the sanctions are positive, they are narrow and simplistic--the superintendent in Alexandria, VA, where I live, gets bonuses for every X increment of test scores on the Stanford 9. I sure hope he's on good terms with his teachers.

But as a consequence of the narrow and negative sanctions, we have a Deputy superintendent in Austin indicted for manipulating test scores (one legislator wants to changes the punishment for conviction from a misdemeanor to a felony), and a principal in Henrico, County, VA suspended for doing likewise.

And we have outrageously high standards in some instances to which people are being held accountable. In Virginia, 98% of the schools failed the first round of testing. How do we know this failure rate is too high? Well, on the face of it for one thing. But there is a data trail that supports my contention: In TIMSS, only six countries scored higher than Iowa in math, only one scored higher in science. Students in Iowa score between the 62nd and 68th percentiles on standardized tests. Thus, given Iowa's high standing in the world, one possible definition of "world class" is the 65th percentile on a domestic commercial achievement test.

But in Virginia, some of the schools that failed score between the 75th and 80th percentiles on standardized tests. If they took the TIMSS tests, they would likely outscore every country in the world (with the possible exception of Singapore), but they are local failures.

In this setting, an accountability model like Richard's is probably the best we can hope for. When I say it's the best we can hope for now, I assume that we will continue using the dreary language of educational reform for some time before cycling back through, as John Goodlad insists we will, the more edifying language of educational renewal.

Richard's model asks the school district to describe everything it considers important--including some variables not usually thought of as achievement, but which should be. It requires measures for all variables, not just one such as test scores. Recall that Binet contended that no one test was all that important. What was important was that you use a lot of them and that you use a composite score as Richard is suggesting in his model.

It has the wisdom to use progress over time, not the attainment of some possibly unrealistic standard, as the criterion for deciding if the accountability call has been met.

Some people might recall Gene Glass making a similar proposal in 1978 in connection with the use of high school minimum competency tests. We can only hope that Richard will get more of a hearing than Gene did. I am not optimistic.

 


© 1999 Gerald Bracey
Last updated May 14, 1999
Web Services by