Times Of Plenty

What have they gotten into up there on 8th Avenue?  It’s like catnip for school reformers.  The Times’ editorial page again comes out hard on Race to the Top.    Meanwhile the magazine takes a look at the SEED school, has a really smart package on “remaking education,” and Todd Farley discusses test-grading on the op-ed page. 

What do toilet water and test scoring have in common?  The Farley piece has implications around technology where, somewhat counterintuitively, the reliability problems are actually less.  But no one wants to believe that because it challenges a deeply held belief.   Likewise, although the technology exists to turn toilet water into clean drinking water – a big deal in places with water shortages — it’s a hard sell because it challenges a deeply held belief system as well.

One Reply to “Times Of Plenty”

  1. Mr. Farley sounds like a great guy. And I sure do applaud him for being the first person I know of in the last 15 years to actually expose testing companies for what they are and test scoring for what it is. However, the fix he suggests is no better than the current situation he deplores.

    He suggests using committed education professionals to score tests rather than temps like himself. Logically, that should improve things. But it does not.

    Research on educators’ abilities to accurately assess student performance goes back to the turn of the century. Much of it is detailed in Robert Marzano’s “Transforming Classroom Grading.” In study after study, teachers (committed education professionals) are shown to be the WORST and LEAST ACCURATE scorers of student work.

    There are many reasons for this, none of which we need to go into here. Suffice it to say that I have been in Mr. Farley’s position many times and have found that the teacher-scorers I was supervising tended to be less accurate in their scoring than intelligent, well-trained non-educators.

    The solution to Farley’s problem of a lack of accuracy is a simple statistical practice that is known to reduce error: multiple raters. Every rating of everything can be expressed as Truth + Error. Sometimes the Error is high, sometimes it’s low. But if we put enough ratings together, the Error components tend to cancel themselves out and we get closer to “the truth.”

    Would this be more expensive? Of course. Is it practical? Yes. Does it work? Yes. And even if we do move to computerized scoring of everything, we will still likely benefit from random sampling checks by small groups of intelligent, well-trained humans WHO ARE NOT committed educators.

Leave a Reply

Your email address will not be published.