Publish And Perish?

RiShawn Biddle takes issue with my TIME column yesterday about choosing teachers because of this line:

But don’t expect too much help from schools [when it comes to choosing teachers]. There are few formal policies, and in most places parents have little information to go on. Some misguided efforts, such as publishing teachers’ value-added scores in the newspaper, don’t do much more than confuse and scare people.

Rishawn’s all for publishing value-add scores, he’s hardly alone and I can understand the impulse.  And I do think parents should be made aware if their child is being taught by a teacher with multiple years of unsatisfactory evaluations.  But, as I wrote at the time the LA Times went down this road there are a few reasons I don’t support this approach of publishing individual value-add scores.  Most notably it’s an incomplete piece of information.  Yes, the predictive leverage of value-added is better than you’re led to believe by much of the rhetoric but it’s not an entire evaluation nor is it available for all teachers.  It would basically be like publishing the error rates of journalists without context about their beat or output or doctors without regard to what they do. And, yes, value-add can address many of the variables – that’s the point – but it’s not a substitute for an overall evaluation that includes other elements and professional judgement.

I find it inconsistent when value-add advocates say that, ‘of course a teacher’s evaluation shouldn’t be based just on test scores’ but then are fine with publishing a statistic derived from test scores that – because it’s being published – by default becomes a summative judgement.  In my view a more constructive approach would be for newspapers to report in a descriptive way about what the value-add data in their community shows – overall quality, variance, where high and low-performing teachers are concentrated, etc…without linking it to individuals.   And to the point I was trying to make in the column that information would help parents ask the important questions they should be asking.

13 Replies to “Publish And Perish?”

  1. Thank you for admitting this. From my many years of experience I know that standardized tests cannot be used for high-stakes decisions because there is no security around them. In many (most?) cases principals encourage teachers to look at the test ahead of time and drill the students on specific items. Of course this invalidates the results. When the Los Angeles Times published the names of “very effective” teachers they could have been praising the people who drilled on the test each year. We don’t know. The “ineffective” teachers could have been the honest ones. Again, we don’t know. When the Harvard and Columbia researchers based their research on standardized tests, the tests might have been valid or not. Again we don’t know. I know they tried to eliminate the cheating factor by working with tests given before 2003, but my experience tells me that many teachers drilled on the exact items on the test as far back as the 1970s.

    There is a mountain of reliable research in education to guide us. It’s time to destroy the status quo of “reform” through guesswork and go with what the research has been telling us for over fifty years. (Hint: the health and early experiences of the child are vitally important.)

  2. It could also be argued that it’s inconsistent to defend VAM as a relatively reliable and accurate measure of teacher effectiveness but then balk at sharing it to the public for fear that it’s incomplete. Well, yes, but we’ve seen that it’s fairly good at what it does nonetheless. Is it wrong for it to be made public along with that caveat? And on the topic of your latest column: If we’re moving toward more “teacher choice”, should there or shouldn’t there be more publicly available information concerning teacher quality?

  3. I’m still confused about why undermining random assignment into classrooms in this way is seen as compatible with – or even complementary to – VAM. Having parents do more to sort their kids to different teachers would severely *reduce* the already-modest “predictive leverage of value-added”.

  4. Our own National Academy Sciences says that we should not use VAM for high-stakes decisions:

    http://bit.ly/xuF7US

    Research in Educational Researcher, one of our very top, methodologically stringent, peer-reviewed education research journals, notes the methodological concerns about VAM:

    http://bit.ly/A55Ozd

    For educational policy and practice, we should be choosing actual data, scholarly analysis, and peer review over anecdotes, personal opinions, and predetermined think tank reports EVERY TIME.

  5. I have degrees in mathematics and physics. For those of you that care about the mathematical rigor of VAM, it is there. The issue is with the assumptions for the model.

    The VAM controls for INCOME AND DEMOGRAPHICS. There is no argument in the edu-refom movement that these two variable are excluded from the two VAM models: The one at SAP and the other at UOW Madison. Both were made by economist.

    As a bit of history, economist have been laboring since the 70’s to bring a sense of quantitative analysis to a more or less fly the seat of the pants, descriptive, weak discipline. This relatively new field is called mathematical economics.

    As one of my graduate professors in econometrics told me: “Econometrics rests entirely upon apriori assumptions.”

    Now back to the issue at hand. Income and demographics account for nearly 85% of the variation in student test scores. The other 15%, that variation which can be explained in a statistically significant way, is related to teacher quality. Thus, when edu-reform advocates talk about the power of good teaching they are talking about his 15%.

    The VAM creators open admit that their model is not ROBUST enough to incorporate income and demographic data without CAUSING their explanatory variables to generates coefficients that have very low t statistics.

    Thus they have a VERY SERIOUS AUTO-CORRELATION PROBLEM when they model is built the way it should be.

    So, right now, there is hidden in the teacher variables a tremendous amount of explained variation that is DUE TO VARIABLES THAT HAVE BEEN EXCLUDED FROM VAM MODELS.

    This is a SERIOUS DEFECT and it is one that should ENTIRELY INVALIDATE present VAM MODELS.

    But what is the edu-reform movement saying? Well they say that something is better than nothing. This is, on its face, risible.

    As a tongue in cheek comment to end this piece, South Africa has now passed law that imprisons and fines meteorologists for issuing incorrect weather predictions. I kid you not.

    Perhaps it is time to do the same thing for blogger, journalists, economists, and all edu-reformers. Unless they demonstrate a 90% accuracy rate, they should be fired, banned, and imprisoned for faulty research.

    Their faulty predictions have cost the economy trillions of dollars in BAD DECISIONS made by free market participants. And as an aside, what do they have to fear. The good predictors should get merit pay.

    Perhaps we could pass a law that states simply: Academics, and bloggers are no longer allowed to post academic qualification. They are, by law, required to post their accuracy rate. And if they falsify that rate, they will be imprisoned.

    Isn’t it time for the edu-reform movement to spread throughout the free markets? Oh we could also outlaw that long treatise at the bottom of your mutual fund report that says that past performance is NO GUARANTEE OF FUTURE PERFORMANCE.

    Cheers.

  6. Bill Jones writes:

    “Now back to the issue at hand. Income and demographics account for nearly 85% of the variation in student test scores. The other 15%, that variation which can be explained in a statistically significant way, is related to teacher quality. Thus, when edu-reform advocates talk about the power of good teaching they are talking about his 15%.”

    Interesting. So only 15% of the variation is “related” to teaching, or more broadly put “instruction.” And of that 15%, how much is within the control of individual teachers? Because instruction includes such things as:

    technology in the classroom
    textbooks and curriculum
    safety and discipline
    attendance rates
    support for special education/special needs
    class size

    And probably dozens of other instructional factors that I’m forgetting.

    If 85% of variation can be explained by income and demographics. That means by definition that all other factors combined account for the remaining 15%. So again I ask, what part of that 15% is really in control of individual teachers?

    Because if individual teachers account for the entire 15% then we are saying that technology, class size, school safety, curriculum, etc. have no measurable effect.

  7. @Paul, that’s a good point to consider. If this were the direction we’re heading, we’d have to perhaps randomly sort which students get into a selected class, or only allow a small number of spots to be filled via choice. Other options?

    @Scott, I get the draw that appeals to authority have, but why also the faulty assumption that the experts who argue in favor of using VAM are only dabbling in “anecdotes, personal opinions, and predetermined think tank reports”?

    @Bill, degrees in math and physics do not make you an expert on VAM. Are you a researcher who has worked with VAM? Your last comments on this blog suggest you are a teacher. Your ideas about accuracy also suggest that maybe you’re not the go-to guy on this.

    @Kent, we already know that in-school factors are not the largest factor influencing student outcomes (teacher quality is considered the largest of these). However, that fact doesn’t impose a maximum on the influence that in-school factors could seek to cause.

  8. @Chris Smyr: Thanks for the thoughtful pushback. I used the quote that grabbed you ’cause that’s what’s primarily being put forth here in Iowa to accompany the Governor’s new education legislation proposal. I know there’s some other research out there (although I’m not sure it carries the same weight as what I just cited)…

  9. @ Chris Smyr,

    I actually think Bill raises some points you ought not dismiss too quickly. I’ve actually commissioned a VAM study and worked with the results. Although VAM shows promise for certain uses, I think it’s always important to remind oneself of the taken-for-granted assumptions underlying statistical models. It is easy to get impressed by some of the elegant highly sophisticated techniques that researchers use in building their model and fail to fully appreciate some of the questionable assumptions and attributions that they make on a routine basis. At the very least these assumptions ought to be scrutinized carefully. I wonder what you make of the relatively low year to year and even classroom to classroom correlations of Value added scores? If value-added scores are such accurate measures of teacher knowledge and skill why is it that these VA score correlations are .2 in ELA and less than .5 in math? Why so much variation? Wouldn’t one expect a “good teacher” to have comparable scores across a number of class section? I don’t think it’s simply a matter of measuring error but rather something more fundamental. I think many researchers using VA models make some fundamental attribution errors in interpreting their results. They assume that teachers are 100% responsible for student learning, when it is more accurate to state that what they are estimating is the amount of learning gain that students achieve as a result of spending a year in a particular classroom. Yes, a teachers has a huge amount to do with the learning experience in the classroom, but so does the curriculum and how it structures and conveys knowledge. So, does the culture of the school and school and peer norms. Many researchers are implicitly conflating teachers with teaching. They are related but not the same.

  10. To clarify, I meant 100% of the growth in student learning, controlling for the other variables in the VA model…

  11. Ed,

    If Bill wants to discuss the issues he brought up, great. In the other thread I asked him to elaborate; in this thread I’ve simply pointed out that his citing of his degrees seemed rather useless and that his ideas regarding accuracy are slightly unhinged. Since you’ve brought up different topics than he, perhaps we can start there.

    If value-added scores are such accurate measures of teacher knowledge and skill why is it that these VA score correlations are .2 in ELA and less than .5 in math? Why so much variation? Wouldn’t one expect a “good teacher” to have comparable scores across a number of class section?

    First, questioning the accuracy of VAM should take into the account the notable results from research in the field. The NBER working paper by Chetty et al, for example, makes a strong case that VAM is accurate enough to forecast VAM changes at the test score level as well to forecast long-term outcomes. The paper got some bad press from the usual suspects, but none of the critiques leveled thus far hold water. Yes, it’s obvious there is variation. This is not a novel concept nor one that removes the benefit from utilizing VAM. Despite the variation and the error, there are key results that support VAM’s reliability.

    I should also question some of your own assumptions regarding teacher quality and variability. Suppose there were an absolute measurement that could track the effectiveness of a single teacher, based on any number of criteria at 100% accuracy throughout a number of years and classrooms. I wouldn’t be surprised to still see such a measurement vary for very mundane reasons given the way teachers can have a different fit in certain classes with certain students and certain subjects and certain content organized within certain curricula. Such a measurement — even without any error — could show variation, and I’d expect it to do so. But such a measurement would still be useful because of the trend. We’d be able to identify exactly which teachers are most often effective and ineffective at what they do.

    There is likely more variation in VAM estimates of teacher quality than actually exists, but VAM scores over time, while noting trends and teacher ratings relative to others, would still be a useful and objective (and accurate!) data point to add to the measures of accountability we currently have. In short, I think that variability is likely more than an artifact of the research design, but that heeding this caution while utilizing it within a system of accountability would still be useful.

    Yes, a teachers has a huge amount to do with the learning experience in the classroom, but so does the curriculum and how it structures and conveys knowledge. So, does the culture of the school and school and peer norms. Many researchers are implicitly conflating teachers with teaching. They are related but not the same.

    Chetty et al. also showed that the measured accuracy of VAM was not conditionally tied to any particular curriculum, school culture or norms. Good teacher who left for another school brought their high VAM scores with them. It was not situational.

    Teacher effectiveness is thought to be the largest in-school factor of significance, and it seems intuitive. Why is it that a teacher shouldn’t be responsible for the curriculum they use? And why assume that school norms must weigh down on every teacher? Research (like the CALDER study Andy referenced) suggests that teacher quality fluctuates greatly even within underperforming schools, evidence that good teachers can lead their students to large learning gains despite obstacles outside the classroom. If factors outside of a teacher’s control were the dominant in-school factor, we shouldn’t see this kind of phenomenon taking place.

Leave a Reply

Your email address will not be published.