Square This Circle

In the NYT Michael Winerip praises teacher peer review in Montgomery County, Maryland.  In The Washington Post Michael Chandler takes a look at the state of Maryland’s struggle to design a teacher evaluation system as part of its Race to the Top plan.  Sounds easy.  Why doesn’t the state just copy Montgomery County?

Short answer: Because Winerip puts the glossiest spin on the numbers that you can.  Peer review is fine insofar as it goes but it’s not a comprehensive solution.  Here are the numbers across several districts. What they show, overall, and what union leaders and district officials say (when not doing PR) is that peer review is pretty good at addressing observably poor teaching.  Teachers who can’t manage a classroom, organize a lesson, work through material sequentially, engage with the material themselves, or worse can’t regularly get to school on time or in a condition to teach.  These are, of course, a minority of teachers but when you’re talking about 3 million and the scale of hiring in larger districts these are all real issues.  And that’s reflected in the data on peer review.

But what peer review doesn’t get at is the real problem the field faces:  Unobservable poor instruction.  And that’s why districts and states across the country are trying to come up with better measures of effective teaching and why – when used responsibly – test scores can tell you something, too, and are one  necessary ingredient in the mix.  Realistically these news measures will have problems and there will be failure along the way.  Just for example, it’s  worth questioning whether statewide systems are really the way to go and where the best places to invest resources and energy are – in tools or training?  But the bottom line is that we need this learning process and we need new methods because even in places like Montgomery County the overall averages obscure some very real problems – problems that are even more acute for students in many other places.  There are a lot of reasons for our poor school outcomes today, but this is one of them.

26 Replies to “Square This Circle”

  1. I love it – “Unobservable poor instruction”

    Here it comes finally – the reformers secret sauce!

    It’s going after the “unobservable”!

    Are we allowed to ask whether it exists? Or do we have to take them at their word when they say it’s “unobservable”?

  2. Steve has a fair question: is an “unobservably” poor teacher one where the observers don’t agree that the instruction is poor (e.g., someone whose organizational skills are poor but who is charismatic) or one where all the ingredients appear in place but test scores are low? Or something else?

  3. “the real problem the field faces: Unobservable poor instruction.”

    Where did that come from? If it was the real problem, why hasn’t anyone else observed theproblem? How come you didn’t present your grand discovery from the rooftops?

    I’m assuming you’d like that statement back. So, why don’t you just retract it?

  4. That one can question what unobservable here entails doesn’t mean the answers are that elusive. Sherman points to the more obvious of these. Much of what goes on in (and out of) a classroom is not going to be observed by any system comprised solely of observations or mentoring. Utilizing these observations as an indicator of effectiveness, without a hard look at student outcomes, should rightfully be considered problematic.

    The “value-added” contribution of the other 2 comments above thus approaches zero.

  5. And here is a long quote from an article in the American Prospect:
    “On exam day in Sabina Trombetta’s Colorado Springs first-grade art class, the 6-year-olds were shown a slide of Picasso’s “Weeping Woman,” a 1937 cubist portrait of the artist’s lover, Dora Maar, with tears streaming down her face. It is painted in vibrant — almost neon — greens, bluish purples, and yellows. Explaining the painting, Picasso once said, “Women are suffering machines.”

    The test asked the first-graders to look at “Weeping Woman” and “write three colors Picasso used to show feeling or emotion.” (Acceptable answers: blue, green, purple, and yellow.) Another question asked, “In each box below, draw three different shapes that Picasso used to show feeling or emotion.” (Acceptable drawings: triangles, ovals, and rectangles.) A separate section of the exam asked students to write a full paragraph about a Matisse painting
    http://prospect.org/cs/articles?article=the_test_generation

  6. “States are effectively having to make a 180-degree turn in their teacher policies . . . and the tools we all need to do this fairly are at best in the 1.0 stage,” said Kate Walsh, president of the National Council on Teacher Quality and also a member of the Maryland State Board of Education. “We could really mess this up.”

    One would think that Ms. Walsh’s organization, which has been around for 11 years, would have come up with better “tools” during that time.

  7. “Unobservable” does not mean mysterious or undefinable. It just means that there is teaching that, when observed in short installments, looks fine; but that there are failures at the level of getting the children to mastery whether because of gaps in instruction, poorly chosen activities, inadequate assessment, or other reasons.

  8. The problem with ‘unobservable poor instruction’ is that it comes perilously close to blaming teachers for results that have nothing to do with their instruction. This theory seems to hold that: (1) The teacher is observed to be teaching well, but (2) Some students aren’t performing up to par, and so (3) The teacher is not performing well. Kind of a strange conclusion.

    An alternate (and in my opinion more likely) train of thought would be: (1) Teacher is observed teaching well; (2) Some students aren’t peforming up to par; (2) Therefore: Other factors outside of the teacher’s control are likely to be hampering the students’ performance. These factors could include any number of factors shown to influence kids, from their families, personal lives (deaths in family, homelessness, etc.) to peer influences, and a host of other variables.

  9. Attorney DC:

    You’ve not actually addressed the problems with ignoring teaching that goes unobserved, as I’ve written above and as BB wrote as well. If the teacher performs well during observations, that doesn’t account for the teaching that goes unobserved. Thus, no, it’s not a strange conclusion to say that there may be unaddressed problems with instruction given poor student outcomes.

    Using your alternate train of thought, it would almost never be the teacher’s fault for poor student outcomes. What’s more is that ANY student outcome could then be explained away by circumstances, although I assume you wanted to suggest that student outcomes are only caused by the teacher if they are good outcomes. It’s not a very good alternative.

  10. A teacher is observed and no concerns are noticed and yet some students don’t test well. Is it possible there are teaching concerns that were missed during the observation process? Is it likely? Is “unobservable poor instruction” simply a question of how the observation process was carried out or is the claim here there “may” still be some poor instruction that was unobservable? How exactly do you fix a concern that is not observable? At what point does something that was not observable deserve a conclusion that it is not the teacher’s fault?

  11. Maybe we should call it “unobserved teaching.” Any observation of teaching is about a single lesson, and maybe there are several observations over the course of a school year. But teaching takes place everyday, so there is a lot of unobserved teaching.

    At the same, the instruments used to by observers to record what they see in a classroom can be superficial or thin on what observers should look for. An observer might “see” something but there is no place to record it on the evaluation checklist.

    I know someone who did a dissertation study on the parts of teaching that are totally invisible, and it was a great study. There’s a lot of what is done in classrooms that no one can see in action.

    Rotherham is correct in what he says, I think.

  12. Of course there are many facets of teaching that are not observeable in a short observation: Time spent planning lessons, grading papers, making calls to parents, etc. Of course, many education pundits and the general public like to say that teachers only work 9-3 and shouldn’t be paid much (even though so much work they do goes on during their own time, after the school doors close for the day).

    However, my concern is that teachers will suffer negative consequences if their performance is evaluated based on “unobservable” factors. In the real world, I’d hate to think of a teacher who does everything right, but is fired because of an outside factor blamed on the teacher.

    Just an example: Say there are two 4th grade teachers, and luck of the draw puts all three of the 4th grade students with diagnosed emotional disabilities in the class of one teacher. If this teacher’s students don’t learn as much (measured by tests) at the end of the year compared to the other class, the correct explanation is likely that her class is a more challenging class. But proponents of “unobserved” characteristics would just say, “Well, we observed Teacher X doing all the right things in our classroom observation, but the students didn’t learn as much, so it must be that Teacher X is really a poor teacher whose bad qualities are simply unobserveable.”

    If I were a teacher who was rated like this, I’d only want to teach gifted kids in a suburban school.

  13. Teachers are often told when they’re going to be observed. Therefore, it stands to reason that during that time they have prepared a fantastic lesson and a fantastic way to assess the students. The qustion about whether they are observed or not is silly. Of course they are! They are observed by the principal and groups such as, Teach for Success. They are just putting their best foot forward during these observation times. The question should be, How can we help teachers to put their best foot forward even when they are unobserved?” That is what’s going to help the educational system in our country.

  14. Peer review does work and does not work but I dont think it is a matter of the minority being overlooked. I think it is the availability of quality teachers. College graduates dont want to enter the teaching force because the pay for the amount of work is not as good as what they can get in the private sector. Some of the minority not performing teachers also know how to play the system. For example, they may not becoming subpar until after the have received tenure.

  15. I had a CT from the PAR program the first year of my teaching. I really learned a lot from her. In fact, I think it was a great professional development opportunity. She gave me advice and strategies to use in the classroom. She modeled lessons and how to differentiate learning opportunities. The principal at the school in which I was teaching had singled me out for some unsubstantiated reason and was making my life miserable. Thanks to that CT I was released from the PAR program and was able to leave that toxic working environment and pursue my love of teaching in greener pastures. So, I am all for the program.

  16. There’s no doubt that the observational process can be inadequate or even useless. Creating a better observational process is at the very least a step in the right direction. Making a conclusion about poor instruction when nothing of the kind has actually been observed is not only not helpful, it can be destructive. Most teachers would be thrilled if an observer could point out actions that could be improved and lead to students learning more and testing better. This may be why education is so often plagued by the newest fad. Evaluators have no idea what “unobserved” poor instruction needs to be improved and so are always willing to try something new (whether it makes sense or not). Even the insistence by some that teachers just need to work harder is problematic. Work harder at what? Something that can’t be observed, but we’re sure is there?

  17. Attorney DC:

    If you want to reference the “real world” in your rebuttal, you ought to note that what is being proposed in this thread for informing evaluations (a combination of observations and outcomes) is very similar to what actually happens in many professional jobs today. If only I could be judged on how hard I worked in lab instead of the cold, heartless focus of would-be employers on my publication record. Let’s remind ourselves that student outcomes are the true goal here. They are the driving purpose for efforts to make schools better. It should not be considered success if our coworkers and employers think we’re doing a good job. That’s only part of the story.

    Now, it is absolutely untrue that, as Scott E writes, “evaluators have no idea what “unobserved” poor instruction needs to be improved”. They do have some idea of these successful practices. The argument made here, however, is that observations will not fully account for whether or not teachers are making real strides in the right directions toward these effective instructional practices, and they definitely will not account for any teacher actions that are effective but, as of yet, are unsupported by current research. Again, if we want to make student outcomes the goal, it shouldn’t be upsetting if we don’t always exactly know why Teacher A produces bigger student gains than Teacher B.

    By your alternative ideas, there will always be an outside factor that you can point to if you don’t want to contribute student outcomes to a teacher’s effectiveness. This is obviously problematic and would make evaluations a nonsensical waste of time toward gauging success. We’d do well to avoid this train of thought.

    Finally, in response to your example of 4th graders, please understand that all of this talk about student outcomes also seeks to account for differences in growth targets, such as in this case for SPED students and non-SPED students. Thus, your example is off-target. This all brings us back full circle to your misunderstanding of VAM and what can be gleaned from such data, so I again will post the relevant comment and suggest you stop pretending that I’m only disagreeing with you:

    eduwonk com/2011/03/must-reads-3.html#comment-219095

  18. It would be a very poor evaluator indeed to not notice that successful practices were not being implemented and not flag this lack as an area of concern. The assumption being put forward here is even if a teacher is observed following effective instructional practices the teacher must still be doing something wrong if student test scores are not high enough. That poor instruction is just unobservable.

    If we don’t know why teacher A produces better student gains than teacher B, how do we actually know those student gains have anything to do with teacher A or that teacher B did not do his or her job as well with what could be controlled within a classroom. The reality is there are always outside factors that affect the results. Maybe the student gains would be even higher, but for some outside factor. Maybe what teacher B did was overcome those outside factors so the gains were not totally lost even though they didn’t match the gains posted by teacher A.

    If we are going to compare teaching with jobs in the real world perhaps we should consider public defenders and ER doctors who have little control over who their clients or patients are and not a lab environment where outside factors can be reduced or even eliminated.

    The point about working harder is that this mentality is being pushed in some quarters. If lousy teachers would just work harder, student gains would follow. This line is so much easier to push when you’re not concerned with why teacher A has greater student gains than teacher B. Maybe teacher A really is better than teacher B. Simply working harder might never be enough, especially if teacher B is working harder at something that is unobservable. Then again, maybe teacher B is really better than teacher A. After all, in the real world, how often does a MVP award go to player that is not on a playoff team?

  19. Scott: I like what you wrote: “The reality is there are always outside factors that affect the results.” and “If we are going to compare teaching with jobs in the real world perhaps we should consider public defenders and ER doctors who have little control over who their clients or patients are and not a lab environment where outside factors can be reduced or even eliminated.”

    In my experience as a teacher, there are always factors that influence the success of any classroom, and these factors cannot (or at least are not) controlled like they should be for a scientific experiment. Student placement is often intentional (not random) — For example, the principal may purposefully place the most difficult students in Teacher X’s classroom, because she knows Teacher X is the only 4th grade teacher who can handle them. Or one teacher has honors students and another teacher has remedial students. Or one teacher has coaching duties after school while another runs a homework club. There are so many differences that cannot be controlled in the school — And so many more factors OUTSIDE the school (families, friends, neighborhoods, gangs, moving/immigration, homelessness, etc.).

    Chris: I’ve read your comments, but I believe that you are off-base with your idea that teaching can be readily compared to other professions in terms of evaluation. In an office, a typical employee generally works relatively independently (or in a small team). Their evaluation does not depend on the performance of 150 teenagers.

    In an office, the receptionist answers the phone, the secretary types letters, the attorney drafts contracts, etc. In teaching, the teacher works with between 20 and 150 students a day. The students are individuals, with their own families, friends, backgrounds and motivations. Teachers are further constrained by laws and policies, limiting their ability to (for instance) remove a disruptive student from the classroom, even if the student’s actions are disrupting other students.

    If one teacher is assigned an honors class of well-behaved kids who do their homework each night, and another teacher is assigned a class of mostly immigrant students who move in and out of the district throughout the year, have little background knowledge on most school subjects, and have parents who don’t speak English, guess whose job is easier? Guess who will probably have higher student gains at the end of the year?

    You seem to maintain that VAM controls for this, but studies I’ve read about seem to point in the other direction. VAM can’t control for all these variables and it certainly doesn’t control for peer effects: If one low-performing, highly disruptive student is in a class with other average students, that one student’s behavior will likely affect the learning of the other students. VAM wouldn’t control for that. It may assume less learning on the part of the one low-performing student, but would assume that the other students should learn at a steady rate, b/c it doesn’t control for their classmates, correct?

    Chris: I think that your theories may sound nice, but in the real world, with a host of uncontrolled variables, they don’t work as well. In my opinion, subjecting teachers to these types of evaluations, based on the performance of their students, will simply push more teachers away from the most difficult schools, or away from the profession entirely.

  20. Added to all the factors that Attorney DC properly mentions are in school factors rarely noticed by those not working in a school such as the daily schedule and whether or not there are any school activities that impinge on academic minutes. The first period of the day is never quite like the one right after lunch even when the same teacher is teaching the same curriculum to approximately the same group of students. Add to this that school activities interrupt different class periods disproportionately, it can be quite challenging for a teacher to keep every class on the same page.

    And the handling of all those factors, both within school and outside of it, day after day comes down to how students with no personal stake respond to taking a test on one particular day.

  21. Scott E:

    Do you agree or disagree that a given observation would not account for most of the instructional practices that a teacher would utilize throughout a given year? 

    Do you agree or disagree that a handful of observations would not allow a committee to witness many of the complex dynamics at hand in (and out of) a classroom throughout an entire year?

    Noting your likely answers to the above, what I said appears true: observations will not account for many of the teaching practices in use by a teacher.

    In terms of what student gains should be expected, there’s plenty of efforts going on right now to get a reasonable estimate for different kids. In fact, there are some decent models that can and are being utilized for this purpose. You mention that there are always outside factors that affect student outcomes, and we all can agree on that, but given that fact we should continue pursuing better ways of controlling for these factors.

    In response to your comment on jobs, realize that I did not use laboratory experiments as an example, but rather publication records. There are large differences between laboratories and the scientific questions addressed in them that contribute to what chances a person has to publish, and in what journals. Yet that doesn’t change the fact that my publication record, along with my experiences working on my project, will both contribute to whether an employer will want to hire me over some other qualified person. A similar combination of outcomes and experiences are what is being proposed for teacher evaluations.

    You bring up the example of public defenders and ER doctors, but they, too, are accountable to many standards of practice and have quantifiable outcomes that are used in their evaluations on top of how their employers and committees think they are performing.

    I don’t think, and haven’t argued, that the answer is for teachers to “work harder”, nor did I ever say I wasn’t concerned with why teachers effect different student gains. The point is that if student outcomes are the overlying goal, they ought to inform evaluations accordingly.

    _____

    Attorney DC:

    ***”In an office, a typical employee generally works relatively independently (or in a small team). Their evaluation does not depend on the performance of 150 teenagers.”

    Their evaluation does depend, however, on how well they perform at their job. Their evaluation will include how the boss thinks he or she is doing the job as well as examples of productivity.

    ***”Teachers are further constrained by laws and policies, limiting their ability to (for instance) remove a disruptive student from the classroom, even if the student’s actions are disrupting other students.”

    Similarly, professionals do not get to simply choose what projects they will be undertaking. Realize also that dealing with disruptive students is both a necessary skill for a teacher to have to be effective (and so demanding their removal instead of working with them is definitely not helping a teacher’s case), and is understood to be a common factor in teaching. Principals know these students exist, and modeling efforts help to account for their impact.

    ***”[An honors class and a class of immigrants]…Guess who will probably have higher student gains at the end of the year?”

    Guess which class every principal (and VAM) would expect high scores from?

    ***”You seem to maintain that VAM controls for this, but studies I’ve read about seem to point in the other direction.”

    Which ones? Many of them give an indication of error bars that suggest their use to supplement observations would be worthwhile. The fact that some of these are already in use today further suggests that you are incorrect to imply that all experts think they are worthless.

    Finally, stop attributing VAM and associated ideas of controlling for outside factors as my ideas. They are not my ideas, nor are they my “theories”. They are the work of many education researchers for the purpose of using student outcomes in a more controlled and reliable way to inform accountability measures. Feel free to address these facts instead of offering more anecdotes about how you don’t think they could do so.

  22. Chris Smyr:

    You state “observations will not account for many of the teaching practices in use by a teacher.” It is curious that you have little faith that direct observations of a teacher’s practices will ever prove sufficient of correcting or eliminating poor teaching practices or for that matter removing poor teachers. You add as well, “In terms of what student gains should be expected, there’s plenty of efforts going on right now to get a reasonable estimate for different kids. In fact, there are some decent models that can and are being utilized for this purpose.” On the other had you suggest we should pursue better ways of evaluating teachers indirectly by using test scores. You might want to check out the post in School Finance 101, Student Test Score Based Measures of Teacher Effectiveness Won’t Improve NJ Schools to see some inherent problems in those models and where those efforts are headed and it does not seem to be toward greater validity. Of course, you did not actually use the words direct or indirectly, but you are suggesting the we can accomplish indirectly what apparently cannot be done directly.

    You mention that public defenders and ER doctors are held accountable to many standards of practice and have many quantifiable outcomes that are used in their evaluations. One has to wonder how many of those quantifiable outcomes have to do with the time and expense of reaching those outcomes as opposed to the outcomes themselves. This conversation started with the idea that despite being observed there must be some poor teaching instruction that was “unobservable”. Do public defenders and ER doctors also have “unobservable” practices?

    In a previous post you stated “it shouldn’t be upsetting if we don’t always exactly know why Teacher A produces bigger student gains than Teacher B.” It’s likely closer to never or not at all in terms of what student test scores can tell us about poor teaching practices if those scores are not confirmed by observations. Many teachers might find it upsetting that poor student scores are sufficient to determine their fate without any other quantifiable outcomes or lack of meeting any professional standards of practice. While you may not have made this argument, you’re suggestion that a conclusion of poor teaching practices is reasonable from poor student gains alone because those practices may be unobservable supports that argument.

    And while you haven’t made the “work harder” argument, it is clearly being made by others. Maybe poor instruction is not really “unobservable”, but only happens when teachers are not being observed. And maybe the working harder argument reasonably follows from there. You’re not making that argument, but we should not ignore the fact that argument is being made.

    And finally, what “facts” do you want to address? The fact that some districts are using value added measurements and more districts are considering to do so does not in itself demonstrate the VAMs are valid instruments for measuring teacher effectiveness. You say “if student outcomes are the overlying goal, they ought to inform evaluations accordingly.” Are test scores student outcomes? Are they the overlying goal? And what is “accordingly” if they don’t “inform” us about actual poor teaching practices whether they are unobservable or not?

  23. Scott E:

    You did not answer my 2 questions addressed to you at the beginning of my last comment. Please do so.

    In response to my statements about VAM, you reply by pointing me in the direction of a blog. That is not a rebuttal. Please give one.

    ***”It is curious that you have little faith that direct observations of a teacher’s practices will ever prove sufficient of correcting or eliminating poor teaching practices or for that matter removing poor teachers”

    If you find it curious, then re-read what I wrote to see the explicit reasons why there are problems inherent in only using observations.

    ***”One has to wonder how many of those quantifiable outcomes have to do with the time and expense of reaching those outcomes as opposed to the outcomes themselves.”

    An employer is not going to favor having a hard worker that gets nothing done, so you don’t need to wonder too much.

    ***”Do public defenders and ER doctors also have “unobservable” practices?”

    So long as they have responsibilities to carry out when others are not there to observe, then yes. No need to use quotations.

    ***”Many teachers might find it upsetting that poor student scores are sufficient to determine their fate without any other quantifiable outcomes or lack of meeting any professional standards of practice.”

    Someone should tell those teachers that that’s not how they are being evaluated.

    ***”While you may not have made this argument, you’re [sic] suggestion that a conclusion of poor teaching practices is reasonable from poor student gains alone because those practices may be unobservable supports that argument.”

    You are correct: I did not make that argument. Student gains are intended to be used to complement the useful but flawed system of observations for evaluations.

    ***”And while you haven’t made the “work harder” argument, it is clearly being made by others.”

    Oh? Who explicitly argued that, were teachers to just work harder, then we would fix education?

    ***”And finally, what “facts” do you want to address?”

    I want people like Attorney DC to stop pretending that anecdotal evidence is convincing, especially when those anecdotes ask the very questions that studies on VAM are answering. I also want him to stop pretending that I’m just thinking up “theories” about VAM.

    ***”The fact that some districts are using value added measurements and more districts are considering to do so does not in itself demonstrate the VAMs are valid instruments for measuring teacher effectiveness.”

    You are correct, however what it DOES show is that, contrary to what has been argued before, education experts do *not* all agree that VAM is a useless measure of teacher effectiveness. It is a flawed system (as are observations), but given that student outcomes are a major goal for education, we need to utilize them as much as validity allows, and continue efforts to support and improve the models.

    ***”Are test scores student outcomes?”

    Yes.

    ***” Are [test scores] the overlying goal?”

    Student outcomes are the overlying goal, and since test scores are one such outcome, then yes.

    ***”And what is “accordingly” if they don’t “inform” us about actual poor teaching practices whether they are unobservable or not?”

    They give us information on how well students are performing. A teacher’s purpose is to get their kids to learn and be able to apply that knowledge. Thus, test scores inform evaluations by showing if a teacher is accomplishing that goal.

Leave a Reply

Your email address will not be published.