Value-Added Evaluation and Dismissal of Teachers: Two Cents from an Employment Lawyer


Both our Justin and the very justifiably well-respected school finance economist Bruce Baker have weighed in from different, and equally enlightening, perspectives as to the legal problems (read: certain lawsuits) that would result from instituting systemic value-added assessment-based teacher dismissals and demotions (including de-tenuring tenured teachers). Here, I add my two cents as a former (defense) litigator of these very sorts of issues in Florida.
First, I wholeheartedly agree with both Justin and Bruce that a flood of lawsuits is certain to occur (one roughly the same size as the flood of dismissals and demotions that occur). Simply put, if you fire people, and they think their firings were unfair, then you are going to be sued. Period. In fact, I think that these lawsuits will assert not only claims under the Due Process Clause and Title VII, as Justin and Bruce explain, but also under ordinary state contract law, and possibly in some states (including Kentucky), claims under state constitutional provisions forbidding "arbitrary" governmental action. This certainty of a litigation explosion alone ought to give policy makers pause when they consider their "blame the victim" strategies for improving teaching. In fact, if I were general counsel of a school district, I would advise the administrators to run. Run fast, away from this. However, assuming that district political officials ignore their general counsels (which they sometimes do), would such suits ultimately succeed? It may be surprising, but except for the rare "unconstitutional arbitrariness" claim that might succeed in a state such as Kentucky, I'm not so optimistic for the plaintiffs. Here's why:
As to any contract-based claims, these would likely just be throw-aways to add to the complaint. Any decently represented district will cover value-added demotion and dismissal as an explicit term of the teacher contact (dealing with the union on this brings up a whole separate issue, of course).
As to the Due Process claims, Bruce makes some very forceful and valid points regarding the validity of the value-added model of assessing teacher performance, and he is, of course, the expert on that topic. However, a claim for wrongful termination in violation of the Due Process Clause is a claim of denial of legal procedures--nothing more. The "property interest" in one's teaching job is simply a threshold showing that has to be made before one can even begin to argue that the proper procedures were denied. Under well-settled precedent, a school district complies with the Due Process Clause as long as it offers teachers (1) sufficient notice of impending termination or demotion; and (2) an opportunity to be heard (usually in a hearing before the Board or its designate). Typically, districts also provide a right to counsel and a rudimentary appellate system. Now, the flaws that Bruce identifies would of course be relevant to such a proceeding, but if the decision were to stand after several layers of hearings in spite of such statistical evidence, then the teacher would be left with a very weak due process claim, regardless of the decision's substantive correctness as a matter of measurement.
As to the disparate impact claim, although I agree with much of Justin's analysis, I must disagree with the claim that school districts would have a hard time showing their policies to be neutral. The neutrality that matters in these cases is facial neutrality, and there is nothing race-based on the face of any such policies. Any race-based effects must be proven statistically. That being said, both Justin and Bruce are surely correct that substantial statistical disparities would result from the use of such measures, and these disparities would be largely based on race.
However, to survive a disparate impact claim (i.e., to win at summary judgment), the district would only have to show that the value-added measures were (1) job-related; and (2) consistent with business necessity. The first element would be a no-brainer. However flawed, a measurement of student learning gains is clearly related to the job of being a teacher. The second element would be somewhat problematic for districts if an expert witness were prepared to offer Bruce's methodological critiques. However, even these critiques concede that some useful, effectiveness-related information can be gleaned from such measures, and in such a case, the court will only be concerned with whether the district was ever presented with a race-neutral alternative that was just as effective at accomplishing the employment-based objective, but was rejected or not considered by the district. Do such equally effective, race-neutral measures of teacher effectiveness exist (I honestly do not know)? If not, then the district would at least have a decent chance of successfully defending its use of value-added, despite its flaws.
Again, none of this means that there would be no lawsuits. Every fired or demoted employee who perceives the sanction to be unfair sues. However, a few high-profile federal circuit court decisions could douse a litigation explosion pretty quickly.
Reader Comments (17)
You know it better than I do for sure, and I get how this is facially neutral especially compared to a case like Griggs, but I guess I'm struggling with how the state will defend those statistics. Their only defense, I guess, is de facto segregation - but I am not sure that is a good defense in this circumstance. That's where I think the finance issues and facilities issues would come into play. Those better freaking be equal if you as a state are going to start firing the teachers at those schools with sucky facilities at much higher rates than the teachers at suburban schools that have Olympic sized swimming pools.
On business necessity, I don't know how that would turn out. Certainly the education business has functioned without this for years, so necessity I'm not sure about. What strikes me as particularly bad is the statutory mandate that a majority of the evaluation be based on the test score (I'm just taking Bruce's word on that, I would need to see the language to be sure). But, if so, is that a business necessity? If it is just a factor, that strikes me as different than if it is mandated to be the dominant factor. On top of that, I'm not sure that the school could make the argument that the goal of teachers is to make students score highly on tests. What about a teacher consoling a student whose parents got divorced? Does the test score account for that?
On due process, is there not a substantive due process case to be made here with Bruce's statistical disparities? Your procedural due process analysis I agree with, but I do think there is more to the story than just whether there is notice and a hearing. Perhaps this is not a due process issue, but I am struggling to think of the legal theory at the moment. I'm guessing equal protection wouldn't work, but perhaps there are public employment statutes that would come into play? I don't know. There is something there, I just can't put my finger on it at the moment.
Justin, I'll do my best to address your very fair points:
1. The point as to the statistics (in the disparate impact sense) is that they only matter to show the disparity. After that, if the district can make out the affirmative defense, the statistical disparity becomes irrelevant. I do not think the districts (or states, as the case may be) will even attempt to argue that there is no disparity--that's foolish. They will just focus on job-relatedness and business necessity.
2. You are taking too narrow a view of what is "necessary." The states will argue that getting rid of bad teachers (and promoting/financially rewarding good ones) is the "necessity." As long as value-added can be considered "consistent" with that "necessity," it will survive. As I said above, this could be the weak link, but the existence of any evidence that they reveal anything useful about teacher effectiveness will likely save them legally, unless there is some other measurement practice that is as effective and not racially impactful.
3. As to your third point, unfortunately, there is no relevant substantive due process right (like a right to work or something) under the U.S. Constitution, and the only public employment statutes that would be relevant here would be Title VII and its state-law analogues (other than any state statutory mandate to use value-added, of course). There is always a point when one is really studying employment law deeply from the employee's perspective when one feels compelled to exclaim, "That's just fundamentally unfair!" This is one of those times. The simple fact is that, outside of contract (and procedural due process for public employees), employees really have few to no legal protections against even the most arbitrary employment action.
Thanks for the answers ... loving this discussion. Feeling like I am learning something and causing me to consider arguments and angles I haven't considered before. Discussions like this really reaffirm my love for the blogging platform.
Very interesting discussion. My comments are more directed at policy. I find it troubling that there is such momentum to implement these value-added models when numerous statisticians (and these aren't just people in colleges of education) aren't buying that they work, that they are not statistically valid. I think this may be a case of when the concept and approach sound good but the assessment tool doesn't provide the needed data in a statistically sound way. Basing high-stakes decisions about teachers on a potentially flawed assessment method would be somewhat funny if not for the important outcomes at stake for many individuals. I am not a statistician but, from a policy perspective, I find it troubling that major education policy decisions are being based on an assessment approach about which so many questions exist from multiple corners.
Bruce Baker and Justin Bathom may have valid points regarding the viability of these teacher lawsuits. Here are my two cents.
First, I believe that teachers may be able to assert a substantive due process claim. The claim would be a property interest based on state law. Teacher salaries are generally based on salary scales. It could be argued that teachers have a legitimate expectation to salaries based on such scales.
After establishing the propety interest, the teachers would have to establish that the deprivation was unconstitutional. Rational basis analysis would apply. A court would ask whether the deprivation was rationally related to legitimate state interest. This is where the concept of test validity would come into play. In other words, can the state use the test for the purpose that it is trying to use it? If the test cannot be used for that purpose (i.e., the results are arbitrary), then the test might be vulnerable to a substantive challenge.
As Bruce Baker points out, the test may have so much "noise," that it cannot be validated for the purpose of measuring "value added." Thus, it is possible that the results may be too arbitrary as to be constitutional.
The same logic may apply to the Title VII context. If the state cannot sufficiently limit the noise or arbitrariness of the test, then it may not be able to establish a business necessity.
A very helpful nuance, Preston, but I remain less than optimistic about the success of any such claims. The substantive due process claim that you describe--because it would clearly not involve any fundamental right that has yet been recognized in any federal court--would require only that the use of value-added is rationally related to a legitimate state interest, as you point out. In such cases, the courts generally do not get into whether the program actually works, but whether the decision to adopt the program is rational in light of the objective. For example, if the law were to require dismissal of any teacher taller than six feet, that requirement could not conceivably be linked to the state objective in any way, and it would be invalidated in a substantive due process lawsuit. This is an arbitrariness inquiry, in effect.
It is true based on Bruce's analysis that the value-added measures contain lots of "noise," but if they nevertheless measure something, anything related to teacher quality (that is, if they provide more information about teacher effectiveness than not measuring anything at all), then they are very likely to be upheld as rationally related to the legitimate objective of dismissing bad teachers and retaining good ones. If, on the other hand, the measures provide less information than not measuring anything at all (and I did not understand Bruce's analysis to so conclude), then they would be vulnerable to attack. My (amateur) reading of Bruce's findings is that about 20-30% of the identified effects are due to something other than "noise." This does not impress us as researchers, but it would arguably be rational for policy makers to rely on it in the absence of anything better. Remember just how ridiculously uneven funding was in Texas at the time of Rodriguez, and how obvious it was that the funding formula was to blame? That was still held to be rational, as a means of preserving local control. Here, even a very flawed (but minimally efficacious) value-added measure would likely be similarly upheld.
I agree that Title VII is a better avenue, as I stated above, but again, value added would have to be demonstrably worse than some non-discriminatory alternative way of measuring teacher effectiveness that would not cause the district to go bankrupt.
Good point, Scott. Let me make my point on the constitutional claims another way. A few courts have addressed the constitutionality of testing (e.g., U.S. v. South Carolina -- teacher certification). Recall, that the courts have upheld these cases on the basis that sound psychometrict procedures were used in the development of these tests -- and validity is a key point component in determining whether these tests were rational. The failure to sufficiently eliminate noise may cause an expert witness to find on the witness stand that the problems are so great that the test cannot be used for the purpose that the state is trying to adopt. In other words, the state is not using sound psychometric procedures in the development of the test. As a point of reference, check out the article that Stephen Sireci and I wrote back in 2000, "Legal and Psychometrict Criteria for Evaluating Teaching Certification Tests" -- Edcational Measurement: Issues and Practices)
Thanks for the case, Preston, and I'll be sure to read that article. The case is new to me, but certainly relevant here. However, I do not see much support in it for the substantive due process argument. I want to point out the central rule statement in the "rational basis" section of the court's opinion there. In it, the court set forth the rational basis standard thus: " The constitutional safeguard is offended only if the classification rests on grounds wholly irrelevant to the achievement of the State's objective." United States v. State of South Carolina, 445 F. Supp. 1094, 1107 (D.S.C. 1977) (quoting McGowan v. Maryland, 366 U.S. 420, 425-26 (1961)). It is the words "wholly irrelevant" that are important here. The court there easily concluded--without discussion--that this rational basis standard was met. Id. at 1108.
The only discussion of the constitutional validity of the psychometric instrument used was in relation to an alternative holding (now best considered to be dicta, as we know it is wrong under current law) that even under "intermediate scrutiny" the test program would be upheld. See id. at 1108-09. At that time in our history, Washington v. Davis had just been decided, and the courts were wrestling with whether, in absence of strict scrutiny, an "intermediate" or "heightened" level of scrutiny should be applied to disparate impact claims under the Equal Protection Clause. The court upheld the tests under such "intermediate scrutiny" due to the statistical validity of the tests, but it did not even inquire as to their psychometric validity under the rational basis test.
The case's best support for the psychometric invalidity argument comes in the Title VII discussion, see id. at 1113-16, where the court engages in lengthy discussion of expert testimony, and ultimately upholds the tests based on such evidence (which appears to have been completely one-sided in favor of validity--unlikely to be the case the other way with value-added). I may change my mind after reading your article, but at present I continue to think that the substantive due process claim is likely very weak, that the best (but by no means certain) way of invalidaing value-added programs is at the burden-shifting phase of a disparate impact case, and that, at this stage, a school district employer will be able to find an expert to demonstrate at least minimal validity, which may or may not be enough to secure a victory for the district.
Good point. Is it possible for a court to infer from the logic of Debra P. v. Turlington? In Debra P., the court defined due process in terms of "fundamental fairness." (I don't think "fundamental" meant fundamental right). Fundamental fairness was defined in terms of whether the test actually tested what was taught (instructional validity). The court was not convinced at the time that the state had demonstrated that there was a sufficient nexus between the test and the instruction. In a subsequent case, the court concluded that the state established a sufficient link by looking at the state's curriculum frameworks.
Is it possible that a court using Debra P's approach might conclude that a case is not "fundamentally fair" if there is a sufficiently high level of "arbitrariness." Using the Debra P. approach, a court might find that the state has sufficiently addressed the concerns raised by plaintiffs. Then again, a court might find that it is "fundamentally unfair" to use this test given the possible "parade of horribles" raised by Bruce.
I agree that Title VII is the more effective way of making the case.
Yes, I agree that Debra P. is a much better case for the plaintiffs in any constitutional claim, and I agree with your interpretation of the word "fundamental" in the case. That said, I recall that the court did not care much in those cases whether the test in question was especially good at measuring what was taught--just that the topics covered on the test were actually part of the curriculum. In effect, the court was seeking to determine whether Debra P. and the other members of the class had the opportunity to succeed on the exam. Another way of looking at this is that it would be irrational to deny a diploma based on a test of skills that the test-taker was denied the opportunity to acquire.
So, how would that translate to the value-added situation? I think the "fundamental fairness" question would look to the same sort of inquiry, and this would become whether the teacher in question had any real control over her gain scores--whether she had an "opportunity" to influence them positively. There might be lots of good arguments here--transience of students, absenteeism, etc. However, we should remember that Debra P. was an unprecedented case that was decided at a time of great upheaval in the 11th Circuit and will likely not be repeated ever again, so it may just be one of those situational rulings that don't yield much in the way of precedent. Still, intriguing possibility, and plausible as a way around the more forgiving traditional rational basis analysis.
Scott, you are right to point out that the court was concerned with the "match" between curriculum and test questions. The court's conclusion about "instructional validity" went toward how much the state had to do in order to eliminate the "noise" (curriculum match, teacher failure to instruct). In Debra P., the court concluded that "curriculum match" was sufficient.
Similar issues arise in the teacher certification arena. The answers to the questions you raise may depend on how sympathetic the plaintiffs are. Even though the plaintiffs are teachers, they could be presented as hard-working, dedicated individuals fighting against all odds -- and being punished due to forces beyond their control. Hillary Swank or Michelle Pfeiffer may play these plaintiffs in the movie.
Yes, I agree. The court in such a case would probably have to draw a distinction between merely unfair or unfortunate treatment, and "fundamentally unfair" or arbitrary treatment. Not sure the current SCOTUS would relish having to make such a decision, but I would love to see the many opinions that such a case would generate.
Out of the office all day, so just catching up now and really savoring your discussion. You both seem to have the due process argument pretty well lined out, and I am 50/50 on it. I think Scott makes the accurate points that the legality of it would be difficult, but if the right forum was chosen, I think Preston presents a path by which a sympathetic judge (which I think is quite possible here) could venture down to get the "right" outcome.
But, I am still of the opinion that we are missing something here.
Just fishing, but what about dismissal statutes? Each state has a statute that typically lists the acceptable causes for dismissal in that state. Even with these policies putting test scores in the evaluation criteria, I doubt that any state will change their dismissal statutes. So, now we have an otherwise good teacher with a bad evaluation based solely on her students failing to achieve whatever value-added measure the state/school has put in place. Based on the state, the teacher would then enter some kind of process like remediation and if the scores didn't improve, the subsequent bad evaluations would be grounds for dismissal.
But, dismissal under what category in the teacher dismissal statutes? The only one that seems plausible in that scenario (especially if the teacher can show she was an otherwise good teacher) is incompetence. So, now you have a teacher fired for incompetence and she files suit and shows (a) she was an otherwise good teacher with testimony from parents and fellow teachers, etc. and (b) the tests are statistically questionable. Does the judge not let that get past summary judgment? Wouldn't there be an issue of material fact as to whether the teacher was actually incompetent? And, if so, once this reaches a full trial (granted it would be appealed first) then you have the state struggling to try to prove the statistical validity of not only the tests, but the application of the tests to the evaluation criteria. I think we are at a different place than rational basis at this point, I think we might be trying to factually determine whether or not the teacher was actually incompetent. That seems a much better case for the teacher.
Not sure about this, just throwing it out there. Love to hear your thoughts.
Justin, you are correct that rational basis or any other constitutional test would be irrelevant to such a claim--it's not a constitutional claim, after all. It's a statutory claim. In my first posting, I mentioned contract claims, and a statutory claim such as the one you describe would be the same sort of animal--a substantive argument that the cause for dismissal (or detenuring, or reduction in pay, or denial of a bonus) cannot be established.
While this sort of claim provides more room for a plaintiff to create genuine issues of material fact and therefore get past summary judgment, I doubt that states would adopt value-added and not specify in both contracts and statutes that poor gain scores are grounds for sanctions up to and including dismissal. States even could accomplish the same thing simply by adding a definition of "incompetence" to the relevant statute or contract that includes failure to demonstrate measurable learning gains for more than two consecutive years, for instance. I have no doubt that they would do so if they are smart.
Even if less proactive states decide to charge forward with value-added without changing their relevant statutes or contractual provisions, they may win most cases. If a dismissed or otherwise sanctioned teacher files a claim based on the "incompetence" provision, she will first have to make out some kind of a prima facie case allowing for a presumption either that (1) incompetence was not the real reason for her dismissal; or (2) she is not actually incompetent, as the statute defines it. If she does so, the district will respond with evidence of her incompetence--i.e., the test scores. Where a teacher is shown to have failed to cause learning gains in her students year over year, she will then be burdened by a presumption of incompetence, which is difficult to rebut. As you say, she will have to show that the measurement in question does not actually measure student learning gains year over year, or as I mentioned above, she can now prove that incompetence was not, in fact, the reason for her adverse employment action. That's a tough road.
Ultimately, to rebut the presumption of incompetence that will attach as a result of the production of test score evidence by the district, a teacher in such a situation may be left with little choice other than to attack the constitutionality of the use of the measurement instrument, as we have been discussing, and so we come full-circle.
I'll post more on this later (or maybe write a more scholarly piece on it), but it is possible that STATE constitutional due process clauses may be the real avenue for relief in these cases. More to come on that . . .
But, can a state simply statutorily define "incompetence" as a poor "value-added" measure and have that stick in a dismissal case, even if the value-added measure is incredibly noisy and a very weak predictor of even that same teacher's year-to-year performance? Does the quality of the measure for actually measuring competence matter - even if facially it would appear to measure competence ("it's kids' achievement! of course that matters")? Are teachers simply out of luck if they sign a contract which says they will be dismissed on the basis of "X" measure - not realizing how bad the measurement properties are - and then they find they can't produce "X" measure ... or at least in some years they can and in others they can't for no explainable reason?
Let's throw another wrinkle at this. Say I'm a principal who just doesn't like a certain teacher even though she gets pretty good value added scores each year. I can decide to do her in by intentionally assigning her all of the students who had really crappy value-added scores the previous year and/or giving her larger class sizes than others teaching the same subject/grade level. Surely, if the teacher could prove that I manipulated the system against her, she could do something about it?
But, this also points to possibility that when negotiating contracts that might include definitions of "competence" and grounds for dismissal based on value-added scores, teachers should negotiate for a system that guarantees "comparable class size across teachers - not to deviate more than Y" and that year to year student assignment to classes should be managed through a "stratified randomized lottery system with independent auditors to oversee that system." Stratified by disability classification, poverty status, language proficiency, neighborhood context, number of books in each child's home setting, etc. This gets out of hand really fast.
Bruce, as to your first set of questions, I must respectfully ask whether there is a better output-based measure out there that could be implemented with roughly the same expense as value-added. I honestly do not know. I do know that simple one-time achievement test scores are inherently unfair to those who teach disadvantaged students--that's just common sense. I also understand that value-added assessments were initially proposed as one way of trying to mitigate the effects of the demographic factors you identify both in your blog post and in your comments above. If this is so, but the value-added measures do a poor job of limiting the noise that certainly exists in one-time assessments, then it seems that, at worst (from a legal perspective) basing employment decisions for teachers on value-added assessments is no worse than basing them on one-time assessments. I do not like this conclusion any more than you do, but the districts (or states) will receive substantial deference from the courts as to their choices of how to evaluate their teachers. What I am saying is that it's bad policy, but not necessarily illegal, to use measures that only provide a small amount of useful information.
As to your hypothetical in your second comment, I certainly can see this happening, but it also happens now, under the current system of evaluating teachers based on personal observations and checklists (what are the data on the effectiveness of those, by the way?). I agree that, if a teacher could prove that a principal deliberately set her up to fail, she might have a case, but from an evidentiary standpoint, proving such malicious intent is nearly impossible. Circumstantial evidence comparing the teaching loads and class sizes of similarly situated teachers in the same school would help, but the presumption that this evidence would generate would be relatively easy to rebut through testimony that the scheduling needs of the school required the disparity in class sizes, for instance. I also do not think that most administrators would be foolish enough greatly imbalance such things on purpose when they can simply gerrymander bad personal observation results by carefully selecting like-minded evaluators.
As to the contract negotiation question, you have hit upon one of the reasons I left public school teaching--there simply is no ability for most individual teachers to negotiate anything about their contracts. Teachers unions negotiate most, and where this does not happen a contract of adhesion is imposed on any new hire. That being said, you mention a few negotiating points that unions would undoubtedly bring up, and I get the sense that you perceive these sorts of work rules to be somewhat far-fetched or absurd (and I agree that they point up the troubled road that output-based evaluations may be taking us down). However, with not much searching, I would bet that I can find several big-city teaching contracts with similar work rules in them already.
I am somewhat comforted by the information that I have been reading on your blog--e.g., that value-added would only be possible for a very small portion of teachers (mostly reading and math), that it would only apply to tested grades (3-8 in most places), etc.--as an indication that all of this may just be political grandstanding, rather than an actual policy intention. However, I think this discussion is relevant to teacher evaluation in general (regardless of whether we decide to include gain scores as an element), and I am greatly enjoying it.