And you thought sausage-making was messy.
If you were an eighth-grader in an unnamed southeastern state taking a writing assessment in 1997 that described “someone or something who is important in your life,” you’d better hope your test was graded after lunch. Before lunch, a certain answer would have given you a failing score of 2. After lunch, once a state official deemed she was seeing too many 2’s, and the scorers complied by inflating grades—rubric be damned—that same essay would have gotten a 3.
Such is the state of the supersecret standardized testing industry, according to Todd Farley, who tells of the 15 years he spent writing and scoring test items in the new book Making the Grades. Farley worked for and on many of the biggies: ETS, Pearson, Riverside, NCS; NAEP, WASL, the SOLs in Virginia, the California high school exit exam and more.
Farley comes across as smug, and spends a little too much time telling us the work was boring, the break schedule sucked, the companies wasted money. I don’t care that scorers had to work fast, grading an essay every minute or so. But I care that failing several qualifying tests does not disqualify you for a scoring job. I care that format often counts more than content. I care that supervisors cheat, changing scores not because they are wrong but because they don’t match each other or the psychometrician-predicted score spread. Kids and teachers and schools are being judged through a system we count on as standard, yet the process is rife with disagreement—grading one practice essay, scorers came up with every possible mark, from 1 through 6.
“I believe if a student was told he or she earned a 37 out of a possible 50 points on a test, after rescoring that very test might well be scored a 41, or a 33, or a 38, or a 34, or a 42, or a 35, ad infinitum,” Farley writes.
The book is filled with powerful examples. Coming from a newspaper and narrative background, though, having known the intensity of rushing to the bathroom to write down a conversation exactly as it happened and the tedium of transcribing those notes every day, I’m frustrated when an author presents direct dialogue or essays written years ago as if they are true examples rather than approximations. Farley’s book, he told me, contains both, and I wish he would have been more clear about that.
But I don’t doubt the veracity of his experience. Sloppy tests, sloppy rubrics, sloppy scoring: I heard much of the same when I interviewed scorers for Tested. In a 2002 essay in Salon, former scorer Amy Weivoda wrote that “after a few hours or days or weeks, we’d sleepwalk and skim and assign scores sort of randomly.”
In his presidential campaign, Barack Obama called for “assessments that don’t simply measure whether students can fill in a bubble on a test.” But reading Farley’s book, you start to see the appeal of Scantron.
P.S. Speaking of Scantron, what is the problem with publishers? The book cover art is a multiple-choice answer sheet. THIS IS A BOOK ABOUT WRITTEN ANSWERS. Reminds me of when my publisher tried to put a white girl in a private school uniform on the cover of Tested. I think there were three white kids at that school.