>>
>>
>>
Florida 2000 and Washington 2004
A Study of Two Elections
Having been "denied" access to the Commission's data, Thernstrom and Redenbaugh enlisted their own consultant. At their request Yale economist and Olin scholar John Lott conducted another series of regression model runs that found no correlation between race and ballot spoilage. These they claim, were better characterized and more thorough than Lichtman's.
"Although the commission refused--and still refuses--to provide us the machine readable data Dr. Lichtman used in his analysis, we were able to assemble the necessary material for our own analysis. We were fortunate in being able to enlist the help of a first-rate economist, Dr. John Lott of the Yale Law School. Dr. Lott agreed to evaluate the work of the commission and of Dr. Lichtman, and even to gather additional data of his own to further extend the analysis....
Dr. Lott ran a series of regressions, varying the specifications in an effort to replicate Dr. Lichtman's results. Using all the variables reported in Appendix I in the majority report, he was unable to find a consistent, statistically significant relationship between the share of voters who were African American and the ballot spoilage rate. He found that the coefficient on the percent of voters who were black was indeed positive, but it was statistically insignificant. The chance that the relationship was real was only 50.3 percent, just about the chance of getting tails to come up on any one coin toss and far below the 95 percent significance level commonly demanded in social science.
Furthermore, when Dr. Lott analyzed the data using a specification that implied that the share of African American voters in a county was significantly related to the level of ballot spoilage, he found that it explained hardly any of the overall variance. Removing race from the equation but leaving in all the other explanatory variables only reduced the amount of ballot spoilage explained by his regression from 73.4 percent to 69.1 percent, a mere 4.3 percentage point reduction.
Indeed, in none of the other specifications provided in Dr. Lott's Table 3 did taking racial information out of the analysis but leaving in other variables reduce by more than 3 percent the amount of variance in the spoiled ballot rate that is explained. Consequently, it simply is not true that the best indicator of whether or not a particular county had a high or low rate of ballot spoilage is its racial composition." (Their emphasis)
(Thernstrom & Redenbaugh, 2001)
Lott ran a series of 8 regression models of ballot spoilage vs. race and several other socio-economic variables including a few that Lichtman had not included. His data, methods and results were presented in a report published in late June 2001 as an appendix to Thernstrom and Redenbaugh's dissent. In July he ran another 8 models that included an expanded set of data and variables. The results of these analyses were included in a revised version of Thernstrom and Redenbaugh's dissent dated July 19, 2001. It is that version that is discussed in this paper.
All 16 models had conceptual and analytical flaws severe enough to invalidate them. The variables used in them were poorly characterized. Some overlapped coverage, impairing their independence. Others that were necessary to test the dissenting claims were omitted. For instance, we saw earlier that Thernstrom and Redenbaugh attributed nearly all of Florida's year 2000 ballot spoilage to low levels of literacy among minorities and high first-time voter levels. Yet of Lott's 8 original runs not one included any of these variables. A literacy variable was added to models 9 through 16. Of these, only one produced a statistically significant correlation of literacy with ballot spoilage under Lott's defined standard of 0.1 (or 10 percent chance of insignificance--the conventional definition in social science fields is 0.01 to 0.05). Education level and first-time voter rates were once again, not included. Lott reports the variables used and regression coefficients obtained for all 16 of his model runs in Table 3 of his final report (Lott, 2001). The square of these regression coefficients, R2, is a direct measure of the degree to which the chosen variables account for ballot spoilage trends. Comparisons of R2 for each paired model run with and without literacy reveals that it explained a maximum of only 2.3 percent of the results (models 1 and 9). In most cases it contributes a few tenths of a percent. Models 2 and 10 are the only pair which purported to show a negative correlation of race with ballot spoilage. In this case literacy contributes a mere 0.3 percent. Thernstrom and Redenbaugh devoted entire sections of their dissent to discussions of literacy, education, and first-time voters and accused Lichtman of not addressing these variables properly. Yet only one of these (literacy) is even included in any of their models, and its contributions to their results are insignificant. Overall, with the literacy variable included Lott's models yield an average R2 of 0.756. In other words, they explain 75.6 percent of the observed ballot spoilage. Compared to the corresponding figure of 0.25 this might seem impressive. Indeed, Thernstrom and Redenbaugh made much of this difference--it was their primary justification for claiming that Lott's models were more reliable than Lichtman's.
Once again, a closer inspection of Lott's choice of variables is revealing. Six of his models (1, 2, 7, 8, 9, 10, and 16) contained redundant variables. Separate "independent" variables were included that controlled for percentage of blacks among registered voters and the corresponding percentage of whites and Hispanics collectively. These groups comprised over 99 percent of Florida's year 2000 registered voters, so changes in either would be almost perfectly mirrored in the other--that is, [x,y] = [x,(1 - x)]. It's not difficult to see how it can destroy the usefulness of a model. Suppose we were to conduct a study of local weather patterns using regression methods. Such a study might include barometric pressure—a strong indicator of weather change for well known thermodynamic reasons. If we were to include two variables—one using data from a standard barometer, and another from a separate barometer located next to it that was identical in every respect except that it used a reversed pressure scale—we could "prove" that barometric pressure was unrelated to weather change. With two "independent” variables that reflected each other perfectly our model would be well correlated, and because changes in the one are guaranteed to cancel out changes in the other it would be surprising if we found that a falling barometer was any indication of impending bad weather. This is what statisticians refer to as multicollinearity, and it effectively destroys the usefulness of a multiple regression model (Hanushek & Jackson, 1977; Weisberg, 1985). Thernstrom and Redenbaugh tell us that,
"In fact, using the variables provided in the report, Dr. Lott was unable to find a consistent, statistically significant relationship between the share of voters who were African American and the ballot spoilage rate. Further, removing race from the equation, but leaving in all the other variables only reduced ballot spoilage rate explained by his regression by a trivial amount. In other words, the best indicator of whether or not a particular county had a high or low rate of ballot spoilage is not its racial composition. Other variables were more important."
(Thernstrom & Redenbaugh, 2001)
Of course they were. Lott double-dipped his racial variables and pre-arranged them to cancel out of the analysis. It would have been surprising if he had reached any other conclusion.
Eliminating the multicollinearity from his models is straightforward--simply remove either of the redundant variables. Models 2 and 10 were the only ones in Lott's ensemble that showed negative correlations of race with ballot spoilage. Lichtman did this for these two models. The result? Both found that race alone explains at least 11 percent of all ballot spoilage at statistically significant levels far better than the 0.01 to 0.05 standard of social science convention and with a 79 percent correlation--better than the originals (Lichtman, 2001b). Lott's remaining models removed the multicollinearity of models 2 and 10 only to reintroduce others. Variables for use of optical scan voting technology and optical scanning by precinct are both present in several of them despite the fact that they overlap considerably (the latter will by definition be a subset of the former).
Lott's results contained a number of red flags that ought to have alerted him to these problems. Models 3 and 11, which have the redundant racial variables removed, do show statistically significant impacts or race on ballot spoilage. Other models failed to yield statistically significant results for variables tracking County Supervisor party affiliation and/or race. More importantly, those that tracked the race of the County Supervisor had problems with their data. This is particularly telling because Thernstrom and Redenbaugh made much of ballot spoilage being higher where there was a Democrat County Supervisor, and even higher where he/she was black.
"Dr. Lott provides a fuller examination of the possible impact of having a Democratic supervisor of elections in his Table 3, and adds another related variable—whether or not the supervisor was African American. Having Democratic officials in charge increases the ballot spoilage rate substantially, and the effect is stronger still when that official is African American. (All African American supervisors of elections are Democrats.) Lott estimates that a 1 percent increase in the black share of voters in counties with Democratic election officials increases the number of spoiled ballots by a striking 135 percent."
(Thernstrom & Redenbaugh, 2001)
The variable they refer to was added to models 14 and 15 in Lott's suite. This variable tracked percentage of registered voters in counties with an African American supervisor, and assigned a zero value to all that did not. An examination of Lott's dataset for this variable reveals that it was based on county records from 2001 immediately prior to his report--not the fall 2000 election. At the time of the dissent publication four of the counties he analyzed had African American County Supervisors. Only one of these did during the election--St. Lucie County, which had a ballot rejection rate of only 0.3 percent. Models 4 and 12 included variables for whether the County Supervisor was Democratic, Republican, and additionally, whether he/she was also black. Lott ran a total of seven models that tracked one or more of these variables. All yielded results that were statistically insignificant at Lott's declared standards (Lott, 2001). The problems don't end here. Lott goes on to draw conclusions from this data based on arithmetically incorrect assumptions. On page 5 of his report we're told that,
"[The] largest effect between the share of voters who are African-American and ballot spoilage rates exists when African-Americans are county election supervisors and a net positive effect also occurs when Democrats are county election supervisors. Because the point estimates need to be added together in evaluating the impact of the percent of voters who are African-American in counties with African-American county election supervisors, the net effect in column 6 for the percent of voters who are African-American and that variable interacted with whether the county election supervisor is African-American is just short of being statistically significant at the 10 percent level (p=.1088). The estimates imply that each one percent increase in the share of voters by African-Americans produces 135 percent more non-voted ballots when the county election supervisors are African-American than when they are of some other race."
(Lott, 2001)
Lott arrives at these conclusions by adding coefficients for the difference between black and non-black ballot spoilage in counties with black supervisors to the corresponding figures for all counties. But these figures are not additive. The former are derived only for counties with black supervisors, which is a subset of the latter figure.
Citing Lott's figure of 135 percent, Thernstrom and Redenbaugh tell us that Democratic County Supervisors had more to do with ballot spoilage than race. The astute reader will also notice that they've misquoted him. His 135 percent figure referred to African-American County Supervisors, not Democrats. They also steered carefully around his admission--clearly stated in the previous sentence--that these results fall short of statistical significance at the 10 percent level, which is quite lenient by accepted social science standards. Either way, the argument regarding race and/or political affiliation of County Supervisors is based entire on statistically insignificant results and a key variable for which there were only 4 data points, 3 of which were flat-out incorrect.
Lichtman also ran a set of revised regression models which were presented as Appendix X to the USCCR Report in August 2001 (Lichtman, 2001b). These runs contained variables controlling for voting technology, poverty, literacy, education level, and voting technology, without multicollinearity. Larger datasets for all variables were used. Additional precinct level data for Broward, Escambia, and Gadsden Counties were included expanding the range of his extreme analyses in demographic and voting technology variables (Lott did not include extreme analysis or any other independent means of checking his own runs). For blacks, Lichtman's original runs yielded statewide ballot spoilage rates of 14.4 percent that were attributable to race alone (Lichtman, 2001). The corresponding precinct level averages were 15 percent. In his revised runs these figures were 14.3 and 14 percent respectively (Lichtman, 2001b). The agreement with the original analysis is striking as is that between the ecological (county level) and extreme (precinct) methods. Furthermore, these runs explained 86.6 percent of the observed ballot spoilage (R2 = 0.866) as compared to Lott's corresponding figure of 74 percent, and without the variable redundancy and dataset problems that plagued his work. No significant impact on ballot spoilage was found for any socioeconomic variables including literacy.
Another analysis was run independently by Philip Klinker of Hamilton College, NY that expanded on Lichtman and Lott's analyses using an even broader range of variables and datasets. Klinker used data from all of Lichtman's cited sources, which as noted earlier, he was able to obtain in machine readable form in minutes from Lichtman's citations alone, contrary to Thernstrom and Redenbaugh's insistence that it was not available. He also used additional literacy data from the Florida Literacy Coalition (FLC, 1992). Like Lichtman's literacy data, it too was based on the 1992 National Adult Literacy Survey but also included data by congressional district and municipality. On page 20 of their dissent Thernstrom and Redenbaugh insisted that national literacy studies "provide no data on Florida specifically." Yet Klinker had no more problem finding this data than I did. Using his cited web source, I located and downloaded it in machine readable form within seconds. Once again we have to wonder how serious Thernstrom and Redenbaugh really were about obtaining the data they insisted were not made available to them. Other variables were included for,
- Voters per precinct (crowded precincts may increase ballot spoilage due to increased wait times and less available voter assistance)
- Increases in voter registration prior to November 2000 (Thernstrom and Redenbaugh claimed that first-time voters were a crucial factor in ballot spoilage and castigated Lichtman for not addressing it, even though Lott neglected it as well)
- County crime rates
- Percent of elderly population
- Percent of population under 25
- Party of the county election supervisor (for which care was taken to insure the accuracy and quantity of data that were required for statistically significant results)
- Percent of population with less than a high school diploma
- Percent of population with at least some college education (not multicollinear with the previous variable)
- Percent of population in rural areas
Top
|
The Far-Right
Issues & Policy
Endangered Species
Property Rights & 'Wise Use'
DDT & Malaria
Terrorism Policy
Neoconservative Media
Astroturfing
Christianity & the Environment
Climate Change
Global Warming Skeptics
The Web of Life
Managing Our Impact
Caring for our Communities
Ted Williams Archive
|