>>
>>
>>
Econometric Modeling as Junk Science
In addition to linearity, multiple regression assumes that each variable is "normally" distributed about all the others in a classic bell-curve pattern. This means that most cases should be clustered around the average within each category, with few at the extremes. Often the data violates this assumption in major ways that lead to completely erroneous results. A good example is John Lott's data on gun control.
Lott collected a massive data set that he generously made available to other researchers. Unfortunately, he did not begin by graphing his data, perhaps because he had so much of it. But it is always a good idea to begin an analysis with graphs so as to see the trends before they are obscured by all the statistical adjustments. If one cannot graph everything, it is still worthwhile to graph some representative cases. So, using Lott's data, I plotted trends in murder rates for a number of counties where "shall issue" laws had gone into effect during the period covered by his study. Before "shall issue" laws were passed, local officials had discretion in granting permits to carry concealed weapons. After they were passed, they had to issue a permit to any law abiding adult who wanted one. If Lott's hypothesis were correct, we would expect to see the murder rate go down once the laws were passed.
The following graph shows trends in murder rates for the largest counties in several states that adopted "shall issue" laws between 1977 and 1992. The date at which the laws went into effect varied from state to state. Before reading further, the reader may find it interesting to examine the graph and try to infer when the law took effect in each county.
Examining the graph, we see that the pattern in Missoula County, Montana, appears to be quite erratic, with very sharp declines in the murder rate in 1979 and 1991. This, however, is not a real phenomenon, but the result of one of the adjustments Lott made to compensate for the non-normality of his data. Instead of using the actual numbers, he converted his numbers to natural logarithms. This is a common practice, since natural logarithms often fit the assumptions of multiple regression better than the actual data. The number in John Lott’s data file for Missoula County in those years is -2.30. This is odd, since a County’s murder rate cannot go below zero, unless previously murdered people are brought back to life. No such luck, however. To get the actual murder rate in each county, one has to invert the logarithms in Lott’s data set with the formula true rate = elogarithmic rate, where e = 2.71828. This can be done easily with the ex button on a scientific calculator. Entering -2.3 in such a calculator and pushing the ex button yields .100, or one tenth of a murder per 100,000 population. Actually, the true figure for murders in Missoula County in 1979 and 1991 was zero. Lott used .1 instead of zero because the natural logarithm of zero is mathematically undefined, so leaving it at zero would have created missing data. There are a great many -2.3’s in his data files on murder, because many of the counties are quite small, much smaller than Missoula with 81,904 people in 1992.
The distribution of murder rates in American counties is not at all close to the bell-shaped normal curve. There are great many small counties with few or no murders, and a few quite large ones with a great many. Converting the data to natural logarithms is one way of minimizing the statistical effects of non-normal distributions, but it can introduce other distortions as we see in this case.
Leaving aside the distortions in Missoula County caused by the conversion of the data to natural logarithms, the trends in these counties are quite smooth. There is no apparent effect from the introduction of "shall issue" laws in Missoula County in 1991, in Fulton County (Atlanta, Georgia) in 1990, in Hinds County (Jackson, Mississippi) in 1990, in Fairfax County (Fairfax, Virginia) in 1988 and Kanawha County (Charleston, West Virginia) in 1989.
One might ask, why are we dealing with these medium sized counties instead of major population centers? This was my first clue to the fundamental flaw in Lott's argument. My first inclination was to graph the trends in America's largest cities, because that's where the homicide problem is most severe. I immediately discovered that none of these cities had a "shall issue" law. The "shall issue" laws were put into effect primarily in states with low population density. This meant that Lott's data did not meet the fundamental assumptions for a regression analysis. To work properly, multiple regression requires that the "shall issue" variable be normally distributed throughout the data set. The mathematical calculations used to "control" for spurious relationships can't work if there is not a sufficient range of variation in the key variables. This was the "smoking gun" hidden in Lott's mass of tables and sophisticated equations. At no point in the book did he acknowledge this fact. When I asked him about this, he shrugged it off. He didn't did not see it as a problem, since he "controlled" for population size.
These irregularities in the data completely invalidated Lott's analysis. It took two years before Ayres and Donohue (1999) verified this in an econometric analysis, but Zimring and Hawkins zeroed in immediately on the problem in 1997. Having studied gun control legislation, they knew that "shall issue" laws were instituted in states where the National Rifle Association was powerful, largely in the South, the West and in rural regions. These were states that already had few restrictions on guns. They observed that this legislative history frustrates (Zimring and Hawkins 1997: 50) "our capacity to compare trends in 'shall issue' states with trends in other states. Because the states that changed legislation are different in location and constitution from the states that did not, comparisons across legislative categories will always risk confusing demographic and regional influences with the behavioral impact of different legal regimes." Zimring and Hawkins (1977: 51) further observed that:
“Lott and Mustard are, of course, aware of this problem. Their solution, a standard econometric technique, is to build a statistical model that will control for all the differences between Idaho and New York City that influence homicide and crime rates, other than the "shall issue" laws. If one can "specify" the major influences on homicide, rape, burglary, and auto theft in our model, then we can eliminate the influence of these factors on the different trends. Lott and Mustard build models that estimate the effects of demographic data, economic data, and criminal punishment on various offenses. These models are the ultimate in statistical home cooking in that they are created for this data set by these authors and only tested on the data that will be used in the evaluation of the right-to-carry impacts.”
Top
|