Polling number Company polled date No Yes Undec/Won't Say LL 959 S 7-8 42.9 43.8 13 SOM 1003 8-12 45.0 37.0 18 Compas 959 11-14 40.0 36.0 24 Createc 1004 15-199 46.2 39.8 15 Crop 2020 20-25 47 39 14 SOM 1820 19-25 48 39 13 LL 1006 25-28 45.1 43.8 11 LL 1015 O 1-3 44 43 13 Gallup 1013 10-12 43 39 18 LL 1002 8-12 42 45 13 SOM 981 13-16 43.4 42.9 13.6 LL 1005 16-20 42.3 45.7 12.1
The numbers that make the headlines have the undecideds allocated to either yes or no. There was an excellent discussion of this in the G\&M on October 18 (A5). SOM allocates the undecideds in the percentages observed in the decided voters; about 50-50. Leger and Leger allocates them based on their answers to other questions in the poll and on demographic factors; about 70% to No. This results in a substantial boost to the reported No vote. The poll corresponding to the last line in the above table was reported as a headline: "Yes 50.2, No 49.8, poll suggests". (The poll of size 2020 has a margin of error of approximately 2.2\%; the poll of size 1820 has a margin of error of approximately 2.3\%. )
One way to describe a measurement that varies in a population is to quote the frequency, of each possible measurement in that population. (Think of the students lined up on the lawn of Penn. State, according to their height.) With many types of measurements, as you get a larger and larger population, with frequencies measured on a finer and finer grid, the plot of the frequencies will start to look like a curve of a very predictable form. (Think of getting 5 times as many students lined up by height classes, and taking a photo from an airplane. Then 500 times as many students, seen from the space shuttle...)
The frequency curve has a particular mathematical expression, in standard form.
It is a surprising, but true, fact that a wide variety of potential measurements tend to follow this curve, once the measurements are converted to standard units. It is a theorem, known as the central limit theorem, that the frequency distribution of measurements that are averages will always follow this curve, as the number of things being averaged increases. (With a bit of a stretch, you could imagine a particular person's height being determined by a number of effects that are averaged: genetic makeup, pre-natal nutrition, post-natal nutrition, etc.. IQ's are typically computed by averaging over a number of test items.)
The frequency curve given above has the property that about 2/3 of its mass is contained in the interval (-1,+1) and about 95% of it mass is contained in the interval (-2,+2). In other words, 95% of the measurements, heights, IQ scores, whatever, will fall within 2 standard units of the center point. The center point is called the mean, and the standard unit is called the standard deviation.
In fact, my analogy above to looking at the students lined up by height, from space, doesn't quite work, unless we think of lining up either all females, or all males. Even from space we'd probably see two humps.
Even when measurements do have a frequency distribution that is exactly described by the bell curve, they can vary enough to look surprising. On p.172 of VDQI, Tufte shows 12 sets of 50 measurements from the bell curve. What is plotted is the observed frequency distribution based on each set of 50 measurements.