Wednesday, November 9, 2011
Polling bias among local pollsters?
Lew from Kiwipolitco stumbled across an interesting analysis of Australian pollsters, looking at whether the methods employed by particular polling companies might lead to biases in the numbers they produce .
As that post notes, it's impossible to tell whether any particular pollster is showing a bias with respect to the real levels of support for each party, because we only find out what the whole population is thinking when we have an election. We can, however, see if a particular polling company is pulling in a different direction than others. Since I've had about all the thesis-editing I can stand for one day, I decided to see if the data I put together for those charts from this morning could tell us anything about polling bias among New Zealand pollsters.
Obviously, the first step is decide on how to find the "average" number against which all polls should be compared. I went with a local regression which just fits a smoothed line through all the points. From there you can measure how far a single estimate is from the smoothed value at the same date*:
Even if all the companies are perfectly sampling the population, we'd expect estimates from single polls to show a good deal of scatter, just because polls are estimates of the population value and come with uncertainty. Indeed, there are dots all over the place in that graph. By collating all those differences between a point and the smoothed lines for the same date we can see if any of the companies are consistently finding higher or lower estimates of a party's support:
And the answer is... sort of. It seems TV3 tends to get National a little higher and Labour a little lower than the rest of the pollsters, and perhaps the Herald goes the other way. I think trying to gain anything more meaningful than that from these results is probably the statistical equivalent of reading tea leaves, but feel free to stare at the graph and confirm your own political biases!
*You should note, this approach introduces biases of its own and you should take these results with several grains of salt. In particular Roy Morgan polls a lot more often than the other companies, so that company's polling will contribute more towards the smoothed line than others, so it's no surprise they show only small biases.
If you are interested in trends in polling bias, I have these differences plotted over time here. As always, I have the raw data here so let me know if you want a csv file to play with.
Obviously, the first step is decide on how to find the "average" number against which all polls should be compared. I went with a local regression which just fits a smoothed line through all the points. From there you can measure how far a single estimate is from the smoothed value at the same date*:
Even if all the companies are perfectly sampling the population, we'd expect estimates from single polls to show a good deal of scatter, just because polls are estimates of the population value and come with uncertainty. Indeed, there are dots all over the place in that graph. By collating all those differences between a point and the smoothed lines for the same date we can see if any of the companies are consistently finding higher or lower estimates of a party's support:
And the answer is... sort of. It seems TV3 tends to get National a little higher and Labour a little lower than the rest of the pollsters, and perhaps the Herald goes the other way. I think trying to gain anything more meaningful than that from these results is probably the statistical equivalent of reading tea leaves, but feel free to stare at the graph and confirm your own political biases!
*You should note, this approach introduces biases of its own and you should take these results with several grains of salt. In particular Roy Morgan polls a lot more often than the other companies, so that company's polling will contribute more towards the smoothed line than others, so it's no surprise they show only small biases.
If you are interested in trends in polling bias, I have these differences plotted over time here. As always, I have the raw data here so let me know if you want a csv file to play with.
Labels: election, pretty-data, sci-blogs