Fivefiftyeight: Aggregating polls: rating the pollsters (Part II)

The aggregation of polls should consider two basic principles. First, older polls should weigh less than recent ones; and second, good pollsters and good polls should weigh more than pollster with a bad record of success in previous elections, or questionable “cooking” methods. There are many ways in which one can apply these two principles. Some examples include the ThreeHundredEight.com web site in Canada and Nate Silver’s latest methodology for pollster rating (version 5).

In this post we present a process to rank the quality of pollster and to estimate the house effect although it is not exactly what we are going to use in order to aggregate the polls. To rate the quality of pollsters we could use 1,340 electoral polls conducted since 1997 until 2011. The sample includes two types of elections: Congressional and Europeans. The poll’s date (necessary to calculate the days separating the poll from the election day) is obtained as the average date the poll was in the field for polls conducted during the last three months before the election. For older polls we use the latest day of the field. The selection of polls follows the principles stated in the previous post: avoid “partisan” polls; avoid Internet-based polls; restrict tracking polls to non-overlapping dates.

To assess the predictive accuracy of the polls we could use a simple measure: the difference, in absolute value, between the poll’s margin of victory of the top two finishers in the election minus the actual election result. This measure is sometimes called “Mosteller P5 index”.

In our sample the historical mean difference between the forecast margin of victory and the actual margin is 6.2, with the worst pollsters averaging 10.3 and the best 1.5. Obviously, this crude difference is not a good indicator. Some elections are more difficult to forecast than others, and pollsters may not be conducting polls for all types of elections. To clean up the influence of factors not related directly to pollster quality we can run a regression of the predictive accuracy of the margin of victory on poll’s sample size/sampling error, the number of days between the poll and the election day, the type of election (Congressional or European), the particular election under consideration (using dummies to clean up the average difficulty of a particular election) and the identity of the pollster (dummies), which will contain the indicator of its quality given the other factors. This regression leads to very sensible results: the error is higher the older is the poll; the 2004 was a very difficult election to predict and the European Elections were easier to predict than Congressional election. The only surprising fact is the lack of explanatory power of the sampling error of the polls conditional on all the other factors.

Once the quality of the pollsters has been normalized to add up to 0 you get the scores of the following table (a negative score means better than the total average). The grading corresponds to probabilities in a normal distribution ("grading on the curve").

Pollster	Average Error	Grade
GAD3	-1.82	A+
Opina	-1.71	A+
Opinion 2000	-1.64	A+
Simple Logica	-1.32	A
Sondaxe	-1.15	A
La Vanguardia	-1.15	A
NC-Report	-0.94	A
GESOP	-0.78	A
Sigma Dos	-0.64	B+
Metroscopia	-0.57	B+
TNS-Demoscopia	-0.52	B+
Gallup	-0.51	B+
Vox Publica	-0.32	B+
DYM	-0.02	B
GETS	0.04	B
Demoscopia	0.19	B
Ipsos-Eco	0.31	B
CIS	0.33	B
Invymark	0.58	B
Metra-6	0.97	C
Instituto Noxa	1.12	C
ASEP	1.46	C
Celeste-Tel	1.52	C
Obradoiro Socioloxia	1.82	D
Demometrica	1.99	D
Intereconomia	2.77	D

Another important effect that should be corrected when aggregating polls is the house effect. This term refers to a pollster’s results compare against other pollsters. For instance, if a pollster had the Popular Party clearly ahead when the others pollsters produce a tied race with the Socialist party, then the pollster has a Popular Party house effect. When estimating house effects it is important to consider the time at which each poll is conducted since sentiment could change over time and, if a pollster conducts many polls in the period of time when one party is clearly favorite, and then it reduces the frequency when parties are tied it may look like that pollster exhibits a strong house effect. To avoid this problem we could run a regression of the difference between the Popular Party and the Socialist Party on pollsters’ dummies and the time up to the election. If we do that for the polls leading to the 2015 Congressional election we find a large PSOE house effect for GETS and a significant PP house effect for NC Report and DYM.

2.12.15

Aggregating polls: rating the pollsters (Part II)

UPF Department of Economics and Business

Barcelona GSE