Predicting turnout from Google search data

Political scientists [Jesse Richman][1] and [Erik Voeten][2] wonder if Google Search data can help us predict turnout in next month’s Presidential election. To summarize, Richman finds that searches for “vote” in the past two elections predict voter turnout, and that low search volume for “vote” this year suggests a low turnout election. In response, Voeten shows that controlling for who has come online since 2004 and 2008 explains much of why search volume for “vote” appears small this year. There are lots of reasons search data is pretty tricky for making good inferences, but that shouldn’t keep us from trying!

My thinking is that we can look at the composition of “vote” searches across elections to see whether this year’s vote searches are indicative of greater voter turnout or greater voter demobilization/skepticism.

For instance, “vote” searches containing the terms “where” or “when” are probably more directly predictive of turning out to vote than the whole set of “vote” searches–as they indicate practical planning to turnout. On the other hand, “vote” searches containing the term “why” are probably far less predictive of turnout–and might even predict how many people are thinking “why bother?” and are considering staying home on election day.

So I went back to the data with this in mind. As baseline and as Voeten shows, if we consider voting searches (“vote” and “voting” together) as a proportion of searches for “university”, “science”, and “law”, this election hardly looks very different. So that’s just (vote+voting) / ((university+science+law) / 3). As explained in the posts linked above, search volume data is scaled from 0 to 100 where 100 is the peak of that search term in the given time period, relative to all other search activity.

[][3]And when we look at voting searches also containing the word “why” (as a proportion of the university-science-law index), again we find almost no difference.

[][4]But when we look at voting searches also containing the words “where” and “when” (I take the average of vote+where and vote+when as a proportion of the university-science-law index), more people appear to be asking where and when to vote this year than in the past two Presidential elections. The p-value on the difference of means between 2008 and 2012 is .051.

[][5]Finally, I look at the difference between the why-vote searches and the where/when-vote searches. The gap might be taken as an aggregate measure of how much society is using the web to think about the decision to turn out without planning to turnout, so that positive values might capture demobilization within vote searches (why bother?) and negative values reflect relatively resolved mobilization. And what we find is that this whole election year shows greater “when/where” vote searches than “why” vote searches at the present moment in the timeline but also for the year as a whole. In both of the previous elections there were more “thinking” searches than “planning” searches, but this year there have been more planning searches. As in the previous analyses, these are as a proportion of the university-science-law index, and the difference of means for 2008 and 2012 is statistically significant at the 99.9% confidence level.

[][6]Arguably, this difference variable perhaps already controls for education or political information because each term reflects a lack of knowledge, so it shouldn’t be inflated by less educated/informed people coming online since 2004 and 2008. But the same pattern is evident and statistically significant if we look at this gap as a proportion of all “politics” search volume or just the raw differences of volume. Obviously, these are back-of-the napkin analyses and these data are plagued by difficulties. But these analyses don’t suggest a low-turnout election in 2012. If anything, it looks like the web is registering more vote planning than in previous years.

[Here is the data in .csv form][7]. Here is the [R script][8] to reproduce the analyses found here. [1]: [2]: [3]: [4]: [5]: [6]: [7]: [8]:

Cite this post: RIS Citation BibTeX Entry

Murphy, Justin. 2012. "Predicting turnout from Google search data," (April 24, 2017).