Big Data and Elections

(Picture Credit:

By Era Kraja, Pablo de la Mora, Shuqing Zhao, Taylor Jarrett  

Nowadays, retrieving information regarding people’s virtual interactions, commercial or intellectual transactions, feelings, expressions is becoming easier due to the worldwide usage of internet. Big data collected from user search has enabled researchers to predict the outcome of presidential elections from general public mood– in 2012 alone, 22% of registered voters let their friends know who they planned to vote for via social media, such as Facebook and Twitter (Rainie, 2012).

Nevertheless, big data analysis often fails to prove its trustworthiness and efficiency, as can be seen from wrong predictions to major political events such as Brexit, and the more recent case of Colombia, when the citizens voted against the peace agreement with the military group FARQ.

But it is totally understandable why so many experts are willing to spend efforts analyzing gigabits of data before the election day–even when they aren’t a hundred percent sure whether the data is correct. Traditional ways of polling, be it going from door to door or calling up random numbers from a telephone book, have fallen out of the trend, though they might prove more accurate than online data. Meanwhile, the number of social media users has grown from 1.4 to 2.34 billion within the past 4 years, and is expected to keep its pace (eMarketer, 2016). Ever more politicians value social media as an important battle field in their campaign.

Big data analysis in this particular field plays two major roles: predicting and marketing. However astonishing the false prediction of 2016 may be, many were equally surprised four years ago, by the accuracy of big data analysis in the Obama-Romney race.

Past glory of big data analytics



More than 10 million tweets were posted during the first general-election debate between Barack Obama and Republican nominee Mitt Romney (Heim, 2015). The “second screen” phenomenon not only created a huge stream of spontaneous information for ordinary spectators, but also left data analysts with plenty of raw material to work with.

The 2012 presidential election indeed made a celebrity out of Nate Silver, the data cruncher. His prediction of the election outcome was so accurate throughout the campaign that in the months, even years following the 2012 election, many political and social science scholars felt encouraged to delve deeper into the new field of big data analytics. Scholars and politicians eagerly deliberated on the tactful analysis given by Silver; numerous writings emerged on this topic, attempting to better the existing analytical methods.

By analyzing vocabulary frequency in Twitter posts and their annotations, the Obama campaign team made smart strategic moves on many key issues. Especially after the disastrous Hurricane Sandy, Obama was able to react according to public sentiments (Tsou et al, 2013). More importantly, from the large-scale analysis they had carried out, Obama’s team knew better of the social and geographical distribution of potential voters, so they had a huge advantage in targeting voters and allocating resources.

How it worked in 2016 US Presidential Election

The 2016 presidential primaries of both the Republican and Democratic parties utilized various techniques of online and telephone polling to generate preferences among a plethora of potential candidates. Unlike the polling for the general election, which overwhelmingly focuses on a choice between two major candidates, party primaries are unique in that candidates seek to obtain the highest percentage of support amongst a multifaceted selection of politicians. Subsequently, polling data proved useful both in determining the winners of perspective primaries and identifying potential areas of weakness for the eventual nominees.

The immediate beginning of the Republican presidential primary polls show former Gov. Jeb Bush commanding a considerable advantage over other potential candidates, including the eventual winner Donald Trump. Bush’s lead quickly evaporated, however, following various statements by Donald Trump that appealed to the more extreme wings of the Republican Party. Trump overtook the lead for the first time on July 16, 2015, merely a week before he uttered the famous phrase “Donald J. Trump calls for a complete shutdown on Muslim immigration.” Another week after this phrase, in a pool of over 15 official candidates, Trump achieved a commanding 30% of preference in RealClearPolitics’ average of major scientific polls. Donald Trump was therefore able to accurately predict which statements were either inflammatory or popular enough to generate attention from the Republican base, a tactic which was reinforced by his unique status as a political outsider unbound to the financial and political restraints that plagued the top competitors, Jeb Bush and Ted Cruz.

The Democratic presidential primary polls also offer significant insight into both the changes happening within the Democratic Party and the shortcomings of the eventual nominee. Former Secretary of State Hillary Clinton started what was forecast to be an easy primary campaign rivaled by a newcomer on the American presidential stage, Sen. Bernie Sanders of Vermont. Sanders’ socialist leanings appeal to the fringes of the Democratic Party and this, coupled with the international recognition of former First Lady Hillary Clinton, are speculated to have given Sec. Clinton the 25+ point edge over Sanders during the first week of polling.

Mirroring the anti-establishment theme that propelled Donald Trump to the forefront of the Republican nomination, Clinton’s popularity plummeted to a mere 1 point lead over Sanders during the early weeks of April. Sen. Bernie Sanders’ rapid and exponential rise in the polls over Sec. Clinton is extraordinary when examining the pre-election comparison of the two Democratic candidates, and when fame and political influence are accounted for, it’s easy to determine why Sec. Hillary Clinton failed to win the presidency against political newcomer Donald Trump. The former Secretary of State relied heavily on her former publicity, and a frank examination of the polling average by RealClearPolitics shows that a critical mass of voters indeed preferred the message of Bernie Sanders.

It can be concluded that electronic polling can convey information that far surpasses a mere summary of support for a candidate. Candidates such as Donald Trump can use the data extracted from polling to tailor their campaigns toward victory. And even after such a victory, as the case of Hillary Clinton reveals, potential weaknesses can be identified from this data.

…And what went wrong?



All candidates in the 2016 US Presidential Election relied on social media to stay in daily contact with their followers, and to respond to popular feedback. Using the data gathered from social media and online polls, major forecasters such as the New York Times Upshot, FiveThirtyEight, and the Princeton election consortium made predictions–though vastly inaccurate–of the election results. These forecasters predicted a minimum of 70% and maximum of 99% chance that Hillary Clinton would win the election, which proved seriously wrong.

One reason big data analysis may show wrong results lies in details such as neglecting the margin of error. Once the elections have taken place, and the results come out, the 10% or 30% that stated that he would win were correct (Singer, New York Times). Another factor that makes social media and polls data unreliable is that people have no incentive or reason to say the truth or even to answer at all (due to convenient sampling). A big problem in the 2016 US presidential election was the lack of seriousness of people, especially the younger generation, in their political engagement.

The future of big data in politics

The usage of big data has come a long way in the recent decades. Evidence suggests that using and analyzing big data, specifically data gathered from Twitter or other social platforms, can at times be the most helpful advising tool to nominees, insinuating strategies or behaviors which might lead to the inflation of the ranks of followers, as research by Tsou et al (2013) demonstrates. Analysis of big data is also usually accurate in its predictions, as was the case 4 years ago, when the majority of polls foresaw the victory of Barack Obama.

However, the reliability of big data has come at stake since the results of the recent presidential election in the United States. The numerous analyses manifested the inability of big data to correctly foresee who the 45th president of the United States of America would be, forecasting a minimum of 70% chance that the Democratic nominee, Hillary Clinton would win these elections. One of the possible reasons for this failure is the nature of polls itself–participants are selected because of their convenient accessibility and availability, and are therefore unrepresentative of the entire population. A different reason can be the absence of margins of error, which state the level of likelihood that the surveys results are close to what the analysis would conclude in case of a population census.

A better approach for big data analysis would be using revealed preference and the things that people have actually done in the past or are doing in the present, since big data relies on real time analysis (Tech.Post.Inst., 2016).




eMarketer. (2016). Number of social media users worldwide from 2010 to 2020 (in billions). Retrieved 28 Nov. 2016 from

Heim, K. (April 15, 2015). Live tweeting a presidential primary debate. Retrieved 28 Nov. 2016 from

Maycotte, H.O. “Will Big Data Determine Our Next President?”. N.p., 2015. Retrieved 28 Nov. 2016 from

Tsou, M., Yang, J., Lusher, D., Han, S., Spitzberg, B., Gawron, J. M., Gupta, D. & An, L. (2013). Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election, Cartography and Geographic Information Science, 40:4, 337-348.

Lohr, Steve, and Natasha Singer. How Data Failed Us in Calling an Election. The New York Times. The New York Times, 10 Nov. 2016. Retrieved 28 Nov. 2016 from

Bliccathemes. “Election Predictions: A Failure of Bad Data, Not Big Data | The Technology Policy Institute.” The Technology Policy Institute. N.p., n.d. Retrieved 28 Nov. 2016 from

Rainie, Lee. “Social Media And Voting”. Pew Research Center: Internet, Science & Tech. N.p., 2012. Retrieved 4 Oct. 2016 from

Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012, July). A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations (pp. 115-120). Association for Computational Linguistics.

One thought on “Big Data and Elections

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s