US Presidential Election 2016: When sentiments ‘trump’ed trends
By : Amit Meher
Senior Manager - Data Sciences
The US election results are out and everyone is stunned by Donald Trump’s unexpected victory over Hillary Clinton. We here at Flytxt, carried out an exploratory analysis on the same.
The main objective of this analysis was to find answers to a few mind boggling questions. Was there a good enough evidence that Trump would win the election? Did the trend analysis on poll data miss anything in predicting the election outcome? Were the Twitter sentiments better indicators of people supporting Trump despite some controversial remarks made by Trump?
Opinion polls failed, but collectively ‘untold’ a story
Trump’s winning the Presidency can be largely attributed to winning the key swing states, namely Florida (29 electoral votes), Pennsylvania (20 electoral votes), Ohio (18 electoral votes), North Carolina (15 electoral votes) and Wisconsin (10 electoral votes). Hence, these states were considered for our post event analysis.
Although opinion polls were inaccurate in predicting the final election outcome, finer trend analysis was performed on poll data to see if it can throw some light on the unexpected favoring of people towards Trump on the Election Day. For this, state wise poll data for around 1 month prior to the Election Day was collected since older data may not be representative of the recent temporal preferences of voters. Poll predictions were aggregated in accordance to their historical performances in past elections, appropriately adjusted using exponential moving average method and linear regression models were learnt to identify linear trends in the poll predictions.
From the above plots, it is quite evident that public support for Hillary followed a downward trend for all these 5 states and for Trump, it had an upward trend for 4 of these key states viz. Florida, Ohio and North Carolina and Pennsylvania. Out of these, Florida, Ohio and North Carolina showed quite significant upward trend for Trump.
Although, final predictions made by opinion polls before the election day favored Hillary overall, the aggregated poll trend shows a consistent upward trend for Trump which could be an explanation of Trump’s winning most popular vote in those states. However, this model could not explain Trump’s win in Wisconsin as it showed a downward trend for Trump in this state.
Candidate’s diffusion in twitter space was the ‘unseen’ influence
Here, we tried to analyse the influence or diffusion of a candidate in the Twitter space. It would be interesting to know the extent to which people supported the ideologies of the candidates before the election took place on November 8, 2016. One of the ways followed by Twitter users to express/extend support to a candidate is by retweeting tweets posted by the candidate itself. More the number of retweets of a candidate’s tweet, higher the diffusion of that candidate is.
Diffusions for both candidates were visualised using an interactive graph visualisation. Here, Top 50 tweets of both Hillary and Trump, having highest retweet counts and posted before the election were considered.
Each of the 50 tweets of a candidate is colored using a different color. The big blue colored node at the center represents the candidate itself and the remaining nodes represent the 50 different tweets posted by the candidate. The retweet count of a node (tweet) is denoted by its size as well as its distance from the center node.
In this setting, a candidate is said to be more diffused in the Twitter space, if the diffusion network is more dispersed, rather than compact. The interactive visualisations for Trump and Hillary are shown below.
From the above visualisations, it is evident that Trump’s network is more diffused as compared to that of Hillary. One of the reasons for this could be the higher number of followers of Trump in Twitter.
Sentiments consolidated to ‘unsure’ votes
Twitter users do not constitute the entire world’s population. However, tweets can be analysed to understand the overall sentiments of Twitter users. Tweets which were collected prior to the election date were only considered.
A sample of tweets, posted within a period of 2 weeks just before the election day were collected and analyzed using NLP (Natural Language Processing) techniques to understand the sentiment of twitter users towards the presidential candidates. Surprisingly, ratio of positive to negative sentiment towards Trump (0.72) was found to be higher than that of Hillary (0.56) which suggests people were supporting Trump more on twitter.
Additionally, state wise sentiment for the 5 mentioned key states are as follows.
For all states except Wisconsin, the ratio of positive to the negative sentiment for Trump is significantly more than that of Hillary which shows that the Twitter users were supporting Trump more than Clinton in these states. Moreover, users’ sentiment in Ohio was more positive than negative towards Trump as indicated by a ratio of more than 1 for Trump in this state.
Identifying the topics or themes of discussion prevailing in tweets as well as users’ sentiment towards a topic would help to understand how users think about a candidate on specific issues. A probabilistic algorithm for topic modelling called LDA (Latent Dirichlet Allocation) was utilised to infer 5 broad topics like election outcome apprehension and speculations, statewise opinion poll results by news channels and media, Wikileaks issues and FBI’s investigation on Hillary, discussions around election rallies and negative sentiments about both the candidates from the tweets.
Following visualisation shows the topic wise sentiment for the topics.
All the topics have higher ratio of positive to the negative sentiment for Trump as compared to Clinton which indicates that the Twitter users are more critical of Hillary than Trump in all of these topics.
Whatever said and done, the US has got its newest President to be sworn into office on January 20, 2017. It was clear that trend analysis gave way to sentiment analysis when it came to predicting the likely outcome of this elections, arguably for the first time in the history of US elections. There were many things which were unseen, unsure and untold as what our post event analysis showed. Probably that explains why sentiment ruled outcome especially with so many swing states in the balance.