US Elections prediction: How we beat the odds

After the initial shock of a win that almost no one predicted, it is time to reflect three months later on the validity of the methodology and assumptions of standard approaches in opinion polling. Lately, we had many blatant examples of the weaknesses of such approaches applied to political elections, where the outcomes were not even close to the ones anticipated: Brexit, US Presidential Election, Left-wing and Right-wing primaries of the French Presidential Election…

What is even more surprising is that even some sophisticated approaches trying to harness Big Data and Statistics, such as the one developed by Nate Silver, the American analyst behind the opinion poll analysis blog FiveThirtyEight, failed to anticipate these upsets. So, where did they fail?

Finding the right data sources and samples

One of the most obvious reasons could be data source. For example, traditional opinion polls rely on direct responses from samples of populations. Depending on the collection method, sample selection, wording of questions or bias correction, results of such methods proved to be unreliable.

New approaches try to take advantage of the opportunities offered by the rise of new sets of data, such as Social Media Data. Over the past few years, social media became a popular medium for political discussions, and a lot of research studies lately have been focusing on the analysis of comments from Twitter or Facebook to compute better estimators of voters’ opinion.

Many players have been building predictive models for political elections powered by Natural Language Processing Analytics (NLP), whose purpose is to automatically derive meaning from textual data, such as tweets from example.

Going beyond the comments

Despite being one of the favored types of algorithms for political predictions, NLP predictive models on Twitter messages historically have mixed results and often fail to perform better than opinion polls. We believe that it is because they mostly focus on the obvious – the comments – and forget about the hidden part of the iceberg – the people.

Elections are all about people: their opinions, their aspirations, their choices, their interests… Only focusing on text analytics when trying to develop predictive models is not enough to deliver a correct and precise prediction. For this reason, QuantCube Technology decided to develop over the past year Profile Analytics algorithms. Indeed, in order to make the best predictions, the challenge now is not to analyze what we can see and read but, most of all, what is not said or shown.

By doing in-depth profiling of all individuals interacting on social media, blogs, forums or news discussions, we have been able to classify potential voters with high precision and even anticipate the behavior of indecisive persons, only by using public data and leveraging on the massive amount of raw information accessible with this digital era. While intuitive, this kind of approach requires extensive knowledge of the science and mathematics behind online networks, and especially Graph theory.

Application to US Presidential Election

Applying our algorithms to the last US Presidential Election, we have been able to identify that indecisive people were mostly leaning towards Donald Trump, as the end of the election drew closer, making him for the first time since we started our analytics the favorite to win the election, with a probability of 50.4%.

Whether the reason for this sudden shift could be linked to the reopening of the FBI case on Hillary’s emails remains uncertain, but what became clear at this point was that the risk of seeing Donald Trump win this election was clearly underestimated by everyone. French speakers can read this article published four days before the elections in Option Finance on Nov 4 2016, where we are actually forecasting this result.

Applying the same models, we also managed to successfully predict 1 month later the result of the much-debated Italian referendum, with a clear win of the “No” vote based on our analytics, for what rapidly transition from a national referendum on constitution to a vote of confidence for the government of M. Renzi. We were also the only startup that managed to predict correctly the 1st and 2nd round of the French Presidential Elections that took place last month, using alternative sets of data.