Can Social Media Data Alone be Used to Predict Election Results and Consumer Behaviour?

Jul 22, 2024

Forecasts for national elections in the UK still rely heavily on poll data but is that the most accurate way to predict future events such as election results?

Can elections be predicted using social media data alone and what is the best way to analyse the data and produce the most accurate forecast?

Challenges of Predicting Elections Using Twitter

Apart from polling data, one of the main ways of predicting election results was traditionally through the use of data from platforms such as Twitter.

Engagement on Twitter was commonly used to extrapolate voting intention and attempt to predict the overall vote share of a particular party.

However, there are several factors with this method that may make the results less reliable, open to bias and therefore not a fair test.

Analysis of Twitter data for predicting elections is often conducted in quite a rudimentary way. The basic principle will be that if one particular candidate has say 60% of the overall posts related to them, they will get 60% of the vote share.

Sentiment analysis can be used to ascertain if the post is in favour or against the candidate. This will be covered in more detail later.

One of the issues with Twitter is that it does not necessarily reflect a representative sample of the population. 

A study in 2019 showed that the majority of Twitter users were between 16–34 years old, more highly educated, reasonably wealthy in terms of disposable income, and predominantly Democrat voters in the US.

However since Elon Musk's takeover of Twitter and rebranding of the platform as “X”, the bias has likely shifted in the opposite direction, with clear evidence of this from studies.

Pew Research Center found that Republican-leaning Twitter users who thought the platform was bad for democracy had dropped from 60% in 2021 to only 21% in 2023.

Republican Twitter users who believe the site is good for democracy have gone up from 17% to 43%. This shows a clear trend in the increase in positive views of Twitter among Republican voters.

The study shows however that Democrat views have shifted in the opposite direction with 47% of Democrat Twitter users saying the site was good for democracy back in 2021, down to only 24% in 2023.

As the overall opinion of the site shifts towards the positive among Republican voters, it would be reasonably safe to assume there would be an increase in the overall number of Republican voters using the site, as more of this demographic views the platform as a useful tool for democracy.

Therefore, the issue with biassed sample data still remains in 2024 although this has more than likely switched completely over from being heavily biassed towards Democrat users and election candidates, to now being biassed towards Republican views.

Further Limitations of Twitter as a Means to Predict Election Data

Apart from the biassed sample data, there are a number of limiting factors that make Twitter less reliable as a means of predicting elections and customer behaviour.

The overall mechanics of the platform that dictate what content the user sees and what will not be visible to that particular user, makes the site something of an echo chamber. It is preaching to the converted scenario where users are fed posts that align with their views and similar posts they have interacted with. 

For example, if a user likes or retweets a post from say the Green Party, there will be a higher chance of another similar post popping up in the user's feed and before long, they will be sent down an avenue of environmental policies and Green Party literature. 

Whilst this may not seem a particularly disastrous consequence for society, if the material “echoed” in the echo chamber relates to malicious or extremist views, then more significant issues can occur and the sample data is further diluted in terms of its accuracy.

In this way, it could be argued that Twitter contributes to the polarisation of political debates, making each user more polarised towards their own political views, with these sentiments reinforced daily whilst opposing views are pushed away until invisible to the user.

Of course, the same can be said for most social media and they normally operate in a similar way, although the user experience on sites such as Facebook and TikTok will perhaps offer a broader spectrum of views, with the relationship between the users and the algorithms that dictate the nature of their experience being somewhat different.

The research evidence shows that Twitter was becoming less reliable as an accurate means to predict election data in 2023, meaning that in 2024 the platform is likely even less relevant with the popularisation of other sites such as TikTok being used to influence political opinion, especially among younger sections of the populace.

Using TikTok to Predict Election Results

Normally, a report carried out and published in the last year or two would be seen as quite a recent and relevant study. However, in the sphere of social media, the data can shift quite quickly from one year to the next.

As new platforms emerge, the current favourite among younger demographics may be completely replaced by another provider. If a study into social media trends on a particular platform takes three or four years to complete, there’s a good chance that the results will be irrelevant by the time the report is published.

A lot of emphasis has been placed on researching Twitter data in the past although researchers are looking more towards sites such as TikTok recently, due to their growing popularity.

Having said that, just because TikTok is currently popular with younger voters, this does not mean it will necessarily still be the case in two or three years. The site could be banned in Western democracies over fears of Chinese influence or some other new platform could simply take over as the new favourite.

This does not however impact the overall effectiveness of social media data in predicting elections; it just means that researchers and political campaigners need to frequently reset their sights to ensure they are getting data from the most accurate and representative sample, looking at the social media platforms that are being primarily used by the voters they wish to target or study. 

A study by ACM looked at the impact of TikTok on the elections in Brazil in 2022. At this time political figures were only just starting to use the platform for campaigning, in comparison to the more widespread use today in 2024, but the conclusions from the report are still relevant and applicable to current elections.

The study considered how popularity and engagement on TikTok related to actual election results, based on data from around 600 posts from the candidates themselves and approximately 10 million interactions from users. 

The report showed two significant findings:

Firstly, the candidate who was ultimately elected and became president received 55% of the total interactions on TikTok, suggesting that the overall election result could have been predicted from this statistic alone.

Perhaps even more significant, however, is the strange correlation between TikTok likes and votes received.

The study by ACM found that not only did the TikTok results signal the winner of the election, but the number of likes received by each of the election candidates was exactly the same as the vote share they received.

So in this instance, you could not only predict the election winner using TikTok stats but you could also have predicted the exact percentage of vote share for each candidate. You just need to know where to look.

Can Google Trends be Used to Predict Customer Behaviour and Election Results?

Using Google Trends to make future predictions is a slightly different approach as the service does not reveal absolute numbers of searches but provides data on the interest in a search term relative to another.

In this way, it provides a comparison of the popularity of two things. So you could, for example, enter Coca-Cola and Pepsi as the two items and find their relative popularity in various regions of the world, over a specified amount of time.

In the same sense, you can enter your two search terms “Trump/Harris” or “Sunak/Starmer”, for example, as a means of establishing the overall popularity of each individual relative to the other, in a country of your choosing such as the UK.

In fact, this is exactly what researchers at the Journal of Big Data did in order to predict the outcomes of several multi-party elections in Germany, focusing on the 2009, 2013, 2017, and 2021 elections that could be accurately forecasted using Google Trends.

Similar studies were also carried out relating to the US, Canadian, Spanish and Greek elections where the results were successfully forecast using Google Trends.

Google Trends has also been used to predict other outcomes such as the spread of influenza, unemployment claims and corporate sales.

What is Sentiment Analysis?

Sentiment analysis is a relatively new method of predicting elections and future events that attaches more meaning to the data than can be represented in the numbers alone (for example, the number of posts about an election candidate).

Using natural language processing and machine learning, sentiment analysis is a way of looking at the actual opinions expressed in online conversations across a wide number of sources and extracting this information to gauge things like public opinion and election outcomes. 

So instead of just looking at the number of posts about election candidates, with sentiment analysis, it is possible to differentiate between positive and negative opinions across all online conversations and news articles studied, then aggregate this data to essentially find out who is most frequently referred to in a positive light. 

If 70% of the things being said about a particular person are positive, there’s a good chance they will win an election against someone who only has 10% good remarks and 90% negative comments.

Of course, the numbers would not normally be as pronounced as these figures, but that is the basic principle underpinning sentiment analysis to predict elections.

Predicting Elections Using Organic Social Media Data 

Of course, you do not necessarily need to use a particularly complicated method to discover hidden information about future events such as elections.

If you know where to look, the clues are there, surrounded by all the other data. 

The important thing is identifying the data that is most representative as a sample of the total population, revealing an indicator for the whole country or data set you are looking at.

Prior to the 2024 general election, Sky News ran a short piece looking at organic social media interaction across all the main parties.

Thankfully there was an accompanying article so you can see the graphs and statistics being referred to in this section.

Using CrowdTangle, a social media engagement tracking tool, Sky News were able to demonstrate that while being the least well-funded and organised, the far-right Reform UK party had the highest amount of organic social media engagement 

Whilst Labour and Conservatives were spending the most, Reform were getting the most interactions online.

This could be something of a Marmite factor where the extreme views presented elicit more of a strong reaction, they either love it or hate it. However, as discussed previously this can be drilled into more deeply.

Is it Possible to Predict Elections Based on Emotes Only?

Looking at the amount of interactions each party gets does not necessarily tell you the full story.

If you are counting likes and ‘loves’ then this would indicate positive sentiment of course, but if a political party has received several thousand interactions and they are all frowns, angry faces or thumbs down type emotes, then this would probably not result in an ultimate election win.

Drilling further into the social media data, Sky News ranked all the posts by the amount of ‘love’ responses they had received and against all odds, Nigel Farage came out on top.

Of course, this kind of data mining can be tricky when accounting for the British sense of humour as the second most ‘loved’ post was from Rishi Sunak talking about how you didn't need university to succeed in life.

As this was coupled with the most laughing emotes on any post, this would suggest that these are more ‘sarcastic loves’ with the irony not lost on most users, with comments such as “yes just marry a billionaire like you did”, etc.

Of course, these types of ironic love emotes are a nightmare for data analysts and show how even a relatively obvious indicator can be perceived in different ways, with various meanings attached.

Using Social Media Data to Amend or Replace Polling Data

Based on a small amount of information contained in one bar graph researchers at Lyon Tech were able to forecast that the polling data would be inaccurate for the expected seats for Reform UK from the most recent YouGov poll stating one seat expected.

It was predicted that this total number of seats would be out by a factor of around four or five and there would be higher than expected vote share overall for the Reform party.

Following the election results, the forecast was proved to be accurate, even more so than the main polling companies, and Reform did in fact win five seats with a significantly larger percentage share of the vote than expected.

Could these methods in data analysis of social media interactions completely replace traditional polling as a means to forecast future elections?

As the process becomes more refined and campaigning becomes increasingly based online, there's a significant chance that these relatively new techniques of analysing social media data will become the only method involved, with traditional polls rendered obsolete within five years.

Keep Up to Date on the Latest News in PR, Marketing, Online Campaigns and Data Analysis

At Lyon Tech, we provide the technology and information needed for data-driven industries to remain competitive.

Whether that is the latest updates on big data techniques that will give your business the edge, or state-of-the-art cloud-based technology and unlimited data storage.

In a data-driven economy, it is important to be able to handle large data sets with accuracy and efficiency. For our clients in industries relying heavily on data processing and analysis, we provide the technology and solutions needed to facilitate large transfers of information with ease.

At Lyon Tech we provide

  • Remote working infrastructure 

  • Fully managed cloud solutions

  • Cybersecurity monitoring and response

  • Unlimited data through remote data centres

  • Seamless integration with your existing systems

  • Virtual workstations and virtual desktops

  • Live infrastructure monitoring

For further details on data analysis techniques and the technology needed to process big data, visit our news pages where you can find further information, or get in touch with us directly and talk to our expert advisors.