Your browser is out of date. Some of the content on this site will not work properly as a result.
Upgrade your browser for a faster, better, and safer web experience.

As you like it

On Saturday 17th March Chris Wylie revealed his face and shock of colourful hair to the world, announcing himself as the man who worked with a “little-known data company” called Cambridge Analytica to create a “psychological warfare tool” capable of swinging elections. In the whistleblower’s interview with The Observer he explained how Cambridge Analytica “exploited Facebook to harvest millions of people’s profiles and built models to exploit what we knew about them and target their inner demons. That was the basis the entire company was built on.” He claimed that the company “broke” Facebook. His revelations certainly broke Cambridge Analytica.

The fallout from Wylie’s interview was seismic. The US Federal Trade Commission opened an investigation into Facebook’s privacy protections, the CEOs for Cambridge Analytica and Facebook, Alexander Nix and Mark Zuckerberg respectively, were summoned to testify before US Congress and the UK’s Information Commissioner’s Office was granted a warrant to search Cambridge Analytica’s servers in London.

Secret recordings surfaced in which Nix and other Cambridge Analytica executives boasted of using targeted ads to swing election campaigns around the world and of bringing Donald Trump to power. On 2nd May, Cambridge Analytica, which maintained that it had done nothing wrong, announced that it was shutting down due to mounting legal fees and a “siege” of negative media coverage.

I had been investigating Cambridge Analytica for almost a year before Wylie went public. I have also been researching and writing about the algorithms that control our lives: conducting experiments of my own, reading the scientific literature and talking to the algorithms’ creators. The problems I found with the company’s supposed psychological warfare tool lead me to conclude that Cambridge Analytica wasted the Trump campaign’s time and money. Trump won the presidential election regardless of the personality algorithm, not because of it.

“Wylie revealed that he worked to create a ‘psychological warfare tool’ capable of swinging elections”

The Cambridge Analytica story has all the ingredients of a modern conspiracy thriller. It involves Donald Trump, data security, the psychology of personality, Facebook, underpaid Mechanical Turk workers, big data, Cambridge University academics, right-wing populist Steve Bannon (who used to sit on Cambridge Analytica’s board), secretive financier Robert Mercer and one-time national security advisor Michael Flynn who pops up as a consultant. When they make the movie, I can imagine Jesse Eisenberg playing a psychologist who gradually uncovers the true motives of the company he works for: to manipulate our every emotion for political ends.

Following the 8th November 2016 US presidential election the front page of Cambridge Analytica’s website featured a montage of clips from CNN, CBSN, Bloomberg and Sky News all announcing how its data-driven campaign had been instrumental in Trump’s victory. The snippets focused on how Cambridge Analytica had used targeted online marketing and micro-level polling data to influence voters. The film ended with a quote from the US political pollster Frank Luntz: “There are no longer any experts except Cambridge Analytica. They were Trump’s digital team who figured out how to win.”

When Alexander Nix presented his company’s research at the Concordia Summit in New York in September 2016, he talked about how, instead of targeting people on the basis of socio-economic background, as Barack Obama had done in his campaign, Cambridge Analytica could “predict the personality of every single adult in the United States of America”. He planned to use the ‘Big Five’, or OCEAN, model of personality, which classifies individuals in terms of five dimensions: openness, conscientiousness, extraversion, agreeableness and neuroticism. Highly neurotic and conscientious voters could be targeted with the message that the “Second Amendment is an insurance policy”. Traditional, agreeable voters could be told about how “the right to bear arms is important to hand down from father to son”. Nix claimed that he could use “hundreds and thousands of individual data points on audiences to understand exactly which messages are going to appeal to which audiences”.

To carry out such targeting on our political personalities, Cambridge Analytica would need a lot of data. In 2014 psychologist Alex Kogan, a researcher at Cambridge University, was collecting data for his scientific studies through an online crowdsourcing marketplace called Mechanical Turk. Kogan described Mechanical Turk to me as “a big pool of folks who’ll do tasks in exchange for cash”. For his scientific study, he was asking them to complete a seemingly inconsequential job: they answered two questions about their income and how long they had been on Facebook, and then clicked a button that gave Kogan and his colleagues consent to access their Facebook profile.

The study was a dramatic demonstration of how willing people are to allow researchers access to their and their friends’ Facebook data. It was also, at that time, surprisingly easy for researchers to access data on the social network site. By getting permission from the Mechanical Turk workers, it was also possible to access the location and the ‘likes’ of their friends. Eighty percent of people volunteering for Kogan’s study provided access to their profile and their friends’ location data in exchange for $1. The workers had, on average, 353 friends. With just 857 participants, Kogan and his colleagues gained access to a total of 287,739 people’s data. This is the power of the social network: collecting data from a small number of people gives researchers access to the data of a vast network of friends.

It was at this point that Kogan started talking to representatives for SCL, a group of companies that provide political and military analysis for clients throughout the world. Initially, SCL was interested in Kogan helping with questionnaire design. But when the company’s representatives realised the power of data collection on Mechanical Turk, the discussion turned to the possibility of accessing vast quantities of Facebook personality data. SCL was poised to set up a political consultancy service, which would become Cambridge Analytica in 2013, to use personality predictions to help its clients win elections.

Kogan appeared to have exactly the approach to data collection they needed. Kogan later admitted to me that he had been naive. He had never worked with a private company before, having been in academia throughout his undergraduate degree at Berkeley, his PhD in Hong Kong and now his research position at Cambridge. “I didn’t appreciate how business is done,” he told me.

Kogan says that he and his colleagues considered the ethical aspects and the risks of working with Cambridge Analytica, making sure they separated the data collection from their university research work.

But what he hadn’t considered was other people’s feelings and perceptions when they heard about the Facebook data collection. “It is pretty ironic, if you think about it,” he says. “A lot of what I study is emotions and I think if we had thought about whether people might feel weird or icky about us making personality predictions about them, then we would have made a different decision.”

What Kogan was doing had been done before, and on a much larger scale. Kogan had been attempting to replicate the work of his colleague Michal Kosinski, who as a PhD student in Cambridge had collected an even bigger data set of online personalities. Via an app called myPersonality, over three million people gave Kosinski and his colleagues permission to access and store their Facebook profiles. Many of these people then took a battery of psychometric tests, measuring intelligence, personality and happiness, and answered questions about sexual orientation, drug use and other aspects of their lifestyle.

Kosinski used this data to show that the so-called ‘Big Five’ personality traits could be predicted by the things we like on Facebook. He found that outgoing people on Facebook like dancing, theatre and beer pong; shy people like role-playing games and Terry Pratchett books; neurotic people like Kurt Cobain and emo music; and calm people like skydiving, football and business administration. Facebook ‘likes’ could also predict which way they are likely to vote.

“Someone who liked Lady Gaga and Starbucks was more likely to be a Republican”

Kosinski used a statistical technique called regression to convert the ‘likes’ people had applied on Facebook into numerical predictions about their personality and political persuasion. He saw something superhuman in the accuracy of his regression models. He believes that humans think about other people in just a small number of dimensions – such as age, race, gender and, if we know them a bit better, personality – while algorithms are already processing billions of data points and making classifications in hundreds of dimensions. “We are better than computers at doing insignificant things that we, for some reason, think are very important, like walking around,” he tells me. “But computers can do other intelligent tasks that we can never do.”

In Kosinski’s view, his work was the first step towards creating a computerised, high-dimensional understanding of human personality that outperforms the understanding we currently have of ourselves. This was exactly the psychological warfare tool that Nix and Cambridge Analytica aimed to create.

I was sceptical about the claims made by Nix and Kosinski’s interpretation of his regression analysis. I wanted to better understand the power of personality algorithms for myself. I don’t have access to the data used by Cambridge Analytica, but Kosinski and his colleagues have created a tutorial package to allow others to create regression models on an anonymised database of 20,000 Facebook users. I downloaded the package and used the data to fit a regression model with ‘likes’ as inputs and then tested the model’s performance.

The first results of my analysis were unsurprising. I found that Democrats like Barack Obama and The Colbert Report and that many Republicans like George W Bush and the Bible. So I took some of the obvious ‘likes’ out of the model. What was surprising, however, was that the model worked with 85 percent accuracy using a combination of other likes to determine political affiliations. For example, someone who liked Lady Gaga, Starbucks and country music was more likely to be a Republican, but a Lady Gaga fan who also liked Alicia Keys and Harry Potter was more likely to be a Democrat. This type of information could be very useful to a political party. Instead of Democrats focusing a campaign purely around traditional liberal media, they could focus on getting the vote out among Harry Potter fans. Republicans could target people who like to drink Starbucks coffee and people who go camping. Lady Gaga fans should be treated with caution by both sides.

While Facebook ‘likes’ might help identify Cambridge Analytica identify hardcore Democrats and Republicans, they do not allow personalities to be targeted. To do this, the algorithm would need to reliably identify neurotic or compassionate people from their ‘likes’. The data set I used included the results of a personality test that measured the ‘Big Five’ personality traits. I used this to test whether a regression model really could determine which individual, out of a randomly selected pair, was most neurotic. It simply couldn’t. I picked two people from the data set at random and looked at their neuroticism scores from the personality test they had performed. I compared these scores to a regression model based on Facebook ‘likes’. The personality test and the regression model produced the same rankings for these pairs in only 60 percent of cases. If I had set scores at random, I would have been correct 50 per cent of the time. The model was only slightly better
than random.

The regression model was a bit better at classifying people in terms of their openness: it was correct about two-thirds of the time. But when I performed the same test on extraversion, conscientiousness and agreeableness I got similar results as for neuroticism: the model got it right six times out of 10, compared with the five times out of 10 we would expect if we assigned people at random.

Even to get this low level of accuracy, a Facebook user has to have ‘liked’ more than 50 things. There are still a lot of people, myself included, who don’t ‘like’ very much on Facebook. I ‘like’ a grand total of four pages. No matter how good a regression technique is, without data a model can’t work.

Another problem with ‘personality algorithms’ lies in how they can be applied. The neuroticism of Kurt Cobain fans, identified by Kosinski’s work, is very different from the neuroticism of a gun owner set on protecting his family, who Nix envisaged targeting in advertising campaigns. Showing an advert about the Second Amendment to a Nirvana fan may well prove counterproductive. Taken together, these limitations make it all but impossible to create a tool for identifying and manipulating
us psychologically.

“Both Wylie and Nix had a lot to gain by playing up the power of personality algorithms”

I emailed my results to Alex Kogan and found that he had reached similar conclusions to my own after working with the data he collected. He no longer believes that an algorithm that effectively classifies people’s personality can be produced. He told me that although aspects of personality could be measured from our digital footprint, the signal wasn’t strong enough to make reliable predictions about us. He was blunt about Alexander Nix. Kogan says that Nix was ultimately less interested in the reliability of the personality algorithm than in presenting the concept to clients. “He was trying to promote [the algorithm] because he had a strong financial incentive to tell a story about how Cambridge Analytica had a secret weapon.”

In the weeks after Wylie’s revelations, I was in daily contact with Kogan. He felt scapegoated, especially when Facebook, with whom he says he had previously had a good working relationship, accused him of lying in a public statement the company published online. He was hounded by journalists asking him, in all seriousness, if he was a Russian spy. And despite all these accusations, he didn’t have a lawyer, even at the height of the scandal. My impression of Kogan was that he was out of his depth. When he spoke to journalists he would focus on academic details of the personality algorithm, instead of either staying silent (as lawyers would have advised him to do) or addressing the questions they had about how he came to be working with Cambridge Analytica.

My own impression was that both Wylie and Nix had, in very different ways, a lot to gain by playing up the power of personality algorithms. As far as many news outlets were concerned, the story that these algorithms didn’t work was less interesting than the narrative of manipulation. Slowly but surely, however, the details of what Cambridge Analytica had actually achieved started to leak out. In May 2018, Wylie did a further interview with the Guardian. Over the phone, apparently during a hectic train journey, Wylie explained how their algorithm worked. His account is almost identical to a description of Michal Kosinski’s research work: using regression to predict personalities on the basis of Facebook ‘likes’ and then applying these models to predict the personalities of other voters.

Wylie’s account also highlighted a serious limitation to Cambridge Analytica’s data: wealthy American women, who perhaps had the time and inclination to fill in long personality questionnaires, were significantly over-represented in Cambridge Analytica’s ‘seed data’ – the detailed psychological information supplied by approximately 32,000 respondents which the company than used to extrapolate its conclusions about the behaviour of a much larger set of Facebook users. Other groups, such as African-American men, were under-represented.

For me Cambridge Analytica’s success and failure are just one part of a bigger story about how we don’t always respond rationally to the power of data. In the course of my research into algorithms I’ve found that buzzwords – like  echo chambers, filter bubbles, fake news, automated troll bots and artificial intelligence – are often used in sensationalist stories which end up being
highly misleading.

As Alex Kogan told me, “The real problem with Cambridge Analytica, and everyone working in the industry knows this, is that this sort of shit simply doesn’t work.”

David Sumpter is professor of applied mathematics at the University of Uppsala, Sweden, and the author of ‘Outnumbered: From Facebook and Google to Fake News and Filter-bubbles – the Algorithms That Control Our Lives’ published by Bloomsbury at £16.99

A slower, more reflective type of journalism”
Creative Review

Jam-packed with information... a counterpoint to the speedy news feeds we've grown accustomed to”
Creative Review

A leisurely (and contrary) look backwards over the previous three months”
The Telegraph

Quality, intelligence and inspiration: the trilogy that drives the makers of Delayed Gratification”
El Mundo

Refreshing... parries the rush of 24-hour news with 'slow journalism'”
The Telegraph

A very cool magazine... It's like if Greenland Sharks made a newspaper”
Qi podcast

The UK's second-best magazine” Ian Hislop
Editor, Private Eye
Private Eye Magazine

Perhaps we could all get used to this Delayed idea...”
BBC Radio 4 - Today Programme