2016: A Year of Data-Driven Confusion

Building a responsible data approach to practices and comprehension.

by Zara Rahman on December 12th, 2016

Quantified data, statistics and data visualisations often catch our attention more than just words. That much is not new. But this year, the impact of misleading data claims and campaigns led to unexpected consequences in some of the world’s largest economies. In fact, political campaigns over the past 18 months have been characterised by “relentless statistical crossfire”, despite these “statistics” being biased and incomplete, or even completely falsified, with seemingly few consequences.

For example, this year in the United States, pre-election polling data had Clinton winning, right up until the votes started rolling in. Though it turns out that she did win the popular vote, polling data on a state level differed from the actual election outcome. What are the implications if this inaccurate, misleading or uncertain poll data in fact impacted voter behavior?

Visualization of light pollution across the United States taken from space.

Photo CC-BY NASA.

While it will take months to review in detail why the polls differed from reality, some high-level reasoning can be offered already. For one, there are many potential methodology issues that might lead to inaccuracy in election polling, but as Mona Chalabi, US Data Editor at the Guardian, wrote the day after the election, readers probably didn’t want to hear ‘it’s complicated’ as part of a newsroom’s pre-election coverage. Indeed, Chalabi wrote about the many problems and limitations of polling data back in January 2016. But getting journalists to accurately and effectively convey those limitations, and getting consumers to take notice of them and apply the necessary caution and scepticism, remain fundamental challenges.

After all, numbers tend to spread further and stick more in people’s minds, or they can convey a sense of certainty or accuracy often unmerited, known as the fallacy of false precision. In the UK, this phenomenon was manipulated in the run-up to the referendum on whether the UK should leave the European Union.

Image of bus emblazoned with "We send the EU £350 million. Let's fund our NHS instead." Campaigners speak to the media outside of the bus.

Source: InFacts

In the months leading to the referendum, an oft-repeated data point was that the UK sends the European Union £350 million per week. A major campaign was run, implying that this amount could be better spent domestically on the National Health Service (NHS), even though the £350 million figure itself wasn’t, and had never been true. Fact-checking sites debunked it repeatedly for months before the referendum. Journalists wrote about how it wasn’t true, and the UK Statistics Authority – though it had no legal recourse – made multiple statements clarifying that the figure was “misleading” and “undermines trust in official statistics.”.

But these official statements and clarifications were not enough. On the day of the referendum, I encountered multiple people in the UK who pointed to the figure as a reason they were voting to leave the EU, and nothing I said could convince them. One woman I spoke to, describing herself as “not interested in politics”, cited the bogus £350 million claim as the sole reason she voted Leave, even as Brexit campaigners were backtracking from their claims mere hours after the result came in. For those of us who pointed in vain to fact-checkers and analysts who had trawled through the numbers to pull out multiple errors in the calculations, it was incredibly frustrating. And for people who changed their voting decision based on this “data,” the admission came too late. Yet somehow, there have been no repercussions or accountability to date.

Building critical data literacy

As journalists, data practitioners and technologists, we must recognize that while numbers might be a good and quick way of getting to grips with a complex topic, context is crucial, and we have a responsibility to convey that as well. Consider what Tricia Wang has labelled “Thick Data” – “data brought to light using qualitative, ethnographic research methods that uncover people’s emotions, stories, and models of their world.” Thick data can provide crucial context, bringing complexity back into what might otherwise be simplified beyond recognition. Quantification itself can often obscure nuance; for example, much has been discussed in recent months of algorithmic bias, acknowledging that algorithms are trained on data, and many human assumptions, decisions and biases go into building and cleaning those datasets, not to mention the steps and decisions codified in the algorithms themselves. Our technical decisions reflect our in-person, human biases, and the implicit biases within all of us, whether we can recognise them or not.

With this in mind, bringing a diversity of perspective and experience to information intermediary roles is essential. Many of the related fields – journalism, data science, computer science – are heavily dominated by white men of relatively homogenous backgrounds. In contrast, we need to build strong, diverse teams who can challenge each other… and the status quo. We need strong mechanisms for ethical and fair practices within teams and organisations, and a culture where pushing back on conclusions is well-received and seen as a sign of strength, not of defiance. Taking a responsible data approach means looking carefully at the power dynamics at play, in workplaces as well as the broader public space. How are the least powerful actors going to be able to make their opinions heard, and how are they affected by the process in question?

Series of wires spread out with a circuit board in the background.

Photo CC-BY Randall Bruder.

An extension of critical data literacy could also include feminist perspectives on data practices and use. Catherine D’Ignazio and Lauren F. Klein began to explore what feminist data visualization would look like in a recently published paper, explaining that a “feminist approach to data visualization, while centered on design, insists that data, design, and community of use, are inextricably intertwined”. They elaborate further on this in six core principles:

  1. Rethink binaries
  2. Embrace pluralism
  3. Examine power and aspire to empowerment
  4. Consider context
  5. Legitimise embodiment and affect
  6. Make labor visible

Though these principles were constructed with the practice of data visualization in mind, many could be abstracted out to other ways of understanding and working with data; from thinking carefully about consent and a duty of care to people reflected in a particular dataset, to rethinking quantification and prioritising context in analyses.

As a broader field, we must also ask: how do we hold each other accountable for our actions? As we saw in the UK, the National Statistics Authority made numerous statements denouncing the misleading use of statistics in the Leave campaign; what else can be done to accurately inform the public of these falsehoods, and help them make better-informed decisions?

From a consumer perspective, it is more crucial than ever that we learn to be critical of numbers, and to question what we see. We need to improve media literacy education, so that more people will recognise fake news, understand bias and propaganda, and be able to critically evaluate the complex messages we receive from the media. Alongside this, we need to get better at critical data literacy – understanding what the limitations of data are, realising that “statistics” and numbers writ large can be improperly and irresponsibly used to tell only part of a story, and knowing what questions to ask of the data, and when.

Many strategies for building up media literacy focus on tech-driven solutions, like online courses – but perhaps it’s time to look beyond techno-solutionism, and recognise that trust and community are fundamental parts of how we get and analyze information. Especially with marginalised communities who may have lower levels of access to technology, tapping into existing community networks to spread messages of education in a way that is relevant and useful is crucial. For this education to really work, it needs to come from people within those communities, who understand the community’s needs and its context.

Michelle Obama said: “When they go low, we go high”. Loosely paraphrasing that approach to this question, this means being accountable for what we’re saying, absorbing and showing: questioning methodologies, interrogating data provenance, emphasising the limits of what conclusions can be drawn from the data, sharing our sources, and educating ourselves, our peers and our communities.

We must take a responsible data approach to advocacy – address gaps in literacy proactively, be rigorous in our methods, and maintain credibility, especially on important issues. Nowadays, thanks to the speed and amplification of sources afforded to us via technology, analyses and “facts” will spread faster than before. Understanding the critical limitations of data and information is going to become ever more important in years to come.