Crowdsourcing, Open Data and Precarious Labour

Crowdsourcing and microtransactions are two halves of the same coin: they both mark new stages in the continuing devaluation of labour.

by Allana Mayer on February 24th, 2016

The cultural heritage industries (libraries, archives, museums, and galleries, often collectively called GLAMs) like to consider themselves the tech industry’s little siblings. We’re working to develop things like Linked Open Data, a decentralized network of collaboratively-improved descriptive metadata; we’re building our own open-source tech to make our catalogues and collections more useful; we’re pushing scholarly publishing out from behind paywalls and into open-access platforms; we’re driving innovations in accessible tech.   

We’re only different in a few ways. One, we’re a distinctly feminized set of professions, which comes with a large set of internally- and externally-imposed assumptions. Two, we rely very heavily on volunteer labour, and not just in the internship-and-exposure vein: often retirees and non-primary wage-earners are the people we “couldn’t do without.” Three, the underlying narrative of a “helping” profession essentially a social service can push us to ignore the first two distinctions, while driving ourselves to perform more and expect less.

I suppose the major way we’re different is that tech doesn’t acknowledge us, treat us with respect, build things for us, or partner with us, unless they need a philanthropic opportunity. Although, when some ingenue autodidact bootstraps himself up to a billion-dollar IPO, there’s a good chance he’s been educating himself using our free resources. Regardless, I imagine a few of the issues true in GLAMs are also true in tech culture, especially in regards to labour and how it’s compensated.


Notecards in a filing drawer: old-fashioned means of recording metadata.

Photo CC-BY Mace Ojala.

Here’s an example. One of the latest trends is crowdsourcing: admitting we don’t have all the answers, and letting users suggest some metadata for our records. (Not to be confused with crowdfunding.) The biggest example of this is Flickr Commons: the Library of Congress partnered with Yahoo! to publish thousands of images that had somehow ended up in the LOC’s collection without identifying information. Flickr users were invited to tag pictures with their own keywords or suggest descriptions using comments.

Many orphaned works (content whose copyright status is unclear) found their way conclusively out into the public domain (or back into copyright) this way. Other popular crowdsourcing models include gamification, transcription of handwritten documents (which can’t be done with Optical Character Recognition), or proofreading OCR output on digitized texts. The most-discussed side benefits of such projects include the PR campaign that raises general awareness about the organization, and a “lifting of the curtain” on our descriptive mechanisms.

The problem with crowdsourcing is that it’s been conclusively proven not to function in the way we imagine it does: a handful of users end up contributing massive amounts of labour, while the majority of those signed up might do a few tasks and then disappear. Seven users in the “Transcribe Bentham” project contributed to 70% of the manuscripts completed; 10 “power-taggers” did the lion’s share of the Flickr Commons’ image-identification work. The function of the distributed digital model of volunteerism is that those users won’t be compensated, even though many came to regard their accomplishments as full-time jobs.

It’s not what you’re thinking: many of these contributors already had full-time jobs, likely ones that allowed them time to mess around on the Internet during working hours. Many were subject-matter experts, such as the vintage-machinery hobbyist who created entire datasets of machine-specific terminology in the form of image tags. (By the way, we have a cute name for this: “folksonomy,” a user-built taxonomy. Nothing like reducing unpaid labour to a deeply colonial ascription of communalism.) In this way, we don’t have precisely the free-labour-for-exposure/project-experience problem the tech industry has; it’s not our internships that are the problem. We’ve moved past that, treating even our volunteer labour as a series of microtransactions. Nobody’s getting even the dubious benefit of job-shadowing, first-hand looks at business practices, or networking. We’ve completely obfuscated our own means of production. People who submit metadata or transcriptions don’t even have a means of seeing how the institution reviews and ingests their work, and often, to see how their work ultimately benefits the public.

All this really says to me is: we could’ve hired subject experts to consult, and given them a living wage to do so, instead of building platforms to dehumanize labour. It also means our systems rely on privilege, and will undoubtedly contain and promote content with a privileged bias, as Wikipedia does. (And hey, even Wikipedia contributions can sometimes result in paid Wikipedian-in-Residence jobs.)

For example, the Library of Congress’s classification and subject headings have long collected books about the genocide of First Nations peoples during the colonization of North America under terms such as “first contact,” “discovery and exploration,” “race relations,” and “government relations.” No “subjugation,” “cultural genocide,” “extermination,” “abuse,” or even “racism” in sight. Also, the term “homosexuality” redirected people to “sexual perversion” up until the 1970s. Our patrons are disrespected and marginalized in the very organization of our knowledge.

If libraries continue on with their veneer of passive and objective authorities that offer free access to all knowledge, this underlying bias will continue to propagate subconsciously. As in Mechanical Turk, being “slightly more diverse than we used to be” doesn’t get us any points, nor does it assure anyone that our labour isn’t coming from countries with long-exploited workers.

Labor and Compensation

Rows and rows of books in a library, on vast curving shelves.

Photo CC-BY Samantha Marx.

I also want to draw parallels between the free labour of crowdsourcing and the free labour offered in civic hackathons or open-data contests. Specifically, I’d argue that open-data projects are less (but still definitely) abusive to their volunteers, because at least those volunteers have a portfolio object or other deliverable to show for their work. They often work in groups and get to network, whereas heritage crowdsourcers work in isolation.

There’s also the potential for converting open-data projects to something monetizable: for example, a Toronto-specific bike-route app can easily be reconfigured for other cities and sold; while the Toronto version stays free under the terms of the civic initiative, freemium options can be added. The volunteers who supply thousands of transcriptions or tags can’t usually download their own datasets and convert them into something portfolio-worthy, let alone sellable. Those data are useless without their digital objects, and those digital objects still belong to the museum or library.

Crowdsourcing and microtransactions are two halves of the same coin: they both mark new stages in the continuing devaluation of labour, and they both enable misuse and abuse of people who increasingly find themselves with few alternatives. If we’re not offering these people jobs, reference letters, training, performance reviews, a “foot in the door” (cronyist as that is), or even acknowledgement by name, what impetus do they have to contribute? As with Wikipedia, I think the intrinsic motivation for many people to supply us with free labour is one of two things: either they love being right, or they’ve been convinced by the feel-good rhetoric that they’re adding to the net good of the world. Of course, trained librarians, archivists, and museum workers have fallen sway to the conflation of labour and identity, too, but we expect to be paid for it.

As in tech, stereotypes and PR obfuscate labour in cultural heritage. For tech, an entrepreneurial spirit and a tendency to buck traditional thinking; for GLAMs, a passion for public service and opening up access to treasures ancient and modern. Of course, tech celebrates the autodidactic dropout; in GLAMs, you need a masters. Period. Maybe two. And entry-level jobs in GLAMs require one or more years of experience, across the board.

When library and archives students go into massive student debt, they’re rarely apprised of the constant shortfall of funding for government-agency positions, nor do they get told how much work is done by volunteers (and, consequently, how much of the job is monitoring and babysitting said volunteers). And they’re not trained with enough technological competency to sysadmin anything, let alone build a platform that pulls crowdsourced data into an authoritative record. The costs of commissioning these platforms aren’t yet being made public, but I bet paying subject experts for their hourly labour would be cheaper.


I’ve tried my hand at many of the crowdsourcing and gamifying interfaces I’m here to critique. I’ve never been caught up in the “passion” ascribed to those super-volunteers who deliver huge amounts of work. But I can tally up other ways I contribute to this problem: I volunteer for scholarly tasks such as peer-reviewing, committee work, and travelling on my own dime to present. I did an unpaid internship without receiving class credit. I’ve put my research behind a paywall. I’m complicit in the established practices of the industry, which sits uneasily between academic and social work: neither of those spheres have ever been profit-generators, and have always used their codified altruism as ways to finagle more labour for less money.

It’s easy to suggest that we outlaw crowdsourced volunteer work, and outlaw microtransactions on Fiverr and MTurk, just as the easy answer would be to outlaw Uber and Lyft for divorcing administration from labour standards. Ideally, we’d make it illegal for technology to wade between workers and fair compensation.

But that’s not going to happen, so we need alternatives. Just as unpaid internships are being eliminated ad-hoc through corporate pledges, rather than being prohibited region-by-region, we need pledges from cultural-heritage institutions that they will pay for labour where possible, and offer concrete incentives to volunteer or intern otherwise. Budgets may be shrinking, but that’s no reason not to compensate people at least through resume and portfolio entries. The best template we’ve got so far is the Society of American Archivists’ volunteer best practices, which includes “adequate training and supervision” provisions, which I interpret to mean outlawing microtransactions entirely. The Citizen Science Alliance, similarly, insists on “concrete outcomes” for its crowdsourcing projects, to “never waste the time of volunteers.” It’s vague, but it’s something.

We can boycott and publicly shame those organizations that promote these projects as fun ways to volunteer, and lobby them to instead seek out subject experts for more significant collaboration. We’ve seen a few efforts to shame job-posters for unicorn requirements and pathetic salaries, but they’ve flagged without productive alternatives to blind rage.

There are plenty more band-aid solutions. Groups like Shatter The Ceiling offer cash to women of colour who take unpaid internships. GLAM-specific internship awards are relatively common, but could: be bigger, focus on diverse applicants who need extra support, and have eligibility requirements that don’t exclude people who most need them (such as part-time students, who are often working full-time to put themselves through school). Better yet, we can build a tech platform that enables paid work, or at least meaningful volunteer projects. We need nationalized or non-profit recruiting systems (a digital “volunteer bureau”) that matches subject experts with the institutions that need their help. One that doesn’t take a cut from every transaction, or reinforce power imbalances, the way Uber does. GLAMs might even find ways to combine projects, so that one person’s work can benefit multiple institutions.

GLAMs could use plenty of other help, too: feedback from UX designers on our catalogue interfaces, helpful tools, customization of our vendor platforms, even turning libraries into Tor relays or exits. The open-source community seems to be looking for ways to contribute meaningful volunteer labour to grateful non-profits; this would be a good start.

What’s most important is that cultural heritage preserves the ostensible benefits of crowdsourcing – opening our collections and processes up for scrutiny, and admitting the limits of our knowledge – without the exploitative labour practices. Just like in tech, a few more glimpses behind the curtain wouldn’t go astray. But it would require deeper cultural shifts, not least in the self-perceptions of GLAM workers: away from overprotective stewards of information, constantly threatened by dwindling budgets and unfamiliar technologies, and towards facilitators, participants in the communities whose histories we hold.