The Politics of Digitization

History, archives and the problem with treating every problem as a tech problem.

by Misty De Meo on February 3rd, 2014

This is a story of how tech can’t always save the world.

A large statue of Sir Arthur Doughty, taken to show the statue's full profile. Doughty sits on a chair, wearing a long flowing robe and dress shoes. The chair is on top of a large, decorative square cement pedestal, with this quote inscribed on the side: Of all national assets, archives are the most precious: they are the gift of one generation to another and the extent of our care of them marks the extent of our civilization. The Canadian Archives and Its Activities, Arthur G. Doughty. A paper sign is curled around his feet, placed by archivists and supporters, that reads: Save the National Archival Development Program.

Photos of the Sir Arthur Doughty statue at Library and Archives Canada. Doughty was the first Dominion Archivist of Canada. Photo by Amanda Hill

In 2012, the national archives of Canada (Library and Archives Canada) announced sweeping budget cuts as a part of its modernization initiative. With the ostensive goal of refocusing the organization on preparation for a digital future, cuts have included:

The effect of these cuts has been to greatly reduce physical access to Library and Archives Canada’s physical collections, and to shift its access focus to digitized content.

A crowd of 40 or so people gathers around the statue of Arthur Doughty, most facing the statue and gathered in a semi-circle. Some hold signs, but the words are facing away from the camera. A few sit on a picnic table situated near the the statue.

Archivists and supporters gather at Sir Arthur Doughty’s statue to pay their respects. Photo by Amanda Hill and available on Flickr.

This has done little to assuage upset researchers; only 0.5% of the organization’s archival content is digitized and no clear roadmap exists to make up for this. In the future this digitization program is expected to follow the government’s commemorative agenda, which is likely to provide access to a litany of war records and other cultural touchstones, and precious little else.

Indeed, work on War of 1812 digitization projects was ongoing at the same time the organization was stonewalling the Indian Residential Schools Truth and Reconciliation Commission’s requests for access to government records in order to uncover and document the abuses of aboriginal students in Canada’s residential school system from the late 19th century to the mid-20th century. The organization’s goal to modernize seems to be taking the form of disregarding its own vast collections in favour of the small subset it can present digitally.

How did we get here? What led an organization to deliberately endanger its own mission in the name of change?

Inside Archives: Community Importance and the Politics of Access

Before we go any further, it’s important to understand what archives are. Where libraries are primarily collections of published works, in various formats, archives are collections of records – the documents amassed by persons or organizations in the course of their ordinary lives and business. Unlike libraries, archival collections are primarily made up of unpublished records and have an enormous diversity of formats. Both libraries and archives are valuable to researchers, but they play different roles. Archives provide researchers with direct access to primary research material, for instance the records produced by the subjects of their research, while libraries provide access to the research of others. Records often also serve secondary purposes as well; for instance, historic diaries kept by explorers over many years can yield insights into climate change hundreds of years ago.

Aside from academic research, archival records are also critical for marginalized groups to shine a light on the injustices they’ve suffered and to provide evidence with which to seek reparations. In Canada, for instance, the Indian Residential Schools Truth and Reconciliation Commission has relied heavily on access to archived governmental records in order to uncover the truth about Canada’s residential school system. These boarding schools, operated from the 19th century to the mid 20th century, were used for cultural assimilation and to deliberately break family ties and weaken aboriginal communities. By combining archival research into government documents with survivor testimonies, the Truth and Reconciliation Commission documented the many abuses of the schools and their shockingly high mortality rates. Because archival collections can be so varied and enormous, providing access to them is challenging. Most people, when they think of archives, picture researchers wandering through dusty hallways with shelves piled high with assorted boxes—unseen since they were last left there in some unknown age of the past.

Actual research is usually much less romantic; archival collections are so huge and diverse that most researchers would be hard-pressed to dig through enormous unsorted collections and find hidden gems. Indeed, many of the well-known “found in the archives” stories are about items that are already publicly catalogued that just hadn’t been researched yet, and only sometimes about discoveries in unprocessed or uncataloged collections.

Instead, access begins with the laborious process of processing and cataloguing collections. Collections are assessed for their evidentiary value and condition, rehoused, and then catalogued in order to make their contents discoverable by researchers. This cataloguing isn’t like a library catalogue, which provides a reasonably detailed description of each book or publication; instead, it’s designed to provide an overview of the collection and its general contents, along with some high-level information about the contents of each container of objects. Even without providing item-by-item descriptions, processing and cataloguing an archival collection is time-consuming and very dependent on the availability of archivists.

Once a collection is catalogued and its description available in physical and digital forms, researchers need to be able to actually access the collections in order to do their research. Some researchers, if they’re lucky, might be close to the archives that hold the material they want to study. Many need to gather funding to be able to travel for their research. If it comes to it, researchers will go on transatlantic journeys just to get to the documents they need. An American professor of French literature, for example, would likely have to travel to France in order to access letters and other documents necessary to do original research.

The Backlog Crisis

It’s here that digitization, and microfilm before it, has traditionally been a big help to researchers. There’s massive public benefits to providing collections to a greater population by being able to disseminate them worldwide. And the benefit to individual researchers of being able to consult a wider variety of collections without travel is immense. I can think of dozens of amazing digitized archival collections of enormous value: from Library and Archives Canada’s immigration records, to the University of British Columbia’s Chung Collection, to the University of North Texas’s delightfully oddball collection of photographs of the remains of school lunches. Some of these collections are of wider interest than others, but each of these is an absolute goldmine to researchers. The value of making these available and discoverable to the world can’t be underestimated.

A large photo of a row of books from the Prelinger Library. The books are focused on California and bay-area, New York City, Oregon and midwest US history and culture. Titles include How to Live in the Country, Machine Age in the Hills, The Rural Life Problem of the United States, Under my Elm, The Countryside Ideal, and Broken Heartland.

Titles from row one of the Prelinger Library, an independent research library located in San Francisco’s South-of-Market neighborhood.

So far, all sounds good. The real problem comes when money enters the equation.

Archives are perpetually underfunded. It’s very hard for archives (especially smaller archives) to find regular funding to continue operating, and extraordinarily difficult to hire new permanent, professional archivists to deal with the volume of incoming records. Perpetual processing backlogs are extraordinarily common in archives and, as Greene and Meissner note in their seminal paper More Product, Less Process, it’s not unheard of for a lack of processing to mean turning researchers away. They note that in 2004, a quarter of users of archives had been denied access to unprocessed records. This number sounds high, but in fact they note that only 44% of surveyed archives explicitly permit access to unprocessed collections. As archives keep taking in more material than they can process, both physical and digital, backlogs continue to grow—even in the face of “more product, less process” techniques.

The bottleneck here is human resources. Archives face dwindling budgets even in the face of their ever-growing backlogs, so the cycle of debt continues. Hiring full-time workers to help pay down the backlog is challenging. Indeed, in an era where budget cuts bring about actual closures of government archives, surely other archives must feel lucky just to exist! The loss of Canada’s National Archival Development Program, a $1.7 million program that funded programs at small archives across the country, almost seems like a pinprick in comparison.

Limitations of Technology in Solving the Archive Problem

It’s here where we find there’s a limit to the ability of technology to help. Processing records is simply a manual task, and arrangement and description isn’t feasible to automate—especially with older, handwritten records, where optical character recognition is impossible. Text-mining and automated analysis techniques, promising for digital archives, can’t meaningfully be applied to heterogeneous analogue collections. Arranging and describing archives requires trained archivists (real, breathing humans), and this is expensive. Even where detailed cataloguing is not written, archivists need to be able to examine the materials carefully to understand the creator of the records and ensure that their processes, and the contexts in which the records were created, are clear to the researchers who will access the collections. Archives try to fill the gaps with internships, student labour, and contracts, but the cure is worse than the disease as archives erode the long-term future of their own workers to try and make ends meet.

This, for many archives, is the problem with a focus on digitization as the primary means of access for analogue records. When many archives are hard-pressed to keep up with the incoming load, and hiring permanent staff members is hindered by perpetual budget crushes, it’s easy for digitization programs to consume disproportionate amounts of budget. Facing the crunch, maybe an archive can’t afford that new employee, or has to cut reading room hours.

Declining budgets can affect which records the public can access. After all, if there isn’t enough money to digitize everything – and there never is – it’s tempting to choose based on popularity. Cutting off physical access to records while tying digitization to commemoration has the effect of narrowing the reach of researchers; this especially risks to erase the histories of marginalized groups while privileging well-documented pop history such as wars. Worse, a digitization-first funding model doesn’t help archives deal with their backlogs: a collection must be arranged and described before it can usefully be digitized.

Some archives have experimented with private partnerships, but these are dicey outside of particular subject niches. Most records simply aren’t of enough commercial value to pay for the enormous cost of paying to digitize them in bulk. When it comes to commercial digitization, the market has spoken: there’s money to be made from a small subset of archival records (especially those serving genealogists), and from publications, but the commercial market simply isn’t a solution for most archives.

Not Every Problem is a Tech Problem

After all this, I must sound anti-digitization. Far from it—I’m a strong believer in the power of digitization to help disseminate archival collections, and I’ve worked tirelessly at my last several jobs to help build digitization technology to bring costs down and make digitization techniques more accessible. I’ve contributed to hardware that brings fast digitization within reach of lower-budget institutions; shared knowledge and worked on digitization educational materials; and automated postprocessing tasks to keep digitization in reach of tiny archives.

I believe strongly in digitization. What I’m afraid of is tech utopianism that trusts the concept of technology more than it trusts archivists. I’m afraid of decisions made to bolster high-tech visions without an understanding of the problems being solved. Sometimes, not every problem is a tech problem.

Get Involved

If you’re interested in learning about archives and what archivists do, try visiting an archives near you. Check for local community archives. Most areas have community archives or historical societies who are in need of help; local government and university archives are also open to the public. Many archives can use donations or volunteers; if you want to make a difference, you can help by getting involved. Smaller archives can often use tech help, but don’t limit yourself to tech volunteering: you might be needed just as much with fundraising, outreach, administrative tasks, etc.

A few of the archives and libraries that inspire me are: