Questions of ethics at Web Archives 2015

Published by Nicholas Taylor on 17 December 2015

A welcome complement to the lately growing number of web archiving-specific events, the inaugural Web Archives: Capture, Curate, Analyze conference (tweet stream) brought together an eclectic crowd of researchers, instructors, students, archivists, librarians, developers, and others interested in web archiving. A novel mixture of institutions was also represented - some active principally through IIPC, many more associated with the SAA Web Archiving Roundtable and/or Archive-It Partner communities, and still others who I'd not yet encountered in these more established, practitioner-centric fora.

Echoing the sentiments of other participants, I was impressed and inspired both by the diversity of perspectives and the excitement for moving web archiving forward. As befitting such a group, the schedule and hallway conversations crossed a wide array of topics. Running through it all, though, questions of ethics seemed to be a persistent subject. I'll highlight three areas of ethical concern that stood out for me.

Social media archiving

Seemantani Sharma encapsulated her legal analysis of Twitter archiving noting that what is permissible under the Terms of Service is far clearer than what is responsible as a matter of ethics. While GW Libraries is farther along than most in making sense of how a platform's terms of service should inform an institutional social media archiving policy, it's clear that the ethical questions are being grappled with more widely.

The most pronounced critique was the problematization of a binaristic conception of privacy, where public-ness is taken as a tacit approval to archive and thereafter unilaterally determine the disposition of content. Other ethical questions relating to social media archiving concerned consent, post-custodialism, and research use. How can individuals and communities using social media consent to archiving, or at least be meaningfully informed of it? How can the prerogatives of individuals and communities whose social media is archived be aligned and/or balanced with those of the archive and, by proxy, the stakeholders in its contents? How does the theory and practice of post-custodial archiving inform approaches to social media archiving? What are guidelines for ethical use of social media archives?

Some resources I've found helpful in the last year in thinking through these questions include: danah boyd's blog post What is Privacy?; Anil Dash's essay What Is Public?; a Zotero Group Library seeded by Jarrett Drake; many blog posts from On Archivy by Bergis Jules and Ed Summers; Eira Tansey's talk at Personal Digital Archiving; a presentation by Michael Zimmer on the ethics of Twitter research; and the AoIR guide on Ethical Decision-Making and Internet Research (PDF).

Appraisal and provenance

In the opening keynote, Jefferson Bailey was eager to highlight how many of the supposedly exceptional conundra of web archives were in fact common to archives generally. Several of such observations were that (web) archives were necessarily incomplete, and always inflected by their curators. As easy as it is to be consumed by the purely operational aspects of web archiving, we often neglect to document the appraisal decisions we make about what to collect and the rationale for capture parameters that may likewise have a powerful, if more incidental effect on the contours of the archive. Our collective choices have consequences for what and whose web heritage will persist, and with what qualifications.

These concerns dovetail a recent public conversation sparked by Kalev Leetaru's analysis of the uneven coverage of the Internet Archive Wayback Machine. The article prompted keen responses from David Rosenthal, Andy Jackson, and other inline commenters, before Kalev himself rejoined. While programmatically-generated crawl metadata and better data mining tools may themselves get us much of the way toward understanding the limitations of our web archives, we could also do more to make our role in the shaping of web archives more visible, a point made powerfully by Christie Peterson in her presentation at the meeting.

Digital divides in web archiving

In the middle of my own presentation on building web archiving technology, together (PDF), I ad-libbed the half-joking comment, "web archiving has never been easier or never been harder, or both, than it is now." This was in the context of CDL and Harvard Library both lately re-evaluating their web archiving plans (PDF) and Archive-It becoming an increasingly popular method for organizations to do their own web archiving. What I meant was that on the one hand Archive-It has made effective, end-to-end web archiving accessible enough that it can be (and is commonly) managed as a fraction of one employee's time (PDF). On the other hand, Archive-It's success both reflects and reinforces an economy of scale that is difficult for other organizations to achieve. The result is that while more organizations can now start web archiving with minimal staffing investments, they have fewer scalable technology and service alternatives.

Conservative estimates highlight a shortfall of at least several orders of magnitude between how much web content we're collectively preserving and how much exists. The institutional membership of the IIPC, the comparatively high degree of professional activity in North America and Western Europe, and perhaps even the distribution of archival coverage in the Internet Archive Wayback Machine suggest that the opportunity gap may not just be in the volume of preserved content but also its diversity. I'm convinced that there is a also "fat, long tail" of both under-preserved web content and organizations that would like to do web archiving but don't feel that they can afford it.

All of which is to highlight the need for community efforts to both expand the base and enhance the capacity of web archiving organizations, through a combination of interoperating local tools and/or third-party systems. We envision our IMLS National Digital Platform grant (PDF) with Internet Archive, University of North Texas Libraries, and Rutgers University as one such effort - more to come on this soon, as we ramp up in the New Year.

Permalink | Crossposted to the Stanford Libraries Digital Library Blog