Web Archivability Guidance Refreshed and Republished

Published by Nicholas Taylor on 6 September 2022

Starting in 2014, I authored and published an extensive set of web archivability guidelines in my role as the Service Manager for Stanford Libraries Web Archiving. Some time after I left in 2019, the staff who succeeded me apparently concluded that they no longer had the expertise to maintain the webpages and decided to unpublish them. This was an understandable though unfortunate decision, as it was one of few contemporary community resources on the topic.

Broken Chain — "Lascaux IV" by Simone Ramella under CC BY 2.0

Having only recently noticed that the content had disappeared from the live web, I reached out to my former colleagues for permission to pick up where I left off and carry the guidelines forward. In response, they have graciously applied a CC0 license to the information.

With that sanction, I am pleased to have been able to refresh and republish the web archivability guidelines on my own website. The content has been updated as needed, though, perhaps unsurprisingly, much of it remains unchanged.

A notable development in the last eight years is that tools for high-fidelity web capture (e.g., ArchiveWeb.page, Browsertrix Crawler, Brozzler, warcprox) and replay (e.g., pywb, ReplayWeb.page) have improved considerably. While these tools may obviate some of the challenges towards which the guidelines were geared, web archiving in the main still relies heavily on archival crawlers, for which the guidelines remain highly relevant.

Permalink